idnits 2.17.1 

draft-nir-qcr-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 448.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 459.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 466.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 472.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 16, 2008) is 5886 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'IDr' is mentioned on line 192, but not defined

  ** Obsolete normative reference: RFC 4306 (Obsoleted by RFC 5996)

  ** Obsolete normative reference: RFC 4718 (Obsoleted by RFC 5996)


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                             Y. Nir
3	Internet-Draft                                               Check Point
4	Intended status: Standards Track                          March 16, 2008
5	Expires: September 17, 2008

7	                 A Quick Crash Recovery Method for IKE
8	                          draft-nir-qcr-00.txt

10	Status of this Memo

12	   By submitting this Internet-Draft, each author represents that any
13	   applicable patent or other IPR claims of which he or she is aware
14	   have been or will be disclosed, and any of which he or she becomes
15	   aware will be disclosed, in accordance with Section 6 of BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt.

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire on September 17, 2008.

35	Copyright Notice

37	   Copyright (C) The IETF Trust (2008).

39	Abstract

41	   This document describes an extension to the IKEv2 protocol that
42	   allows for faster crash recovery using a saved token method.

44	   When an IPsec tunnel between two IKEv2 implementations is
45	   disconnected due to a restart of one peer, it can take as much as
46	   several minutes to recover.  In this text we propose an extension to
47	   the protocol, that allows for recovery within a few seconds of the
48	   reboot.

50	Table of Contents

52	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
53	     1.1.  Conventions Used in This Document  . . . . . . . . . . . .  3
54	   2.  RFC 4306 Crash Recovery  . . . . . . . . . . . . . . . . . . .  3
55	   3.  Protocol Outline . . . . . . . . . . . . . . . . . . . . . . .  4
56	   4.  Formats and Exchanges  . . . . . . . . . . . . . . . . . . . .  4
57	     4.1.  Notification Format  . . . . . . . . . . . . . . . . . . .  4
58	     4.2.  Authentication Exchange  . . . . . . . . . . . . . . . . .  5
59	     4.3.  Informational Exchange . . . . . . . . . . . . . . . . . .  7
60	   5.  Token Generation and Verification  . . . . . . . . . . . . . .  7
61	     5.1.  A Stateful Method of Token Generation  . . . . . . . . . .  7
62	     5.2.  A Stateless Method of Token Generation . . . . . . . . . .  8
63	     5.3.  Token Lifetime . . . . . . . . . . . . . . . . . . . . . .  8
64	   6.  Alternative Solutions  . . . . . . . . . . . . . . . . . . . .  8
65	     6.1.  Why not Save the Entire IKE SA . . . . . . . . . . . . . .  8
66	     6.2.  Initiating a new IKE SA  . . . . . . . . . . . . . . . . .  9
67	   7.  Operational Considerations . . . . . . . . . . . . . . . . . .  9
68	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
69	   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
70	   10. Normative References . . . . . . . . . . . . . . . . . . . . . 10
71	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11
72	   Intellectual Property and Copyright Statements . . . . . . . . . . 12

74	1.  Introduction

76	   IKEv2, as described in [RFC4306] has a method for recovering from a
77	   reboot of one peer.  As long as traffic flows in both directions, the
78	   rebooted peer should re-establish the tunnels immediately.  However,
79	   in many cases the rebooted peer is a VPN gateway that protects only
80	   servers, or else the non-rebooted peers have a dynamic IP address.
81	   In such cases, the rebooted peer will not re-establish the tunnels.

83	   Section 2 describes the current procedure, and explains why crash
84	   recovery can take up to several minutes.  The method proposed here,
85	   is to send a token in the IKE_AUTH exchange that establishes the
86	   tunnel.  That token can be maintained on the peer in some kind of
87	   persistent storage such as a disk or a database, and can be used to
88	   delete the IKE SA after a crash.  Deleting the IKE SA results is a
89	   quick re-establishment of the IPsec tunnel.

91	1.1.  Conventions Used in This Document

93	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
94	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
95	   document are to be interpreted as described in [RFC2119].

97	2.  RFC 4306 Crash Recovery

99	   When one peer reboots, the other peer does not get any notification,
100	   so IPsec traffic can still flow.  The rebooted peer will not be able
101	   to decrypt it, however, and the only remedy is to send an unprotected
102	   INFORMATIONAL exchange with an INVALID_SPI notification as described
103	   in section 3.10.1 of [RFC4306].  That section also describes the
104	   processing of such a notification: "If this Informational Message is
105	   sent outside the context of an IKE_SA, it should be used by the
106	   recipient only as a "hint" that something might be wrong (because it
107	   could easily be forged)."

109	   Since the INVALID_SPI can only be used as a hint, the non-rebooted
110	   peer has to determine whether the IPsec SA, and indeed the parent IKE
111	   SA are still valid.  The method of doing this is described in section
112	   2.4 of [RFC4306].  This method, called "liveness check" involves
113	   sending a protected empty INFORMATIONAL message, and awaiting a
114	   response.  This procedure is sometimes refered to as "Dead Peer
115	   Detection" or DPD.

117	   Section 2.4 does not mandate how many times the INFORMATIONAL message
118	   should be retransmitted, or for how long, but does recommend the
119	   following: "It is suggested that messages be retransmitted at least a
120	   dozen times over a period of at least several minutes before giving
121	   up on an SA".  Clearly, implementations differ, but all will take a
122	   significant amount of time.

124	3.  Protocol Outline

126	   Supporting implementations will send a notification, called a "QCR
127	   token", as described in Section 4.1 in the last packets of the
128	   IKE_AUTH exchange.  These are the final request and final response
129	   that contain the AUTH payloads.  The generation of these tokens is a
130	   local matter for implementations, but considerations are described in
131	   Section 5.

133	   A supporting implementation receiving such a token SHOULD store it in
134	   such a way, that it will survive a reboot.  When a supporting
135	   implementation receives a protected IKE request message with unknown
136	   IKE SPIs, it should scan its saved token store.  If a token matching
137	   the IKE SPIs is found, it SHOULD send it to the requesting peer in an
138	   unprotected IKE message as described in Section 4.3.

140	   When a supporting implementation receives the QCR notification token
141	   in an unprotected INFORMATIONAL exchange, it MUST verify that the
142	   TOKEN_SECRET_DATA field is associated with the IKE SPIs in the
143	   IKE_SPI fields of the IKE packet.  If the verification fails, it
144	   SHOULD log the event.  If it succeeds, it MUST delete the IKE SA
145	   associated with the IKE_SPI fields, and all dependant child SAs.
146	   This event MAY also be logged.

148	   A supporting implementation MAY immediately create new SAs using an
149	   Initial exchange, or it may wait for subsequent traffic to trigger
150	   the creation of new SAs.

152	   There is ongoing work on IKEv2 Session Resumption [resumption].  The
153	   current proposal is orthogonal to Session Resumption, and in fact
154	   using Session Resumption instead of a regular IKE exchange, the new
155	   SA can be created with minimal overhead.

157	4.  Formats and Exchanges

159	4.1.  Notification Format

161	   The notification payload called "QCR token" is formatted as follows:

163	                            1                   2                   3
164	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
165	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
166	       ! Next Payload  !C!  RESERVED   !         Payload Length        !
167	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
168	       !  Protocol ID  !   SPI Size    ! QCR Token Notify Message Type !
169	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
170	       !                                                               !
171	       ~                       TOKEN_SECRET_DATA                       ~
172	       !                                                               !
173	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

175	   o  Protocol ID (1 octet) MUST contain 1, as this message is related
176	      to an IKE SA.
177	   o  SPI Size (1 octet) MUST be zero, in conformance with [RFC4306].
178	   o  QCR Token Notify Message Type (2 octets) - Must be xxxxx, the
179	      value assigned for QCR token notifications.  TBA by IANA.
180	   o  TOKEN_SECRET_DATA (16-256 octets) contains a generated token as
181	      described in Section 5.

183	4.2.  Authentication Exchange

185	   For clarity, only the EAP version of an AUTH exchange will be
186	   presented here.  The non-EAP version is very similar.  The figure
187	   below is based on appendix A.3 of [RFC4718].

189	    first request       --> IDi,
190	                            [N(INITIAL_CONTACT)],
191	                            [[N(HTTP_CERT_LOOKUP_SUPPORTED)], CERTREQ+],
192	                            [IDr],
193	                            [CP(CFG_REQUEST)],
194	                            [N(IPCOMP_SUPPORTED)+],
195	                            [N(USE_TRANSPORT_MODE)],
196	                            [N(ESP_TFC_PADDING_NOT_SUPPORTED)],
197	                            [N(NON_FIRST_FRAGMENTS_ALSO)],
198	                            SA, TSi, TSr,
199	                            [V+]

201	    first response      <-- IDr, [CERT+], AUTH,
202	                            EAP,
203	                            [V+]

205	                      / --> EAP
206	    repeat 1..N times |
207	                      \ <-- EAP

209	    last request        --> AUTH
210	                            [N(QCR_TOKEN)]

212	    last response       <-- AUTH,
213	                            [N(QCR_TOKEN)]
214	                            [CP(CFG_REPLY)],
215	                            [N(IPCOMP_SUPPORTED)],
216	                            [N(USE_TRANSPORT_MODE)],
217	                            [N(ESP_TFC_PADDING_NOT_SUPPORTED)],
218	                            [N(NON_FIRST_FRAGMENTS_ALSO)],
219	                            SA, TSi, TSr,
220	                            [N(ADDITIONAL_TS_POSSIBLE)],
221	                            [V+]

223	   Note that the QCR_TOKEN notification is marked as optional because it
224	   is not required by this specification that both sides send QCR
225	   tokens.  If only one peer sends the QCR token, then a reboot of the
226	   other peer will not be recoverable by this method.  This may be
227	   acceptable if traffic typically originates from the other peer.

229	   In any case, the lack of a QCR_TOKEN notification MUST NOT be taken
230	   as an indication that the peer does not support this standard.
231	   Conversely, if a peer does not understand this notification, it will
232	   simply ignore it.  Therefore a peer MAY send this notification
233	   freely, even if it doesnOt know whether the other side supports it.

235	4.3.  Informational Exchange

237	   This informational exchange is non-protected, and is sent as a
238	   response to a protected IKE request, which uses an IKE SA that is
239	   unknown.

241	               request             --> N(QCR_TOKEN)

243	               response            <--

245	   The QCR_TOKEN is the only notification in the request.  Similar to
246	   the description in section 2.21 of [RFC4306], The IKE SPI and message
247	   ID fields in the packet headers are taken from the protected IKE
248	   request.

250	   If the QCR_TOKEN verifies OK, an empty response MUST be sent.  If the
251	   QCR_TOKEN cannot be validated, a response SHOULD NOT be sent.
252	   Section 5 defines token verification.

254	5.  Token Generation and Verification

256	   No token generation method is mandated by this document.  Two methods
257	   are documented in Section 5.1 and Section 5.2, but they only serve as
258	   examples.

260	   The following lists the requirements from a token generation
261	   mechanism:
262	   o  Tokens should be at least 16 octets log, and no more than 256
263	      octets long, to facilitate storage.
264	   o  It should not be possible for an external attacker to guess the
265	      QCR token generated by an implementation.  Cryptographic
266	      mechanisms such as PRNG and hash functions are RECOMMENDED.
267	   o  The peer that generated the QCR token, should be able to
268	      immediately verify it, provided that the IKE SPIs are given, and
269	      that the IKE SA has not expired or been otherwise deleted.

271	5.1.  A Stateful Method of Token Generation

273	   This describes a stateful method of generating a token:
274	   o  Before sending the QCR token, 32 random octets are generated using
275	      a secure random number generator or a PRNG.
276	   o  Those 32 bytes are used as the TOKEN_SECRET_DATA field, and stored
277	      as part of the IKE SA.
278	   o  For verification, the IKE implementation simply retrieves the IKE
279	      SA, and compares the TOKEN_SECRET_DATA field from the notification
280	      to the TOKEN_SECRET_DATA field stored with the SA.

282	5.2.  A Stateless Method of Token Generation

284	   This describes a stateless method of generating a token.
285	   o  At startup, the IKE implementation generates a 32-octet random
286	      buffer using a cryptographically secure PRNG.  This buffer is
287	      called the QCR_SECRET.
288	   o  For each QCR token, the TOKEN_SECRET_DATA field is generated by
289	      calculating a SHA-256 hash over a concatenation of the QCR_SECRET
290	      and the IKE SPI as follows:

292	            TOKEN_SECRET_DATA = HASH(QCR_SECRET | SPI-I | SPI-R)

294	   o  Verification uses the same calculation, and works even if the IKE
295	      SA has been deleted.  Still, if the IKE SA is no longer valid, the
296	      notification MUST NOT be acknowledged, as this could be used in an
297	      attempt to guess the QCR_SECRET.

299	5.3.  Token Lifetime

301	   The token is associated with a single IKE SA, and SHOULD be deleted
302	   when the SA is deleted or expires.  More formally, the token is
303	   associated with the pair (SPI-I, SPI-R).

305	6.  Alternative Solutions

307	6.1.  Why not Save the Entire IKE SA

309	   IKEv2 does not assume the existence of a persistent storage module.
310	   If we are adding such a module, why not use it to save the entire IKE
311	   SA across reboots, nullifying the need for a crash recovery
312	   procedure?

314	   There are several reasons why we believe that this is not a good
315	   idea:
316	   1.  A token is only 16-256 octets, and is much more compact than all
317	       the data needed to store an IKE SA.
318	   2.  A token is valid for the life of an IKE SA.  An IKE SA state is
319	       updated whenever a message is sent, becuase of the requirement to
320	       keep the sequence of message IDs.  It may not be acceptable to
321	       update the persistent storage whenever an IKE message is sent.
322	   3.  A reboot is usually an unpredictable event, and as such, we
323	       cannot know how long it will last.  By the time the machine has
324	       rebooted, the peer may have attempted some type of protected
325	       exchange (liveness check, create-child-SA or delete), timed out,
326	       and deleted the SA.  It is far better to reboot without SAs and
327	       with only a token for quick recovery.

329	6.2.  Initiating a new IKE SA

331	   Instead of sending a QCR token, we could have the rebooted
332	   implementation start an Initial exchange with the peer, including the
333	   INITIAL_CONTACT notification.  This would have the same effect,
334	   instructing the peer to erase the old IKE SA, as well as establishing
335	   a new IKE SA with fewer rounds.

337	   The disadvantage here, is that in IKEv2 an authentication exchange
338	   MUST have a piggy-backed Child SA set up.  Since our use case is such
339	   that the rebooted implementation does not have traffic flowing to the
340	   peer, there are no good selectors for such a child SA.

342	   Additionally, when authentication is assymetric, such as when EAP is
343	   used, it is not possible for the rebooted implementation to initiate
344	   IKE.

346	7.  Operational Considerations

348	   To support this standard, an implementation needs to have access to a
349	   persistent storage module.  This could be an internal hard disk, a
350	   local or remote database application, or any other method that
351	   persists across reboots.  This storage module and the data links
352	   between the storage module and the IKE module must meet the
353	   performance requirements of the IKE module.  The storage module MUST
354	   support insertion and deletion rates equal to peek IKE SA setup rates
355	   and it SHOULD support query rates that are fast enough.

357	   See Section 8 for security considerations for this storage mechanism.

359	   In order to limit the effects of DoS attacks, an implementation
360	   SHOULD limit the rate of queries into the token storage so as not to
361	   overload it.  If excessive amounts of IKE requests protected with
362	   unknown IKE SPIs arrive, the IKE module SHOULD revert to the behavior
363	   described in section 2.21 of [RFC4306] and either send an
364	   INVALID_IKE_SPI notification, or ignore it entirely.

366	8.  Security Considerations

368	   Tokens MUST be hard to guess.  This is critical, because if an
369	   attacker can guess the token associated with the IKE SA, she can tear
370	   down the IKE SA and associated tunnels at will.  When the token is
371	   delivered in the IKE_AUTH exchange, it is encrypted.  When it is sent
372	   back in an informational exchange it is not encrypted, but that is
373	   the last use of that token.

375	   An aggregation of some tokens generated by one peer together with the
376	   related IKE SPIs MUST NOT give an attacker the ability to guess other
377	   tokens.  Specifically, if one peer does not properly secure the QCR
378	   tokens and an attacker gains access to them, this attacker MUST NOT
379	   be able to guess other tokens generated by the same peer.  This is
380	   the reason that the QCR_SECRET in Section 5.2 needs to be long.

382	   The persistent storage MUST be protected from access by other
383	   parties.  Anyone gaining access to the contents of the storage will
384	   be able to delete all the IKE SAs described in it.

386	   The tokens associated with expired and deleted IKE SAs MUST be
387	   deleted from the storage, so that a future compromise of the storage
388	   does not reveal enough tokens to facilitate an attack against the QCR
389	   tokens.

391	   The QCR token is sent by the rebooted peer in an unprotected message.
392	   A message like that is subject to modification, deletion and replay
393	   by an attacker.  However, these attacks will not compromise the
394	   security of either side.  Modification is meaningless because a
395	   modified token is simply an invalid token.  Deletion will only cause
396	   the protocol not to work, resulting in a delay in tunnel re-
397	   establishment as described in Section 2.  Replay is also meaningless,
398	   because the IKE SA has been deleted after the first transmission.

400	9.  IANA Considerations

402	   IANA is requested to assign a notify message type from the error
403	   types range (43-8191) of the "IKEv2 Notify Message Types" registry
404	   with name "QUICK_CRASH_RECOVERY".

406	10.  Normative References

408	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
409	              Requirement Levels", BCP 14, RFC 2119, March 1997.

411	   [RFC4306]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
412	              RFC 4306, December 2005.

414	   [RFC4718]  Eronen, P. and P. Hoffman, "IKEv2 Clarifications and
415	              Implementation Guidelines", RFC 4718, October 2006.

417	   [resumption]
418	              Sheffer, Y., Tschofenig, H., Dondeti, L., and V.

420	              Narayanan, "IPsec Gateway Failover Protocol",
421	              draft-sheffer-ipsec-failover-02 (work in progress),
422	              November 2007.

424	Author's Address

426	   Yoav Nir
427	   Check Point Software Technologies Ltd.
428	   5 Hasolelim st.
429	   Tel Aviv  67897
430	   Israel

432	   Email: ynir@checkpoint.com

434	Full Copyright Statement

436	   Copyright (C) The IETF Trust (2008).

438	   This document is subject to the rights, licenses and restrictions
439	   contained in BCP 78, and except as set forth therein, the authors
440	   retain all their rights.

442	   This document and the information contained herein are provided on an
443	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
444	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
445	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
446	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
447	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
448	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

450	Intellectual Property

452	   The IETF takes no position regarding the validity or scope of any
453	   Intellectual Property Rights or other rights that might be claimed to
454	   pertain to the implementation or use of the technology described in
455	   this document or the extent to which any license under such rights
456	   might or might not be available; nor does it represent that it has
457	   made any independent effort to identify any such rights.  Information
458	   on the procedures with respect to rights in RFC documents can be
459	   found in BCP 78 and BCP 79.

461	   Copies of IPR disclosures made to the IETF Secretariat and any
462	   assurances of licenses to be made available, or the result of an
463	   attempt made to obtain a general license or permission for the use of
464	   such proprietary rights by implementers or users of this
465	   specification can be obtained from the IETF on-line IPR repository at
466	   http://www.ietf.org/ipr.

468	   The IETF invites any interested party to bring to its attention any
469	   copyrights, patents or patent applications, or other proprietary
470	   rights that may cover technology that may be required to implement
471	   this standard.  Please address the information to the IETF at
472	   ietf-ipr@ietf.org.

474	Acknowledgment

476	   Funding for the RFC Editor function is provided by the IETF
477	   Administrative Support Activity (IASA).