idnits 2.17.1 

draft-nir-ike-qcd-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 929.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 940.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 947.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 953.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 13, 2008) is 5766 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'IDr' is mentioned on line 459, but not defined

  == Missing Reference: 'CERTREQ' is mentioned on line 656, but not defined

  == Missing Reference: 'TSi' is mentioned on line 676, but not defined

  == Missing Reference: 'TSr' is mentioned on line 676, but not defined

  ** Obsolete normative reference: RFC 4306 (Obsoleted by RFC 5996)

  ** Obsolete normative reference: RFC 4718 (Obsoleted by RFC 5996)


     Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                             Y. Nir
3	Internet-Draft                                               Check Point
4	Intended status: Standards Track                             F. Detienne
5	Expires: January 14, 2009                                       P. Sethi
6	                                                                   Cisco
7	                                                           July 13, 2008

9	                 A Quick Crash Detection Method for IKE
10	                        draft-nir-ike-qcd-01.txt

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt.

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	   This Internet-Draft will expire on January 14, 2009.

37	Abstract

39	   This document describes an extension to the IKEv2 protocol that
40	   allows for faster crash recovery using a saved token.

42	   When an IPsec tunnel between two IKEv2 implementations is
43	   disconnected due to a restart of one peer, it can take as much as
44	   several minutes for the other peer to discover that the reboot has
45	   occurred, thus delaying recovery.  In this text we propose an
46	   extension to the protocol, that allows for recovery immediately
47	   following the reboot.

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
52	     1.1.  Conventions Used in This Document  . . . . . . . . . . . .  3
53	   2.  RFC 4306 Crash Recovery  . . . . . . . . . . . . . . . . . . .  3
54	   3.  Protocol Outline . . . . . . . . . . . . . . . . . . . . . . .  4
55	   4.  Stateless Variant Outline  . . . . . . . . . . . . . . . . . .  5
56	     4.1.  Introducing CHECK_SPI  . . . . . . . . . . . . . . . . . .  5
57	     4.2.  Stateless Recovery . . . . . . . . . . . . . . . . . . . .  6
58	     4.3.  Wait before rekey  . . . . . . . . . . . . . . . . . . . .  6
59	     4.4.  Throttling and Dampening . . . . . . . . . . . . . . . . .  7
60	       4.4.1.  Invalid SPI throttling . . . . . . . . . . . . . . . .  8
61	       4.4.2.  Dampening  . . . . . . . . . . . . . . . . . . . . . .  8
62	       4.4.3.  User controls  . . . . . . . . . . . . . . . . . . . .  9
63	   5.  Formats and Exchanges  . . . . . . . . . . . . . . . . . . . .  9
64	     5.1.  Notification Format  . . . . . . . . . . . . . . . . . . .  9
65	     5.2.  check_fmt  . . . . . . . . . . . . . . . . . . . . . . . .  9
66	     5.3.  Stateless IKE Recovery VendorID  . . . . . . . . . . . . . 10
67	     5.4.  Authentication Exchange  . . . . . . . . . . . . . . . . . 10
68	     5.5.  Informational Exchange . . . . . . . . . . . . . . . . . . 12
69	   6.  Token Generation and Verification  . . . . . . . . . . . . . . 12
70	     6.1.  A Stateless Method of Token Generation . . . . . . . . . . 13
71	     6.2.  Token Lifetime . . . . . . . . . . . . . . . . . . . . . . 13
72	   7.  Backup Gateways  . . . . . . . . . . . . . . . . . . . . . . . 13
73	   8.  Alternative Solutions  . . . . . . . . . . . . . . . . . . . . 13
74	     8.1.  Initiating a new IKE SA  . . . . . . . . . . . . . . . . . 14
75	     8.2.  Birth Certificates . . . . . . . . . . . . . . . . . . . . 14
76	   9.  Interaction with IFARE . . . . . . . . . . . . . . . . . . . . 14
77	   10. Operational Considerations . . . . . . . . . . . . . . . . . . 15
78	     10.1. Who should implement this specification  . . . . . . . . . 15
79	     10.2. Response to unknown child SPI  . . . . . . . . . . . . . . 16
80	     10.3. Stateless IKE Recovery cookie  . . . . . . . . . . . . . . 17
81	   11. Security Considerations  . . . . . . . . . . . . . . . . . . . 17
82	     11.1. Security Considerations for the Stateful Method  . . . . . 18
83	     11.2. Security Considerations for the Stateless Method . . . . . 18
84	   12. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
85	   13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
86	   14. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 19
87	     14.1. Changes from draft-nir-ike-qcd-00  . . . . . . . . . . . . 19
88	     14.2. Changes from draft-nir-qcr-00  . . . . . . . . . . . . . . 19
89	   15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
90	     15.1. Normative References . . . . . . . . . . . . . . . . . . . 19
91	     15.2. Informative References . . . . . . . . . . . . . . . . . . 20
92	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20
93	   Intellectual Property and Copyright Statements . . . . . . . . . . 22

95	1.  Introduction

97	   IKEv2, as described in [RFC4306] has a method for recovering from a
98	   reboot of one peer.  As long as traffic flows in both directions, the
99	   rebooted peer should re-establish the tunnels immediately.  However,
100	   in many cases the rebooted peer is a VPN gateway that protects only
101	   servers, or else the non-rebooted peer has a dynamic IP address.  In
102	   such cases, the rebooted peer will not be able to re-establish the
103	   tunnels.  Section 2 describes how recovery works under RFC 4306, and
104	   explains why it takes several minutes.

106	   The method proposed here, is to send a token in the IKE_AUTH exchange
107	   that establishes the tunnel.  That token can be stored on the peer as
108	   part of the IKE SA.  After a reboot, the rebooted implementation can
109	   re-generate the token, and send it to the non-rebooted peer so as to
110	   delete the IKE SA.  Deleting the IKE SA results is a quick re-
111	   establishment of the IPsec tunnels.  This is described in Section 3.

113	   Finally, Section 4 describes a variant that does not require storing
114	   state on the non-rebooted peer, but does require an extra round-trip.

116	1.1.  Conventions Used in This Document

118	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
119	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
120	   document are to be interpreted as described in [RFC2119].

122	   The term "token" refers to an octet string that an implementation can
123	   generate using only the IKE SPIs as input.  A conforming
124	   implementation MUST be able to generate the same token from the same
125	   input even after rebooting.

127	   The term "token maker" refers to an implementation that generates a
128	   token and sends it to the peer in the IKE_AUTH exchange.

130	   The term "token taker" refers to an implementation that stores such a
131	   token or a digest thereof, after receiving it in an IKE_AUTH
132	   exchange.

134	2.  RFC 4306 Crash Recovery

136	   When one peer reboots, the other peer does not get any notification,
137	   so IPsec traffic can still flow.  The rebooted peer will not be able
138	   to decrypt it, however, and the only remedy is to send an unprotected
139	   INVALID_SPI notification as described in section 3.10.1 of [RFC4306].
140	   That section also describes the processing of such a notification:
141	   "If this Informational Message is sent outside the context of an
142	   IKE_SA, it should be used by the recipient only as a "hint" that
143	   something might be wrong (because it could easily be forged)."

145	   Since the INVALID_SPI can only be used as a hint, the non-rebooted
146	   peer has to determine whether the IPsec SA, and indeed the parent IKE
147	   SA are still valid.  The method of doing this is described in section
148	   2.4 of [RFC4306].  This method, called "liveness check" involves
149	   sending a protected empty INFORMATIONAL message, and awaiting a
150	   response.  This procedure is sometimes referred to as "Dead Peer
151	   Detection" or DPD.

153	   Section 2.4 does not mandate how many times the liveness check
154	   message should be retransmitted, or for how long, but does recommend
155	   the following: "It is suggested that messages be retransmitted at
156	   least a dozen times over a period of at least several minutes before
157	   giving up on an SA".  Clearly, implementations differ, but all will
158	   take a significant amount of time.

160	3.  Protocol Outline

162	   Supporting implementations will send a notification, called a "QCD
163	   token", as described in Section 5.1 in the last packets of the
164	   IKE_AUTH exchange.  These are the final request and final response
165	   that contain the AUTH payloads.  The generation of these tokens is a
166	   local matter for implementations, but considerations are described in
167	   Section 6.  Implementations that send such a token will be called
168	   "token makers".

170	   A supporting implementation receiving such a token SHOULD store it as
171	   part of the IKE SA.  Implementations that support this part of the
172	   protocol will be called "token takers".  Section 10.1 has
173	   considerations for which implementations need to be token takers, and
174	   which should be token makers.  Implementation that are not token
175	   takers will silently ignore QCD tokens.

177	   When a token maker receives a protected IKE request message with
178	   unknown IKE SPIs, it MUST generate a new token that is identical to
179	   the previous token, and send it to the requesting peer in an
180	   unprotected IKE message as described in Section 5.5.

182	   When a token taker receives the QCD token in an unprotected
183	   notification, it MUST verify that the TOKEN_SECRET_DATA matches the
184	   token stored in the matching the IKE SA.  If the verification fails,
185	   or if the IKE SPIs in the message do not match any existing IKE SA,
186	   it SHOULD log the event.  If it succeeds, it MUST delete the IKE SA
187	   associated with the IKE_SPI fields, and all dependant child SAs.
188	   This event MAY also be logged.  The token taker MUST accept such
189	   tokens from any address, so as to allow different kinds of high-
190	   availability configuration of the token maker.

192	   A supporting token taker MAY immediately create new SAs using an
193	   Initial exchange, or it may wait for subsequent traffic to trigger
194	   the creation of new SAs.

196	   There is ongoing work on IKEv2 Session Resumption [resumption].  See
197	   Section 9 for a short discussion about this protocol's interaction
198	   with session resumption.

200	4.  Stateless Variant Outline

202	   Sometimes, a QCD token is not available to the non-rebooted
203	   implementation.  This can happen for several reasons:
204	   o  Perhaps the rebooted peer has not implemented the "token maker"
205	      part of the protocol.
206	   o  Perhaps the non-rebooted peer is resource-constrained, and cannot
207	      spare the memory needed to save the token, so it did not implement
208	      the "token taker" part of the protocol.

210	   In such cases, we also define a stateless variant of the protocol,
211	   that does not require any state on the non-rebooted peer, but does
212	   require an extra round-trip.

214	   A supporting implementation will advertise this capability with a
215	   special VID payload as defined in Section 5.3.  When such an
216	   implementation reboots and sends an INVALID_SPI or INVALID_IKE_SPI
217	   notification to the non-rebooted peer, which has no QCD token, the
218	   non-rebooted peer uses a CHECK_SPI notification (see Section 4.1) to
219	   poll its peer about whether or not the SPI is actually invalid.

221	4.1.  Introducing CHECK_SPI

223	   In order to achieve stateless IKE recovery, this memo introduces a
224	   new notify type called CHECK_SPI.  The CHECK_SPI payload carries an
225	   SPI (IKE_SA or Child SA) and one of three sub-types (QUERY, ACK,
226	   NACK).  The semantic of the CHECK_SPI subtypes is the following:
227	   o  QUERY: a peer queries the remote peer SA DB for the presence of
228	      the SA whose value is in the payload.
229	   o  ACK: a peer confirms it has the SA specified in the payload.
230	   o  NACK: a peer confirms it does not have the SA specified in the
231	      payload.

233	   The payload format of the CHECK_SPI notify is covered in Section 5.2.

235	4.2.  Stateless Recovery

237	   After receiving the INVALID_SPI or INVALID_IKE_SPI notifications, the
238	   non-rebooted peer (called Peer Y in the figure) will send an
239	   unprotected IKE message as follows.  Note that Peer Y MUST NOT send
240	   this unless Peer X has advertised this capability in the IKE_AUTH
241	   exchange.

243	      Peer X                                                  Peer Y

245	                HDR(A,B) INVALID_IKE_SPI(A,B)
246	               -------------------------------------------->

248	                HDR(A,B) CHECK_SPI(QUERY,(A,B)), N(Cookie)
249	               <--------------------------------------------

251	                HDR(A,B) CHECK_SPI(ACK|NACK,(A,B)), N(Cookie)
252	               -------------------------------------------->

254	   In this figure, A & B represent the IKE SPIs, and the Cookie is a
255	   stateless cookie with similar considerations as the stateless cookie
256	   described in section 2.6 of RFC 4306.  The cookie SHOULD depend on
257	   the IKE SPIs and a saved secret.

259	   A similar exchange happens when the peer sends an INVALID_SPI
260	   notification:

262	      Peer X                                                  Peer Y

264	                HDR(0,0) INVALID_SPI(a)
265	               -------------------------------------------->

267	                HDR(A,B) CHECK_SPI(QUERY,(A,B)), N(Cookie)
268	               <--------------------------------------------

270	                HDR(A,B) CHECK_SPI(ACK|NACK,(A,B)), N(Cookie)
271	               -------------------------------------------->

273	   The difference here is that Peer Y had to locate the IKE SPIs
274	   associated with the SPI mentioned in the INVALID_SPI notification.

276	4.3.  Wait before rekey

278	   There exists a particular attack where a man-in-the-middle can snoop
279	   and inject traffic but can not block or drop packets.  This attack
280	   can spoof INVALID_SPI (allegedly from X), forcing a CHECK_SPI(QUERY)
281	   from Y. The attacker would spoof back CHECK_SPI(NACK) to force an
282	   undue rekey.  Since the attacker can not block packets, the
283	   INVALID_SPI will also reach Alice, who will reply with
284	   CHECK_SPI(ACK).

286	   Y receives CHECK_SPI(NACK) first and MAY wait for a few msec before
287	   creating a new SA.  Y will eventually receive BOTH a CHECK_SPI(ACK)
288	   and a CHECK_SPI(NACK), Which is dubious.  The SIR process should then
289	   stop and log an error, saving the SA.

291	   The process is illustrated below:

293	         X                 Attacker                Y
294	                               Inv SPI
295	                               ------------------>

297	                                  CHECK_SPI(QUERY)
298	            <-------------------------------------

300	                               CHECK_SPI(NACK)
301	                               ------------------> Should rekey
302	                                                   but wait a few msec

304	            CHECK_SPI(ACK)
305	            -------------------------------------> Hint of attack
306	                                                   => no rekey

308	   Ideally, the round-trip-time should be measured during the IKE
309	   exchange and Y wait for a full RTT before initiating a rekey.

311	   Given that IKE itself is subject to DH computation by a man-in-the-
312	   middle, also considering that SA's are dampened after creation (see
313	   Section 4.4.2), the staging complexity and limited interest of this
314	   attack makes it rather impractical.  An implementation MAY decided to
315	   implement this final safety wait but this is strictly optional.

317	4.4.  Throttling and Dampening

319	   An important aspect of the security in stateless IKE recovery has to
320	   do with limiting the CPU utilization.  In order to thwart flood types
321	   denial of service attacks, strict rate limiting and throttling
322	   mechanisms have to be enforced.

324	   All the notifications that are exchanged during IKE recovery SHOULD
325	   be rate limited.  This paragraph provides information on the way rate
326	   limiting should take place.

328	4.4.1.  Invalid SPI throttling

330	   The sending of all Invalid SPI notifies MUST be rate limited one way
331	   or an other.  The rate limiting SHOULD be performed on a per peer
332	   basis but dynamic state creation SHOULD be avoided as much as
333	   possible.  A recommended tradeoff is to limit the number of flows
334	   that can undergo recovery at one point in time and avoid sending
335	   Invalid SPI notifies for flows that are potentially already under
336	   recovery.

338	   Invalid SPI rate limiting protects against natural dangling SA
339	   occurences.  I.e. normal traffic conditions may cause unrecognized
340	   SPI's to be received and this message is the most important to
341	   protect.  Indeed, it is not realistic to send one notification per
342	   bad ESP packet received.  On high speed links, this could mean
343	   thousands of IKE notifies sent for the same offending SPI.

345	   The receiving of unauthenticated Invalid SPI notifies MUST as well be
346	   rate limited.  Again, the rate limiting SHOULD be performed on a per
347	   peer basis without dynamic state creation.  In normal circumstances,
348	   the peer receiving Invalid SPI notifies has an SA with the peer
349	   sendig those notifies and already maintains peer-related data
350	   structures that can help in maintaining adequate counters.

352	   Authenticated Invalid SPI notifies can be accepted without
353	   throttling.

355	4.4.2.  Dampening

357	   After one of the following conditions:
358	   o  the natural creation or rekey of one or more SA's
359	   o  the recovery of one or more SA's
360	   o  the failure in recovering an SA owned by the local security
361	      gateway
362	   o  the logging of an error or warning message involving an SA owned
363	      by the local security gateway

365	   The peer with which SA's were created, attempted or against which a
366	   log was emitted SHOULD be dampened, which means that all the
367	   unauthenticated Invalid SPI and Check SPI messages emitted by that
368	   peer MUST be ignored for a chosen duration.

370	   This protection prevents a man-in-the-middle from forcing the fast
371	   recreation of SA's and potentially depleting the entropy of systems
372	   under attack.  It also deals efficently with race conditions that may
373	   occur after a rekey.

375	4.4.3.  User controls

377	   Because throttling at large is related to speed, the network
378	   implementation around the security gateways has a major influence on
379	   the pertinence of the paremeters controlling rate limiting.  It is
380	   difficult to provide good absolute values for the rate limiters,
381	   considering that these are implementation dependent.

383	   As such, for the sake of fitness in practical deployments, a system
384	   implementing this memo MUST provide administrative controls over the
385	   rate limiter parameters.

387	5.  Formats and Exchanges

389	5.1.  Notification Format

391	   The notification payload called "QCD token" is formatted as follows:

393	                            1                   2                   3
394	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
395	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
396	       ! Next Payload  !C!  RESERVED   !         Payload Length        !
397	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
398	       !  Protocol ID  !   SPI Size    ! QCD Token Notify Message Type !
399	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
400	       !                                                               !
401	       ~                       TOKEN_SECRET_DATA                       ~
402	       !                                                               !
403	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

405	   o  Protocol ID (1 octet) MUST contain 1, as this message is related
406	      to an IKE SA.
407	   o  SPI Size (1 octet) MUST be zero, in conformance with [RFC4306].
408	   o  QCD Token Notify Message Type (2 octets) - MUST be xxxxx, the
409	      value assigned for QCD token notifications.  TBA by IANA.
410	   o  TOKEN_SECRET_DATA (16-256 octets) contains a generated token as
411	      described in Section 6.

413	5.2.  check_fmt

415	   The notification payload called "CHECK_SPI" is formatted as follows:

417	                            1                   2                   3
418	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
419	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
420	       ! Next Payload  !C!  RESERVED   !         Payload Length        !
421	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
422	       !  Protocol ID  !   SPI Size    ! CHECK_SPI Notify Message Type !
423	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
424	       ! Operation     !
425	       +-+-+-+-+-+-+-+-+

427	   o  Protocol ID (1 octet) MUST contain 1, as this message is related
428	      to an IKE SA.
429	   o  SPI Size (1 octet) MUST be zero, in conformance with [RFC4306].
430	   o  CHECK_SPI Notify Message Type (2 octets) - MUST be xxxxx, the
431	      value assigned for CHECK_SPI notifications.  TBA by IANA.
432	   o  Operation (1 Octet) - This field determines the operation being
433	      performed (Query, Reply_ACK, Reply_NACK)

435	   The list of operations and their corresponding value:
436	   o  Query: 0
437	   o  Reply_ACK: 1
438	   o  NACK: 2

440	5.3.  Stateless IKE Recovery VendorID

442	   The stateless IKE recovery VendorID or SIR_VID is as follows:

444	   "SIR STATELESS" hex: 53 49 52 20 53 54 41 54 45 4c 45 53 53

446	   This VendorID payload MUST be sent in the first IKE_AUTH message of
447	   any implementation that supports the stateless variant of this
448	   protocol.

450	5.4.  Authentication Exchange

452	   For clarity, only the EAP version of an AUTH exchange will be
453	   presented here.  The non-EAP version is very similar.  The figure
454	   below is based on appendix A.3 of [RFC4718].

456	    first request       --> IDi,
457	                            [N(INITIAL_CONTACT)],
458	                            [[N(HTTP_CERT_LOOKUP_SUPPORTED)], CERTREQ+],
459	                            [IDr],
460	                            [CP(CFG_REQUEST)],
461	                            [N(IPCOMP_SUPPORTED)+],
462	                            [N(USE_TRANSPORT_MODE)],
463	                            [N(ESP_TFC_PADDING_NOT_SUPPORTED)],
464	                            [N(NON_FIRST_FRAGMENTS_ALSO)],
465	                            SA, TSi, TSr,
466	                            [V(SIR_VID)]
467	                            [V+]

469	    first response      <-- IDr, [CERT+], AUTH,
470	                            EAP,
471	                            [V(SIR_VID)]
472	                            [V+]

474	                      / --> EAP
475	    repeat 1..N times |
476	                      \ <-- EAP

478	    last request        --> AUTH
479	                            [N(QCD_TOKEN)]

481	    last response       <-- AUTH,
482	                            [N(QCD_TOKEN)]
483	                            [CP(CFG_REPLY)],
484	                            [N(IPCOMP_SUPPORTED)],
485	                            [N(USE_TRANSPORT_MODE)],
486	                            [N(ESP_TFC_PADDING_NOT_SUPPORTED)],
487	                            [N(NON_FIRST_FRAGMENTS_ALSO)],
488	                            SA, TSi, TSr,
489	                            [N(ADDITIONAL_TS_POSSIBLE)],
490	                            [V+]

492	   Note that the QCD_TOKEN notification is marked as optional because it
493	   is not required by this specification that every implementation be
494	   both token maker and token taker.  If only one peer sends the QCD
495	   token, then a reboot of the other peer will not be recoverable by
496	   this method.  This may be acceptable if traffic typically originates
497	   from the other peer.

499	   In any case, the lack of a QCD_TOKEN notification MUST NOT be taken
500	   as an indication that the peer does not support this standard.
501	   Conversely, if a peer does not understand this notification, it will
502	   simply ignore it.  Therefore a peer MAY send this notification
503	   freely, even if it does not know whether the other side supports it.

505	5.5.  Informational Exchange

507	   This QCD_TOKEN notification is unprotected, and is sent as a response
508	   to a protected IKE request, which uses an IKE SA that is unknown.

510	            request             --> N(INVALID_IKE_SPI), N(QCD_TOKEN)+

512	            response            <--

514	   If child SPIs are persistently mapped to IKE SPIs as described in
515	   Section 10.2, we may get the following exchange in response to an ESP
516	   or AH packet.

518	            request             --> N(INVALID_SPI), N(QCD_TOKEN)+

520	            response            <--

522	   The QCD_TOKEN and INVALID_IKE_SPI notifications are sent together to
523	   support both implementations that conform to this specification and
524	   implementations that don't.  Similar to the description in section
525	   2.21 of [RFC4306], The IKE SPI and message ID fields in the packet
526	   headers are taken from the protected IKE request.

528	   To support a periodic rollover of token generation constants, the
529	   token taker MUST support at least four QCD_TOKEN notifications in a
530	   single packet.  The token is considered verified if any of the
531	   QCD_TOKEN notifications matches.  The token maker MAY generate up to
532	   four QCD_TOKEN notifications, based on several generations of keys.

534	   If the QCD_TOKEN verifies OK, an empty response MUST be sent.  If the
535	   QCD_TOKEN cannot be validated, a response SHOULD NOT be sent.
536	   Section 6 defines token verification.

538	6.  Token Generation and Verification

540	   No token generation method is mandated by this document.  A method is
541	   documented in Section 6.1, but only serves as an example.

543	   The following lists the requirements from a token generation
544	   mechanism:
545	   o  Tokens MUST be at least 16 octets log, and no more than 128 octets
546	      long, to facilitate storage and transmission.  Tokens SHOULD be
547	      indistinguishable from random data.
548	   o  It should not be possible for an external attacker to guess the
549	      QCD token generated by an implementation.  Cryptographic
550	      mechanisms such as PRNG and hash functions are RECOMMENDED.

552	   o  The token maker, MUST be able to re-generate or retrieve the token
553	      based on the IKE SPIs even after it reboots.

555	6.1.  A Stateless Method of Token Generation

557	   This describes a stateless method of generating a token:
558	   o  At installation or immediately after the first boot of the IKE
559	      implementation, 32 random octets are generated using a secure
560	      random number generator or a PRNG.
561	   o  Those 32 bytes, called the "QCD_SECRET", are stored in non-
562	      volatile storage on the machine, and kept indefinitely.
563	   o  The TOKEN_SECRET_DATA is calculated as follows:

565	            TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R)

567	   o  If key rollover is required by policy, the implementation MAY
568	      periodically generate a new QCD_SECRET and keep up to 3 previous
569	      generations.  When sending an unprotected QCD_TOKEN, as many as 4
570	      notification payloads may be sent, each from a different
571	      QCD_SECRET.

573	6.2.  Token Lifetime

575	   The token is associated with a single IKE SA, and SHOULD be deleted
576	   by the token taker when the SA is deleted or expires.  More formally,
577	   the token is associated with the pair (SPI-I, SPI-R).

579	7.  Backup Gateways

581	   Making crash recovery quick is important, but since rebooting a
582	   gateway takes a non-zero amount of time, many implementations choose
583	   to have a stand-by gateway ready to take over as soon as the primary
584	   gateway fails for any reason.

586	   If such a configuration is available, it is RECOMMENDED that the
587	   stand-by gateway be able to generate the same token as the active
588	   gateway. if the method described in Section 6.1 is used, this means
589	   that the QCD_SECRET field is identical in both gateways.  This has
590	   the effect of having the crash recovery available immediately.

592	8.  Alternative Solutions
593	8.1.  Initiating a new IKE SA

595	   Instead of sending a QCD token, we could have the rebooted
596	   implementation start an Initial exchange with the peer, including the
597	   INITIAL_CONTACT notification.  This would have the same effect,
598	   instructing the peer to erase the old IKE SA, as well as establishing
599	   a new IKE SA with fewer rounds.

601	   The disadvantage here, is that in IKEv2 an authentication exchange
602	   MUST have a piggy-backed Child SA set up.  Since our use case is such
603	   that the rebooted implementation does not have traffic flowing to the
604	   peer, there are no good selectors for such a Child SA.

606	   Additionally, when authentication is asymmetric, such as when EAP is
607	   used, it is not possible for the rebooted implementation to initiate
608	   IKE.

610	8.2.  Birth Certificates

612	   Here we should explain why not Birth Certificates.

614	9.  Interaction with IFARE

616	   IFARE, specified in [resumption] proposes to make setting up a new
617	   IKE SA consume less computing resources.  This is particularly useful
618	   in the case of a remote access gateway that has many tunnels.  A
619	   failure of such a gateway would require all these many remote access
620	   clients to establish an IKE SA either with the rebooted gateway or
621	   with a backup gateway.  This tunnel re-establishment should occur
622	   within a short period of time, creating a burden on the remote access
623	   gateway.  IFARE addresses this problem by having the clients store an
624	   encrypted derivative of the IKE SA for quick re-establishment.

626	   What IFARE does not help, is the problem of detecting that the peer
627	   gateway has failed.  A failed gateway may go undetected for as long
628	   as the lifetime of a child SA, because IPsec does not have packet
629	   acknowledgement.  Before establishing a new IKE SA using IFARE, a
630	   client MUST ascertain that the gateway has indeed failed.  This could
631	   be done using either a liveness check (as in RFC 4306) or using the
632	   QCD tokens described in this document.

634	   A remote access client conforming to both specifications will store
635	   QCD tokens, as well as the IFARE state, if provided by the gateway.
636	   A remote access gateway conforming to both specifications will
637	   generate a QCD token for the client.  When the gateway reboots, the
638	   client will discover this in either of two ways:

640	   1.  The client does regular liveness checks, or else the time for
641	       some other IKE exchange has come.  Since the gateway is still
642	       down, the IKE times out after several minutes.  In this case QCD
643	       does not help.
644	   2.  Either the primary gateway or a backup gateway (see Section 7) is
645	       ready and sends a QCD token to the client.  In that case the
646	       client will quickly re-establish the IPsec tunnel, either with
647	       the rebooted primary gateway, the backup gateway as described in
648	       this document or another gateway as described in [resumption]

650	   The full combined protocol looks like this:

652	        Initiator                Responder
653	        -----------              -----------
654	       HDR, SAi1, KEi, Ni  -->

656	                           <--    HDR, SAr1, KEr, Nr, [CERTREQ]

658	       HDR, SK {IDi, [CERT,]
659	       [CERTREQ,] [IDr,]
660	       AUTH, N(QCD_TOKEN)
661	       SAi2, TSi, TSr,
662	       N(TICKET_REQUEST)}  -->
663	                           <--    HDR, SK {IDr, [CERT,] AUTH, SAr2, TSi,
664	                                  TSr, N(TICKET_OPAQUE)
665	                                  [,N(TICKET_GATEWAY_LIST)]}

667	                ---- Reboot -----

669	       HDR, {}             -->
670	                           <--  HDR, N(QCD_Token)

672	       HDR, Ni, N(TICKET_OPAQUE),
673	       [N+,], SK {IDi, [IDr,]
674	       SAi2, TSi, TSr,
675	       [CP(CFG_REQUEST)]}  -->
676	                           <--  HDR, SK {IDr, Nr, SAr2, [TSi, TSr],
677	                                [CP(CFG_REPLY)]}

679	10.  Operational Considerations

681	10.1.  Who should implement this specification

683	   Throughout this document, we have referred to reboot time
684	   alternatingly as the time that the implementation crashes and the
685	   time when it is ready to process IPsec packets and IKE exchanges.
686	   Depending on the hardware and software platforms and the cause of the
687	   reboot, rebooting may take anywhere from a few seconds to several
688	   minutes.  If the implementation is down for a long time, the benefit
689	   of this protocol extension are reduced.  For this reason critical
690	   systems should implement backup gateways as described in Section 7.
691	   Note that the lower-case "should" in the previous sentence is
692	   intentional, as we do not specify this in the sense of RFC 2119.

694	   Implementing the "token maker" side of QCD makes sense for IKE
695	   implementation where protected connections originate from the peer,
696	   such as inter-domain VPNs and remote access gateways.  Implementing
697	   the "token taker" side of QCD makes sense for IKE implementations
698	   where protected connections originate, such as inter-domain VPNs and
699	   remote access clients.

701	   To clarify the requirements:
702	   o  A remote-access client MUST be a token taker and MAY be a token
703	      maker.
704	   o  A remote-access gateway MAY be a token taker and MUST be a token
705	      maker.
706	   o  An inter-domain VPN gateway MUST be both token maker and token
707	      taker.

709	   In order to limit the effects of DoS attacks, a token taker SHOULD
710	   limit the rate of QCD_TOKENs verified from a particular source.

712	   If excessive amounts of IKE requests protected with unknown IKE SPIs
713	   arrive at a token maker, the IKE module SHOULD revert to the behavior
714	   described in section 2.21 of [RFC4306] and either send an
715	   INVALID_IKE_SPI notification, or ignore it entirely.

717	10.2.  Response to unknown child SPI

719	   After a reboot, it is more likely that an implementation receives
720	   IPsec packets than IKE packets.  In that case, the rebooted
721	   implementation will send an INVALID_SPI notification, triggering a
722	   liveness check.  The token will only be sent in a response to the
723	   liveness check, thus requiring an extra round-trip.

725	   To avoid this, an implementation that has access to non-volatile
726	   storage MAY store a mapping of child SPIs to owning IKE SPIs.  If
727	   such a mapping is available and persistent across reboots, the
728	   rebooted implementation MAY respond to the IPsec packet with an
729	   INVALID_SPI notification, along with the appropriate QCD_Token
730	   notifications.  A token taker SHOULD verify the QCD token that
731	   arrives with an INVALID_SPI notification the same as if it arrived
732	   with the IKE SPIs of the parent IKE SA.

734	   However, a persistent storage module might not be updated in a timely
735	   manner, and could be populated with IKE SPIs that have already been
736	   rekeyed.  A token taker MUST NOT take an invalid QCD Token sent along
737	   with an INVALID_SPI notification as evidence that the peer is either
738	   malfunctioning or attacking, but it SHOULD limit the rate at which
739	   such notifications are processed.

741	10.3.  Stateless IKE Recovery cookie

743	   The cookie information is chosen by the peer that emits it.  As such,
744	   the cookie has strictly no meaning for the remote peer and can thus
745	   be chosen as seen fit.  This section provides recommendations on how
746	   to generate and validate those cookies.

748	   When an IKE endpoint X sends an unauthenticated CHECK_SPI, the cookie
749	   payload following the notify is computed as follow:

751	                  Cookie = VersionIDofSecret
752	                           | H( SECRET | CHECK_SPI(..., Query)
753	                           | ip.src | ip.dst
754	                           | udp.src | udp.dst)

756	   where
757	   o  SECRET is a randomly generated secret known only to the
758	      implementation and periodically changed.
759	   o  VersionIDofSecret should be changed whenever SECRET is
760	      regenerated.
761	   o  CHECK_SPI(..., Query) is the content of the CHECK_SPI notify
762	      payload where the operation subtype has been set to Query (cf.
763	      Section 4.1)
764	   o  ip.src is the source ip address of the IKE packet.
765	   o  ip.dst is the destination ip address of the IKE packet.
766	   o  udp.src is the source udp post of the IKE packet.
767	   o  udp.dst is the destination udp port of the IKE packet.

769	   Upon reception of a CHECK_SPI notify (ACK or NACK) followed by a
770	   N(Cookie), a peer can verify whether this is the reply to a Query it
771	   placed by recomputing the cookie and comparing it to the COOKIE in
772	   the IKE message.

774	   In order to minimize the range of cryptographic attacks on SECRET,
775	   its value SHOULD have a limited life time.

777	11.  Security Considerations
778	11.1.  Security Considerations for the Stateful Method

780	   Tokens MUST be hard to guess.  This is critical, because if an
781	   attacker can guess the token associated with the IKE SA, she can tear
782	   down the IKE SA and associated tunnels at will.  When the token is
783	   delivered in the IKE_AUTH exchange, it is encrypted.  When it is sent
784	   again in an unprotected notification, it is not, but that is the last
785	   time this token is ever used.

787	   An aggregation of some tokens generated by one peer together with the
788	   related IKE SPIs MUST NOT give an attacker the ability to guess other
789	   tokens.  Specifically, if one peer does not properly secure the QCD
790	   tokens and an attacker gains access to them, this attacker MUST NOT
791	   be able to guess other tokens generated by the same peer.  This is
792	   the reason that the QCD_SECRET in Section 6.1 needs to be
793	   sufficiently long.

795	   The QCD_SECRET MUST be protected from access by other parties.
796	   Anyone gaining access to this value will be able to delete all the
797	   IKE SAs for this token maker.

799	   The QCD token is sent by the rebooted peer in an unprotected message.
800	   A message like that is subject to modification, deletion and replay
801	   by an attacker.  However, these attacks will not compromise the
802	   security of either side.  Modification is meaningless because a
803	   modified token is simply an invalid token.  Deletion will only cause
804	   the protocol not to work, resulting in a delay in tunnel re-
805	   establishment as described in Section 2.  Replay is also meaningless,
806	   because the IKE SA has been deleted after the first transmission.

808	11.2.  Security Considerations for the Stateless Method

810	   IKE recovery self-protection is discussed all along the document and
811	   contains many mechanism to thwart denial of service attacks.

813	   IKE recovery is subject to a man-in-the-middle attack that can let
814	   the attacker trigger a renegotiation.  It has to be noticed that an
815	   attacker able to block ESP and/or IKE packets can cause IKE itself to
816	   also tear down and trigger a rekey of IKE SA's.  With throttling and
817	   dampening enabled, IKE recovery is able to reduce the amount of
818	   rekeys/negotiations to as low a rate as IKEv2.

820	   Overall, IKE Recovery is not more vulnerable than IKEv2 and even
821	   improves on the security of IKEv2 by resynchronizing SA's more
822	   rapidly which is important with dynamic polices.

824	12.  IANA Considerations

826	   IANA is requested to assign a notify message type from the error
827	   types range (43-8191) of the "IKEv2 Notify Message Types" registry
828	   with name "QUICK_CRASH_DETECTION".

830	   IANA is requested to assign a notify message type from the status
831	   types range (16406-40959) of the "IKEv2 Notify Message Types"
832	   registry with name "CHECK_SPI".

834	13.  Acknowledgements

836	   We would like to thank Hannes Tschofenig and Yaron Sheffer for their
837	   comments about IFARE.

839	14.  Change Log

841	   This section lists all changes in this document

843	   NOTE TO RFC EDITOR : Please remove this section in the final RFC

845	14.1.  Changes from draft-nir-ike-qcd-00

847	   o  Merged proposal with draft-detienne-ikev2-recovery [recovery]
848	   o  Changed the protocol so that the rebooted peer generates the
849	      token.  This has the effect, that the need for persistent storage
850	      is eliminated.
851	   o  Added discussion of birth certificates.

853	14.2.  Changes from draft-nir-qcr-00

855	   o  Changed name to reflect that this relates to IKE.  Also changed
856	      from quick crash recovery to quick crash detection to avoid
857	      confusion with IFARE.
858	   o  Added more operational considerations.
859	   o  Added interaction with IFARE.
860	   o  Added discussion of backup gateways.

862	15.  References

864	15.1.  Normative References

866	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
867	              Requirement Levels", BCP 14, RFC 2119, March 1997.

869	   [RFC4306]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
870	              RFC 4306, December 2005.

872	   [RFC4718]  Eronen, P. and P. Hoffman, "IKEv2 Clarifications and
873	              Implementation Guidelines", RFC 4718, October 2006.

875	15.2.  Informative References

877	   [recovery]
878	              Detienne, F. and P. Sethi, "Safe IKE Recovery",
879	              draft-detienne-ikev2-recovery-00 (work in progress),
880	              June 2008.

882	   [resumption]
883	              Sheffer, Y., Tschofenig, H., Dondeti, L., and V.
884	              Narayanan, "IPsec Gateway Failover Protocol",
885	              draft-sheffer-ipsec-failover-03 (work in progress),
886	              March 2008.

888	Authors' Addresses

890	   Yoav Nir
891	   Check Point Software Technologies Ltd.
892	   5 Hasolelim st.
893	   Tel Aviv  67897
894	   Israel

896	   Email: ynir@checkpoint.com

898	   Frederic Detienne
899	   Cisco Systems, Inc.
900	   De Kleetlaan, 7
901	   Diegem  B-1831
902	   Belgium

904	   Phone: +32 2 704 5681
905	   Email: fd@cisco.com
906	   Pratima Sethi
907	   Cisco Systems, Inc.
908	   O'Shaugnessy Road, 11
909	   Bangalore, Karnataka  560027
910	   India

912	   Phone: +91 80 4154 1654
913	   Email: psethi@cisco.com

915	Full Copyright Statement

917	   Copyright (C) The IETF Trust (2008).

919	   This document is subject to the rights, licenses and restrictions
920	   contained in BCP 78, and except as set forth therein, the authors
921	   retain all their rights.

923	   This document and the information contained herein are provided on an
924	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
925	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
926	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
927	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
928	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
929	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

931	Intellectual Property

933	   The IETF takes no position regarding the validity or scope of any
934	   Intellectual Property Rights or other rights that might be claimed to
935	   pertain to the implementation or use of the technology described in
936	   this document or the extent to which any license under such rights
937	   might or might not be available; nor does it represent that it has
938	   made any independent effort to identify any such rights.  Information
939	   on the procedures with respect to rights in RFC documents can be
940	   found in BCP 78 and BCP 79.

942	   Copies of IPR disclosures made to the IETF Secretariat and any
943	   assurances of licenses to be made available, or the result of an
944	   attempt made to obtain a general license or permission for the use of
945	   such proprietary rights by implementers or users of this
946	   specification can be obtained from the IETF on-line IPR repository at
947	   http://www.ietf.org/ipr.

949	   The IETF invites any interested party to bring to its attention any
950	   copyrights, patents or patent applications, or other proprietary
951	   rights that may cover technology that may be required to implement
952	   this standard.  Please address the information to the IETF at
953	   ietf-ipr@ietf.org.