idnits 2.17.1 

draft-ietf-ipsecme-failure-detection-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (September 3, 2010) is 4984 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'IDr' is mentioned on line 256, but not defined

  == Missing Reference: 'KEi' is mentioned on line 315, but not defined

  == Missing Reference: 'KEr' is mentioned on line 317, but not defined

  == Missing Reference: 'CERTREQ' is mentioned on line 604, but not defined


     Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	IPsecME Working Group                                        Y. Nir, Ed.
3	Internet-Draft                                               Check Point
4	Intended status: Standards Track                           D. Wierbowski
5	Expires: March 7, 2011                                               IBM
6	                                                       September 3, 2010

8	                 A Quick Crash Detection Method for IKE
9	                draft-ietf-ipsecme-failure-detection-00

11	Abstract

13	   This document describes an extension to the IKEv2 protocol that
14	   allows for faster detection of SA desynchronization using a saved
15	   token.

17	   When an IPsec tunnel between two IKEv2 peers is disconnected due to a
18	   restart of one peer, it can take as much as several minutes for the
19	   other peer to discover that the reboot has occurred, thus delaying
20	   recovery.  In this text we propose an extension to the protocol, that
21	   allows for recovery immediately following the restart.

23	Status of this Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on March 7, 2011.

40	Copyright Notice

42	   Copyright (c) 2010 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
58	     1.1.  Conventions Used in This Document  . . . . . . . . . . . .  4
59	   2.  RFC 4306 Crash Recovery  . . . . . . . . . . . . . . . . . . .  5
60	   3.  Protocol Outline . . . . . . . . . . . . . . . . . . . . . . .  5
61	   4.  Formats and Exchanges  . . . . . . . . . . . . . . . . . . . .  6
62	     4.1.  Notification Format  . . . . . . . . . . . . . . . . . . .  6
63	     4.2.  Passing a Token in the AUTH Exchange . . . . . . . . . . .  7
64	     4.3.  Replacing Tokens After Rekey or Resumption . . . . . . . .  8
65	     4.4.  Replacing the Token for an Existing SA . . . . . . . . . .  9
66	     4.5.  Presenting the Token in an INFORMATIONAL Exchange  . . . .  9
67	   5.  Token Generation and Verification  . . . . . . . . . . . . . . 10
68	     5.1.  A Stateless Method of Token Generation . . . . . . . . . . 10
69	     5.2.  A Stateless Method with IP addresses . . . . . . . . . . . 11
70	     5.3.  Token Lifetime . . . . . . . . . . . . . . . . . . . . . . 11
71	   6.  Backup Gateways  . . . . . . . . . . . . . . . . . . . . . . . 11
72	   7.  Alternative Solutions  . . . . . . . . . . . . . . . . . . . . 12
73	     7.1.  Initiating a new IKE SA  . . . . . . . . . . . . . . . . . 12
74	     7.2.  SIR  . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
75	     7.3.  Birth Certificates . . . . . . . . . . . . . . . . . . . . 12
76	     7.4.  Reducing Liveness Check Length . . . . . . . . . . . . . . 13
77	   8.  Interaction with Session Resumption  . . . . . . . . . . . . . 13
78	   9.  Operational Considerations . . . . . . . . . . . . . . . . . . 15
79	     9.1.  Who should implement this specification  . . . . . . . . . 15
80	     9.2.  Response to unknown child SPI  . . . . . . . . . . . . . . 16
81	     9.3.  Using Tokens that Depend on IP Addresses . . . . . . . . . 16
82	   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 17
83	     10.1. QCD Token Generation and Handling  . . . . . . . . . . . . 17
84	     10.2. QCD Token Transmission . . . . . . . . . . . . . . . . . . 18
85	     10.3. QCD Token Enumeration  . . . . . . . . . . . . . . . . . . 18
86	   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
87	   12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
88	   13. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 19
89	     13.1. Changes from draft-nir-ike-qcd-07  . . . . . . . . . . . . 19
90	     13.2. Changes from draft-nir-ike-qcd-03 and -04  . . . . . . . . 19
91	     13.3. Changes from draft-nir-ike-qcd-02  . . . . . . . . . . . . 19
92	     13.4. Changes from draft-nir-ike-qcd-01  . . . . . . . . . . . . 20
93	     13.5. Changes from draft-nir-ike-qcd-00  . . . . . . . . . . . . 20
94	     13.6. Changes from draft-nir-qcr-00  . . . . . . . . . . . . . . 20
95	   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
96	     14.1. Normative References . . . . . . . . . . . . . . . . . . . 20
97	     14.2. Informative References . . . . . . . . . . . . . . . . . . 20
98	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21

100	1.  Introduction

102	   IKEv2, as described in [IKEv2bis] and its predecessor RFC 4306, has a
103	   method for recovering from a reboot of one peer.  As long as traffic
104	   flows in both directions, the rebooted peer should re-establish the
105	   tunnels immediately.  However, in many cases the rebooted peer is a
106	   VPN gateway that protects only servers, or else the non-rebooted peer
107	   has a dynamic IP address.  In such cases, the rebooted peer will not
108	   be able to re-establish the tunnels.  Section 2 describes how
109	   recovery works under RFC 4306, and explains why it may take several
110	   minutes.

112	   The method proposed here, is to send an octet string, called a "QCD
113	   token" in the IKE_AUTH exchange that establishes the tunnel.  That
114	   token can be stored on the peer as part of the IKE SA.  After a
115	   reboot, the rebooted implementation can re-generate the token, and
116	   send it to the peer, so as to delete the IKE SA.  Deleting the IKE SA
117	   results is a quick establishment of new IPsec tunnels.  This is
118	   described in Section 3.

120	1.1.  Conventions Used in This Document

122	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
123	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
124	   document are to be interpreted as described in [RFC2119].

126	   The term "token" refers to an octet string that an implementation can
127	   generate using only the properties of a protected IKE message (such
128	   as IKE SPIs) as input.  A conforming implementation MUST be able to
129	   generate the same token from the same input even after rebooting.

131	   The term "token maker" refers to an implementation that generates a
132	   token and sends it to the peer as specified in this document.

134	   The term "token taker" refers to an implementation that stores such a
135	   token or a digest thereof, in order to verify that a new token it
136	   receives is identical to the old token it has stored.

138	   The term "non-volatile storage" in this document refers to a data
139	   storage module, that persists across restarts of the token maker.
140	   Examples of such a storage module include an internal disk, an
141	   internal flash memory module, an external disk and an external
142	   database.  A small non-volatile storage module is required for a
143	   token maker, but a larger one can be used to enhance performance, as
144	   described in Section 9.2.

146	2.  RFC 4306 Crash Recovery

148	   When one peer loses state or reboots, the other peer does not get any
149	   notification, so unidirectional IPsec traffic can still flow.  The
150	   rebooted peer will not be able to decrypt it, however, and the only
151	   remedy is to send an unprotected INVALID_SPI notification as
152	   described in section 3.10.1 of [IKEv2bis].  That section also
153	   describes the processing of such a notification:

155	         "If this Informational Message is sent outside the
156	     context of an IKE_SA, it should be used by the recipient
157	     only as a "hint" that something might be wrong (because it
158	     could easily be forged)."

160	   Since the INVALID_SPI can only be used as a hint, the non-rebooted
161	   peer has to determine whether the IPsec SA, and indeed the parent IKE
162	   SA are still valid.  The method of doing this is described in section
163	   2.4 of [IKEv2bis].  This method, called "liveness check" involves
164	   sending a protected empty INFORMATIONAL message, and awaiting a
165	   response.  This procedure is sometimes referred to as "Dead Peer
166	   Detection" or DPD.

168	   Section 2.4 does not mandate how many times the liveness check
169	   message should be retransmitted, or for how long, but does recommend
170	   the following:

172	                                                               "It is
173	    suggested that messages be retransmitted at least a dozen times over
174	    a period of at least several minutes before giving up on an SA..."

176	   Those "at least several minutes" are a time during which both peers
177	   are active, but IPsec cannot be used.

179	3.  Protocol Outline

181	   Supporting implementations will send a notification, called a "QCD
182	   token", as described in Section 4.1 in the last IKE_AUTH exchange
183	   messages.  These are the final IKE_AUTH request and final IKE_AUTH
184	   response that contain the AUTH payloads.  The generation of these
185	   tokens is a local matter for implementations, but considerations are
186	   described in Section 5.  Implementations that send such a token will
187	   be called "token makers".

189	   A supporting implementation receiving such a token MUST store it (or
190	   a digest thereof) along with the IKE SA.  Implementations that
191	   support this part of the protocol will be called "token takers".
192	   Section 9.1 has considerations for which implementations need to be
193	   token takers, and which should be token makers.  Implementation that
194	   are not token takers will silently ignore QCD tokens.

196	   When a token maker receives a protected IKE request message with
197	   unknown IKE SPIs, it SHOULD generate a new token that is identical to
198	   the previous token, and send it to the requesting peer in an
199	   unprotected IKE message as described in Section 4.5.

201	   When a token taker receives the QCD token in an unprotected
202	   notification, it MUST verify that the TOKEN_SECRET_DATA matches the
203	   token stored with the matching IKE SA.  If the verification fails, or
204	   if the IKE SPIs in the message do not match any existing IKE SA, it
205	   SHOULD log the event.  If it succeeds, it MUST silently delete the
206	   IKE SA associated with the IKE_SPI fields, and all dependant child
207	   SAs.  This event MAY also be logged.  The token taker MUST accept
208	   such tokens from any IP address and port combination, so as to allow
209	   different kinds of high-availability configurations of the token
210	   maker.

212	   A supporting token taker MAY immediately create new SAs using an
213	   Initial exchange, or it may wait for subsequent traffic to trigger
214	   the creation of new SAs.

216	   See Section 8 for a short discussion about this extensions's
217	   interaction with IKEv2 Session Resumption ([RFC5723]).

219	4.  Formats and Exchanges

221	4.1.  Notification Format

223	   The notification payload called "QCD token" is formatted as follows:

225	                            1                   2                   3
226	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
227	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
228	       ! Next Payload  !C!  RESERVED   !         Payload Length        !
229	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
230	       !  Protocol ID  !   SPI Size    ! QCD Token Notify Message Type !
231	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
232	       !                                                               !
233	       ~                       TOKEN_SECRET_DATA                       ~
234	       !                                                               !
235	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

237	   o  Protocol ID (1 octet) MUST be 1, as this message is related to an
238	      IKE SA.

240	   o  SPI Size (1 octet) MUST be zero, in conformance with section 3.10
241	      of [IKEv2bis].
242	   o  QCD Token Notify Message Type (2 octets) - MUST be xxxxx, the
243	      value assigned for QCD token notifications.  TBA by IANA.
244	   o  TOKEN_SECRET_DATA (16-128 octets) contains a generated token as
245	      described in Section 5.

247	4.2.  Passing a Token in the AUTH Exchange

249	   For brevity, only the EAP version of an AUTH exchange will be
250	   presented here.  The non-EAP version is very similar.  The figures
251	   below are based on appendix C.3 of [IKEv2bis].

253	    first request       --> IDi,
254	                            [N(INITIAL_CONTACT)],
255	                            [[N(HTTP_CERT_LOOKUP_SUPPORTED)], CERTREQ+],
256	                            [IDr],
257	                            [CP(CFG_REQUEST)],
258	                            [N(IPCOMP_SUPPORTED)+],
259	                            [N(USE_TRANSPORT_MODE)],
260	                            [N(ESP_TFC_PADDING_NOT_SUPPORTED)],
261	                            [N(NON_FIRST_FRAGMENTS_ALSO)],
262	                            SA, TSi, TSr,
263	                            [V+]

265	    first response      <-- IDr, [CERT+], AUTH,
266	                            EAP,
267	                            [V+]

269	                      / --> EAP
270	    repeat 1..N times |
271	                      \ <-- EAP

273	    last request        --> AUTH
274	                            [N(QCD_TOKEN)]

276	    last response       <-- AUTH,
277	                            [N(QCD_TOKEN)]
278	                            [CP(CFG_REPLY)],
279	                            [N(IPCOMP_SUPPORTED)],
280	                            [N(USE_TRANSPORT_MODE)],
281	                            [N(ESP_TFC_PADDING_NOT_SUPPORTED)],
282	                            [N(NON_FIRST_FRAGMENTS_ALSO)],
283	                            SA, TSi, TSr,
284	                            [N(ADDITIONAL_TS_POSSIBLE)],
285	                            [V+]

287	   Note that the QCD_TOKEN notification is marked as optional because it
288	   is not required by this specification that every implementation be
289	   both token maker and token taker.  If only one peer sends the QCD
290	   token, then a reboot of the other peer will not be recoverable by
291	   this method.  This may be acceptable if traffic typically originates
292	   from the other peer.

294	   In any case, the lack of a QCD_TOKEN notification MUST NOT be taken
295	   as an indication that the peer does not support this standard.
296	   Conversely, if a peer does not understand this notification, it will
297	   simply ignore it.  Therefore a peer MAY send this notification
298	   freely, even if it does not know whether the other side supports it.

300	   The QCD_TOKEN notification is related to the IKE SA and MUST follow
301	   the AUTH payload and precede the Configuration payload and all
302	   payloads related to the child SA.

304	4.3.  Replacing Tokens After Rekey or Resumption

306	   After rekeying an IKE SA, the IKE SPIs are replaced, so the new SA
307	   also needs to have a token.  If only the responder in the rekey
308	   exchange is the token maker, this can be done within the
309	   CREATE_CHILD_SA exchange.  If the initiator is a token maker, then we
310	   need an extra informational exchange.

312	   The following figure shows the CREATE_CHILD_SA exchange for rekeying
313	   the IKE SA.  Only the responder sends a QCD token.

315	      request             --> SA, Ni, [KEi]

317	      response            <-- SA, Nr, [KEr], N(QCD_TOKEN)

319	   If the initiator is also a token maker, it SHOULD soon initiate an
320	   INFORMATIONAL exchange as follows:

322	      request             --> N(QCD_TOKEN)

324	      response            <--

326	   For session resumption, as specified in [RFC5723], the situation is
327	   similar.  The responder, which is necessarily the peer that has
328	   crashed, SHOULD send a new ticket within the protected payload of the
329	   IKE_SESSION_RESUME exchange.  If the Initiator is also a token maker,
330	   it needs to send a QCD_TOKEN in a separate INFORMATIONAL exchange.

332	   The INFORMATIONAL exchange described in this section can also be used
333	   if QCD tokens need to be replaced due to a key rollover.  However,
334	   since token takers are required to verify at least 4 QCD tokens, this
335	   is only necessary if secret QCD keys are rolled over more than four
336	   times as often as IKE SAs are rekeyed.

338	4.4.  Replacing the Token for an Existing SA

340	   With some token generation methods, such as that described in
341	   Section 5.2, a QCD token may sometimes become invalid, although the
342	   IKE SA is still perfectly valid.

344	   In such a case, the token maker MUST send the new token in a
345	   protected message under that IKE SA.  That exchange could be a simple
346	   INFORMATIONAL, such as in the last figure in the previous section, or
347	   else it can be part of a MOBIKE INFORMATIONAL exchange such as in the
348	   following figure taken from section 2.2 of [RFC4555] and modified by
349	   adding a QCD_TOKEN notification:

351	     (IP_I2:4500 -> IP_R1:4500)
352	     HDR, SK { N(UPDATE_SA_ADDRESSES),
353	               N(NAT_DETECTION_SOURCE_IP),
354	               N(NAT_DETECTION_DESTINATION_IP) }  -->

356	                           <-- (IP_R1:4500 -> IP_I2:4500)
357	                               HDR, SK { N(NAT_DETECTION_SOURCE_IP),
358	                                    N(NAT_DETECTION_DESTINATION_IP) }

360	                           <-- (IP_R1:4500 -> IP_I2:4500)
361	                               HDR, SK { N(COOKIE2), [N(QCD_TOKEN)] }

363	     (IP_I2:4500 -> IP_R1:4500)
364	     HDR, SK { N(COOKIE2), [N(QCD_TOKEN)] }  -->

366	   A token taker MUST accept such gratuitous QCD_TOKEN notifications as
367	   long as they are carried in protected exchanges.  A token maker
368	   SHOULD NOT generate them unless it is no longer able to generate the
369	   old QCD_TOKEN.

371	4.5.  Presenting the Token in an INFORMATIONAL Exchange

373	   This QCD_TOKEN notification is unprotected, and is sent as a response
374	   to a protected IKE request, which uses an IKE SA that is unknown.

376	            request             --> N(INVALID_IKE_SPI), N(QCD_TOKEN)+

378	   If child SPIs are persistently mapped to IKE SPIs as described in
379	   Section 9.2, a token taker may get the following unprotected message
380	   in response to an ESP or AH packet.

382	            request             --> N(INVALID_SPI), N(QCD_TOKEN)+

384	   The QCD_TOKEN and INVALID_IKE_SPI notifications are sent together to
385	   support both implementations that conform to this specification and
386	   implementations that don't.  Similar to the description in section
387	   2.21 of [IKEv2bis], The IKE SPI and message ID fields in the packet
388	   headers are taken from the protected IKE request.

390	   To support a periodic rollover of the secret used for token
391	   generation, the token taker MUST support at least four QCD_TOKEN
392	   notifications in a single packet.  The token is considered verified
393	   if any of the QCD_TOKEN notifications matches.  The token maker MAY
394	   generate up to four QCD_TOKEN notifications, based on several
395	   generations of keys.

397	   If the QCD_TOKEN verifies OK, an empty response MUST be sent.  If the
398	   QCD_TOKEN cannot be validated, a response MUST NOT be sent.
399	   Section 5 defines token verification.

401	5.  Token Generation and Verification

403	   No token generation method is mandated by this document.  Two method
404	   are documented in the following sub-sections, but they only serve as
405	   examples.

407	   The following lists the requirements from a token generation
408	   mechanism:
409	   o  Tokens MUST be at least 16 octets long, and no more than 128
410	      octets long, to facilitate storage and transmission.  Tokens
411	      SHOULD be indistinguishable from random data.
412	   o  It should not be possible for an external attacker to guess the
413	      QCD token generated by an implementation.  Cryptographic
414	      mechanisms such as PRNG and hash functions are RECOMMENDED.
415	   o  The token maker, MUST be able to re-generate or retrieve the token
416	      based on the IKE SPIs even after it reboots.
417	   o  The method of token generation MUST be such, that a collision of
418	      QCD tokens between different pairs of IKE SPI will be highly
419	      unlikely.

421	5.1.  A Stateless Method of Token Generation

423	   This describes a stateless method of generating a token:
424	   o  At installation or immediately after the first boot of the token
425	      maker, 32 random octets are generated using a secure random number
426	      generator or a PRNG.
427	   o  Those 32 bytes, called the "QCD_SECRET", are stored in non-
428	      volatile storage on the machine, and kept indefinitely.

430	   o  If key rollover is required by policy, the implementation MAY
431	      periodically generate a new QCD_SECRET and keep up to 3 previous
432	      generations.  When sending an unprotected QCD_TOKEN, as many as 4
433	      notification payloads may be sent, each from a different
434	      QCD_SECRET.
435	   o  The TOKEN_SECRET_DATA is calculated as follows:

437	            TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R)

439	5.2.  A Stateless Method with IP addresses

441	   This method is similar to the one in the previous section, except
442	   that the IP address of the token taker is also added to the block
443	   being hashed.  This has the disadvantage that the token needs to be
444	   replaced (as described in Section 4.4) whenever the token taker
445	   changes its address.

447	   The reason to use this method is described in Section 9.3.  When
448	   using this method, the TOKEN_SECRET_DATA field is calculated as
449	   follows:

451	         TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R | IPaddr-T)

453	   The IPaddr-T field specifies the IP address of the token taker.
454	   Secret rollover considerations are similar to those in the previous
455	   section.

457	5.3.  Token Lifetime

459	   The token is associated with a single IKE SA, and SHOULD be deleted
460	   by the token taker when the SA is deleted or expires.  More formally,
461	   the token is associated with the pair (SPI-I, SPI-R).

463	6.  Backup Gateways

465	   Making crash detection and recovery quick is a worthy goal, but since
466	   rebooting a gateway takes a non-zero amount of time, many
467	   implementations choose to have a stand-by gateway ready to take over
468	   as soon as the primary gateway fails for any reason. [cluster]
469	   describes consideration for such clusters of gateways with
470	   synchronized state, but the rest of this section is relevant even
471	   when there is no synchnorized state.

473	   If such a configuration is available, it is RECOMMENDED that the
474	   stand-by gateway be able to generate the same token as the active
475	   gateway. if the method described in Section 5.1 is used, this means
476	   that the QCD_SECRET field is identical in both gateways.  This has
477	   the effect of having the crash recovery available immediately.

479	   Note that this refers to "high availability" configurations, where
480	   only one gateway is active at any given moment.  This is different
481	   from "load sharing" configurations where more than one gateway is
482	   active at the same time.  For load sharing configurations, please see
483	   Section 10.2 for security considerations.

485	7.  Alternative Solutions

487	7.1.  Initiating a new IKE SA

489	   Instead of sending a QCD token, we could have the rebooted
490	   implementation start an Initial exchange with the peer, including the
491	   INITIAL_CONTACT notification.  This would have the same effect,
492	   instructing the peer to erase the old IKE SA, as well as establishing
493	   a new IKE SA with fewer rounds.

495	   The disadvantage here, is that in IKEv2 an authentication exchange
496	   MUST have a piggy-backed Child SA set up.  Since our use case is such
497	   that the rebooted implementation does not have traffic flowing to the
498	   peer, there are no good selectors for such a Child SA.

500	   Additionally, when authentication is asymmetric, such as when EAP is
501	   used, it is not possible for the rebooted implementation to initiate
502	   IKE.

504	7.2.  SIR

506	   Another proposal that was considered for this work item is the SIR
507	   extension, which is described in [recovery].  Under that proposal,
508	   the non-rebooted peer sends a non-protected query to the possibly
509	   rebooted peer, asking whether the IKE SA exists.  The peer replies
510	   with either a positive or negative response, and the absence of a
511	   positive response, along with the existence of a negative response is
512	   taken as proof that the IKE SA has really been lost.

514	   The working group preferred the QCD proposal to this one.

516	7.3.  Birth Certificates

518	   Birth Certificates is a method of crash detection that has never been
519	   formally defined.  Bill Sommerfeld suggested this idea in a mail to
520	   the IPsec mailing list on August 7, 2000, in a thread discussing
521	   methods of crash detection:

523	       If we have the system sign a "birth certificate" when it
524	       reboots (including a reboot time or boot sequence number),
525	       we could include that with a "bad spi" ICMP error and in
526	       the negotiation of the IKE SA.

528	   We believe that this method would have some problems.  First, it
529	   requires Alice to store the certificate, so as to be able to compare
530	   the public keys.  That requires more storage than does a QCD token.
531	   Additionally, the public-key operations needed to verify the self-
532	   signed certificates are more expensive for Alice.

534	   We believe that a symmetric-key operation such as proposed here is
535	   more light-weight and simple than that implied by the Birth
536	   Certificate idea.

538	7.4.  Reducing Liveness Check Length

540	   Some have suggested that the RFC 4306 procedure described in
541	   Section 2 can be tweaked by requiring fewer retransmissions over a
542	   shorter period of time for cases of liveness check started because of
543	   an INVALID_SPI or INVALID_IKE_SPI notification.

545	   We believe that the default retransmission policy should represent a
546	   good balance between the need for a timely discovery of a dead peer,
547	   and a low probability of false detection.  We expect the policy to be
548	   set to take the shortest time such that this probability achieves a
549	   certain target.  Therefore, reducing elapsed time and retransmission
550	   count will create an unacceptably high probability of false
551	   detection, and this can be triggered by a single INVALID_IKE_SPI
552	   notification.

554	   Additionally, even if the retransmission policy is reduced to, say,
555	   one minute, it is still a very noticeable delay from a human
556	   perspective, from the time that the gateway has come up until the
557	   tunnels are active, or from the time the backup gateway has taken
558	   over until the tunnels are active.

560	8.  Interaction with Session Resumption

562	   Session Resumption, specified in [RFC5723] proposes to make setting
563	   up a new IKE SA consume less computing resources.  This is
564	   particularly useful in the case of a remote access gateway that has
565	   many tunnels.  A failure of such a gateway would require all these
566	   many remote access clients to establish an IKE SA either with the
567	   rebooted gateway or with a backup gateway.  This tunnel re-
568	   establishment should occur within a short period of time, creating a
569	   burden on the remote access gateway.  Session Resumption addresses
570	   this problem by having the clients store an encrypted derivative of
571	   the IKE SA for quick re-establishment.

573	   What Session Resumption does not help, is the problem of detecting
574	   that the peer gateway has failed.  A failed gateway may go undetected
575	   for as long as the lifetime of a child SA, because IPsec does not
576	   have packet acknowledgement, and applications cannot signal the IPsec
577	   layer that the tunnel "does not work".  Before establishing a new IKE
578	   SA using Session Resumption, a client should ascertain that the
579	   gateway has indeed failed.  This could be done using either a
580	   liveness check (as in RFC 4306) or using the QCD tokens described in
581	   this document.

583	   A remote access client conforming to both specifications will store
584	   QCD tokens, as well as the Session Resumption ticket, if provided by
585	   the gateway.  A remote access gateway conforming to both
586	   specifications will generate a QCD token for the client.  When the
587	   gateway reboots, the client will discover this in either of two ways:
588	   1.  The client does regular liveness checks, or else the time for
589	       some other IKE exchange has come.  Since the gateway is still
590	       down, the IKE exchange times out after several minutes.  In this
591	       case QCD does not help.
592	   2.  Either the primary gateway or a backup gateway (see Section 6) is
593	       ready and sends a QCD token to the client.  In that case the
594	       client will quickly re-establish the IPsec tunnel, either with
595	       the rebooted primary gateway or the backup gateway as described
596	       in this document.

598	   The full combined protocol looks like this:

600	        Initiator                Responder
601	        -----------              -----------
602	       HDR, SAi1, KEi, Ni  -->

604	                           <--    HDR, SAr1, KEr, Nr, [CERTREQ]

606	       HDR, SK {IDi, [CERT,]
607	       [CERTREQ,] [IDr,]
608	       AUTH, N(QCD_TOKEN)
609	       SAi2, TSi, TSr,
610	       N(TICKET_REQUEST)}  -->
611	                           <--    HDR, SK {IDr, [CERT,] AUTH,
612	                                  N(QCD_TOKEN), SAr2, TSi, TSr,
613	                                  N(TICKET_LT_OPAQUE) }

615	                ---- Reboot -----

617	       HDR, {}             -->
618	                           <--  HDR, N(QCD_TOKEN)

620	       HDR, [N(COOKIE),]
621	       Ni, N(TICKET_OPAQUE)
622	       [,N+]               -->
623	                           <--  HDR, Nr [,N+]

625	9.  Operational Considerations

627	9.1.  Who should implement this specification

629	   Throughout this document, we have referred to reboot time
630	   alternatingly as the time that the implementation crashes and the
631	   time when it is ready to process IPsec packets and IKE exchanges.
632	   Depending on the hardware and software platforms and the cause of the
633	   reboot, rebooting may take anywhere from a few seconds to several
634	   minutes.  If the implementation is down for a long time, the benefit
635	   of this protocol extension is reduced.  For this reason critical
636	   systems should implement backup gateways as described in Section 6.

638	   Implementing the "token maker" side of QCD makes sense for IKE
639	   implementation where protected connections originate from the peer,
640	   such as inter-domain VPNs and remote access gateways.  Implementing
641	   the "token taker" side of QCD makes sense for IKE implementations
642	   where protected connections originate, such as inter-domain VPNs and
643	   remote access clients.

645	   To clarify the requirements:

647	   o  A remote-access client MUST be a token taker and MAY be a token
648	      maker.
649	   o  A remote-access gateway MAY be a token taker and MUST be a token
650	      maker.
651	   o  An inter-domain VPN gateway MUST be both token maker and token
652	      taker.

654	   In order to limit the effects of DoS attacks, a token taker SHOULD
655	   limit the rate of QCD_TOKENs verified from a particular source.

657	   If excessive amounts of IKE requests protected with unknown IKE SPIs
658	   arrive at a token maker, the IKE module SHOULD revert to the behavior
659	   described in section 2.21 of [IKEv2bis] and either send an
660	   INVALID_IKE_SPI notification, or ignore it entirely.

662	9.2.  Response to unknown child SPI

664	   After a reboot, it is more likely that an implementation receives
665	   IPsec packets than IKE packets.  In that case, the rebooted
666	   implementation will send an INVALID_SPI notification, triggering a
667	   liveness check.  The token will only be sent in a response to the
668	   liveness check, thus requiring an extra round-trip.

670	   To avoid this, an implementation that has access to enough non-
671	   volatile storage MAY store a mapping of child SPIs to owning IKE
672	   SPIs, or to generated tokens.  If such a mapping is available and
673	   persistent across reboots, the rebooted implementation SHOULD respond
674	   to the IPsec packet with an INVALID_SPI notification, along with the
675	   appropriate QCD_Token notifications.  A token taker SHOULD verify the
676	   QCD token that arrives with an INVALID_SPI notification the same as
677	   if it arrived with the IKE SPIs of the parent IKE SA.

679	   However, a persistent storage module might not be updated in a timely
680	   manner, and could be populated with tokens relating to IKE SPIs that
681	   have already been rekeyed.  A token taker MUST NOT take an invalid
682	   QCD Token sent along with an INVALID_SPI notification as evidence
683	   that the peer is either malfunctioning or attacking, but it SHOULD
684	   limit the rate at which such notifications are processed.

686	9.3.  Using Tokens that Depend on IP Addresses

688	   This section describes the rationale for token generation methods
689	   such as the one described in Section 5.2.  Note that this section
690	   merely provides a possible rationale, and does not specify or
691	   recommend any kind of configuration.

693	   Some configurations of security gateway use a load-sharing cluster of
694	   hosts, all sharing the same IP addresses, where the SAs (IKE and
695	   child) are not synchronized between the cluster members.  In such a
696	   configuration, a single member does not know about all the IKE SAs
697	   that are active for the configuration.  A load balancer (usually a
698	   networking switch) sends IKE and IPsec packets to the several members
699	   based on source IP address.

701	   In such a configuration, an attacker can send a forged protected IKE
702	   packet with the IKE SPIs of an existing IKE SA, but from a different
703	   IP address.  This packet will likely be processed by a different
704	   cluster member from the one that owns the IKE SA.  Since no IKE SA
705	   state is stored on this member, it will send a QCD token to the
706	   attacker.  If the QCD token does not depend on IP address, this token
707	   can immediately be used to tell the token taker to tear down the IKE
708	   SA using an unprotected QCD_TOKEN notification.

710	   To thwart this possible attack, such configurations should use a
711	   method that considers the taker's IP address, such as the method
712	   described in Section 5.2.

714	10.  Security Considerations

716	10.1.  QCD Token Generation and Handling

718	   Tokens MUST be hard to guess.  This is critical, because if an
719	   attacker can guess the token associated with an IKE SA, she can tear
720	   down the IKE SA and associated tunnels at will.  When the token is
721	   delivered in the IKE_AUTH exchange, it is encrypted.  When it is sent
722	   again in an unprotected notification, it is not, but that is the last
723	   time this token is ever used.

725	   An aggregation of some tokens generated by one maker together with
726	   the related IKE SPIs MUST NOT give an attacker the ability to guess
727	   other tokens.  Specifically, if one taker does not properly secure
728	   the QCD tokens and an attacker gains access to them, this attacker
729	   MUST NOT be able to guess other tokens generated by the same maker.
730	   This is the reason that the QCD_SECRET in Section 5.1 needs to be
731	   sufficiently long.

733	   The token taker MUST store the token in a secure manner.  No attacker
734	   should be able to gain access to a stored token.

736	   The QCD_SECRET MUST be protected from access by other parties.
737	   Anyone gaining access to this value will be able to delete all the
738	   IKE SAs for this token maker.

740	   The QCD token is sent by the rebooted peer in an unprotected message.
741	   A message like that is subject to modification, deletion and replay
742	   by an attacker.  However, these attacks will not compromise the
743	   security of either side.  Modification is meaningless because a
744	   modified token is simply an invalid token.  Deletion will only cause
745	   the protocol not to work, resulting in a delay in tunnel re-
746	   establishment as described in Section 2.  Replay is also meaningless,
747	   because the IKE SA has been deleted after the first transmission.

749	10.2.  QCD Token Transmission

751	   A token maker MUST NOT send a QCD token in an unprotected message for
752	   an existing IKE SA.  This implies that a conforming QCD token maker
753	   MUST be able to tell whether a particular pair of IKE SPIs represent
754	   a valid IKE SA.

756	   This requirement is obvious and easy in the case of a single gateway.
757	   However, some implementations use a load balancer to divide the load
758	   between several physical gateways.  It MUST NOT be possible even in
759	   such a configuration to trick one gateway into sending a QCD token
760	   for an IKE SA which is valid on another gateway.

762	   This document does not specify how a load sharing sharing
763	   configuration of IPsec gateways would work, but in order to support
764	   this specification, all members MUST be able to tell whether a
765	   particular IKE SA is active anywhere in the cluster.  One way to do
766	   it is to synchronize a list of active IKE SPIs among all the cluster
767	   members.

769	10.3.  QCD Token Enumeration

771	   An attacker may try to attack QCD if the generation algorithm
772	   described in Section 5.1 is used.  The attacker will send several
773	   fake IKE requests to the gateway under attack, receiving and
774	   recording the QCD Tokens in the responses.  This will allow the
775	   attacker to create a dictionary of IKE SPIs to QCD Tokens, which can
776	   later be used to tear down any IKE SA.

778	   Three factors mitigate this threat:
779	   o  The space of all possible IKE SPI pairs is huge: 2^128, so making
780	      such a dictionary is impractical.  Even if we assume that one
781	      implementation always generates predictable IKE SPIs, the space is
782	      still at least 2^64 entries, so making the dictionary is extremely
783	      hard.
784	   o  Throttling the amount of QCD_TOKEN notifications sent out, as
785	      discussed in Section 9.1, especially when not soon after a crash
786	      will limit the attacker's ability to construct a dictionary.
787	   o  The methods in Section 5.1 and Section 5.2 allow for a periodic
788	      change of the QCD_SECRET.  Any such change invalidates the entire
789	      dictionary.

791	11.  IANA Considerations

793	   IANA is requested to assign a notify message type from the status
794	   types range (16406-40959) of the "IKEv2 Notify Message Types"
795	   registry with name "QUICK_CRASH_DETECTION".

797	12.  Acknowledgements

799	   We would like to thank Hannes Tschofenig and Yaron Sheffer for their
800	   comments about Session Resumption.

802	   Frederic D'etienne and Pratima Sethi contributed the ideas in
803	   Section 9.3 and Section 5.2.

805	   Others who have contrinuted valuable comments are, in alphabetical
806	   order, Lakshminath Dondeti, Scott C Moonen and Dave Wierbowski.

808	13.  Change Log

810	   This section lists all changes in this document

812	   NOTE TO RFC EDITOR : Please remove this section in the final RFC

814	13.1.  Changes from draft-nir-ike-qcd-07

816	   o  First WG version.
817	   o  Addressed Scott C Moonen's concern about collisions of QCD tokens.
818	   o  Updated references to point to IKEv2bis instead of RFC 4306 and
819	      4718.  Also converted draft reference for resumption to RFC 5723.
820	   o  Added Dave Wiebrowski as author, and removed Pratima and Frederic.

822	13.2.  Changes from draft-nir-ike-qcd-03 and -04

824	   Mostly editorial changes and cleaning up.

826	13.3.  Changes from draft-nir-ike-qcd-02

828	   o  Described QCD token enumeration, following a question by
829	      Lakshminath Dondeti.
830	   o  Added the ability to replace the QCD token for an existing IKE SA.
831	   o  Added tokens dependant on peer IP address and their interaction
832	      with MOBIKE.

834	13.4.  Changes from draft-nir-ike-qcd-01

836	   o  Removed stateless method.
837	   o  Added discussion of rekeying and resumption.
838	   o  Added discussion of non-synchronized load-balanced clusters of
839	      gateways in the security considerations.
840	   o  Other wording fixes.

842	13.5.  Changes from draft-nir-ike-qcd-00

844	   o  Merged proposal with draft-detienne-ikev2-recovery
845	   o  Changed the protocol so that the rebooted peer generates the
846	      token.  This has the effect, that the need for persistent storage
847	      is eliminated.
848	   o  Added discussion of birth certificates.

850	13.6.  Changes from draft-nir-qcr-00

852	   o  Changed name to reflect that this relates to IKE.  Also changed
853	      from quick crash recovery to quick crash detection to avoid
854	      confusion with IFARE.
855	   o  Added more operational considerations.
856	   o  Added interaction with IFARE.
857	   o  Added discussion of backup gateways.

859	14.  References

861	14.1.  Normative References

863	   [IKEv2bis]
864	              Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen,
865	              "Internet Key Exchange Protocol: IKEv2",
866	              draft-ietf-ipsecme-ikev2bis-11 (work in progress),
867	              May 2010.

869	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
870	              Requirement Levels", BCP 14, RFC 2119, March 1997.

872	   [RFC4555]  Eronen, P., "IKEv2 Mobility and Multihoming Protocol
873	              (MOBIKE)", RFC 4555, June 2006.

875	14.2.  Informative References

877	   [RFC5723]  Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption",
878	              RFC 5723, January 2010.

880	   [cluster]  Nir, Y., Ed., "IPsec Cluster Problem Statement",
881	              draft-ietf-ipsecme-ipsec-ha (work in progress), July 2010.

883	   [recovery]
884	              Detienne, F., Sethi, P., and Y. Nir, "Safe IKE Recovery",
885	              draft-detienne-ikev2-recovery (work in progress),
886	              January 2010.

888	Authors' Addresses

890	   Yoav Nir (editor)
891	   Check Point Software Technologies Ltd.
892	   5 Hasolelim st.
893	   Tel Aviv  67897
894	   Israel

896	   Email: ynir@checkpoint.com

898	   David Wierbowski
899	   International Business Machines
900	   1701 North Street
901	   Endicott, New York  13760
902	   United States

904	   Email: wierbows@us.ibm.com