idnits 2.17.1 draft-nir-ike-qcd-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 832. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 843. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 850. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 856. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 12, 2008) is 5675 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'IDr' is mentioned on line 231, but not defined == Missing Reference: 'KEi' is mentioned on line 292, but not defined == Missing Reference: 'KEr' is mentioned on line 294, but not defined == Missing Reference: 'CERTREQ' is mentioned on line 528, but not defined == Missing Reference: 'TSi' is mentioned on line 548, but not defined == Missing Reference: 'TSr' is mentioned on line 548, but not defined ** Obsolete normative reference: RFC 4306 (Obsoleted by RFC 5996) ** Obsolete normative reference: RFC 4718 (Obsoleted by RFC 5996) Summary: 3 errors (**), 0 flaws (~~), 7 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nir 3 Internet-Draft Check Point 4 Intended status: Standards Track F. Detienne 5 Expires: April 15, 2009 P. Sethi 6 Cisco 7 October 12, 2008 9 A Quick Crash Detection Method for IKE 10 draft-nir-ike-qcd-03 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on April 15, 2009. 37 Abstract 39 This document describes an extension to the IKEv2 protocol that 40 allows for faster detection of SA desynchronization using a saved 41 token. 43 When an IPsec tunnel between two IKEv2 peers is disconnected due to a 44 restart of one peer, it can take as much as several minutes for the 45 other peer to discover that the reboot has occurred, thus delaying 46 recovery. In this text we propose an extension to the protocol, that 47 allows for recovery immediately following the restart. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 1.1. Conventions Used in This Document . . . . . . . . . . . . 3 53 2. RFC 4306 Crash Recovery . . . . . . . . . . . . . . . . . . . 3 54 3. Protocol Outline . . . . . . . . . . . . . . . . . . . . . . . 4 55 4. Formats and Exchanges . . . . . . . . . . . . . . . . . . . . 5 56 4.1. Notification Format . . . . . . . . . . . . . . . . . . . 5 57 4.2. Passing a Token in the AUTH Exchange . . . . . . . . . . . 5 58 4.3. Replacing Tokens After Rekey or Resumption . . . . . . . . 7 59 4.4. Replacing the Token for an Existing SA . . . . . . . . . . 7 60 4.5. Presenting the Token in an INFORMATIONAL Exchange . . . . 8 61 5. Token Generation and Verification . . . . . . . . . . . . . . 9 62 5.1. A Stateless Method of Token Generation . . . . . . . . . . 9 63 5.2. A Stateless Method with IP addresses . . . . . . . . . . . 9 64 5.3. Token Lifetime . . . . . . . . . . . . . . . . . . . . . . 10 65 6. Backup Gateways . . . . . . . . . . . . . . . . . . . . . . . 10 66 7. Alternative Solutions . . . . . . . . . . . . . . . . . . . . 10 67 7.1. Initiating a new IKE SA . . . . . . . . . . . . . . . . . 10 68 7.2. Birth Certificates . . . . . . . . . . . . . . . . . . . . 11 69 8. Interaction with Session Resumption . . . . . . . . . . . . . 11 70 9. Operational Considerations . . . . . . . . . . . . . . . . . . 13 71 9.1. Who should implement this specification . . . . . . . . . 13 72 9.2. Response to unknown child SPI . . . . . . . . . . . . . . 13 73 9.3. Using Tokens that Depend on IP Addresses . . . . . . . . . 14 74 10. Security Considerations . . . . . . . . . . . . . . . . . . . 14 75 10.1. QCD Token Handling . . . . . . . . . . . . . . . . . . . . 15 76 10.2. QCD Token Transmission . . . . . . . . . . . . . . . . . . 15 77 10.3. QCD Token Enumeration . . . . . . . . . . . . . . . . . . 15 78 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 79 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 80 13. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 16 81 13.1. Changes from draft-nir-ike-qcd-02 . . . . . . . . . . . . 16 82 13.2. Changes from draft-nir-ike-qcd-01 . . . . . . . . . . . . 17 83 13.3. Changes from draft-nir-ike-qcd-00 . . . . . . . . . . . . 17 84 13.4. Changes from draft-nir-qcr-00 . . . . . . . . . . . . . . 17 85 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 86 14.1. Normative References . . . . . . . . . . . . . . . . . . . 17 87 14.2. Informative References . . . . . . . . . . . . . . . . . . 17 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 89 Intellectual Property and Copyright Statements . . . . . . . . . . 19 91 1. Introduction 93 IKEv2, as described in [RFC4306] has a method for recovering from a 94 reboot of one peer. As long as traffic flows in both directions, the 95 rebooted peer should re-establish the tunnels immediately. However, 96 in many cases the rebooted peer is a VPN gateway that protects only 97 servers, or else the non-rebooted peer has a dynamic IP address. In 98 such cases, the rebooted peer will not be able to re-establish the 99 tunnels. Section 2 describes how recovery works under RFC 4306, and 100 explains why it may take several minutes. 102 The method proposed here, is to send a so-called "token" in the 103 IKE_AUTH exchange that establishes the tunnel. That token can be 104 stored on the peer as part of the IKE SA. After a reboot, the 105 rebooted implementation can re-generate the token, and send it to the 106 non-rebooted peer so as to delete the IKE SA. Deleting the IKE SA 107 results is a quick re-establishment of the IPsec tunnels. This is 108 described in Section 3. 110 1.1. Conventions Used in This Document 112 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 113 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 114 document are to be interpreted as described in [RFC2119]. 116 The term "token" refers to an octet string that an implementation can 117 generate using only the properties of a protected IKE message (such 118 as IKE SPIs) as input. A conforming implementation MUST be able to 119 generate the same token from the same input even after rebooting. 121 The term "token maker" refers to an implementation that generates a 122 token and sends it to the peer as specified in this document. 124 The term "token taker" refers to an implementation that stores such a 125 token or a digest thereof, in order to verify that a new token it 126 receives is identical to the old token it has stored. 128 2. RFC 4306 Crash Recovery 130 When one peer loses state or reboots, the other peer does not get any 131 notification, so unidirectional IPsec traffic can still flow. The 132 rebooted peer will not be able to decrypt it, however, and the only 133 remedy is to send an unprotected INVALID_SPI notification as 134 described in section 3.10.1 of [RFC4306]. That section also 135 describes the processing of such a notification: "If this 136 Informational Message is sent outside the context of an IKE_SA, it 137 should be used by the recipient only as a "hint" that something might 138 be wrong (because it could easily be forged)." 140 Since the INVALID_SPI can only be used as a hint, the non-rebooted 141 peer has to determine whether the IPsec SA, and indeed the parent IKE 142 SA are still valid. The method of doing this is described in section 143 2.4 of [RFC4306]. This method, called "liveness check" involves 144 sending a protected empty INFORMATIONAL message, and awaiting a 145 response. This procedure is sometimes referred to as "Dead Peer 146 Detection" or DPD. 148 Section 2.4 does not mandate how many times the liveness check 149 message should be retransmitted, or for how long, but does recommend 150 the following: "It is suggested that messages be retransmitted at 151 least a dozen times over a period of at least several minutes before 152 giving up on an SA". Clearly, implementations differ, but all will 153 take a significant amount of time. 155 3. Protocol Outline 157 Supporting implementations will send a notification, called a "QCD 158 token", as described in Section 4.1 in the last packets of the 159 IKE_AUTH exchange. These are the final request and final response 160 that contain the AUTH payloads. The generation of these tokens is a 161 local matter for implementations, but considerations are described in 162 Section 5. Implementations that send such a token will be called 163 "token makers". 165 A supporting implementation receiving such a token SHOULD store it 166 (or a digest thereof) as part of the IKE SA. Implementations that 167 support this part of the protocol will be called "token takers". 168 Section 9.1 has considerations for which implementations need to be 169 token takers, and which should be token makers. Implementation that 170 are not token takers will silently ignore QCD tokens. 172 When a token maker receives a protected IKE request message with 173 unknown IKE SPIs, it MUST generate a new token that is identical to 174 the previous token, and send it to the requesting peer in an 175 unprotected IKE message as described in Section 4.5. 177 When a token taker receives the QCD token in an unprotected 178 notification, it MUST verify that the TOKEN_SECRET_DATA matches the 179 token stored in the matching the IKE SA. If the verification fails, 180 or if the IKE SPIs in the message do not match any existing IKE SA, 181 it SHOULD log the event. If it succeeds, it MUST delete the IKE SA 182 associated with the IKE_SPI fields, and all dependant child SAs. 183 This event MAY also be logged. The token taker MUST accept such 184 tokens from any IP address and port combination, so as to allow 185 different kinds of high-availability configurations of the token 186 maker. 188 A supporting token taker MAY immediately create new SAs using an 189 Initial exchange, or it may wait for subsequent traffic to trigger 190 the creation of new SAs. 192 There is ongoing work on IKEv2 Session Resumption ([resumption] or 193 [stubs]). See Section 8 for a short discussion about this protocol's 194 interaction with session resumption. 196 4. Formats and Exchanges 198 4.1. Notification Format 200 The notification payload called "QCD token" is formatted as follows: 202 1 2 3 203 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 205 ! Next Payload !C! RESERVED ! Payload Length ! 206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 207 ! Protocol ID ! SPI Size ! QCD Token Notify Message Type ! 208 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 209 ! ! 210 ~ TOKEN_SECRET_DATA ~ 211 ! ! 212 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 214 o Protocol ID (1 octet) MUST contain 1, as this message is related 215 to an IKE SA. 216 o SPI Size (1 octet) MUST be zero, in conformance with [RFC4306]. 217 o QCD Token Notify Message Type (2 octets) - MUST be xxxxx, the 218 value assigned for QCD token notifications. TBA by IANA. 219 o TOKEN_SECRET_DATA (16-128 octets) contains a generated token as 220 described in Section 5. 222 4.2. Passing a Token in the AUTH Exchange 224 For brevity, only the EAP version of an AUTH exchange will be 225 presented here. The non-EAP version is very similar. The figures 226 below are based on appendix A.3 of [RFC4718]. 228 first request --> IDi, 229 [N(INITIAL_CONTACT)], 230 [[N(HTTP_CERT_LOOKUP_SUPPORTED)], CERTREQ+], 231 [IDr], 232 [CP(CFG_REQUEST)], 233 [N(IPCOMP_SUPPORTED)+], 234 [N(USE_TRANSPORT_MODE)], 235 [N(ESP_TFC_PADDING_NOT_SUPPORTED)], 236 [N(NON_FIRST_FRAGMENTS_ALSO)], 237 SA, TSi, TSr, 238 [V(SIR_VID)] 239 [V+] 241 first response <-- IDr, [CERT+], AUTH, 242 EAP, 243 [V(SIR_VID)] 244 [V+] 246 / --> EAP 247 repeat 1..N times | 248 \ <-- EAP 250 last request --> AUTH 251 [N(QCD_TOKEN)] 253 last response <-- AUTH, 254 [N(QCD_TOKEN)] 255 [CP(CFG_REPLY)], 256 [N(IPCOMP_SUPPORTED)], 257 [N(USE_TRANSPORT_MODE)], 258 [N(ESP_TFC_PADDING_NOT_SUPPORTED)], 259 [N(NON_FIRST_FRAGMENTS_ALSO)], 260 SA, TSi, TSr, 261 [N(ADDITIONAL_TS_POSSIBLE)], 262 [V+] 264 Note that the QCD_TOKEN notification is marked as optional because it 265 is not required by this specification that every implementation be 266 both token maker and token taker. If only one peer sends the QCD 267 token, then a reboot of the other peer will not be recoverable by 268 this method. This may be acceptable if traffic typically originates 269 from the other peer. 271 In any case, the lack of a QCD_TOKEN notification MUST NOT be taken 272 as an indication that the peer does not support this standard. 273 Conversely, if a peer does not understand this notification, it will 274 simply ignore it. Therefore a peer MAY send this notification 275 freely, even if it does not know whether the other side supports it. 277 The QCD_TOKEN notification is related to the IKE SA and MUST follow 278 the AUTH payload and precede the Configuration payload and all 279 payloads related to the child SA. 281 4.3. Replacing Tokens After Rekey or Resumption 283 After rekeying an IKE SA, the IKE SPIs are replaced, so the new SA 284 also needs to have a token. If only the responder in the rekey 285 exchange is the token maker, this can be done before within the 286 CREATE_CHILD_SA exchange. If the initiator is a token maker, then we 287 need an extra informational exchange. 289 The following figure shows the CREATE_CHILD_SA exchange for rekeying 290 the IKE SA. Only the responder sends a QCD token. 292 request --> SA, Ni, [KEi] 294 response <-- SA, Nr, [KEr], N(QCD_TOKEN) 296 If the initiator is also a token maker, it SHOULD soon initiate an 297 INFORMATIONAL exchange as follows: 299 request --> N(QCD_TOKEN) 301 response <-- 303 For session resumption, as specified in [resumption], the situation 304 is similar. The responder, which is necessarily the peer that has 305 crashed, SHOULD send a new ticket within the protected payload of the 306 IKE_SESSION_RESUME exchange. If the Initiator is also a token maker, 307 it needs to send a QCD_TOKEN in a separate INFORMATIONAL exchange. 309 4.4. Replacing the Token for an Existing SA 311 With some token generation methods, such as that described in 312 Section 5.2, a QCD token may sometimes become invalid, although the 313 IKE SA is still perfectly valid. 315 In such a case, the token maker MUST send the new token in a 316 protected message under that IKE SA. That exchange could be a simple 317 INFORMATIONAL, such as in the last figure in the previous section, or 318 else it can be part of a MOBIKE INFORMATIONAL exchange such as in the 319 following figure taken from section 2.2 of [RFC4555] and modified by 320 adding a QCD_TOKEN notification: 322 (IP_I2:4500 -> IP_R1:4500) 323 HDR, SK { N(UPDATE_SA_ADDRESSES), 324 N(NAT_DETECTION_SOURCE_IP), 325 N(NAT_DETECTION_DESTINATION_IP) } --> 327 <-- (IP_R1:4500 -> IP_I2:4500) 328 HDR, SK { N(NAT_DETECTION_SOURCE_IP), 329 N(NAT_DETECTION_DESTINATION_IP) } 331 <-- (IP_R1:4500 -> IP_I2:4500) 332 HDR, SK { N(COOKIE2), [N(QCD_TOKEN)] } 334 (IP_I2:4500 -> IP_R1:4500) 335 HDR, SK { N(COOKIE2), [N(QCD_TOKEN)] } --> 337 A token taker MUST accept such gratuitous QCD_TOKEN notifications as 338 long as they are carried in protected exchanges. A token maker 339 SHOULD NOT generate them unless it will not be able to generate the 340 old QCD_TOKEN after a crash. 342 4.5. Presenting the Token in an INFORMATIONAL Exchange 344 This QCD_TOKEN notification is unprotected, and is sent as a response 345 to a protected IKE request, which uses an IKE SA that is unknown. 347 request --> N(INVALID_IKE_SPI), N(QCD_TOKEN)+ 349 If child SPIs are persistently mapped to IKE SPIs as described in 350 Section 9.2, a token taker may get the following unprotected message 351 in response to an ESP or AH packet. 353 request --> N(INVALID_SPI), N(QCD_TOKEN)+ 355 The QCD_TOKEN and INVALID_IKE_SPI notifications are sent together to 356 support both implementations that conform to this specification and 357 implementations that don't. Similar to the description in section 358 2.21 of [RFC4306], The IKE SPI and message ID fields in the packet 359 headers are taken from the protected IKE request. 361 To support a periodic rollover of the secret used for token 362 generation, the token taker MUST support at least four QCD_TOKEN 363 notifications in a single packet. The token is considered verified 364 if any of the QCD_TOKEN notifications matches. The token maker MAY 365 generate up to four QCD_TOKEN notifications, based on several 366 generations of keys. 368 If the QCD_TOKEN verifies OK, an empty response MUST be sent. If the 369 QCD_TOKEN cannot be validated, a response SHOULD NOT be sent. 371 Section 5 defines token verification. 373 5. Token Generation and Verification 375 No token generation method is mandated by this document. A method is 376 documented in Section 5.1, but only serves as an example. 378 The following lists the requirements from a token generation 379 mechanism: 380 o Tokens MUST be at least 16 octets long, and no more than 128 381 octets long, to facilitate storage and transmission. Tokens 382 SHOULD be indistinguishable from random data. 383 o It should not be possible for an external attacker to guess the 384 QCD token generated by an implementation. Cryptographic 385 mechanisms such as PRNG and hash functions are RECOMMENDED. 386 o The token maker, MUST be able to re-generate or retrieve the token 387 based on the IKE SPIs even after it reboots. 389 5.1. A Stateless Method of Token Generation 391 This describes a stateless method of generating a token: 392 o At installation or immediately after the first boot of the IKE 393 implementation, 32 random octets are generated using a secure 394 random number generator or a PRNG. 395 o Those 32 bytes, called the "QCD_SECRET", are stored in non- 396 volatile storage on the machine, and kept indefinitely. 397 o The TOKEN_SECRET_DATA is calculated as follows: 399 TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R) 401 o If key rollover is required by policy, the implementation MAY 402 periodically generate a new QCD_SECRET and keep up to 3 previous 403 generations. When sending an unprotected QCD_TOKEN, as many as 4 404 notification payloads may be sent, each from a different 405 QCD_SECRET. 407 5.2. A Stateless Method with IP addresses 409 This method is similar to the one in the previous section, except 410 that the IP address of the token taker is also added to the block 411 being hashed. This has the disadvantage that the token needs to be 412 replaced (as described in Section 4.4) whenever the token taker 413 changes its address. 415 The reason to use this method is described in Section 9.3. When 416 using this method, the TOKEN_SECRET_DATA field is calculated as 417 follows: 419 TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R | IPaddr-T) 421 The IPaddr-T field specifies the IP address of the token taker. 422 Secret rollover considerations are similar to those in the previous 423 section. 425 5.3. Token Lifetime 427 The token is associated with a single IKE SA, and SHOULD be deleted 428 by the token taker when the SA is deleted or expires. More formally, 429 the token is associated with the pair (SPI-I, SPI-R). 431 6. Backup Gateways 433 Making crash detection and recovery quick is a worthy goal, but since 434 rebooting a gateway takes a non-zero amount of time, many 435 implementations choose to have a stand-by gateway ready to take over 436 as soon as the primary gateway fails for any reason. 438 If such a configuration is available, it is RECOMMENDED that the 439 stand-by gateway be able to generate the same token as the active 440 gateway. if the method described in Section 5.1 is used, this means 441 that the QCD_SECRET field is identical in both gateways. This has 442 the effect of having the crash recovery available immediately. 444 7. Alternative Solutions 446 7.1. Initiating a new IKE SA 448 Instead of sending a QCD token, we could have the rebooted 449 implementation start an Initial exchange with the peer, including the 450 INITIAL_CONTACT notification. This would have the same effect, 451 instructing the peer to erase the old IKE SA, as well as establishing 452 a new IKE SA with fewer rounds. 454 The disadvantage here, is that in IKEv2 an authentication exchange 455 MUST have a piggy-backed Child SA set up. Since our use case is such 456 that the rebooted implementation does not have traffic flowing to the 457 peer, there are no good selectors for such a Child SA. 459 Additionally, when authentication is asymmetric, such as when EAP is 460 used, it is not possible for the rebooted implementation to initiate 461 IKE. 463 7.2. Birth Certificates 465 Birth Certificates is a method of crash detection that has never been 466 formally defined. Bill Sommerfeld suggested this idea in a mail to 467 the IPsec mailing list on August 7, 2000, in a thread discussing 468 methods of crash detection: 470 If we have the system sign a "birth certificate" when it 471 reboots (including a reboot time or boot sequence number), 472 we could include that with a "bad spi" ICMP error and in 473 the negotiation of the IKE SA. 475 We believe that this method would have some problems. First, it 476 requires Alice to store the certificate, so as to be able to compare 477 the public keys. That requires more storage than does a QCD token. 478 Additionally, the public-key operations needed to verify the self- 479 signed certificates are more expensive for Alice. 481 We believe that a symmetric-key operation such as proposed here is 482 more light-weight and simple than that implied by the Birth 483 Certificate idea. 485 8. Interaction with Session Resumption 487 Session Resumption, specified in [resumption] proposes to make 488 setting up a new IKE SA consume less computing resources. This is 489 particularly useful in the case of a remote access gateway that has 490 many tunnels. A failure of such a gateway would require all these 491 many remote access clients to establish an IKE SA either with the 492 rebooted gateway or with a backup gateway. This tunnel re- 493 establishment should occur within a short period of time, creating a 494 burden on the remote access gateway. Session Resumption addresses 495 this problem by having the clients store an encrypted derivative of 496 the IKE SA for quick re-establishment. 498 What Session Resumption does not help, is the problem of detecting 499 that the peer gateway has failed. A failed gateway may go undetected 500 for as long as the lifetime of a child SA, because IPsec does not 501 have packet acknowledgement, and applications cannot signal the IPsec 502 layer that the tunnel "does not work". Before establishing a new IKE 503 SA using Session Resumption, a client MUST ascertain that the gateway 504 has indeed failed. This could be done using either a liveness check 505 (as in RFC 4306) or using the QCD tokens described in this document. 507 A remote access client conforming to both specifications will store 508 QCD tokens, as well as the Session Resumption ticket, if provided by 509 the gateway. A remote access gateway conforming to both 510 specifications will generate a QCD token for the client. When the 511 gateway reboots, the client will discover this in either of two ways: 512 1. The client does regular liveness checks, or else the time for 513 some other IKE exchange has come. Since the gateway is still 514 down, the IKE times out after several minutes. In this case QCD 515 does not help. 516 2. Either the primary gateway or a backup gateway (see Section 6) is 517 ready and sends a QCD token to the client. In that case the 518 client will quickly re-establish the IPsec tunnel, either with 519 the rebooted primary gateway, the backup gateway as described in 520 this document or another gateway as described in [resumption] 522 The full combined protocol looks like this: 524 Initiator Responder 525 ----------- ----------- 526 HDR, SAi1, KEi, Ni --> 528 <-- HDR, SAr1, KEr, Nr, [CERTREQ] 530 HDR, SK {IDi, [CERT,] 531 [CERTREQ,] [IDr,] 532 AUTH, N(QCD_TOKEN) 533 SAi2, TSi, TSr, 534 N(TICKET_REQUEST)} --> 535 <-- HDR, SK {IDr, [CERT,] AUTH, SAr2, TSi, 536 TSr, N(TICKET_OPAQUE) 537 [,N(TICKET_GATEWAY_LIST)]} 539 ---- Reboot ----- 541 HDR, {} --> 542 <-- HDR, N(QCD_Token) 544 HDR, Ni, N(TICKET_OPAQUE), 545 [N+,], SK {IDi, [IDr,] 546 SAi2, TSi, TSr, 547 [CP(CFG_REQUEST)]} --> 548 <-- HDR, SK {IDr, Nr, SAr2, [TSi, TSr], 549 [CP(CFG_REPLY)]} 551 9. Operational Considerations 553 9.1. Who should implement this specification 555 Throughout this document, we have referred to reboot time 556 alternatingly as the time that the implementation crashes and the 557 time when it is ready to process IPsec packets and IKE exchanges. 558 Depending on the hardware and software platforms and the cause of the 559 reboot, rebooting may take anywhere from a few seconds to several 560 minutes. If the implementation is down for a long time, the benefit 561 of this protocol extension is reduced. For this reason critical 562 systems should implement backup gateways as described in Section 6. 563 Note that the lower-case "should" in the previous sentence is 564 intentional, as we do not specify this in the sense of RFC 2119. 566 Implementing the "token maker" side of QCD makes sense for IKE 567 implementation where protected connections originate from the peer, 568 such as inter-domain VPNs and remote access gateways. Implementing 569 the "token taker" side of QCD makes sense for IKE implementations 570 where protected connections originate, such as inter-domain VPNs and 571 remote access clients. 573 To clarify the requirements: 574 o A remote-access client MUST be a token taker and MAY be a token 575 maker. 576 o A remote-access gateway MAY be a token taker and MUST be a token 577 maker. 578 o An inter-domain VPN gateway MUST be both token maker and token 579 taker. 581 In order to limit the effects of DoS attacks, a token taker SHOULD 582 limit the rate of QCD_TOKENs verified from a particular source. 584 If excessive amounts of IKE requests protected with unknown IKE SPIs 585 arrive at a token maker, the IKE module SHOULD revert to the behavior 586 described in section 2.21 of [RFC4306] and either send an 587 INVALID_IKE_SPI notification, or ignore it entirely. 589 9.2. Response to unknown child SPI 591 After a reboot, it is more likely that an implementation receives 592 IPsec packets than IKE packets. In that case, the rebooted 593 implementation will send an INVALID_SPI notification, triggering a 594 liveness check. The token will only be sent in a response to the 595 liveness check, thus requiring an extra round-trip. 597 To avoid this, an implementation that has access to non-volatile 598 storage MAY store a mapping of child SPIs to owning IKE SPIs, or to 599 generated tokens. If such a mapping is available and persistent 600 across reboots, the rebooted implementation SHOULD respond to the 601 IPsec packet with an INVALID_SPI notification, along with the 602 appropriate QCD_Token notifications. A token taker SHOULD verify the 603 QCD token that arrives with an INVALID_SPI notification the same as 604 if it arrived with the IKE SPIs of the parent IKE SA. 606 However, a persistent storage module might not be updated in a timely 607 manner, and could be populated with IKE SPIs that have already been 608 rekeyed. A token taker MUST NOT take an invalid QCD Token sent along 609 with an INVALID_SPI notification as evidence that the peer is either 610 malfunctioning or attacking, but it SHOULD limit the rate at which 611 such notifications are processed. 613 9.3. Using Tokens that Depend on IP Addresses 615 This section will describe the rationale for token generation methods 616 such as the one described in Section 5.2. Note that this section 617 merely provides a possible rationale, and does not specify or 618 recommend any kind of configuration. 620 Some configurations of security gateway use a load-sharing cluster of 621 hosts, all sharing the same IP addresses, where the SAs (IKE and 622 child) are not synchronized between the cluster members. In such a 623 configuration, a single member does not know about all the IKE SAs 624 that are active for the configuration. A load balancer (usually a 625 networking switch) sends IKE and IPsec packets to the several members 626 based on source IP address. 628 In such a configuration, an attacker can send a forged protected IKE 629 packet with the IKE SPIs of an existing IKE SA, but from a different 630 IP address. This packet will likely be processed by a different 631 cluster member from the one that owns the IKE SA. Since no IKE SA 632 state is stored on this member, it will send a QCD token to the 633 attacker. If the QCD token does not depend on IP address, this token 634 can immediately be used to tell the token taker to tear down the IKE 635 SA using an unprotected QCD_TOKEN notification. 637 To thwart this possible attack, such configurations should use a 638 method that considers the taker's IP address, such as the method 639 described in Section 5.2. 641 10. Security Considerations 642 10.1. QCD Token Handling 644 Tokens MUST be hard to guess. This is critical, because if an 645 attacker can guess the token associated with the IKE SA, she can tear 646 down the IKE SA and associated tunnels at will. When the token is 647 delivered in the IKE_AUTH exchange, it is encrypted. When it is sent 648 again in an unprotected notification, it is not, but that is the last 649 time this token is ever used. 651 An aggregation of some tokens generated by one peer together with the 652 related IKE SPIs MUST NOT give an attacker the ability to guess other 653 tokens. Specifically, if one peer does not properly secure the QCD 654 tokens and an attacker gains access to them, this attacker MUST NOT 655 be able to guess other tokens generated by the same peer. This is 656 the reason that the QCD_SECRET in Section 5.1 needs to be 657 sufficiently long. 659 The QCD_SECRET MUST be protected from access by other parties. 660 Anyone gaining access to this value will be able to delete all the 661 IKE SAs for this token maker. 663 The QCD token is sent by the rebooted peer in an unprotected message. 664 A message like that is subject to modification, deletion and replay 665 by an attacker. However, these attacks will not compromise the 666 security of either side. Modification is meaningless because a 667 modified token is simply an invalid token. Deletion will only cause 668 the protocol not to work, resulting in a delay in tunnel re- 669 establishment as described in Section 2. Replay is also meaningless, 670 because the IKE SA has been deleted after the first transmission. 672 10.2. QCD Token Transmission 674 A token maker MUST NOT send a QCD token in an unprotected message for 675 an existing IKE SA. This implies that a conforming QCD token maker 676 MUST be able to tell whether a particular pair of IKE SPIs represent 677 a valid IKE SA. 679 This requirement is obvious and easy in the case of a single gateway. 680 However, some implementations use a load balancer to divide the load 681 between several physical gateways. It MUST NOT be possible even in 682 such a configuration to trick one gateway into sending a QCD token 683 for an IKE SA which is valid on another gateway. 685 10.3. QCD Token Enumeration 687 An attacker may try to attack QCD if the generation algorithm 688 described in Section 5.1 is used. The attacker will send several 689 fake IKE requests to the gateway under attack, receiving and 690 recording the QCD Tokens in the responses. This will allow the 691 attacker to create a dictionary of IKE SPIs to QCD Tokens, which can 692 later be used to tear down any IKE SA. 694 Three factors mitigate this threat: 695 o The space of all possible IKE SPI pairs is huge: 2^128, so making 696 such a dictionary is impractical. Even if we assume that one 697 implementation is faulty and always generates predictable IKE 698 SPIs, the space is still at least 2^64 entries, so making the 699 dictionary is extremely hard. 700 o Throttling the amount of QCD_TOKEN notifications sent out, as 701 discussed in Section 9.1, especially when not soon after a crash 702 will limit the attacker's ability to construct a dictionary. 703 o The methods in Section 5.1 and Section 5.2 allow for a periodic 704 change of the QCD_SECRET. Any such change invalidates the entire 705 dictionary. 707 11. IANA Considerations 709 IANA is requested to assign a notify message type from the error 710 types range (43-8191) of the "IKEv2 Notify Message Types" registry 711 with name "QUICK_CRASH_DETECTION". 713 12. Acknowledgements 715 We would like to thank Hannes Tschofenig and Yaron Sheffer for their 716 comments about Session Resumption. 718 13. Change Log 720 This section lists all changes in this document 722 NOTE TO RFC EDITOR : Please remove this section in the final RFC 724 13.1. Changes from draft-nir-ike-qcd-02 726 o Described QCD token enumeration, following a question by 727 Lakshminath Dondeti. 728 o Added the ability to replace the QCD token for an existing IKE SA. 729 o Added tokens dependant on peer IP address and their interaction 730 with MOBIKE. 732 13.2. Changes from draft-nir-ike-qcd-01 734 o Removed stateless method. 735 o Added discussion of rekeying and resumption. 736 o Added discussion of non-synchronized load-balanced clusters of 737 gateways in the security considerations. 738 o Other wording fixes. 740 13.3. Changes from draft-nir-ike-qcd-00 742 o Merged proposal with draft-detienne-ikev2-recovery [recovery] 743 o Changed the protocol so that the rebooted peer generates the 744 token. This has the effect, that the need for persistent storage 745 is eliminated. 746 o Added discussion of birth certificates. 748 13.4. Changes from draft-nir-qcr-00 750 o Changed name to reflect that this relates to IKE. Also changed 751 from quick crash recovery to quick crash detection to avoid 752 confusion with IFARE. 753 o Added more operational considerations. 754 o Added interaction with IFARE. 755 o Added discussion of backup gateways. 757 14. References 759 14.1. Normative References 761 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 762 Requirement Levels", BCP 14, RFC 2119, March 1997. 764 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 765 RFC 4306, December 2005. 767 [RFC4555] Eronen, P., "IKEv2 Mobility and Multihoming Protocol 768 (MOBIKE)", RFC 4555, June 2006. 770 [RFC4718] Eronen, P. and P. Hoffman, "IKEv2 Clarifications and 771 Implementation Guidelines", RFC 4718, October 2006. 773 14.2. Informative References 775 [recovery] 776 Detienne, F., Sethi, P., and Y. Nir, "Safe IKE Recovery", 777 draft-detienne-ikev2-recovery (work in progress), 778 July 2008. 780 [resumption] 781 Sheffer, Y., Tschofenig, H., Dondeti, L., and V. 782 Narayanan, "IKEv2 Session Resumption", 783 draft-tschofenig-ipsecme-ikev2-resumption (work in 784 progress), September 2008. 786 [stubs] Xu, Y., Yang, P., Ma, Y., Deng, H., and K. Xu, "IKEv2 SA 787 Synchronization for session resumption", 788 draft-xu-ike-sa-sync (work in progress), October 2008. 790 Authors' Addresses 792 Yoav Nir 793 Check Point Software Technologies Ltd. 794 5 Hasolelim st. 795 Tel Aviv 67897 796 Israel 798 Email: ynir@checkpoint.com 800 Frederic Detienne 801 Cisco Systems, Inc. 802 De Kleetlaan, 7 803 Diegem B-1831 804 Belgium 806 Phone: +32 2 704 5681 807 Email: fd@cisco.com 809 Pratima Sethi 810 Cisco Systems, Inc. 811 O'Shaugnessy Road, 11 812 Bangalore, Karnataka 560027 813 India 815 Phone: +91 80 4154 1654 816 Email: psethi@cisco.com 818 Full Copyright Statement 820 Copyright (C) The IETF Trust (2008). 822 This document is subject to the rights, licenses and restrictions 823 contained in BCP 78, and except as set forth therein, the authors 824 retain all their rights. 826 This document and the information contained herein are provided on an 827 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 828 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 829 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 830 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 831 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 832 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 834 Intellectual Property 836 The IETF takes no position regarding the validity or scope of any 837 Intellectual Property Rights or other rights that might be claimed to 838 pertain to the implementation or use of the technology described in 839 this document or the extent to which any license under such rights 840 might or might not be available; nor does it represent that it has 841 made any independent effort to identify any such rights. Information 842 on the procedures with respect to rights in RFC documents can be 843 found in BCP 78 and BCP 79. 845 Copies of IPR disclosures made to the IETF Secretariat and any 846 assurances of licenses to be made available, or the result of an 847 attempt made to obtain a general license or permission for the use of 848 such proprietary rights by implementers or users of this 849 specification can be obtained from the IETF on-line IPR repository at 850 http://www.ietf.org/ipr. 852 The IETF invites any interested party to bring to its attention any 853 copyrights, patents or patent applications, or other proprietary 854 rights that may cover technology that may be required to implement 855 this standard. Please address the information to the IETF at 856 ietf-ipr@ietf.org.