idnits 2.17.1 draft-nir-ike-qcd-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 929. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 940. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 947. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 953. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 13, 2008) is 5766 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'IDr' is mentioned on line 459, but not defined == Missing Reference: 'CERTREQ' is mentioned on line 656, but not defined == Missing Reference: 'TSi' is mentioned on line 676, but not defined == Missing Reference: 'TSr' is mentioned on line 676, but not defined ** Obsolete normative reference: RFC 4306 (Obsoleted by RFC 5996) ** Obsolete normative reference: RFC 4718 (Obsoleted by RFC 5996) Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nir 3 Internet-Draft Check Point 4 Intended status: Standards Track F. Detienne 5 Expires: January 14, 2009 P. Sethi 6 Cisco 7 July 13, 2008 9 A Quick Crash Detection Method for IKE 10 draft-nir-ike-qcd-01.txt 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on January 14, 2009. 37 Abstract 39 This document describes an extension to the IKEv2 protocol that 40 allows for faster crash recovery using a saved token. 42 When an IPsec tunnel between two IKEv2 implementations is 43 disconnected due to a restart of one peer, it can take as much as 44 several minutes for the other peer to discover that the reboot has 45 occurred, thus delaying recovery. In this text we propose an 46 extension to the protocol, that allows for recovery immediately 47 following the reboot. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 1.1. Conventions Used in This Document . . . . . . . . . . . . 3 53 2. RFC 4306 Crash Recovery . . . . . . . . . . . . . . . . . . . 3 54 3. Protocol Outline . . . . . . . . . . . . . . . . . . . . . . . 4 55 4. Stateless Variant Outline . . . . . . . . . . . . . . . . . . 5 56 4.1. Introducing CHECK_SPI . . . . . . . . . . . . . . . . . . 5 57 4.2. Stateless Recovery . . . . . . . . . . . . . . . . . . . . 6 58 4.3. Wait before rekey . . . . . . . . . . . . . . . . . . . . 6 59 4.4. Throttling and Dampening . . . . . . . . . . . . . . . . . 7 60 4.4.1. Invalid SPI throttling . . . . . . . . . . . . . . . . 8 61 4.4.2. Dampening . . . . . . . . . . . . . . . . . . . . . . 8 62 4.4.3. User controls . . . . . . . . . . . . . . . . . . . . 9 63 5. Formats and Exchanges . . . . . . . . . . . . . . . . . . . . 9 64 5.1. Notification Format . . . . . . . . . . . . . . . . . . . 9 65 5.2. check_fmt . . . . . . . . . . . . . . . . . . . . . . . . 9 66 5.3. Stateless IKE Recovery VendorID . . . . . . . . . . . . . 10 67 5.4. Authentication Exchange . . . . . . . . . . . . . . . . . 10 68 5.5. Informational Exchange . . . . . . . . . . . . . . . . . . 12 69 6. Token Generation and Verification . . . . . . . . . . . . . . 12 70 6.1. A Stateless Method of Token Generation . . . . . . . . . . 13 71 6.2. Token Lifetime . . . . . . . . . . . . . . . . . . . . . . 13 72 7. Backup Gateways . . . . . . . . . . . . . . . . . . . . . . . 13 73 8. Alternative Solutions . . . . . . . . . . . . . . . . . . . . 13 74 8.1. Initiating a new IKE SA . . . . . . . . . . . . . . . . . 14 75 8.2. Birth Certificates . . . . . . . . . . . . . . . . . . . . 14 76 9. Interaction with IFARE . . . . . . . . . . . . . . . . . . . . 14 77 10. Operational Considerations . . . . . . . . . . . . . . . . . . 15 78 10.1. Who should implement this specification . . . . . . . . . 15 79 10.2. Response to unknown child SPI . . . . . . . . . . . . . . 16 80 10.3. Stateless IKE Recovery cookie . . . . . . . . . . . . . . 17 81 11. Security Considerations . . . . . . . . . . . . . . . . . . . 17 82 11.1. Security Considerations for the Stateful Method . . . . . 18 83 11.2. Security Considerations for the Stateless Method . . . . . 18 84 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 85 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 86 14. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 19 87 14.1. Changes from draft-nir-ike-qcd-00 . . . . . . . . . . . . 19 88 14.2. Changes from draft-nir-qcr-00 . . . . . . . . . . . . . . 19 89 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 90 15.1. Normative References . . . . . . . . . . . . . . . . . . . 19 91 15.2. Informative References . . . . . . . . . . . . . . . . . . 20 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 93 Intellectual Property and Copyright Statements . . . . . . . . . . 22 95 1. Introduction 97 IKEv2, as described in [RFC4306] has a method for recovering from a 98 reboot of one peer. As long as traffic flows in both directions, the 99 rebooted peer should re-establish the tunnels immediately. However, 100 in many cases the rebooted peer is a VPN gateway that protects only 101 servers, or else the non-rebooted peer has a dynamic IP address. In 102 such cases, the rebooted peer will not be able to re-establish the 103 tunnels. Section 2 describes how recovery works under RFC 4306, and 104 explains why it takes several minutes. 106 The method proposed here, is to send a token in the IKE_AUTH exchange 107 that establishes the tunnel. That token can be stored on the peer as 108 part of the IKE SA. After a reboot, the rebooted implementation can 109 re-generate the token, and send it to the non-rebooted peer so as to 110 delete the IKE SA. Deleting the IKE SA results is a quick re- 111 establishment of the IPsec tunnels. This is described in Section 3. 113 Finally, Section 4 describes a variant that does not require storing 114 state on the non-rebooted peer, but does require an extra round-trip. 116 1.1. Conventions Used in This Document 118 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 119 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 120 document are to be interpreted as described in [RFC2119]. 122 The term "token" refers to an octet string that an implementation can 123 generate using only the IKE SPIs as input. A conforming 124 implementation MUST be able to generate the same token from the same 125 input even after rebooting. 127 The term "token maker" refers to an implementation that generates a 128 token and sends it to the peer in the IKE_AUTH exchange. 130 The term "token taker" refers to an implementation that stores such a 131 token or a digest thereof, after receiving it in an IKE_AUTH 132 exchange. 134 2. RFC 4306 Crash Recovery 136 When one peer reboots, the other peer does not get any notification, 137 so IPsec traffic can still flow. The rebooted peer will not be able 138 to decrypt it, however, and the only remedy is to send an unprotected 139 INVALID_SPI notification as described in section 3.10.1 of [RFC4306]. 140 That section also describes the processing of such a notification: 141 "If this Informational Message is sent outside the context of an 142 IKE_SA, it should be used by the recipient only as a "hint" that 143 something might be wrong (because it could easily be forged)." 145 Since the INVALID_SPI can only be used as a hint, the non-rebooted 146 peer has to determine whether the IPsec SA, and indeed the parent IKE 147 SA are still valid. The method of doing this is described in section 148 2.4 of [RFC4306]. This method, called "liveness check" involves 149 sending a protected empty INFORMATIONAL message, and awaiting a 150 response. This procedure is sometimes referred to as "Dead Peer 151 Detection" or DPD. 153 Section 2.4 does not mandate how many times the liveness check 154 message should be retransmitted, or for how long, but does recommend 155 the following: "It is suggested that messages be retransmitted at 156 least a dozen times over a period of at least several minutes before 157 giving up on an SA". Clearly, implementations differ, but all will 158 take a significant amount of time. 160 3. Protocol Outline 162 Supporting implementations will send a notification, called a "QCD 163 token", as described in Section 5.1 in the last packets of the 164 IKE_AUTH exchange. These are the final request and final response 165 that contain the AUTH payloads. The generation of these tokens is a 166 local matter for implementations, but considerations are described in 167 Section 6. Implementations that send such a token will be called 168 "token makers". 170 A supporting implementation receiving such a token SHOULD store it as 171 part of the IKE SA. Implementations that support this part of the 172 protocol will be called "token takers". Section 10.1 has 173 considerations for which implementations need to be token takers, and 174 which should be token makers. Implementation that are not token 175 takers will silently ignore QCD tokens. 177 When a token maker receives a protected IKE request message with 178 unknown IKE SPIs, it MUST generate a new token that is identical to 179 the previous token, and send it to the requesting peer in an 180 unprotected IKE message as described in Section 5.5. 182 When a token taker receives the QCD token in an unprotected 183 notification, it MUST verify that the TOKEN_SECRET_DATA matches the 184 token stored in the matching the IKE SA. If the verification fails, 185 or if the IKE SPIs in the message do not match any existing IKE SA, 186 it SHOULD log the event. If it succeeds, it MUST delete the IKE SA 187 associated with the IKE_SPI fields, and all dependant child SAs. 188 This event MAY also be logged. The token taker MUST accept such 189 tokens from any address, so as to allow different kinds of high- 190 availability configuration of the token maker. 192 A supporting token taker MAY immediately create new SAs using an 193 Initial exchange, or it may wait for subsequent traffic to trigger 194 the creation of new SAs. 196 There is ongoing work on IKEv2 Session Resumption [resumption]. See 197 Section 9 for a short discussion about this protocol's interaction 198 with session resumption. 200 4. Stateless Variant Outline 202 Sometimes, a QCD token is not available to the non-rebooted 203 implementation. This can happen for several reasons: 204 o Perhaps the rebooted peer has not implemented the "token maker" 205 part of the protocol. 206 o Perhaps the non-rebooted peer is resource-constrained, and cannot 207 spare the memory needed to save the token, so it did not implement 208 the "token taker" part of the protocol. 210 In such cases, we also define a stateless variant of the protocol, 211 that does not require any state on the non-rebooted peer, but does 212 require an extra round-trip. 214 A supporting implementation will advertise this capability with a 215 special VID payload as defined in Section 5.3. When such an 216 implementation reboots and sends an INVALID_SPI or INVALID_IKE_SPI 217 notification to the non-rebooted peer, which has no QCD token, the 218 non-rebooted peer uses a CHECK_SPI notification (see Section 4.1) to 219 poll its peer about whether or not the SPI is actually invalid. 221 4.1. Introducing CHECK_SPI 223 In order to achieve stateless IKE recovery, this memo introduces a 224 new notify type called CHECK_SPI. The CHECK_SPI payload carries an 225 SPI (IKE_SA or Child SA) and one of three sub-types (QUERY, ACK, 226 NACK). The semantic of the CHECK_SPI subtypes is the following: 227 o QUERY: a peer queries the remote peer SA DB for the presence of 228 the SA whose value is in the payload. 229 o ACK: a peer confirms it has the SA specified in the payload. 230 o NACK: a peer confirms it does not have the SA specified in the 231 payload. 233 The payload format of the CHECK_SPI notify is covered in Section 5.2. 235 4.2. Stateless Recovery 237 After receiving the INVALID_SPI or INVALID_IKE_SPI notifications, the 238 non-rebooted peer (called Peer Y in the figure) will send an 239 unprotected IKE message as follows. Note that Peer Y MUST NOT send 240 this unless Peer X has advertised this capability in the IKE_AUTH 241 exchange. 243 Peer X Peer Y 245 HDR(A,B) INVALID_IKE_SPI(A,B) 246 --------------------------------------------> 248 HDR(A,B) CHECK_SPI(QUERY,(A,B)), N(Cookie) 249 <-------------------------------------------- 251 HDR(A,B) CHECK_SPI(ACK|NACK,(A,B)), N(Cookie) 252 --------------------------------------------> 254 In this figure, A & B represent the IKE SPIs, and the Cookie is a 255 stateless cookie with similar considerations as the stateless cookie 256 described in section 2.6 of RFC 4306. The cookie SHOULD depend on 257 the IKE SPIs and a saved secret. 259 A similar exchange happens when the peer sends an INVALID_SPI 260 notification: 262 Peer X Peer Y 264 HDR(0,0) INVALID_SPI(a) 265 --------------------------------------------> 267 HDR(A,B) CHECK_SPI(QUERY,(A,B)), N(Cookie) 268 <-------------------------------------------- 270 HDR(A,B) CHECK_SPI(ACK|NACK,(A,B)), N(Cookie) 271 --------------------------------------------> 273 The difference here is that Peer Y had to locate the IKE SPIs 274 associated with the SPI mentioned in the INVALID_SPI notification. 276 4.3. Wait before rekey 278 There exists a particular attack where a man-in-the-middle can snoop 279 and inject traffic but can not block or drop packets. This attack 280 can spoof INVALID_SPI (allegedly from X), forcing a CHECK_SPI(QUERY) 281 from Y. The attacker would spoof back CHECK_SPI(NACK) to force an 282 undue rekey. Since the attacker can not block packets, the 283 INVALID_SPI will also reach Alice, who will reply with 284 CHECK_SPI(ACK). 286 Y receives CHECK_SPI(NACK) first and MAY wait for a few msec before 287 creating a new SA. Y will eventually receive BOTH a CHECK_SPI(ACK) 288 and a CHECK_SPI(NACK), Which is dubious. The SIR process should then 289 stop and log an error, saving the SA. 291 The process is illustrated below: 293 X Attacker Y 294 Inv SPI 295 ------------------> 297 CHECK_SPI(QUERY) 298 <------------------------------------- 300 CHECK_SPI(NACK) 301 ------------------> Should rekey 302 but wait a few msec 304 CHECK_SPI(ACK) 305 -------------------------------------> Hint of attack 306 => no rekey 308 Ideally, the round-trip-time should be measured during the IKE 309 exchange and Y wait for a full RTT before initiating a rekey. 311 Given that IKE itself is subject to DH computation by a man-in-the- 312 middle, also considering that SA's are dampened after creation (see 313 Section 4.4.2), the staging complexity and limited interest of this 314 attack makes it rather impractical. An implementation MAY decided to 315 implement this final safety wait but this is strictly optional. 317 4.4. Throttling and Dampening 319 An important aspect of the security in stateless IKE recovery has to 320 do with limiting the CPU utilization. In order to thwart flood types 321 denial of service attacks, strict rate limiting and throttling 322 mechanisms have to be enforced. 324 All the notifications that are exchanged during IKE recovery SHOULD 325 be rate limited. This paragraph provides information on the way rate 326 limiting should take place. 328 4.4.1. Invalid SPI throttling 330 The sending of all Invalid SPI notifies MUST be rate limited one way 331 or an other. The rate limiting SHOULD be performed on a per peer 332 basis but dynamic state creation SHOULD be avoided as much as 333 possible. A recommended tradeoff is to limit the number of flows 334 that can undergo recovery at one point in time and avoid sending 335 Invalid SPI notifies for flows that are potentially already under 336 recovery. 338 Invalid SPI rate limiting protects against natural dangling SA 339 occurences. I.e. normal traffic conditions may cause unrecognized 340 SPI's to be received and this message is the most important to 341 protect. Indeed, it is not realistic to send one notification per 342 bad ESP packet received. On high speed links, this could mean 343 thousands of IKE notifies sent for the same offending SPI. 345 The receiving of unauthenticated Invalid SPI notifies MUST as well be 346 rate limited. Again, the rate limiting SHOULD be performed on a per 347 peer basis without dynamic state creation. In normal circumstances, 348 the peer receiving Invalid SPI notifies has an SA with the peer 349 sendig those notifies and already maintains peer-related data 350 structures that can help in maintaining adequate counters. 352 Authenticated Invalid SPI notifies can be accepted without 353 throttling. 355 4.4.2. Dampening 357 After one of the following conditions: 358 o the natural creation or rekey of one or more SA's 359 o the recovery of one or more SA's 360 o the failure in recovering an SA owned by the local security 361 gateway 362 o the logging of an error or warning message involving an SA owned 363 by the local security gateway 365 The peer with which SA's were created, attempted or against which a 366 log was emitted SHOULD be dampened, which means that all the 367 unauthenticated Invalid SPI and Check SPI messages emitted by that 368 peer MUST be ignored for a chosen duration. 370 This protection prevents a man-in-the-middle from forcing the fast 371 recreation of SA's and potentially depleting the entropy of systems 372 under attack. It also deals efficently with race conditions that may 373 occur after a rekey. 375 4.4.3. User controls 377 Because throttling at large is related to speed, the network 378 implementation around the security gateways has a major influence on 379 the pertinence of the paremeters controlling rate limiting. It is 380 difficult to provide good absolute values for the rate limiters, 381 considering that these are implementation dependent. 383 As such, for the sake of fitness in practical deployments, a system 384 implementing this memo MUST provide administrative controls over the 385 rate limiter parameters. 387 5. Formats and Exchanges 389 5.1. Notification Format 391 The notification payload called "QCD token" is formatted as follows: 393 1 2 3 394 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 395 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 396 ! Next Payload !C! RESERVED ! Payload Length ! 397 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 398 ! Protocol ID ! SPI Size ! QCD Token Notify Message Type ! 399 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 400 ! ! 401 ~ TOKEN_SECRET_DATA ~ 402 ! ! 403 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 o Protocol ID (1 octet) MUST contain 1, as this message is related 406 to an IKE SA. 407 o SPI Size (1 octet) MUST be zero, in conformance with [RFC4306]. 408 o QCD Token Notify Message Type (2 octets) - MUST be xxxxx, the 409 value assigned for QCD token notifications. TBA by IANA. 410 o TOKEN_SECRET_DATA (16-256 octets) contains a generated token as 411 described in Section 6. 413 5.2. check_fmt 415 The notification payload called "CHECK_SPI" is formatted as follows: 417 1 2 3 418 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 419 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 420 ! Next Payload !C! RESERVED ! Payload Length ! 421 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 422 ! Protocol ID ! SPI Size ! CHECK_SPI Notify Message Type ! 423 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 424 ! Operation ! 425 +-+-+-+-+-+-+-+-+ 427 o Protocol ID (1 octet) MUST contain 1, as this message is related 428 to an IKE SA. 429 o SPI Size (1 octet) MUST be zero, in conformance with [RFC4306]. 430 o CHECK_SPI Notify Message Type (2 octets) - MUST be xxxxx, the 431 value assigned for CHECK_SPI notifications. TBA by IANA. 432 o Operation (1 Octet) - This field determines the operation being 433 performed (Query, Reply_ACK, Reply_NACK) 435 The list of operations and their corresponding value: 436 o Query: 0 437 o Reply_ACK: 1 438 o NACK: 2 440 5.3. Stateless IKE Recovery VendorID 442 The stateless IKE recovery VendorID or SIR_VID is as follows: 444 "SIR STATELESS" hex: 53 49 52 20 53 54 41 54 45 4c 45 53 53 446 This VendorID payload MUST be sent in the first IKE_AUTH message of 447 any implementation that supports the stateless variant of this 448 protocol. 450 5.4. Authentication Exchange 452 For clarity, only the EAP version of an AUTH exchange will be 453 presented here. The non-EAP version is very similar. The figure 454 below is based on appendix A.3 of [RFC4718]. 456 first request --> IDi, 457 [N(INITIAL_CONTACT)], 458 [[N(HTTP_CERT_LOOKUP_SUPPORTED)], CERTREQ+], 459 [IDr], 460 [CP(CFG_REQUEST)], 461 [N(IPCOMP_SUPPORTED)+], 462 [N(USE_TRANSPORT_MODE)], 463 [N(ESP_TFC_PADDING_NOT_SUPPORTED)], 464 [N(NON_FIRST_FRAGMENTS_ALSO)], 465 SA, TSi, TSr, 466 [V(SIR_VID)] 467 [V+] 469 first response <-- IDr, [CERT+], AUTH, 470 EAP, 471 [V(SIR_VID)] 472 [V+] 474 / --> EAP 475 repeat 1..N times | 476 \ <-- EAP 478 last request --> AUTH 479 [N(QCD_TOKEN)] 481 last response <-- AUTH, 482 [N(QCD_TOKEN)] 483 [CP(CFG_REPLY)], 484 [N(IPCOMP_SUPPORTED)], 485 [N(USE_TRANSPORT_MODE)], 486 [N(ESP_TFC_PADDING_NOT_SUPPORTED)], 487 [N(NON_FIRST_FRAGMENTS_ALSO)], 488 SA, TSi, TSr, 489 [N(ADDITIONAL_TS_POSSIBLE)], 490 [V+] 492 Note that the QCD_TOKEN notification is marked as optional because it 493 is not required by this specification that every implementation be 494 both token maker and token taker. If only one peer sends the QCD 495 token, then a reboot of the other peer will not be recoverable by 496 this method. This may be acceptable if traffic typically originates 497 from the other peer. 499 In any case, the lack of a QCD_TOKEN notification MUST NOT be taken 500 as an indication that the peer does not support this standard. 501 Conversely, if a peer does not understand this notification, it will 502 simply ignore it. Therefore a peer MAY send this notification 503 freely, even if it does not know whether the other side supports it. 505 5.5. Informational Exchange 507 This QCD_TOKEN notification is unprotected, and is sent as a response 508 to a protected IKE request, which uses an IKE SA that is unknown. 510 request --> N(INVALID_IKE_SPI), N(QCD_TOKEN)+ 512 response <-- 514 If child SPIs are persistently mapped to IKE SPIs as described in 515 Section 10.2, we may get the following exchange in response to an ESP 516 or AH packet. 518 request --> N(INVALID_SPI), N(QCD_TOKEN)+ 520 response <-- 522 The QCD_TOKEN and INVALID_IKE_SPI notifications are sent together to 523 support both implementations that conform to this specification and 524 implementations that don't. Similar to the description in section 525 2.21 of [RFC4306], The IKE SPI and message ID fields in the packet 526 headers are taken from the protected IKE request. 528 To support a periodic rollover of token generation constants, the 529 token taker MUST support at least four QCD_TOKEN notifications in a 530 single packet. The token is considered verified if any of the 531 QCD_TOKEN notifications matches. The token maker MAY generate up to 532 four QCD_TOKEN notifications, based on several generations of keys. 534 If the QCD_TOKEN verifies OK, an empty response MUST be sent. If the 535 QCD_TOKEN cannot be validated, a response SHOULD NOT be sent. 536 Section 6 defines token verification. 538 6. Token Generation and Verification 540 No token generation method is mandated by this document. A method is 541 documented in Section 6.1, but only serves as an example. 543 The following lists the requirements from a token generation 544 mechanism: 545 o Tokens MUST be at least 16 octets log, and no more than 128 octets 546 long, to facilitate storage and transmission. Tokens SHOULD be 547 indistinguishable from random data. 548 o It should not be possible for an external attacker to guess the 549 QCD token generated by an implementation. Cryptographic 550 mechanisms such as PRNG and hash functions are RECOMMENDED. 552 o The token maker, MUST be able to re-generate or retrieve the token 553 based on the IKE SPIs even after it reboots. 555 6.1. A Stateless Method of Token Generation 557 This describes a stateless method of generating a token: 558 o At installation or immediately after the first boot of the IKE 559 implementation, 32 random octets are generated using a secure 560 random number generator or a PRNG. 561 o Those 32 bytes, called the "QCD_SECRET", are stored in non- 562 volatile storage on the machine, and kept indefinitely. 563 o The TOKEN_SECRET_DATA is calculated as follows: 565 TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R) 567 o If key rollover is required by policy, the implementation MAY 568 periodically generate a new QCD_SECRET and keep up to 3 previous 569 generations. When sending an unprotected QCD_TOKEN, as many as 4 570 notification payloads may be sent, each from a different 571 QCD_SECRET. 573 6.2. Token Lifetime 575 The token is associated with a single IKE SA, and SHOULD be deleted 576 by the token taker when the SA is deleted or expires. More formally, 577 the token is associated with the pair (SPI-I, SPI-R). 579 7. Backup Gateways 581 Making crash recovery quick is important, but since rebooting a 582 gateway takes a non-zero amount of time, many implementations choose 583 to have a stand-by gateway ready to take over as soon as the primary 584 gateway fails for any reason. 586 If such a configuration is available, it is RECOMMENDED that the 587 stand-by gateway be able to generate the same token as the active 588 gateway. if the method described in Section 6.1 is used, this means 589 that the QCD_SECRET field is identical in both gateways. This has 590 the effect of having the crash recovery available immediately. 592 8. Alternative Solutions 593 8.1. Initiating a new IKE SA 595 Instead of sending a QCD token, we could have the rebooted 596 implementation start an Initial exchange with the peer, including the 597 INITIAL_CONTACT notification. This would have the same effect, 598 instructing the peer to erase the old IKE SA, as well as establishing 599 a new IKE SA with fewer rounds. 601 The disadvantage here, is that in IKEv2 an authentication exchange 602 MUST have a piggy-backed Child SA set up. Since our use case is such 603 that the rebooted implementation does not have traffic flowing to the 604 peer, there are no good selectors for such a Child SA. 606 Additionally, when authentication is asymmetric, such as when EAP is 607 used, it is not possible for the rebooted implementation to initiate 608 IKE. 610 8.2. Birth Certificates 612 Here we should explain why not Birth Certificates. 614 9. Interaction with IFARE 616 IFARE, specified in [resumption] proposes to make setting up a new 617 IKE SA consume less computing resources. This is particularly useful 618 in the case of a remote access gateway that has many tunnels. A 619 failure of such a gateway would require all these many remote access 620 clients to establish an IKE SA either with the rebooted gateway or 621 with a backup gateway. This tunnel re-establishment should occur 622 within a short period of time, creating a burden on the remote access 623 gateway. IFARE addresses this problem by having the clients store an 624 encrypted derivative of the IKE SA for quick re-establishment. 626 What IFARE does not help, is the problem of detecting that the peer 627 gateway has failed. A failed gateway may go undetected for as long 628 as the lifetime of a child SA, because IPsec does not have packet 629 acknowledgement. Before establishing a new IKE SA using IFARE, a 630 client MUST ascertain that the gateway has indeed failed. This could 631 be done using either a liveness check (as in RFC 4306) or using the 632 QCD tokens described in this document. 634 A remote access client conforming to both specifications will store 635 QCD tokens, as well as the IFARE state, if provided by the gateway. 636 A remote access gateway conforming to both specifications will 637 generate a QCD token for the client. When the gateway reboots, the 638 client will discover this in either of two ways: 640 1. The client does regular liveness checks, or else the time for 641 some other IKE exchange has come. Since the gateway is still 642 down, the IKE times out after several minutes. In this case QCD 643 does not help. 644 2. Either the primary gateway or a backup gateway (see Section 7) is 645 ready and sends a QCD token to the client. In that case the 646 client will quickly re-establish the IPsec tunnel, either with 647 the rebooted primary gateway, the backup gateway as described in 648 this document or another gateway as described in [resumption] 650 The full combined protocol looks like this: 652 Initiator Responder 653 ----------- ----------- 654 HDR, SAi1, KEi, Ni --> 656 <-- HDR, SAr1, KEr, Nr, [CERTREQ] 658 HDR, SK {IDi, [CERT,] 659 [CERTREQ,] [IDr,] 660 AUTH, N(QCD_TOKEN) 661 SAi2, TSi, TSr, 662 N(TICKET_REQUEST)} --> 663 <-- HDR, SK {IDr, [CERT,] AUTH, SAr2, TSi, 664 TSr, N(TICKET_OPAQUE) 665 [,N(TICKET_GATEWAY_LIST)]} 667 ---- Reboot ----- 669 HDR, {} --> 670 <-- HDR, N(QCD_Token) 672 HDR, Ni, N(TICKET_OPAQUE), 673 [N+,], SK {IDi, [IDr,] 674 SAi2, TSi, TSr, 675 [CP(CFG_REQUEST)]} --> 676 <-- HDR, SK {IDr, Nr, SAr2, [TSi, TSr], 677 [CP(CFG_REPLY)]} 679 10. Operational Considerations 681 10.1. Who should implement this specification 683 Throughout this document, we have referred to reboot time 684 alternatingly as the time that the implementation crashes and the 685 time when it is ready to process IPsec packets and IKE exchanges. 686 Depending on the hardware and software platforms and the cause of the 687 reboot, rebooting may take anywhere from a few seconds to several 688 minutes. If the implementation is down for a long time, the benefit 689 of this protocol extension are reduced. For this reason critical 690 systems should implement backup gateways as described in Section 7. 691 Note that the lower-case "should" in the previous sentence is 692 intentional, as we do not specify this in the sense of RFC 2119. 694 Implementing the "token maker" side of QCD makes sense for IKE 695 implementation where protected connections originate from the peer, 696 such as inter-domain VPNs and remote access gateways. Implementing 697 the "token taker" side of QCD makes sense for IKE implementations 698 where protected connections originate, such as inter-domain VPNs and 699 remote access clients. 701 To clarify the requirements: 702 o A remote-access client MUST be a token taker and MAY be a token 703 maker. 704 o A remote-access gateway MAY be a token taker and MUST be a token 705 maker. 706 o An inter-domain VPN gateway MUST be both token maker and token 707 taker. 709 In order to limit the effects of DoS attacks, a token taker SHOULD 710 limit the rate of QCD_TOKENs verified from a particular source. 712 If excessive amounts of IKE requests protected with unknown IKE SPIs 713 arrive at a token maker, the IKE module SHOULD revert to the behavior 714 described in section 2.21 of [RFC4306] and either send an 715 INVALID_IKE_SPI notification, or ignore it entirely. 717 10.2. Response to unknown child SPI 719 After a reboot, it is more likely that an implementation receives 720 IPsec packets than IKE packets. In that case, the rebooted 721 implementation will send an INVALID_SPI notification, triggering a 722 liveness check. The token will only be sent in a response to the 723 liveness check, thus requiring an extra round-trip. 725 To avoid this, an implementation that has access to non-volatile 726 storage MAY store a mapping of child SPIs to owning IKE SPIs. If 727 such a mapping is available and persistent across reboots, the 728 rebooted implementation MAY respond to the IPsec packet with an 729 INVALID_SPI notification, along with the appropriate QCD_Token 730 notifications. A token taker SHOULD verify the QCD token that 731 arrives with an INVALID_SPI notification the same as if it arrived 732 with the IKE SPIs of the parent IKE SA. 734 However, a persistent storage module might not be updated in a timely 735 manner, and could be populated with IKE SPIs that have already been 736 rekeyed. A token taker MUST NOT take an invalid QCD Token sent along 737 with an INVALID_SPI notification as evidence that the peer is either 738 malfunctioning or attacking, but it SHOULD limit the rate at which 739 such notifications are processed. 741 10.3. Stateless IKE Recovery cookie 743 The cookie information is chosen by the peer that emits it. As such, 744 the cookie has strictly no meaning for the remote peer and can thus 745 be chosen as seen fit. This section provides recommendations on how 746 to generate and validate those cookies. 748 When an IKE endpoint X sends an unauthenticated CHECK_SPI, the cookie 749 payload following the notify is computed as follow: 751 Cookie = VersionIDofSecret 752 | H( SECRET | CHECK_SPI(..., Query) 753 | ip.src | ip.dst 754 | udp.src | udp.dst) 756 where 757 o SECRET is a randomly generated secret known only to the 758 implementation and periodically changed. 759 o VersionIDofSecret should be changed whenever SECRET is 760 regenerated. 761 o CHECK_SPI(..., Query) is the content of the CHECK_SPI notify 762 payload where the operation subtype has been set to Query (cf. 763 Section 4.1) 764 o ip.src is the source ip address of the IKE packet. 765 o ip.dst is the destination ip address of the IKE packet. 766 o udp.src is the source udp post of the IKE packet. 767 o udp.dst is the destination udp port of the IKE packet. 769 Upon reception of a CHECK_SPI notify (ACK or NACK) followed by a 770 N(Cookie), a peer can verify whether this is the reply to a Query it 771 placed by recomputing the cookie and comparing it to the COOKIE in 772 the IKE message. 774 In order to minimize the range of cryptographic attacks on SECRET, 775 its value SHOULD have a limited life time. 777 11. Security Considerations 778 11.1. Security Considerations for the Stateful Method 780 Tokens MUST be hard to guess. This is critical, because if an 781 attacker can guess the token associated with the IKE SA, she can tear 782 down the IKE SA and associated tunnels at will. When the token is 783 delivered in the IKE_AUTH exchange, it is encrypted. When it is sent 784 again in an unprotected notification, it is not, but that is the last 785 time this token is ever used. 787 An aggregation of some tokens generated by one peer together with the 788 related IKE SPIs MUST NOT give an attacker the ability to guess other 789 tokens. Specifically, if one peer does not properly secure the QCD 790 tokens and an attacker gains access to them, this attacker MUST NOT 791 be able to guess other tokens generated by the same peer. This is 792 the reason that the QCD_SECRET in Section 6.1 needs to be 793 sufficiently long. 795 The QCD_SECRET MUST be protected from access by other parties. 796 Anyone gaining access to this value will be able to delete all the 797 IKE SAs for this token maker. 799 The QCD token is sent by the rebooted peer in an unprotected message. 800 A message like that is subject to modification, deletion and replay 801 by an attacker. However, these attacks will not compromise the 802 security of either side. Modification is meaningless because a 803 modified token is simply an invalid token. Deletion will only cause 804 the protocol not to work, resulting in a delay in tunnel re- 805 establishment as described in Section 2. Replay is also meaningless, 806 because the IKE SA has been deleted after the first transmission. 808 11.2. Security Considerations for the Stateless Method 810 IKE recovery self-protection is discussed all along the document and 811 contains many mechanism to thwart denial of service attacks. 813 IKE recovery is subject to a man-in-the-middle attack that can let 814 the attacker trigger a renegotiation. It has to be noticed that an 815 attacker able to block ESP and/or IKE packets can cause IKE itself to 816 also tear down and trigger a rekey of IKE SA's. With throttling and 817 dampening enabled, IKE recovery is able to reduce the amount of 818 rekeys/negotiations to as low a rate as IKEv2. 820 Overall, IKE Recovery is not more vulnerable than IKEv2 and even 821 improves on the security of IKEv2 by resynchronizing SA's more 822 rapidly which is important with dynamic polices. 824 12. IANA Considerations 826 IANA is requested to assign a notify message type from the error 827 types range (43-8191) of the "IKEv2 Notify Message Types" registry 828 with name "QUICK_CRASH_DETECTION". 830 IANA is requested to assign a notify message type from the status 831 types range (16406-40959) of the "IKEv2 Notify Message Types" 832 registry with name "CHECK_SPI". 834 13. Acknowledgements 836 We would like to thank Hannes Tschofenig and Yaron Sheffer for their 837 comments about IFARE. 839 14. Change Log 841 This section lists all changes in this document 843 NOTE TO RFC EDITOR : Please remove this section in the final RFC 845 14.1. Changes from draft-nir-ike-qcd-00 847 o Merged proposal with draft-detienne-ikev2-recovery [recovery] 848 o Changed the protocol so that the rebooted peer generates the 849 token. This has the effect, that the need for persistent storage 850 is eliminated. 851 o Added discussion of birth certificates. 853 14.2. Changes from draft-nir-qcr-00 855 o Changed name to reflect that this relates to IKE. Also changed 856 from quick crash recovery to quick crash detection to avoid 857 confusion with IFARE. 858 o Added more operational considerations. 859 o Added interaction with IFARE. 860 o Added discussion of backup gateways. 862 15. References 864 15.1. Normative References 866 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 867 Requirement Levels", BCP 14, RFC 2119, March 1997. 869 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 870 RFC 4306, December 2005. 872 [RFC4718] Eronen, P. and P. Hoffman, "IKEv2 Clarifications and 873 Implementation Guidelines", RFC 4718, October 2006. 875 15.2. Informative References 877 [recovery] 878 Detienne, F. and P. Sethi, "Safe IKE Recovery", 879 draft-detienne-ikev2-recovery-00 (work in progress), 880 June 2008. 882 [resumption] 883 Sheffer, Y., Tschofenig, H., Dondeti, L., and V. 884 Narayanan, "IPsec Gateway Failover Protocol", 885 draft-sheffer-ipsec-failover-03 (work in progress), 886 March 2008. 888 Authors' Addresses 890 Yoav Nir 891 Check Point Software Technologies Ltd. 892 5 Hasolelim st. 893 Tel Aviv 67897 894 Israel 896 Email: ynir@checkpoint.com 898 Frederic Detienne 899 Cisco Systems, Inc. 900 De Kleetlaan, 7 901 Diegem B-1831 902 Belgium 904 Phone: +32 2 704 5681 905 Email: fd@cisco.com 906 Pratima Sethi 907 Cisco Systems, Inc. 908 O'Shaugnessy Road, 11 909 Bangalore, Karnataka 560027 910 India 912 Phone: +91 80 4154 1654 913 Email: psethi@cisco.com 915 Full Copyright Statement 917 Copyright (C) The IETF Trust (2008). 919 This document is subject to the rights, licenses and restrictions 920 contained in BCP 78, and except as set forth therein, the authors 921 retain all their rights. 923 This document and the information contained herein are provided on an 924 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 925 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 926 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 927 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 928 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 929 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 931 Intellectual Property 933 The IETF takes no position regarding the validity or scope of any 934 Intellectual Property Rights or other rights that might be claimed to 935 pertain to the implementation or use of the technology described in 936 this document or the extent to which any license under such rights 937 might or might not be available; nor does it represent that it has 938 made any independent effort to identify any such rights. Information 939 on the procedures with respect to rights in RFC documents can be 940 found in BCP 78 and BCP 79. 942 Copies of IPR disclosures made to the IETF Secretariat and any 943 assurances of licenses to be made available, or the result of an 944 attempt made to obtain a general license or permission for the use of 945 such proprietary rights by implementers or users of this 946 specification can be obtained from the IETF on-line IPR repository at 947 http://www.ietf.org/ipr. 949 The IETF invites any interested party to bring to its attention any 950 copyrights, patents or patent applications, or other proprietary 951 rights that may cover technology that may be required to implement 952 this standard. Please address the information to the IETF at 953 ietf-ipr@ietf.org.