idnits 2.17.1 draft-nir-qcr-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 448. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 459. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 466. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 472. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 16, 2008) is 5886 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'IDr' is mentioned on line 192, but not defined ** Obsolete normative reference: RFC 4306 (Obsoleted by RFC 5996) ** Obsolete normative reference: RFC 4718 (Obsoleted by RFC 5996) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nir 3 Internet-Draft Check Point 4 Intended status: Standards Track March 16, 2008 5 Expires: September 17, 2008 7 A Quick Crash Recovery Method for IKE 8 draft-nir-qcr-00.txt 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on September 17, 2008. 35 Copyright Notice 37 Copyright (C) The IETF Trust (2008). 39 Abstract 41 This document describes an extension to the IKEv2 protocol that 42 allows for faster crash recovery using a saved token method. 44 When an IPsec tunnel between two IKEv2 implementations is 45 disconnected due to a restart of one peer, it can take as much as 46 several minutes to recover. In this text we propose an extension to 47 the protocol, that allows for recovery within a few seconds of the 48 reboot. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. Conventions Used in This Document . . . . . . . . . . . . 3 54 2. RFC 4306 Crash Recovery . . . . . . . . . . . . . . . . . . . 3 55 3. Protocol Outline . . . . . . . . . . . . . . . . . . . . . . . 4 56 4. Formats and Exchanges . . . . . . . . . . . . . . . . . . . . 4 57 4.1. Notification Format . . . . . . . . . . . . . . . . . . . 4 58 4.2. Authentication Exchange . . . . . . . . . . . . . . . . . 5 59 4.3. Informational Exchange . . . . . . . . . . . . . . . . . . 7 60 5. Token Generation and Verification . . . . . . . . . . . . . . 7 61 5.1. A Stateful Method of Token Generation . . . . . . . . . . 7 62 5.2. A Stateless Method of Token Generation . . . . . . . . . . 8 63 5.3. Token Lifetime . . . . . . . . . . . . . . . . . . . . . . 8 64 6. Alternative Solutions . . . . . . . . . . . . . . . . . . . . 8 65 6.1. Why not Save the Entire IKE SA . . . . . . . . . . . . . . 8 66 6.2. Initiating a new IKE SA . . . . . . . . . . . . . . . . . 9 67 7. Operational Considerations . . . . . . . . . . . . . . . . . . 9 68 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 69 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 70 10. Normative References . . . . . . . . . . . . . . . . . . . . . 10 71 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11 72 Intellectual Property and Copyright Statements . . . . . . . . . . 12 74 1. Introduction 76 IKEv2, as described in [RFC4306] has a method for recovering from a 77 reboot of one peer. As long as traffic flows in both directions, the 78 rebooted peer should re-establish the tunnels immediately. However, 79 in many cases the rebooted peer is a VPN gateway that protects only 80 servers, or else the non-rebooted peers have a dynamic IP address. 81 In such cases, the rebooted peer will not re-establish the tunnels. 83 Section 2 describes the current procedure, and explains why crash 84 recovery can take up to several minutes. The method proposed here, 85 is to send a token in the IKE_AUTH exchange that establishes the 86 tunnel. That token can be maintained on the peer in some kind of 87 persistent storage such as a disk or a database, and can be used to 88 delete the IKE SA after a crash. Deleting the IKE SA results is a 89 quick re-establishment of the IPsec tunnel. 91 1.1. Conventions Used in This Document 93 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 94 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 95 document are to be interpreted as described in [RFC2119]. 97 2. RFC 4306 Crash Recovery 99 When one peer reboots, the other peer does not get any notification, 100 so IPsec traffic can still flow. The rebooted peer will not be able 101 to decrypt it, however, and the only remedy is to send an unprotected 102 INFORMATIONAL exchange with an INVALID_SPI notification as described 103 in section 3.10.1 of [RFC4306]. That section also describes the 104 processing of such a notification: "If this Informational Message is 105 sent outside the context of an IKE_SA, it should be used by the 106 recipient only as a "hint" that something might be wrong (because it 107 could easily be forged)." 109 Since the INVALID_SPI can only be used as a hint, the non-rebooted 110 peer has to determine whether the IPsec SA, and indeed the parent IKE 111 SA are still valid. The method of doing this is described in section 112 2.4 of [RFC4306]. This method, called "liveness check" involves 113 sending a protected empty INFORMATIONAL message, and awaiting a 114 response. This procedure is sometimes refered to as "Dead Peer 115 Detection" or DPD. 117 Section 2.4 does not mandate how many times the INFORMATIONAL message 118 should be retransmitted, or for how long, but does recommend the 119 following: "It is suggested that messages be retransmitted at least a 120 dozen times over a period of at least several minutes before giving 121 up on an SA". Clearly, implementations differ, but all will take a 122 significant amount of time. 124 3. Protocol Outline 126 Supporting implementations will send a notification, called a "QCR 127 token", as described in Section 4.1 in the last packets of the 128 IKE_AUTH exchange. These are the final request and final response 129 that contain the AUTH payloads. The generation of these tokens is a 130 local matter for implementations, but considerations are described in 131 Section 5. 133 A supporting implementation receiving such a token SHOULD store it in 134 such a way, that it will survive a reboot. When a supporting 135 implementation receives a protected IKE request message with unknown 136 IKE SPIs, it should scan its saved token store. If a token matching 137 the IKE SPIs is found, it SHOULD send it to the requesting peer in an 138 unprotected IKE message as described in Section 4.3. 140 When a supporting implementation receives the QCR notification token 141 in an unprotected INFORMATIONAL exchange, it MUST verify that the 142 TOKEN_SECRET_DATA field is associated with the IKE SPIs in the 143 IKE_SPI fields of the IKE packet. If the verification fails, it 144 SHOULD log the event. If it succeeds, it MUST delete the IKE SA 145 associated with the IKE_SPI fields, and all dependant child SAs. 146 This event MAY also be logged. 148 A supporting implementation MAY immediately create new SAs using an 149 Initial exchange, or it may wait for subsequent traffic to trigger 150 the creation of new SAs. 152 There is ongoing work on IKEv2 Session Resumption [resumption]. The 153 current proposal is orthogonal to Session Resumption, and in fact 154 using Session Resumption instead of a regular IKE exchange, the new 155 SA can be created with minimal overhead. 157 4. Formats and Exchanges 159 4.1. Notification Format 161 The notification payload called "QCR token" is formatted as follows: 163 1 2 3 164 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 165 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 166 ! Next Payload !C! RESERVED ! Payload Length ! 167 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 168 ! Protocol ID ! SPI Size ! QCR Token Notify Message Type ! 169 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 170 ! ! 171 ~ TOKEN_SECRET_DATA ~ 172 ! ! 173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 175 o Protocol ID (1 octet) MUST contain 1, as this message is related 176 to an IKE SA. 177 o SPI Size (1 octet) MUST be zero, in conformance with [RFC4306]. 178 o QCR Token Notify Message Type (2 octets) - Must be xxxxx, the 179 value assigned for QCR token notifications. TBA by IANA. 180 o TOKEN_SECRET_DATA (16-256 octets) contains a generated token as 181 described in Section 5. 183 4.2. Authentication Exchange 185 For clarity, only the EAP version of an AUTH exchange will be 186 presented here. The non-EAP version is very similar. The figure 187 below is based on appendix A.3 of [RFC4718]. 189 first request --> IDi, 190 [N(INITIAL_CONTACT)], 191 [[N(HTTP_CERT_LOOKUP_SUPPORTED)], CERTREQ+], 192 [IDr], 193 [CP(CFG_REQUEST)], 194 [N(IPCOMP_SUPPORTED)+], 195 [N(USE_TRANSPORT_MODE)], 196 [N(ESP_TFC_PADDING_NOT_SUPPORTED)], 197 [N(NON_FIRST_FRAGMENTS_ALSO)], 198 SA, TSi, TSr, 199 [V+] 201 first response <-- IDr, [CERT+], AUTH, 202 EAP, 203 [V+] 205 / --> EAP 206 repeat 1..N times | 207 \ <-- EAP 209 last request --> AUTH 210 [N(QCR_TOKEN)] 212 last response <-- AUTH, 213 [N(QCR_TOKEN)] 214 [CP(CFG_REPLY)], 215 [N(IPCOMP_SUPPORTED)], 216 [N(USE_TRANSPORT_MODE)], 217 [N(ESP_TFC_PADDING_NOT_SUPPORTED)], 218 [N(NON_FIRST_FRAGMENTS_ALSO)], 219 SA, TSi, TSr, 220 [N(ADDITIONAL_TS_POSSIBLE)], 221 [V+] 223 Note that the QCR_TOKEN notification is marked as optional because it 224 is not required by this specification that both sides send QCR 225 tokens. If only one peer sends the QCR token, then a reboot of the 226 other peer will not be recoverable by this method. This may be 227 acceptable if traffic typically originates from the other peer. 229 In any case, the lack of a QCR_TOKEN notification MUST NOT be taken 230 as an indication that the peer does not support this standard. 231 Conversely, if a peer does not understand this notification, it will 232 simply ignore it. Therefore a peer MAY send this notification 233 freely, even if it doesnOt know whether the other side supports it. 235 4.3. Informational Exchange 237 This informational exchange is non-protected, and is sent as a 238 response to a protected IKE request, which uses an IKE SA that is 239 unknown. 241 request --> N(QCR_TOKEN) 243 response <-- 245 The QCR_TOKEN is the only notification in the request. Similar to 246 the description in section 2.21 of [RFC4306], The IKE SPI and message 247 ID fields in the packet headers are taken from the protected IKE 248 request. 250 If the QCR_TOKEN verifies OK, an empty response MUST be sent. If the 251 QCR_TOKEN cannot be validated, a response SHOULD NOT be sent. 252 Section 5 defines token verification. 254 5. Token Generation and Verification 256 No token generation method is mandated by this document. Two methods 257 are documented in Section 5.1 and Section 5.2, but they only serve as 258 examples. 260 The following lists the requirements from a token generation 261 mechanism: 262 o Tokens should be at least 16 octets log, and no more than 256 263 octets long, to facilitate storage. 264 o It should not be possible for an external attacker to guess the 265 QCR token generated by an implementation. Cryptographic 266 mechanisms such as PRNG and hash functions are RECOMMENDED. 267 o The peer that generated the QCR token, should be able to 268 immediately verify it, provided that the IKE SPIs are given, and 269 that the IKE SA has not expired or been otherwise deleted. 271 5.1. A Stateful Method of Token Generation 273 This describes a stateful method of generating a token: 274 o Before sending the QCR token, 32 random octets are generated using 275 a secure random number generator or a PRNG. 276 o Those 32 bytes are used as the TOKEN_SECRET_DATA field, and stored 277 as part of the IKE SA. 278 o For verification, the IKE implementation simply retrieves the IKE 279 SA, and compares the TOKEN_SECRET_DATA field from the notification 280 to the TOKEN_SECRET_DATA field stored with the SA. 282 5.2. A Stateless Method of Token Generation 284 This describes a stateless method of generating a token. 285 o At startup, the IKE implementation generates a 32-octet random 286 buffer using a cryptographically secure PRNG. This buffer is 287 called the QCR_SECRET. 288 o For each QCR token, the TOKEN_SECRET_DATA field is generated by 289 calculating a SHA-256 hash over a concatenation of the QCR_SECRET 290 and the IKE SPI as follows: 292 TOKEN_SECRET_DATA = HASH(QCR_SECRET | SPI-I | SPI-R) 294 o Verification uses the same calculation, and works even if the IKE 295 SA has been deleted. Still, if the IKE SA is no longer valid, the 296 notification MUST NOT be acknowledged, as this could be used in an 297 attempt to guess the QCR_SECRET. 299 5.3. Token Lifetime 301 The token is associated with a single IKE SA, and SHOULD be deleted 302 when the SA is deleted or expires. More formally, the token is 303 associated with the pair (SPI-I, SPI-R). 305 6. Alternative Solutions 307 6.1. Why not Save the Entire IKE SA 309 IKEv2 does not assume the existence of a persistent storage module. 310 If we are adding such a module, why not use it to save the entire IKE 311 SA across reboots, nullifying the need for a crash recovery 312 procedure? 314 There are several reasons why we believe that this is not a good 315 idea: 316 1. A token is only 16-256 octets, and is much more compact than all 317 the data needed to store an IKE SA. 318 2. A token is valid for the life of an IKE SA. An IKE SA state is 319 updated whenever a message is sent, becuase of the requirement to 320 keep the sequence of message IDs. It may not be acceptable to 321 update the persistent storage whenever an IKE message is sent. 322 3. A reboot is usually an unpredictable event, and as such, we 323 cannot know how long it will last. By the time the machine has 324 rebooted, the peer may have attempted some type of protected 325 exchange (liveness check, create-child-SA or delete), timed out, 326 and deleted the SA. It is far better to reboot without SAs and 327 with only a token for quick recovery. 329 6.2. Initiating a new IKE SA 331 Instead of sending a QCR token, we could have the rebooted 332 implementation start an Initial exchange with the peer, including the 333 INITIAL_CONTACT notification. This would have the same effect, 334 instructing the peer to erase the old IKE SA, as well as establishing 335 a new IKE SA with fewer rounds. 337 The disadvantage here, is that in IKEv2 an authentication exchange 338 MUST have a piggy-backed Child SA set up. Since our use case is such 339 that the rebooted implementation does not have traffic flowing to the 340 peer, there are no good selectors for such a child SA. 342 Additionally, when authentication is assymetric, such as when EAP is 343 used, it is not possible for the rebooted implementation to initiate 344 IKE. 346 7. Operational Considerations 348 To support this standard, an implementation needs to have access to a 349 persistent storage module. This could be an internal hard disk, a 350 local or remote database application, or any other method that 351 persists across reboots. This storage module and the data links 352 between the storage module and the IKE module must meet the 353 performance requirements of the IKE module. The storage module MUST 354 support insertion and deletion rates equal to peek IKE SA setup rates 355 and it SHOULD support query rates that are fast enough. 357 See Section 8 for security considerations for this storage mechanism. 359 In order to limit the effects of DoS attacks, an implementation 360 SHOULD limit the rate of queries into the token storage so as not to 361 overload it. If excessive amounts of IKE requests protected with 362 unknown IKE SPIs arrive, the IKE module SHOULD revert to the behavior 363 described in section 2.21 of [RFC4306] and either send an 364 INVALID_IKE_SPI notification, or ignore it entirely. 366 8. Security Considerations 368 Tokens MUST be hard to guess. This is critical, because if an 369 attacker can guess the token associated with the IKE SA, she can tear 370 down the IKE SA and associated tunnels at will. When the token is 371 delivered in the IKE_AUTH exchange, it is encrypted. When it is sent 372 back in an informational exchange it is not encrypted, but that is 373 the last use of that token. 375 An aggregation of some tokens generated by one peer together with the 376 related IKE SPIs MUST NOT give an attacker the ability to guess other 377 tokens. Specifically, if one peer does not properly secure the QCR 378 tokens and an attacker gains access to them, this attacker MUST NOT 379 be able to guess other tokens generated by the same peer. This is 380 the reason that the QCR_SECRET in Section 5.2 needs to be long. 382 The persistent storage MUST be protected from access by other 383 parties. Anyone gaining access to the contents of the storage will 384 be able to delete all the IKE SAs described in it. 386 The tokens associated with expired and deleted IKE SAs MUST be 387 deleted from the storage, so that a future compromise of the storage 388 does not reveal enough tokens to facilitate an attack against the QCR 389 tokens. 391 The QCR token is sent by the rebooted peer in an unprotected message. 392 A message like that is subject to modification, deletion and replay 393 by an attacker. However, these attacks will not compromise the 394 security of either side. Modification is meaningless because a 395 modified token is simply an invalid token. Deletion will only cause 396 the protocol not to work, resulting in a delay in tunnel re- 397 establishment as described in Section 2. Replay is also meaningless, 398 because the IKE SA has been deleted after the first transmission. 400 9. IANA Considerations 402 IANA is requested to assign a notify message type from the error 403 types range (43-8191) of the "IKEv2 Notify Message Types" registry 404 with name "QUICK_CRASH_RECOVERY". 406 10. Normative References 408 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 409 Requirement Levels", BCP 14, RFC 2119, March 1997. 411 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 412 RFC 4306, December 2005. 414 [RFC4718] Eronen, P. and P. Hoffman, "IKEv2 Clarifications and 415 Implementation Guidelines", RFC 4718, October 2006. 417 [resumption] 418 Sheffer, Y., Tschofenig, H., Dondeti, L., and V. 420 Narayanan, "IPsec Gateway Failover Protocol", 421 draft-sheffer-ipsec-failover-02 (work in progress), 422 November 2007. 424 Author's Address 426 Yoav Nir 427 Check Point Software Technologies Ltd. 428 5 Hasolelim st. 429 Tel Aviv 67897 430 Israel 432 Email: ynir@checkpoint.com 434 Full Copyright Statement 436 Copyright (C) The IETF Trust (2008). 438 This document is subject to the rights, licenses and restrictions 439 contained in BCP 78, and except as set forth therein, the authors 440 retain all their rights. 442 This document and the information contained herein are provided on an 443 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 444 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 445 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 446 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 447 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 448 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 450 Intellectual Property 452 The IETF takes no position regarding the validity or scope of any 453 Intellectual Property Rights or other rights that might be claimed to 454 pertain to the implementation or use of the technology described in 455 this document or the extent to which any license under such rights 456 might or might not be available; nor does it represent that it has 457 made any independent effort to identify any such rights. Information 458 on the procedures with respect to rights in RFC documents can be 459 found in BCP 78 and BCP 79. 461 Copies of IPR disclosures made to the IETF Secretariat and any 462 assurances of licenses to be made available, or the result of an 463 attempt made to obtain a general license or permission for the use of 464 such proprietary rights by implementers or users of this 465 specification can be obtained from the IETF on-line IPR repository at 466 http://www.ietf.org/ipr. 468 The IETF invites any interested party to bring to its attention any 469 copyrights, patents or patent applications, or other proprietary 470 rights that may cover technology that may be required to implement 471 this standard. Please address the information to the IETF at 472 ietf-ipr@ietf.org. 474 Acknowledgment 476 Funding for the RFC Editor function is provided by the IETF 477 Administrative Support Activity (IASA).