idnits 2.17.1 draft-ietf-ipsecme-ipsecha-protocol-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 11, 2010) is 4917 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5798' is mentioned on line 166, but not defined == Missing Reference: 'RFC-5685' is mentioned on line 224, but not defined == Missing Reference: 'CERT' is mentioned on line 308, but not defined == Missing Reference: 'CERTREQ' is mentioned on line 308, but not defined == Missing Reference: 'IDr' is mentioned on line 308, but not defined == Unused Reference: 'RFC5685' is defined on line 703, but no explicit reference was found in the text == Unused Reference: 'RFC5723' is defined on line 706, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5996 (Obsoleted by RFC 7296) Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Singh, Ed. 3 Internet-Draft G. Kalyani 4 Intended status: Standards Track Cisco 5 Expires: April 14, 2011 Y. Nir 6 Check Point 7 D. Zhang 8 Huawei 9 October 11, 2010 11 Protocol Support for High Availability IKEv2/IPsec 12 draft-ietf-ipsecme-ipsecha-protocol-01 14 Abstract 16 IKEv2 and IPsec protocols are widely used for deploying VPN. In 17 order to make such VPN highly available, more scalable and failure- 18 prone, these VPNs are implemented as IKEv2/IPsec Highly Available 19 (HA) cluster. But there are many issues in IKEv2/IPsec HA cluster. 20 The draft "IPsec Cluster Problem Statement" enumerates all the issues 21 encountered in IKEv2/IPsec HA cluster environment. 23 This document proposes an extension to IKEv2 protocol to solve main 24 issues of "IPsec Cluster Problem Statement" in Hot Standby cluster 25 and gives implementation advice for other issues. The main issues to 26 be solved are: 27 o IKEv2 Message Id synchronization : This is done by syncing up 28 expected send and receive message Id values with the peer and 29 updating the values at the newly active cluster member after the 30 failover. 31 o IPsec Replay Counter synchronization : This is done by syncing up 32 bumped up outgoing SA replay counters values with peer and 33 updating the values at the newly active cluster member after the 34 failover. 36 Status of this Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at http://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on April 14, 2011. 53 Copyright Notice 55 Copyright (c) 2010 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 72 3. Issues solved from IPsec Cluster Problem Statement . . . . . . 6 73 4. IKEv2/IPsec SA Counter Synchronization Problem . . . . . . . . 6 74 5. IKEv2/IPsec SA Counter Synchronization Solution . . . . . . . 8 75 6. IKEv2/IPsec synchronization notification payloads . . . . . . 9 76 6.1. IKEV2_MESSAGE_ID_SYNC_SUPPORTED . . . . . . . . . . . . . 10 77 6.2. IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED . . . . . . . . . . . 10 78 6.3. IKEV2_MESSAGE_ID_SYNC . . . . . . . . . . . . . . . . . . 11 79 6.4. IPSEC_REPLAY_COUNTER_SYNC . . . . . . . . . . . . . . . . 11 80 7. Details of implementation . . . . . . . . . . . . . . . . . . 12 81 8. Step-by-Step details . . . . . . . . . . . . . . . . . . . . . 13 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 83 10. Interaction with other drafts . . . . . . . . . . . . . . . . 14 84 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 85 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 86 13. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 16 87 13.1. Draft -01 . . . . . . . . . . . . . . . . . . . . . . . . 16 88 13.2. Draft -00 . . . . . . . . . . . . . . . . . . . . . . . . 16 89 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 90 14.1. Normative References . . . . . . . . . . . . . . . . . . . 17 91 14.2. Informative References . . . . . . . . . . . . . . . . . . 17 92 Appendix A. IKEv2 Message Id examples . . . . . . . . . . . . . . 17 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 95 1. Introduction 97 IKEv2 is used for deploying IPsec-based VPNs. In order to make such 98 VPN highly available, more scalable and failure-prone, these VPNs are 99 implemented as IKEv2/IPsec Highly Available (HA) cluster. But there 100 are many issues in IKEv2/IPsec HA cluster. The draft "IPsec Cluster 101 Problem Statement" enumerates all the issues encountered in IKEv2/ 102 IPsec HA cluster. 104 In case of Hot Standby cluster implementation of IKEv2/IPsec based 105 VPNs, the IKEv2/IPsec session gets established with the peer and the 106 active member of cluster. After that, the active member syncs/ 107 updates the IKE/IPsec SA state to the standby member of the cluster. 108 This primary SA state sync-up is done on SA bring up and/or rekey. 109 Doing SA state synchronization/updation between active and peer 110 member for each IKE and IPsec message standby cluster is very costly, 111 so normally its done periodically. So, when "failover" event happens 112 in the cluster, first "failover' is detected by the standby member 113 and then it becomes active member and it takes considerable time. 114 During the time of failover and standby member becoming newly active 115 member, the peer is unaware of failover and keeps sending IKE request 116 and IPsec packets to the cluster which is allowed as per IKEv2 and 117 IPsec windowing feature. Now, newly active member after coming up 118 finds the mismtach in IKE message Id's and IPsec replay counters. 119 Please see Section 4 for more details. 121 This document proposes an extension to IKEv2 protocol to solve main 122 issues of IKE message id sync and IPsec SA replay counter sync and 123 gives implementation advice for others. Here is summary of solutions 124 provided in this document: 126 IKEv2 Message Id synchronization :This is done by syncing up expected 127 send and receive message Id values with the peer and updating the 128 values at the newly active cluster member after the failover. 130 IPsec Replay Counter synchronization : This is done by syncing up 131 bumped up outgoing SA replay counters values with peer and updating 132 the values at the newly active cluster member after the failover 134 Though this document describes the IKEv2 message Id sync and IPsec 135 replay counter synchronization in context of IPsec HA cluster, the 136 solution provided is genetic and can be used in other scenarios where 137 IKEv2 message Id sync or IPsec SA replay counters sync is required. 139 While some IPsec HA implementation suffers from IKEv2 message Id 140 synchronization problem, some other implementation suffers from IPsec 141 replay counter synchronization. Both of these problem are handled 142 separately, using separate notify for each problem. This provides 143 the flexibility of implementing IKEv2 message Id synchronization or 144 IPsec replay counter synchronization or both. 146 2. Terminology 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 150 document are to be interpreted as described in RFC 2119 [RFC2119]. 152 "SA Counter SYNC Request" is the information exchange request defined 153 in this document to synchronize the IKEv2/IPsec SA counter 154 information between member of the cluster and the peer. 156 "SA Counter SYNC Response" is the information exchange response 157 defined in this document to synchronize the IKEv2/IPsec SA counter 158 information between member of the cluster and the peer. 160 Below are the terms taken from [IPsec Cluster Problem Statement] with 161 added information in context of this document. 163 "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of 164 the members is active at any one time. This member is also referred 165 to as the "active", whereas the other(s) are referred to as 166 "standbys". VRRP ([RFC5798]) is one method of building such a 167 cluster. The goal of Hot Standby Cluster is that it creates illusion 168 of single virtual gateway to the peer(s). 170 "Active Member" is the primary member in the Hot Standby cluster. It 171 is responsible for forwarding packets for the virtual gateway. 173 "Standby Member" is the primary backup router. The member takes 174 control i.e. becomes active member after the "failover" event. 176 "Peer" is the IKEv2/IPsec endpoint which establishes VPN connection 177 with Hot Standby cluster. The Peer knows Hot Standby Cluster by 178 single cluster's IP address. In case of "failover", the standby 179 member of the cluster becomes active, so the peer normally doesn't 180 notice that "failover" has occurred in the cluster. 182 "Multiple failover" is the situation when in a cluster with three or 183 more nodes failover happens in rapid succession. The protocol and 184 implementation must be able to handle multiple failover i.e. able to 185 handle new failover even if they are still processing the old 186 failover. 188 "Simultaneous failover" is the situation when in a cluster the 189 failover happens at the both ends at the same time. The protocol and 190 implementation must be able to handle simultaneous failover. 192 The generic term IKEv2/IPsec SA counters is used throughout. By 193 IKEv2 SA counter stands for IKEv2 message ids and IPsec SA counter 194 stands for IPsec SA replay counters which are used to provide 195 optional anti-replay feature. 197 3. Issues solved from IPsec Cluster Problem Statement 199 IPsec Cluster Problem Statement defines the problems encountered in 200 IPsec Clusters. . The problems along with their section names as 201 given in the statement are as follows. 202 o 3.2. Lots of Long Lived State 203 o 3.3. IKE Counters 204 o 3.4. Outbound SA Counters 205 o 3.5. Inbound SA Counters 206 o 3.6. Missing Synch Messages 207 o 3.7. Simultaneous use of IKE and IPsec SAs by Different Members 208 * 3.7.1. Outbound SAs using counter modes 209 o 3.8. Different IP addresses for IKE and IPsec 210 o 3.9. Allocation of SPIs 212 This document solves the main issues using the protocol extension, 213 and provides implementation advice for other issues, given as 214 follows. 215 o 3.2 This section mentions that there's lots of state that needs to 216 be synchronized. If state is not synchronized, it's not really an 217 interesting cluster - failover will be just like a reboot, so the 218 issue need not be solved with protocol extensions. 219 o 3.3, 3.4,3.5, and 3.6 are solved by this document. Please see 220 Section 4, for more details. 221 o 3.7 is the problem to be solved while building clusters. However, 222 the peers should be mandated to accept multiple parallel SAs for 223 3.7.1 224 o 3.8 can be solved by using IKEv2 Redirect Mechanism [RFC-5685]. 225 o 3.9 is the problem about avoiding collision of same SPI's among 226 the cluster members. This is outside the scope of the document 227 since this has to be solved within the context of the cluster and 228 not with the peer. 230 4. IKEv2/IPsec SA Counter Synchronization Problem 232 IKEv2 RFC states that "An IKE endpoint MUST NOT exceed the peer's 233 stated window size for transmitted IKE requests". 235 As per the protocol, all IKEv2 packets follows request-response 236 paradigm. The initiator of an IKEv2 request MUST retransmit the 237 request, until it has received a response from the peer. IKEv2 238 introduces a windowing mechanism that allows multiple requests to be 239 outstanding at a given point of time, but mandates that the sender 240 window does not move until the oldest message sent from one peer to 241 another is acknowledged. Loss of even a single packet leads to 242 repeated re-transmissions followed by an IKEv2 SA teardown if the re- 243 transmissions are unacknowledged. 245 IPsec Hot Standby Cluster is required to ensure that in case of 246 failover of active member, the standby member becomes active 247 immediately. The standby member is expected to have the exact values 248 of message id fields of active member before failover. Even with the 249 best efforts to update the message Id values from active to standby 250 member, the values at standby member can be stale due to following 251 reasons: 252 o Standby member is unaware of the last message that was received 253 and acknowledged by the older active member as failover could have 254 happened before the standby could be updated. 255 o Standby member does not have information about on-going 256 unacknowledged requests of active member before the failover 257 event. So after failover event when standby member becomes 258 active, it can not re-transmit those requests. 260 When a standby member takes over as the active member, it would start 261 the message id ranges from previously updated values. This would 262 make it reject requests from the peer, since the values would be 263 stale. As a sender, the standby member may end up reusing a stale 264 message id which will cause the peer to drop the request. Eventually 265 there is a high probability of the IKEv2 and corresponding IPsec SAs 266 getting torn down simply because of a transitory message id mis-match 267 and re-transmission of requests. This is not a desirable feature of 268 HA. Even after updating standby member periodically the cluster can 269 loose IKE and so all IPsec SA due to message id i.e. SA counter 270 mismatch. 272 Similar issue is observed in IPsec counters also if anti-replay 273 protection/ESN is implemented. Even with the best efforts of syncing 274 the ESP and AH SA counter numbers from active to stand by member , 275 there is a chance that the stand-by member would have stale counter 276 values. The standby member would then send the stale counter 277 numbers. The peer would reject/drop such packets since in case of 278 anti-replay protection feature, duplicate use of counters are not 279 allowed. In case of IPsec it is OK to skip some counter values and 280 start with the higher counter values. 282 Hence a mechanism is required in HA to ensure that the standby member 283 has correct values of message Id values and IPsec counters, so that 284 sessions are not torn down just because of mismatching counters. 286 5. IKEv2/IPsec SA Counter Synchronization Solution 288 When the standby member becomes the active member after failover 289 event in the cluster, the standby member would send an authenticated 290 IKEv2 request to the peer to send its values of SA counters. 292 The standby member would then update its values of SA counters and 293 then start sending/receiving the requests. 295 First, the peer MUST negotiate its ability to support IKEv2 message 296 Id synchronization information with active member of the cluster by 297 sending the IKEV2_MESSAGE_ID_SYNC_SUPPORTED notification in IKE_AUTH 298 exchange. 300 Similarly to support IPsec replay counter synchronization, the peer 301 MUST negotiate its ability to support IPsec replay counter 302 synchronization with active member of the cluster by sending 303 IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED notification in IKE_AUTH 304 exchange. 306 Peer Active Member 307 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 308 HDR, SK {IDi, [CERT], [CERTREQ], [IDr], AUTH, 309 N[IKEV2_MESSAGE_ID_SYNC_SUPPORTED], 310 N[IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED], 311 SAi2, TSi, TSr} ----------> 313 <---------- HDR, SK {IDr, [CERT+], [CERTREQ+], AUTH, 314 N[IKEV2_MESSAGE_ID_SYNC_SUPPORTED], 315 N[IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED], SAr2, TSi, TSr} 317 When peer and active member both support SA counter synchronization, 318 the active member MUST sync/update SA counter synchronization 319 capability to the standby member after the establishment of the IKE 320 SA . So that standby member is aware of the capability and can use 321 it when it becomes the active member after failover event. 323 After failover event, when the standby member becomes the active 324 member, it has to request the peer for the SA counters. Standby 325 member would initiate the SYNC Request with an INFORMATIONAL exchange 326 with message Id zero containing the notify IKEV2_MESSAGE_ID_SYNC or 327 IPSEC_REPLAY_COUNTER_SYNC or both depending on whether the 328 synchronization needs to be done for IKEv2 message Ids, IPsec replay 329 counters, or both. 331 The initiator of IKEv2 message Id sync request sends its expected 332 send and receive message Id values and "failover count" in 333 IKEV2_MESSAGE_ID_SYNC notify. The responder of the request compares 334 the received values with the available local values. The higher 335 among both is selected and sent as sync response with notify 336 IKEV2_MESSAGE_ID_SYNC. The initiator now updates send and receive 337 IKEv2 message Ids to the values received in sync response and can 338 start normal IKEv2 message exchange. 340 The initiator of IPsec replay counter sync sends bumped outgoing 341 IPsec SA reply counter value and "failover count" in 342 IPSEC_REPLAY_COUNTER_SYNC notify. The responder of the request 343 updates its incoming IPsec SA counter values and sends its bumped 344 outgoing IPsec SA replay counter value in sync response with 345 IPSEC_REPLAY_COUNTER_SYNC. The initiator now updates its incoming 346 IPsec SA counter to values received in sync response and can start 347 normal IPsec data traffic. 349 Both the notify types IKEV2_MESSAGE_ID_SYNC and 350 IPSEC_REPLAY_COUNTER_SYNC contain Nonce Data in the payload to avoid 351 DOS attack due to replay of SA counter sync request/response. The 352 Nonce are defined per notify and MUST be validated. The Nonce data 353 sent in response MUST match with nonce data sent by newly-active 354 member in request. If nonce data received in response does not match 355 with nonce data sent in request, the standby i.e. newly-active member 356 MUST discard this response, and normal IKEv2 behavior of re- 357 transmitting the request and waiting for genuine reply from the peer 358 SHOULD follow, before tearing down the SA because of re-transmits. 360 Standby [Newly Active] Member Peer 361 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 362 HDR, SK {N[IKEV2_MESSAGE_ID_SYNC ], 363 N[IPSEC_REPLAY_COUNTER_SYNC]} --------> 365 <--------- HDR, SK {N[IKEV2_MESSAGE_ID_SYNC ], 366 N[IPSEC_REPLAY_COUNTER_SYNC]} 368 6. IKEv2/IPsec synchronization notification payloads 370 Below are the new notify and payload types that are defined 372 6.1. IKEV2_MESSAGE_ID_SYNC_SUPPORTED 374 IKEV2_MESSAGE_ID_SYNC_SUPPORTED: This notify is included in the 375 IKE_AUTH request/response to indicate support for IKEv2 message Id 376 synchronization mechanism described in this document. 378 1 2 3 379 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 380 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 381 | Next Payload |C| RESERVED | Payload Length | 382 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 383 |Protocol ID(=0)| SPI Size (=0) | Notify Message Type | 384 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 386 The 'Next Payload', 'Payload Length', 'Protocol ID', 'SPI Size', and 387 'Notify Message Type' fields are the same as described in Section 3 388 of [RFC5996]. The 'SPI Size' field MUST be set to 0 to indicate that 389 the SPI is not present in this message. The 'Protocol ID' MUST be 390 set to 0, since the notification is not specific to a particular 391 security association. 'Payload Length' field is set to the length in 392 octets of the entire payload, including the generic payload header. 393 The 'Notify Message Type' field is set to indicate the 394 IKEV2_MESSAGE_ID_SYNC_SUPPORTED payload. 396 6.2. IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED 398 IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED: This notify is included in the 399 IKE_AUTH request/response to indicate support for IPsec SA replay 400 counter synchronization mechanism described in this document. 402 1 2 3 403 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | Next Payload |C| RESERVED | Payload Length | 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 |Protocol ID(=0)| SPI Size (=0) | Notify Message Type | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 410 The 'Next Payload', 'Payload Length', 'Protocol ID', 'SPI Size', and 411 'Notify Message Type' fields are the same as described in Section 3 412 of [RFC5996]. The 'SPI Size' field MUST be set to 0 to indicate that 413 the SPI is not present in this message. The 'Protocol ID' MUST be 414 set to 0, since the notification is not specific to a particular 415 security association. 'Payload Length' field is set to the length in 416 octets of the entire payload, including the generic payload header. 417 The 'Notify Message Type' field is set to indicate the 418 IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED payload. 420 6.3. IKEV2_MESSAGE_ID_SYNC 422 IKEV2_MESSAGE_ID_SYNC : This payload type is defined to sync the 423 IKEv2 message Ids among newly-active [standby] member and the peer. 425 1 2 3 426 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 427 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 428 | Next Payload | RESERVED | Payload Length | 429 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 430 | Failover count | 431 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 432 | Nonce Data | 433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 434 | EXPECTED_SEND_REQ_MESSAGE_ID | 435 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 | EXPECTED_RECV_REQ_MESSAGE_ID | 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 439 It contains the following data. 440 o Failover count (4 octets) : The failover count within the cluster, 441 it increases with each failover event in HA cluster. 442 o Nonce Data (4 octets) : The random nonce data. It should be sent 443 same in the SYNC Request and Response. The nonce data is used to 444 counter the replay of IKEV2_MESSAGE_ID_SYNC response by the 445 attacker. 446 o EXPECTED_SEND_REQ_MESSAGE_ID (4 octets) : This MUST be present 447 only if protocol ID is IKE. This field is used by the sender of 448 this notify, to indicate the message Id it will use in the next 449 request, that it will send to the other side peer. 450 o EXPECTED_RECV_REQ_MESSAGE_ID (4 octets) : This field is used by 451 the sender of this notify, to indicate the message Id it can 452 accept in the next request, received from the other side peer. 454 6.4. IPSEC_REPLAY_COUNTER_SYNC 456 IPSEC_REPLAY_COUNTER_SYNC: This payload type is defined to sync the 457 IPsec SA replay counters among newly-active [standby] member and the 458 peer. 460 1 2 3 461 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 463 | Next Payload |ESN| RESERVED | Payload Length | 464 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 465 | Failover count | 466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 467 | Outgoing IPsec SA counter | 468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 It contains the following data. 471 o ESN (1 bit) : The ESN bit MUST be ON if IPsec SA were established 472 with Extended Sequence Numbers. 473 o Failover count (4 octets) : The failover count within the cluster, 474 it increases with each failover event in HA cluster. 475 o Outgoing IPsec SA counter (4 octets or 8 octect) : The outgoing 476 IPsec SA counter is the bumped-up outgoing IPsec SA replay counter 477 value considering ALL Child SA under the IKEv2 SA. The size of 478 outgoing IPsec SA counter depends on ESN bit. If ESN bit is ON, 479 it is size of 8 octets else it is 4 octets. 481 7. Details of implementation 483 The message Id used IKEV2_MESSAGE_ID_SYNC exchange MUST be zero so 484 that it is not validated upon receipt as per IKEv2 windowing. 485 Message Id zero MUST be permitted only for informational exchange 486 that would have NOTIFY of type IKEV2_MESSAGE_ID_SYNC. If any 487 INFORMATIONAL exchange uses the message Id Zero, without having this 488 Notify, then such packets MUST be discarded upon decryption and 489 INVALID_SYNTAX notify SHOULD be sent. No other payloads are allowed 490 in this Informational exchange. Whenever IKEV2_MESSAGE_ID_SYNC or 491 IPSEC_REPLAY_COUNTER_SYNC notify is received with invalid failover 492 count or nonce data, the event SHOULD be logged. 494 The standby member can initiate the synchronization of IKEv2 Message 495 Id's 496 o When it receives the bad IKEv2/IPsec packet. The 'bad" IKEv2/ 497 IPsec packet means a packet outside receive window. 498 o When it has to send an IKEv2/IPsec packet after failover event. 499 o It has just got the control from active member and would require 500 to update the values before-hand, so that it need not start this 501 exchange at the time of sending/receiving the request. 503 The standby member can initiate the synchronization of IPsec SA 504 Counters 505 o If there is traffic using the IPsec SA in the recent past and 506 there could be stale replay counter at standby member 508 Since there can be many sessions at Standby member, and sending 509 exchanges from all of the sessions can cause throttling, the standby 510 member can choose to initiate the exchange when it has to send or 511 receive the request. Thus the trigger to initiate this exchange 512 depends on the requirement/discretion of the standby member. 514 The member which has not announced its capability 515 IKEV2_MESSAGE_ID_SYNC_SUPPORTED MUST NOT send/receive the notify 516 IKEV2_MESSAGE_ID_SYNC. 518 The member which has not announced its capability 519 IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED MUST NOT send/receive the notify 520 IPSEC_REPLAY_COUNTER_SYNC. 522 If a peer gets IKEV2_MESSAGE_ID_SYNC or IPSEC_REPLAY_COUNTER_SYNC 523 request even though it did not announce its capability in IKE_AUTH 524 exchange, then it MUST ignore this message. 526 If any of the Notify or the SYNC request/response is malformed, then 527 it is treated as INVALID_SYNTAX message. 529 8. Step-by-Step details 531 The step by step details of the synchronization of IKE message Id is 532 as follows. 533 o Active member and peer device establish the session . They 534 announce the capability to sync the counter info by sending 535 IKEV2_MESSAGE_ID_SYNC_SUPPORTED notify in IKE_AUTH Exchange. 536 o Active member dies and Stand-by member takes over. Standby Member 537 sends its own idea of the IKE Message ID (its side) to peer in an 538 INFORMATIONAL message exchange with message Id zero. 539 o The peer first authenticates the message and then validates that 540 failover count. The peer will compare the received values with 541 the values available locally and finally picks the higher value. 542 It then updates its message Id's with the higher values and also 543 propose the same in Response. 544 o The peer should not wait for pending response while responding 545 with this message Id values. For example if window size is 5 and 546 peer window is 3-7 and if peer has sent requests 3, 4,5,6,7 and 547 but got response only for 4,5,6,7 but not 3 then it should send 548 the EXPECTED_SEND_REQ_MESSAGE_ID as 8 and should not wait for 549 response of 3 anymore. 551 o The peer should not wait for pending request also. For example if 552 window size is 5 and peer window is 3-7 and if peer has received 553 requests 4,5,6,7 but not 3 then it should send the 554 EXPECTED_RECV_REQ_MESSAGE_ID as 8 and should not wait for 3 555 anymore. 557 There is corner case with "failover count' and multiple failover. 558 What if "failover count" is not updated on a member, and next 559 "failover" happened, then "failover count" is updated on other side 560 but not on this member. [[ This need to be discussed on mailing list. 561 ]] 563 9. Security Considerations 565 There can be two types of DOS attacks. 566 o Replay of Message SYNC Request. This is countered by "failover 567 count", since synchronization starts after failover event and each 568 member of the cluster is aware of failover event. The receiver of 569 sync request should verify and maintain failover count. If a peer 570 again receives a sync request with same "failover count', it can 571 safely safely discard the request if it has received valid 572 request/response from other side peer after sync exchange. The 573 peer can send the cached response for sync request till it has not 574 received valid request/response from other side peer or failover 575 count has not increased. 576 o Replay of Message SYNC Response. This is countered by sending the 577 NONCE data along with the sync notify. The same NONCE data has to 578 be returned in response. Thus the standby member can accept the 579 reply only for the current request. After it receives the valid 580 response, it MUST NOT process same response again and MUST discard 581 the response. 583 10. Interaction with other drafts 585 The primary assumption of IKEv2/IPsec SA Counter Synchronization 586 proposal is IKEv2 SA has been established between active member of 587 Hot Standby Cluster and peer, after that the failover event occurred 588 and now standby member has "become" active. It also assumes the 589 IKEv2 SA state was synced between active and standby member of the 590 Hot Standby Cluster before the failover event. 591 o Session Resumption. Session resumption assumes that peer i.e. 592 client or initiator detects the need to re-establish the session. 593 In IKEv2/IPsec SA counter synchronization, standby member which 594 becomes active i.e. gateway or responder detects the need to 595 synchronize the SA counter after the failover event. Also in Hot 596 Standby Cluster, peer establishes the IKEv2/IPsec session with 597 single cluster's IP address, so peer normally does not detect the 598 event of failover in the cluster until standby member took very 599 long to become active and IKEv2 SA times out via liveness check. 600 So, session resumption and SA counter synchronization after 601 failover are mutually exclusive. 602 o This document describes the operation of tightly coupled clusters, 603 which are the common way of building IPsec clusters. In these 604 clusters, all members appear to the peer as one gateway, 605 specifically they share a single IP address. High availability 606 can also be provided by loosely coupled clusters (for lack of a 607 better term), which are a group of gateways that do not share an 608 IP address and do not synchronize state. In this architecture, 609 the client can use Session Resumption to fail-over from one 610 cluster member to another. Specifically this requires: 611 * Support of session resumption on peers and gateways. 612 * A common session resumption ticket format on all gateways (not 613 currently standardized). 614 * Configuration on the peers of the group of gateways that 615 constitute the cluster. 616 o Redirect. Redirect mechanism for load-balancing can be used 617 during init (IKE_SA_INIT) and auth (IKE_AUTH) and after session 618 establishment. While SA counter sync is used after IKE SA has 619 been established and failover event has occurred. So it is 620 mutually exclusive with redirect during init and auth. The 621 redirect after session established is used for timed or planned 622 shutdown/maintenance. The failover event can not be detected on 623 active member beforehand and so using redirect after session 624 establishment is not possible in case of failover. So, Redirect 625 and SA counter synchronization after failover are mutually 626 exclusive. 627 o Crash detection. Solves the similar problem where peer detect 628 that cluster member has crashed based on a token. It is mutually 629 exclusive with HA with SA counter sync. 631 11. IANA Considerations 633 This document introduces four new IKEv2 Notification Message types as 634 described in Section 6.The new Notify Message Types must be assigned 635 values between 16396 and 40959. 636 o IKEV2_MESSAGE_ID_SYNC_SUPPORTED. 637 o IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED. 638 o IKEV2_MESSAGE_ID_SYNC. 639 o IPSEC_REPLAY_COUNTER_SYNC. 641 12. Acknowledgements 643 We would like to thank Pratima Sethi and Frederic Detienne for their 644 reviews comments and valuable suggestions for initial version of the 645 document. 647 We would also like to thank following people (in alphabetical order) 648 for their review comments and valuable suggestions: Dan Harkins, Paul 649 Hoffman, Steve Kent, Tero Kivinen, David McGrew, Pekka Riikonen, and 650 Yaron Sheffar. 652 13. Change Log 654 This section lists all the changes in this document. 656 NOTE TO RFC EDITOR: Please remove this section before publication. 658 13.1. Draft -01 660 Added "Multiple and Simultaneous failover' scenarios. 662 Now document provides a mechanism to sync either IKEv2 message or 663 IPsec replay counter or both to cater different types of 664 implementations. 666 HA cluster's "failover count' is used to encounter replay of sync 667 requests by attacker. 669 The sync of IPsec SA replay counter optimized to to have just one 670 global bumped-up outgoing IPsec SA counter of ALL Child SAs under an 671 IKEv2 SA. 673 The examples added for IKEv2 message Id sync to provide more clarity. 675 Some edits as per comments on mailing list to enhance clarity. 677 13.2. Draft -00 679 Version 00 is identical to 680 draft-kagarigi-ipsecme-ikev2-windowsync-04, started as WG document. 682 Added IPSECME WG HA design team members as authors. 684 Added comment in Introduction to discuss the window sync process on 685 WG mailing list to solve some concerns. 687 14. References 689 14.1. Normative References 691 [IPsec Cluster Problem Statement] 692 Nir, Y., "IPsec Cluster Problem Statement", July 2010. 694 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 695 Requirement Levels", BCP 14, RFC 2119, March 1997. 697 [RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, 698 "Internet Key Exchange Protocol: IKEv2", RFC 5996, 699 September 2010. 701 14.2. Informative References 703 [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for 704 IKEv2", RFC 5685, November 2009. 706 [RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", 707 RFC 5723, January 2010. 709 Appendix A. IKEv2 Message Id examples 711 Below are the examples to illustrate how the IKEv2 message Id values 712 are synced. The notation used to denote EXPECTED_SEND_REQ_MESSAGE_ID 713 and EXPECTED_RECV_REQ_MESSAGE_ID on a member is 714 (EXPECTED_SEND_REQ_MESSAGE_ID, EXPECTED_RECV_REQ_MESSAGE_ID). 716 Normal failover - Example 1 718 Standby [Newly Active] Member Peer 719 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 720 Request SYNC (2, 3) --------> 722 Peer has values as (4, 5) so it sends 723 < -------------( 4, 5) Response SYNC 725 Normal failover - Example 2 727 Standby [Newly Active] Member Peer 728 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 729 Request SYNC (2, 5) --------> 730 Peer has values as (2, 4) so it sends 731 < -------------( 5, 4) Response SYNC 733 Simultaneous failover 735 In case of simultaneous failover, both the sides send the SYNC 736 request, but whichever side has the higher value will be eventually 737 synced. 739 Standby [Newly Active] Member Peer 740 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 742 request SYNC (4,4) -----> 744 <-------------- request SYNC (5,5) 746 response SYNC (5,5) ----> 748 <-------- response SYNC (5,5) 750 Authors' Addresses 752 Raj Singh (Editor) 753 Cisco Systems, Inc. 754 Divyashree Chambers, B Wing, O'Shaugnessy Road 755 Bangalore, Karnataka 560025 756 India 758 Phone: +91 80 4301 3320 759 Email: rsj@cisco.com 761 Kalyani Garigipati 762 Cisco Systems, Inc. 763 Divyashree Chambers, B Wing, O'Shaugnessy Road 764 Bangalore, Karnataka 560025 765 India 767 Phone: +91 80 4426 4831 768 Email: kagarigi@cisco.com 769 Yoav Nir 770 Check Point Software Technologies Ltd. 771 5 Hasolelim st. 772 Tel Aviv 67897 773 Israel 775 Email: ynir@checkpoint.com 777 Dacheng Zhang 778 Huawei Technologies Ltd. 780 Email: zhangdacheng@huawei.com