idnits 2.17.1 draft-ietf-ipsecme-ipsecha-protocol-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: There can be two types of DOS attacks. o Replay of Message SYNC Request. This can be countered by rate limiting the number of such requests a peer can receive. The rate limiting can be done either by number or the time delay between which Message SYNC request can be received or both.These options are configurable. o Replay of Message SYNC Response. This can be countered by sending the NONCE data along with the SYNC_SA_COUNTER_INFO notify. The same NONCE data has to be returned in response. Thus the standby member can accept the reply only for the current request. After it receives the response, it MUST not accept the same response again and MUST drop the response. -- The document date (September 6, 2010) is 4981 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5798' is mentioned on line 159, but not defined == Missing Reference: 'RFC-5685' is mentioned on line 206, but not defined == Missing Reference: 'CERT' is mentioned on line 283, but not defined == Missing Reference: 'CERTREQ' is mentioned on line 283, but not defined == Missing Reference: 'IDr' is mentioned on line 283, but not defined == Unused Reference: 'RFC5685' is defined on line 627, but no explicit reference was found in the text == Unused Reference: 'RFC5723' is defined on line 630, but no explicit reference was found in the text -- No information found for draft-ietf-IPsecme-ikev2bis - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'IKEv2bis' Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Singh, Ed. 3 Internet-Draft G. Kalyani 4 Intended status: Standards Track Cisco 5 Expires: March 10, 2011 Y. Nir 6 Check Point 7 D. Zhang 8 Huawei 9 September 6, 2010 11 Protocol Support for High Availability IKEv2/IPsec 12 draft-ietf-ipsecme-ipsecha-protocol-00 14 Abstract 16 IKEv2 and IPsec protocols are widely used for deploying VPN. In 17 order to make such VPN highly available and failure-prone, these VPNs 18 are implemented as IKEv2/IPsec Highly Available (HA) cluster. But 19 there are many issues in IKEv2/IPsec HA cluster. The draft "IPsec 20 Cluster Problem Statement" enumerates all the issues encountered in 21 IKEv2/IPsec HA cluster environment. 23 This draft proposes an extension to IKEv2 protocol to solve main 24 issues of "IPsec Cluster Problem Statement" in Hot Standby cluster 25 and gives implementation advice for other issues. The main issues to 26 be solved are: 27 o IKE Message Id synchronization : This is done by obtaining the 28 message Id values from the peer and updating the values at the 29 newly active cluster member after the failover. 30 o IPsec SA Counter synchronization : This is done by sending 31 incremented values of replay counters by the newly active cluster 32 member to the peer as expected replay counter value. 34 Status of this Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on March 10, 2011. 50 Copyright Notice 52 Copyright (c) 2010 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 69 3. Issues solved from IPsec Cluster Problem Statement . . . . . . 6 70 4. IKEv2/IPsec SA Counter Synchronization Problem . . . . . . . . 6 71 5. IKEv2/IPsec SA Counter Synchronization Solution . . . . . . . 7 72 6. SA counter synchronization notify and payload types . . . . . 9 73 6.1. SYNC_SA_COUNTER_INFO_SUPPORTED . . . . . . . . . . . . . . 9 74 6.2. SYNC_SA_COUNTER_INFO . . . . . . . . . . . . . . . . . . . 9 75 7. Details of implementation . . . . . . . . . . . . . . . . . . 11 76 8. Step-by-Step details . . . . . . . . . . . . . . . . . . . . . 12 77 9. Security Considerations . . . . . . . . . . . . . . . . . . . 13 78 10. Interaction with other drafts . . . . . . . . . . . . . . . . 13 79 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 80 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 81 13. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 15 82 13.1. Draft -00 . . . . . . . . . . . . . . . . . . . . . . . . 15 83 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 84 14.1. Normative References . . . . . . . . . . . . . . . . . . . 15 85 14.2. Informative References . . . . . . . . . . . . . . . . . . 15 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 88 1. Introduction 90 IKEv2 is used for deploying IPsec-based VPNs. In order to make such 91 VPN highly available and failure-prone, these VPNs are inplemented as 92 IKEv2/IPsec Highly Available (HA) cluster. But there are many issues 93 in IKEv2/IPsec HA cluster. The draft "IPsec Cluster Problem 94 Statement" enumerates all the issues encountered in IKEv2/IPsec HA 95 cluster. 97 In case of Hot Standby cluster implementaion of IKEv2/IPsec based 98 VPNs, the IKEv2/IPsec session gets established with the peer and the 99 active member of cluster. After that, the active member syncs/ 100 updates the IKE/IPsec SA state to the standby member of the cluster. 101 This primary SA state sync-up is done on SA bring up and/or rekey. 102 Doing SA state synchronization/updation between active and peer 103 member for each IKE and IPsec message standby cluster is very costly, 104 so normally its done periodically. So, when "failover" event happens 105 in the cluster, first "failover' is detected by the standby member 106 and then it becomes active member and it takes considerable time. 107 During the time of failover and standby member becoming newly active 108 member, the peer is unaware of failover and keeps sending IKE request 109 and IPsec packets to the cluster which is allowed as per IKEv2 and 110 IPsec windowing feature. Now, newly active member after coming up 111 finds the mismtach in IKE message id's and IPsec replay counters. 112 Please see Section 4 for more details. 114 This draft proposes an extension to IKEv2 protocol to solve main 115 issues of IKE message id sync and IPsec SA replay counter sync and 116 gives implementation advice for others. Here is summary of solutions 117 provided in this draft: 119 IKE Message Id synchronization : This is done by obtaining the 120 message Id values from the peer and updating the values at the newly 121 active cluster member after the failover. 123 IPsec SA Counter synchronization : This is done by sending 124 incremented values of replay counters by the newly active cluster 125 member to the peer as expected replay counter value. 127 Though this draft describes the IKEv2/IPsec SA counter 128 synchronisation in context of hot standby cluster. This solution can 129 be used in other scenarios where IKEv2/IPsec SA counters are mis- 130 matched and couner sync is needed. 132 There were some concerns about the current window sync process. The 133 concern was to make IKEv2 window sync optional but we beleive IKEv2 134 window sync will be mandatory. 136 [[ This topic needs to be discussed further on the WG mailing list. 137 ]] 139 2. Terminology 141 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 142 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 143 document are to be interpreted as described in RFC 2119 [RFC2119]. 145 "SA Counter SYNC Request" is the information exchange request defined 146 in this draft to synchronize the IKEv2/IPsec SA counter information 147 between member of the cluster and the peer. 149 "SA Counter SYNC Response" is the information exchange response 150 defined in this draft to synchronize the IKEv2/IPsec SA counter 151 information between member of the cluster and the peer. 153 Below are the terms taken from [IPsec Cluster Problem Statement] with 154 added information in context of this draft. 156 "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of 157 the members is active at any one time. This member is also referred 158 to as the "active", whereas the other(s) are referred to as 159 "standbys". VRRP ([RFC5798]) is one method of building such a 160 cluster. The goal of Hot Standby Cluster is that it creates illusion 161 of single virtual gateway to the peer(s). 163 "Active Member" is the primary member in the Hot Standby cluster. It 164 is responsible for forwarding packets for the virtual gateway. 166 "Standby Member" is the primary backup router. The member takes 167 control i.e. becomes active member after the "failover" event. 169 "Peer" is the IKEv2/IPsec endpoint which establishes VPN connection 170 with Hot Standby cluster. The Peer knows Hot Standby Cluster by 171 single cluster's IP address. In case of "failover", the standby 172 member of the cluster becomes active, so the peer normally doesn't 173 notice that "failover" has occured in the cluster. 175 The generic term IKEv1/IPsec SA counters is used throughout. By 176 IKEv2 SA counter stands for IKEv2 message ids and IPsec SA counter 177 stands for IPsec SA replay counters which are used to provide 178 optional anti-replay feature. 180 3. Issues solved from IPsec Cluster Problem Statement 182 IPsec Cluster Problem Statement defines the problems encountered in 183 IPsec Clusters. . The problems along with their section names as 184 given in the statement are as follows. 185 o 3.2. Lots of Long Lived State 186 o 3.3. IKE Counters 187 o 3.4. Outbound SA Counters 188 o 3.5. Inbound SA Counters 189 o 3.6. Missing Synch Messages 190 o 3.7. Simultaneous use of IKE and IPsec SAs by Different Members 191 * 3.7.1. Outbound SAs using counter modes 192 o 3.8. Different IP addresses for IKE and IPsec 193 o 3.9. Allocation of SPIs 195 This draft solves the main issues using the protocol extention, and 196 provides implementation advice for other issues, given as follows. 197 o 3.2 This section mentions that there's lots of state that needs to 198 be synchronized. If state is not synchronized, it's not really an 199 interesting cluster - failover will be just like a reboot, so the 200 issue need not be solved with protocol extensions. 201 o 3.3, 3.4,3.5, and 3.6 are solved by this draft. Please see 202 Section 4, for more details. 203 o 3.7 is the problem to be solved while building clusters. However, 204 the peers should be mandated to accept multiple parallel SAs for 205 3.7.1 206 o 3.8 can be solved by using IKEv2 Redirect Mechanism [RFC-5685]. 207 o 3.9 is the problem about avoiding collision of same SPI's among 208 the cluster members. This is outside the scope of the document 209 since this has to be solved within the context of the cluster and 210 not with the peer. 212 4. IKEv2/IPsec SA Counter Synchronization Problem 214 IKEv2 RFC states that "An IKE endpoint MUST NOT exceed the peer's 215 stated window size for transmitted IKE requests". 217 As per the protocol, all IKEv2 packets follows request-response 218 paradigm. The initiator of an IKEv2 request MUST retransmit the 219 request, until it has received a response from the peer. IKEv2 220 introduces a windowing mechanism that allows multiple requests to be 221 outstanding at a given point of time, but mandates that the sender 222 window does not move until the oldest message sent from one peer to 223 another is acknowledged. Loss of even a single packet leads to 224 repeated retransmissions followed by an IKEv2 SA teardown if the 225 retransmissions are unacknowledged. 227 IPsec Hot Standby Cluster is required to ensure that in case of 228 failover of active member, the standby member becomes active 229 immediately. The standby member is expected to have the exact values 230 of message id fields of active member before failover. Even with the 231 best efforts to update the message Id values from active to standby 232 member, the values at standby member can be stale due to following 233 reasons: 234 o Standby member is unaware of the last message that was received 235 and acknowledged by the older active member as failover could have 236 happened before the standby could be updated. 237 o Standby member does not have information about on-going 238 unackowledged requests of active member before the failover event. 239 So after failover event when standby member becomes active, it can 240 not re-transmit those requests. 242 When a standby member takes over as the active member, it would start 243 the message id ranges from previously updated values. This would 244 make it reject requests from the peer, since the values would be 245 stale. As a sender, the standby member may end up reusing a stale 246 message id which will cause the peer to drop the request. Eventually 247 there is a high probability of the IKEv2 and corresponding IPsec SAs 248 getting torn down simply because of a transitory message id mismatch 249 and re-transmission of requests. This is not a desirable feature of 250 HA. Even after updating standby memeber periodically the cluster can 251 loose IKE and so all IPsec SA due to message id i.e. SA counter 252 mismatch. 254 Similar issue is observed in IPsec counters also if anti-replay 255 protection/ESN is implemented. Even with the best efforts of syncing 256 the ESP and AH SA counter numbers from active to stand by member , 257 there is a chance that the stand-by member would have stale counter 258 values. The standby member would then send the stale counter 259 numbers. The peer would reject such packets since in case of anti- 260 replay protection feature, duplicate use of counters are not allowed. 261 In case of IPsec it is ok to skip some counter values and start with 262 the highr counter values. 264 Hence a mechanism is required in HA to ensure that the standby member 265 has correct values of message Id values and IPsec counters, so that 266 sessions are not torn down just because of window ranges. 268 5. IKEv2/IPsec SA Counter Synchronization Solution 270 After the standby member becomes the active member after failover 271 event in the cluster, the standby member would send an authenticated 272 IKEv2 request to the peer to send its values of SA counters. 274 The standby member would then update its values of SA counters and 275 then start sending/receiving the requests. 277 The peer MUST negotiate its ability to support SA counter 278 synchronization information with active member by sending the 279 SYNC_SA_COUNTER_INFO_SUPPORTED notification in IKE_AUTH exchange. 281 Peer Active Member 282 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 283 HDR, SK {IDi, [CERT], [CERTREQ], [IDr], AUTH, 284 N[SYNC_SA_COUNTER_INFO_SUPPORTED], SAi2, TSi, TSr} ----------> 286 <---------- HDR, SK {IDr, [CERT+], [CERTREQ+], AUTH, 287 N[SYNC_SA_COUNTER_INFO_SUPPORTED], SAr2, TSi, TSr} 289 When peer and active member both support SA counter synchronization, 290 the active member MUST sync/update SA counter synchronization 291 capability to the standby member after the establishment of the IKE 292 SA. So that standby member is aware of the capability and can use it 293 when it becomes the active member after failover event. 295 After failover event, when the standby member becomes the active 296 member, it has to request the peer for the SA counters. Standby 297 member would initiate the SYNC Request with an INFORMATIONAL exchange 298 containing the notify SYNC_SA_COUNTER_INFO. The SYNC_SA_COUNTER_INFO 299 information can be used for update IKEv2 counters i.e. message ids 300 and also IPsec SA replay counters. 302 If there are many IPsec SAs and all IPsec SA counters cannot be 303 synchronized with a single counter sync exchange, then another 304 counter sync exchange SHOULD be send for remaining IPsec SAs, but for 305 this exchange message id would be synced IKE message id after first 306 counter sync exchnage NOT zero. 308 The peer will respond back with the notify SYNC_SA_COUNTER_INFO. The 309 SYNC_SA_COUNTER_INFO request contains NONCE data to avoid DOS attack 310 due to replay of SA counter sync response. The Nonce data send in 311 SYNC_SA_COUNTER_INFO response MUST match with nonce data sent by 312 newly-active member in SYNC_SA_COUNTER_INFO request. If nonce data 313 received in SYNC_SA_COUNTER_INFO response does not match with nonce 314 data sent in SYNC_SA_COUNTER_INFO request, the standby i.e. newly- 315 active member MUST discard this SYNC_SA_COUNTER_INFO response, and 316 normal IKEv2 behaviour of re-transmitting the request and waiting for 317 genuine reply from the peer SHOULD follow, before tearing down the SA 318 becuase of re-transmits. 320 Standby [Newly Active] Member Peer 321 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 322 HDR, SK {N[SYNC_SA_COUNTER_INFO]+} --------> 324 <--------- HDR, SK {N[SYNC_SA_COUNTER_INFO]+} 326 6. SA counter synchronization notify and payload types 328 Below are the new notify and payload types that are defined 330 6.1. SYNC_SA_COUNTER_INFO_SUPPORTED 332 SYNC_SA_COUNTER_INFO_SUPPORTED: This notify is included in the 333 IKE_AUTH request by the peer to indicate the support for IKEv2/IPsec 334 SA counter synchronization mechanism described in this document. 336 1 2 3 337 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 338 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 339 | Next Payload |C| RESERVED | Payload Length | 340 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 341 |Protocol ID(=0)| SPI Size (=0) | Notify Message Type | 342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 344 SYNC_SA_COUNTER_INFO_SUPPORTED 346 The 'Next Payload', 'Payload Length', 'Protocol ID', 'SPI Size', and 347 'Notify Message Type' fields are the same as described in Section 3 348 of [IKEv2bis]. The 'SPI Size' field MUST be set to 0 to indicate 349 that the SPI is not present in this message. The 'Protocol ID' MUST 350 be set to 0, since the notification is not specific to a particular 351 security association. 'Payload Length' field is set to the length in 352 octets of the entire payload, including the generic payload header. 353 The 'Notify Message Type' field is set to indicate the 354 SYNC_SA_COUNTER_INFO_SUPPORTED payload. 356 6.2. SYNC_SA_COUNTER_INFO 358 SYNC_SA_COUNTER_INFO : This payload type is defined to sync the SA 359 counter information among newly-active [standby] member and the peer. 360 The SYNC_SA_COUNTER_INFO payload can be used to synchronize IKE SA 361 counter and IPsec SA counters as well. So, multiple payloads of this 362 type can be used in the single exchange where one payload is used to 363 sync the IKE SA counter information, another payload can be used to 364 sync the Child SA [ e.g. ESP, AH etc] information. 366 1 2 3 367 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 369 | Next Payload |M| RESERVED | Payload Length | 370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 371 |Protocol ID | SPI Size | # of SPI's |Counter Size | 372 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 373 | | 374 ~ ~ 375 | | 376 ~ Nonce Data ~ 377 | | 378 ~ ~ 379 | | 380 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 381 | EXPECTED_SEND_REQ_MESSAGE_ID | 382 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 383 | EXPECTED_RECV_REQ_MESSAGE_ID | 384 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 385 | SPI | 386 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 387 | | 388 ~ Last Counter ~ 389 | | 390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 392 SYNC_SA_COUNTER_INFO 394 It contains the following data. 395 o Protocol ID (1 octet) - Must be 1 for an IKE SA, 2 for AH, or 3 396 for ESP. 397 o SPI Size (1 octet) - Length in octets of the SPI as defined by the 398 protocol ID. It MUST be zero for IKE or four for AH and ESP. 399 o # of SPIs (1 octet) - The number of SPIs contained in this 400 payload. The size of each SPI is defined by the SPI Size field. 401 It MUST be zero if protocol is IKE. 402 o Counter Size (1 octet) is the size of IPsec SA counter in octets. 403 It is 4 if the Extended Sequence Numbers option is not set for the 404 SAs described in this payload, or 8 otherwise. It MUST be zero if 405 protocol is IKE. 406 o Nonce Data (16 octets) - The nonce data MUST be present if 407 protocol is IKE. The nonce data is used to counter the replay of 408 SYNC_SA_COUNTER_INFO response by the attacker. 410 o EXPECTED_SEND_REQ_MESSAGE_ID (4 octets) : This MUST be present 411 only if protocol ID is IKE. This field is used by the sender of 412 this notify, to indicate the message Id it will use in the next 413 request, t that it will send to the peer. It MUST be present only 414 in SA counter synchronization response and MUST be ignored in SA 415 counter synchronization request. 416 o EXPECTED_RECV_REQ_MESSAGE_ID(4 octets) : This field is used by the 417 sender of this notify, to indicate the message Id it can accept in 418 the next request, received from the peer.This data MUST be present 419 only in response and MUST be ignored if present in REQUEST.This 420 MUST be present only if protocol ID is IKE. 421 o SPI (4 octets) is the Security Parameter Index of the outbound SA 422 for the sender, or the inbound SA for the receiver. 423 o Last Counter (4 or 8 octets) is the counter number of the last 424 packet sent. The receiver MUST drop any IPsec packet with replay 425 counter lower than this. 426 o M (More - 1 bit) - This flag MUST be set when there are some IPsec 427 are left to be synced, but can not be send due to packet size or 428 some other limitation. When M bit is zero it, it tell it is last 429 SA counter sync message. 431 7. Details of implementation 433 The message Id used in this exchange MUST be zero so that it is not 434 vaildated upon receipt. Message Id zero MUST be permitted only for 435 informational exchange that would have NOTIFY of type 436 SYNC_SA_COUNTER_INFO. If any packet uses the message Id Zero, 437 without having this Notify along with the Nonce payload, then such 438 packets MUST be discarded upon decryption. No other payloads are 439 allowed in this Informational exchange. 441 The standby member can initiate the synchronization of IKEv2 Message 442 Id's 443 o When it receives the bad IKEv2/IPsec packet. The 'bad" IKEv2/ 444 IPsec packet means a packet outside receive window. 445 o When it has to send an IKEv2/IPsec packet after failover event. 446 o It has just got the control from active member and would require 447 to update the values before-hand, so that it need not start this 448 exchange at the time of sending/receiving the request. 450 The standby member can initiate the synchronization of IPsec SA 451 Counters 452 o If there is traffic using the IPsec SA in the recent past and 453 there could be stale replay counter at standby member 455 Since there can be many sessions at Standby member, and sending 456 exchanges from all of the sessions can cause throttling, the standby 457 member can choose to initiate the exchange when it has to send or 458 receive the request. Thus the trigger to initiate this exchange 459 depends on the requirement/discretion of the standby member. 461 The member which has not announced its capability 462 SYNC_SA_COUNTER_INFO_SUPPORTED MUST NOT send/receive the notify 463 SYNC_SA_COUNTER_INFO. 465 If a peer gets SYNC_SA_COUNTER_INFO request even though it did not 466 announce its capability in IKE_AUTH exchange, then it MUST ignore 467 this message. 469 8. Step-by-Step details 471 The step by step details of the synchronisation of IKE message Id is 472 as follows. 473 o Active member and peer device establish the session . They 474 announce the capability to sync the counter info by sending 475 SYNC_SA_COUNTER_INFO_SUPPORTED notify in AUTH Exchange. 476 o Active member dies and Stand-by member takes over. . Stand-by 477 Member sends its own idea of the IKE Message ID (its side) to 478 peer. 479 o The peer will send its EXPECTED_SEND_REQ_MESSAGE_ID and 480 EXPECTED_RECV_REQ_MESSAGE_ID. Since the message Id values 481 received are higher than values at the stand-by member , itwould 482 update its local values of message Id's with the received values. 483 o The peer should not wait for pending response while responding 484 with this message Id values. For example if window size is 5 and 485 peer window is 3-7 and if peer has sent requests 3, 4,5,6,7 and 486 but got response only for 4,5,6,7 but not 3 then it should send 487 the EXPECTED_SEND_REQ_MESSAGE_ID as 8 and should not wait for 488 response of 3 anymore. 489 o The peer should not wait for pending request also. For example if 490 window size is 5 and peer window is 3-7 and if peer has received 491 requests 4,5,6,7 but not 3 then it should send the 492 EXPECTED_RECV_REQ_MESSAGE_ID as 8 and should not wait for 3 493 anymore. 495 The step by step details of the synchronisation of IPsec SA Counter 496 synchronization is as follows. 497 o Active member and peer device establish the session . They 498 announce the capability to sync the counter info by sending 499 SYNC_SA_COUNTER_INFO_SUPPORTED notify in AUTH Exchange. 500 o Active member dies and Stand-by member takes over. Stand-by 501 Member increments its values of Outbound SA Counters for each 502 IPsec SA and sends them to the peer. 504 o The peer will update its Inbound SA Counter corresponding to each 505 IPsec SA and send its Outbound SA Counter value for each IPsec SA 506 on it. 507 o If replay counters were bumped by large amount, we MAY slowly do 508 child sa rekey to reset counter when member is less loaded after 509 failover event. 511 9. Security Considerations 513 There can be two types of DOS attacks. 514 o Replay of Message SYNC Request. This can be countered by rate 515 limiting the number of such requests a peer can receive. The rate 516 limiting can be done either by number or the time delay between 517 which Message SYNC request can be received or both.These options 518 are configurable. 519 o Replay of Message SYNC Response. This can be countered by sending 520 the NONCE data along with the SYNC_SA_COUNTER_INFO notify. The 521 same NONCE data has to be returned in response. Thus the standby 522 member can accept the reply only for the current request. After 523 it receives the response, it MUST not accept the same response 524 again and MUST drop the response. 526 10. Interaction with other drafts 528 The primary assumption of IKEv2/IPsec SA Counter Synchronization 529 prososal is IKEv2 SA has been established between active member of 530 Hot Standby Cluster and peer, after that the failover event occurred 531 and now standby member has "become" active. It also assumes the 532 IKEv2 SA state was synced between active and standby member of the 533 Hot Standby Cluster before the failover event. 534 o Session Resumption. Session resumption assumes that peer i.e. 535 client or initiator detects the need to re-establish the session. 536 In IKEv2/IPsec SA counter cynchronization, standby member which 537 becomes active i.e. gateway or responder detects the need to 538 synchronize the SA counter after the failover event. Also in Hot 539 Standby Cluster, peer establishes the IKEv2/IPsec session with 540 single cluster's IP address, so peer normally does not detect the 541 event of failover in the cluster until standby member took very 542 long to become active and IKEv2 SA times out via liveness check. 543 So, session resumption and SA counter synchronization after 544 failover are mutually exclusive. 545 o This document describes the operation of tightly coupled clusters, 546 which are the common way of building IPsec clusters. In these 547 clusters, all members appear to the peer as one gateway, 548 specifically they share a single IP address. High availability 549 can also be provided by loosely coupled clusters (for lack of a 550 better term), which are a group of gateways that do not share an 551 IP address and do not synchronize state. In this architecture, 552 the client can use Session Resumption to fail-over from one 553 cluster member to another. Specifically this requires: 554 * Support of session resumption on peers and gateways. 555 * A common session resumption ticket format on all gateways (not 556 currently standardized). 557 * Configuration on the peers of the group of gateways that 558 constitute the cluster. 559 o Redirect. Redirect mechanism for load-balancing can be used 560 during init (IKE_SA_INIT) and auth (IKE_AUTH) and after session 561 establishment. While SA counter sync is used after IKE SA has 562 been established and failover event has occurred. So it is 563 mutually exclusive with redirect during init and auth. The 564 redirect after session established is used for timed or planned 565 shutdown/maintenance. The failover event can not be detected on 566 active member beforehand and so using redirect after session 567 establishment is not possible in case of failover. So, Redirect 568 and SA counter synchronization after failover are mutually 569 exclusive. 570 o Crash detection. Solves the similar problem where peer detect 571 that cluster member has crashed based on a token. It is mutualy 572 exclusive with HA with SA counter sync. 574 11. IANA Considerations 576 This document introduces two new IKEv2 Notification Message types as 577 described in Section 6.The new Notify Message Types must be assigned 578 values between 16396 and 40959. 579 o SYNC_SA_COUNTER_INFO_SUPPORTED 580 o SYNC_SA_COUNTER_INFO 582 12. Acknowledgements 584 We would like to thank Pratima Sethi and Frederic Detienne for their 585 reviews comments and valuable suggestions for initial version of the 586 document. 588 We would also like to thank following people (in alphabetical order) 589 for their review comments and valuable suggestions: Dan Harkins, Paul 590 Hoffman, Steve Kent, Tero Kivinen, David McGrew, Pekka Riikonen, 591 Yaron Sheffar. 593 13. Change Log 595 This section lists all the changes in this document. 597 NOTE TO RFC EDITOR: Please remove this section in before final RFC 598 publication. 600 13.1. Draft -00 602 Version 00 is identical to 603 draft-kagarigi-ipsecme-ikev2-windowsync-04, started as WG document. 605 Added IPSECME WG HA design team members as authors. 607 Added comment in Introduction to discuss the window sync process on 608 WG mailing list to solve some concerns. 610 14. References 612 14.1. Normative References 614 [IKEv2bis] 615 Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, 616 "Internet Key Exchange Protocol: IKEv2", 617 draft-ietf-IPsecme-ikev2bis (work in progress), May 2010. 619 [IPsec Cluster Problem Statement] 620 Nir, Y., "IPsec Cluster Problem Statement", July 2010. 622 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 623 Requirement Levels", BCP 14, RFC 2119, March 1997. 625 14.2. Informative References 627 [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for 628 IKEv2", RFC 5685, November 2009. 630 [RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", 631 RFC 5723, January 2010. 633 Authors' Addresses 635 Raj Singh (Editor) 636 Cisco Systems, Inc. 637 Divyashree Chambers, B Wing, O'Shaugnessy Road 638 Bangalore, Karnataka 560025 639 India 641 Phone: +91 80 4426 4833 642 Email: rsj@cisco.com 644 Kalyani Garigipati 645 Cisco Systems, Inc. 646 Divyashree Chambers, B Wing, O'Shaugnessy Road 647 Bangalore, Karnataka 560025 648 India 650 Phone: +91 80 4426 4831 651 Email: kagarigi@cisco.com 653 Yoav Nir 654 Check Point Software Technologies Ltd. 655 5 Hasolelim st. 656 Tel Aviv 67897 657 Israel 659 Email: ynir@checkpoint.com 661 Dacheng Zhang 662 Huawei Technologies Ltd. 664 Email: zhangdacheng@huawei.com