idnits 2.17.1 draft-kagarigi-ipsecme-ikev2-windowsync-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: There can be two types of DOS attacks. o Replay of Message SYNC Request. This can be countered by rate limiting the number of such requests a peer can receive. The rate limiting can be done either by number or the time delay between which Message SYNC request can be received or both.These options are configurable. o Replay of Message SYNC Response. This can be countered by sending the NONCE data along with the SYNC_SA_COUNTER_INFO notify. The same NONCE data has to be returned in response. Thus the standby member can accept the reply only for the current request. After it receives the response, it MUST not accept the same response again and MUST drop the response. -- The document date (July 29, 2010) is 5020 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5798' is mentioned on line 146, but not defined == Missing Reference: 'RFC-5685' is mentioned on line 193, but not defined == Missing Reference: 'CERT' is mentioned on line 271, but not defined == Missing Reference: 'CERTREQ' is mentioned on line 271, but not defined == Missing Reference: 'IDr' is mentioned on line 271, but not defined == Unused Reference: 'RFC5685' is defined on line 594, but no explicit reference was found in the text == Unused Reference: 'RFC5723' is defined on line 597, but no explicit reference was found in the text -- No information found for draft-ietf-IPsecme-ikev2bis - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'IKEv2bis' Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group G. Kalyani 3 Internet-Draft Cisco 4 Intended status: Standards Track July 29, 2010 5 Expires: January 30, 2011 7 IKEv2/IPsec SA counter synchronization 8 draft-kagarigi-ipsecme-ikev2-windowsync-04 10 Abstract 12 IKEv2 and IPsec protocols are widely used for deploying VPN. In 13 order to make such VPN highly available and failure-prone, these VPNs 14 are implemented as IKEv2/IPsec Highly Available (HA) cluster. But 15 there are many issues in IKEv2/IPsec HA cluster. The draft "IPsec 16 Cluster Problem Statement" enumerates all the issues encountered in 17 IKEv2/IPsec HA cluster environment. 19 This draft proposes an extension to IKEv2 protocol to solve main 20 issues of "IPsec Cluster Problem Statement" in Hot Standby cluster 21 and gives implementation advice for others. The main issues to be 22 solved are: 23 o IKE Message Id synchronization : This is done by obtaining the 24 message Id values from the peer and updating the values at the 25 newly active cluster member after the failover. 26 o IPsec SA Counter synchronization : This is done by sending 27 incremented the values of replay counters by the newly active 28 cluster member to the peer as expected replay counter value. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on January 30, 2011. 47 Copyright Notice 48 Copyright (c) 2010 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 3. Issues solved from IPsec Cluster Problem Statement . . . . . . 4 66 4. IKEv2/IPsec SA Counter Synchronization Problem . . . . . . . . 5 67 5. IKEv2/IPsec SA Counter Synchronization Solution . . . . . . . 6 68 6. SA counter synchronization notify and payload types . . . . . 8 69 6.1. SYNC_SA_COUNTER_INFO_SUPPORTED . . . . . . . . . . . . . . 8 70 6.2. SYNC_SA_COUNTER_INFO . . . . . . . . . . . . . . . . . . . 8 71 7. Details of implementation . . . . . . . . . . . . . . . . . . 10 72 8. Step-by-Step details . . . . . . . . . . . . . . . . . . . . . 11 73 9. Security Considerations . . . . . . . . . . . . . . . . . . . 12 74 10. Interaction with other drafts . . . . . . . . . . . . . . . . 12 75 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 76 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 77 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 78 13.1. Normative References . . . . . . . . . . . . . . . . . . . 13 79 13.2. Informative References . . . . . . . . . . . . . . . . . . 14 80 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 14 82 1. Introduction 84 IKEv2 is used for deploying IPsec-based VPNs. In order to make such 85 VPN highly available and failure-prone, these VPNs are inplemented as 86 IKEv2/IPsec Highly Available (HA) cluster. But there are many issues 87 in IKEv2/IPsec HA cluster. The draft "IPsec Cluster Problem 88 Statement" enumerates all the issues encountered in IKEv2/IPsec HA 89 cluster. 91 In case of Hot Standby cluster implementaion of IKEv2/IPsec based 92 VPNs, the IKEv2/IPsec session gets established with the peer and the 93 active member of cluster. After that, the active member syncs/ 94 updates the IKE/IPsec SA state to the standby member of the cluster. 95 This primary SA state sync-up is done on SA bring up and/or rekey. 96 Doing SA state synchronization/updation between active and peer 97 member for each IKE and IPsec message standby cluster is very costly, 98 so normally its done periodically. So, when "failover" event happens 99 in the cluster, first "failover' is detected by the standby member 100 and then it becomes active member and it takes considerable time. 101 During the time of failover and standby member becoming newly active 102 member, the peer is unaware of failover and keeps sending IKE request 103 and IPsec packets to the cluster which is allowed as per IKEv2 and 104 IPsec windowing feature. Now, newly active member after coming up 105 finds the mismtach in IKE message id's and IPsec replay counters. 106 Please see Section 4 for more details. 108 This draft proposes an extension to IKEv2 protocol to solve main 109 issues of IKE message id sync and IPsec SA replay counter sync and 110 gives implementation advice for others. Here is summary of solutions 111 provided in this draft: 113 IKE Message Id synchronization : This is done by obtaining the 114 message Id values from the peer and updating the values at the newly 115 active cluster member after the failover. 117 IPsec SA Counter synchronization : This is done by sending 118 incremented values of replay counters by the newly active cluster 119 member to the peer as expected replay counter value. 121 Though this draft describes the IKEv2/IPsec SA counter 122 synchronisation in context of hot standby cluster. This solution can 123 be in other scenarios where IKEv2/IPsec SA counters are mis-matched 124 and couner sync is needed. 126 2. Terminology 128 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 129 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 130 document are to be interpreted as described in RFC 2119 [RFC2119]. 132 "SA Counter SYNC Request" is the information exchange request defined 133 in this draft to synchronize the IKEv2/IPsec SA counter information 134 between member of the cluster and the peer. 136 "SA Counter SYNC Response" is the information exchange response 137 defined in this draft to synchronize the IKEv2/IPsec SA counter 138 information between member of the cluster and the peer. 140 Below are the terms taken from [IPsec Cluster Problem Statement] with 141 added information in context of this draft. 143 "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of 144 the members is active at any one time. This member is also referred 145 to as the "active", whereas the other(s) are referred to as 146 "standbys". VRRP ([RFC5798]) is one method of building such a 147 cluster. The goal of Hot Standby Cluster is that it creates illusion 148 of single virtual gateway to the peer(s). 150 "Active Member" is the primary member in the Hot Standby cluster. It 151 is responsible for forwarding packets for the virtual gateway. 153 "Standby Member" is the primary backup router. The member takes 154 control i.e. becomes active member after the "failover" event. 156 "Peer" is the IKEv2/IPsec endpoint which establishes VPN connection 157 with Hot Standby cluster. The Peer knows Hot Standby Cluster by 158 single cluster's IP address. In case of "failover", the standby 159 member of the cluster becomes active, so the peer normally doesn't 160 notice that "failover" has occured in the cluster. 162 The generic term IKEv1/IPsec SA counters is used throughout. By 163 IKEv2 SA counter stands for IKEv2 message ids and IPsec SA counter 164 stands for IPsec SA replay counters which are used to provide 165 optional anti-replay feature. 167 3. Issues solved from IPsec Cluster Problem Statement 169 IPsec Cluster Problem Statement defines the problems encountered in 170 IPsec Clusters. . The problems along with their section names as 171 given in the statement are as follows. 172 o 3.2. Lots of Long Lived State 173 o 3.3. IKE Counters 174 o 3.4. Outbound SA Counters 175 o 3.5. Inbound SA Counters 176 o 3.6. Missing Synch Messages 177 o 3.7. Simultaneous use of IKE and IPsec SAs by Different Members 178 * 3.7.1. Outbound SAs using counter modes 179 o 3.8. Different IP addresses for IKE and IPsec 180 o 3.9. Allocation of SPIs 182 This draft solves the main issues using the protocol extention, and 183 provides implementation advice for other issues, given as follows. 184 o 3.2 This section mentions that there's lots of state that needs to 185 be synchronized. If state is not synchronized, it's not really an 186 interesting cluster - failover will be just like a reboot, so the 187 issue need not be solved with protocol extensions. 188 o 3.3, 3.4,3.5, and 3.6 are solved by this draft. Please see 189 Section 4, for more details. 190 o 3.7 is the problem to be solved while building clusters. However, 191 the peers should be mandated to accept multiple parallel SAs for 192 3.7.1 193 o 3.8 can be solved by using IKEv2 Redirect Mechanism [RFC-5685]. 194 o 3.9 is the problem about avoiding collision of same SPI's among 195 the cluster members. This is outside the scope of the document 196 since this has to be solved within the context of the cluster and 197 not with the peer. 199 4. IKEv2/IPsec SA Counter Synchronization Problem 201 IKEv2 RFC states that "An IKE endpoint MUST NOT exceed the peer's 202 stated window size for transmitted IKE requests". 204 As per the protocol, all IKEv2 packets follows request-response 205 paradigm. The initiator of an IKEv2 request MUST retransmit the 206 request, until it has received a response from the peer. IKEv2 207 introduces a windowing mechanism that allows multiple requests to be 208 outstanding at a given point of time, but mandates that the sender 209 window does not move until the oldest message sent from one peer to 210 another is acknowledged. Loss of even a single packet leads to 211 repeated retransmissions followed by an IKEv2 SA teardown if the 212 retransmissions are unacknowledged. 214 IPsec Hot Standby Cluster is required to ensure that in case of 215 failover of active member, the standby member becomes active 216 immediately. The standby member is expected to have the exact values 217 of message id fields of active member before failover. Even with the 218 best efforts to update the message Id values from active to standby 219 member, the values at standby member can be stale due to following 220 reasons: 222 o Standby member is unaware of the last message that was received 223 and acknowledged by the older active member as failover could have 224 happened before the standby could be updated. 225 o Standby member does not have information about on-going 226 unackowledged requests of active member before the failover event. 227 So after failover event when standby member becomes active, it can 228 not re-transmit those requests. 230 When a standby member takes over as the active member, it would start 231 the message id ranges from previously updated values. This would 232 make it reject requests from the peer, since the values would be 233 stale. As a sender, the standby member may end up reusing a stale 234 message id which will cause the peer to drop the request. Eventually 235 there is a high probability of the IKEv2 and corresponding IPsec SAs 236 getting torn down simply because of a transitory message id mismatch 237 and re-transmission of requests. This is not a desirable feature of 238 HA. Even after updating standby memeber periodically the cluster can 239 loose IKE and so all IPsec SA due to message id i.e. SA counter 240 mismatch. 242 Similar issue is observed in IPsec counters also if anti-replay 243 protection/ESN is implemented. Even with the best efforts of syncing 244 the ESP and AH SA counter numbers from active to stand by member , 245 there is a chance that the stand-by member would have stale counter 246 values. The standby member would then send the stale counter 247 numbers. The peer would reject such packets since in case of anti- 248 replay protection feature, duplicate use of counters are not allowed. 249 In case of IPsec it is ok to skip some counter values and start with 250 the highr counter values. 252 Hence a mechanism is required in HA to ensure that the standby member 253 has correct values of message Id values and IPsec counters, so that 254 sessions are not torn down just because of window ranges. 256 5. IKEv2/IPsec SA Counter Synchronization Solution 258 After the standby member becomes the active member after failover 259 event in the cluster, the standby member would send an authenticated 260 IKEv2 request to the peer to send its values of SA counters. 262 The standby member would then update its values of SA counters and 263 then start sending/receiving the requests. 265 The peer MUST negotiate its ability to support SA counter 266 synchronization information with active member by sending the 267 SYNC_SA_COUNTER_INFO_SUPPORTED notification in IKE_AUTH exchange. 269 Peer Active Member 270 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 271 HDR, SK {IDi, [CERT], [CERTREQ], [IDr], AUTH, 272 N[SYNC_SA_COUNTER_INFO_SUPPORTED], SAi2, TSi, TSr} ----------> 274 <---------- HDR, SK {IDr, [CERT+], [CERTREQ+], AUTH, 275 N[SYNC_SA_COUNTER_INFO_SUPPORTED], SAr2, TSi, TSr} 277 When peer and active member both support SA counter synchronization, 278 the active member MUST sync/update SA counter synchronization 279 capability to the standby member after the establishment of the IKE 280 SA. So that standby member is aware of the capability and can use it 281 when it becomes the active member after failover event. 283 After failover event, when the standby member becomes the active 284 member, it has to request the peer for the SA counters. Standby 285 member would initiate the SYNC Request with an INFORMATIONAL exchange 286 containing the notify SYNC_SA_COUNTER_INFO. The SYNC_SA_COUNTER_INFO 287 information can be used for update IKEv2 counters i.e. message ids 288 and also IPsec SA replay counters. 290 If there are many IPsec SAs and all IPsec SA counters cannot be 291 synchronized with a single counter sync exchange, then another 292 counter sync exchange SHOULD be send for remaining IPsec SAs, but for 293 this exchnage message id would be synced IKE message id after first 294 cpunter sync exchnage NOT zero. 296 The peer will respond back with the notify SYNC_SA_COUNTER_INFO. The 297 SYNC_SA_COUNTER_INFO request contains NONCE data to avoid DOS attack 298 due to replay of SA counter sync response. The Nonce data send in 299 SYNC_SA_COUNTER_INFO response MUST match with nonce data sent by 300 newly-active member in SYNC_SA_COUNTER_INFO request. If nonce data 301 received in SYNC_SA_COUNTER_INFO response does not match with nonce 302 data sent in SYNC_SA_COUNTER_INFO request, the standby i.e. newly- 303 active member MUST discard this SYNC_SA_COUNTER_INFO response, and 304 normal IKEv2 behaviour of re-transmitting the request and waiting for 305 genuine reply from the peer SHOULD follow, before tearing down the SA 306 becuase of re-transmits. 308 Standby [Newly Active] Member Peer 309 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 310 HDR, SK {N[SYNC_SA_COUNTER_INFO]+} --------> 312 <--------- HDR, SK {N[SYNC_SA_COUNTER_INFO]+} 314 6. SA counter synchronization notify and payload types 316 Below are the new notify and payload types that are defined 318 6.1. SYNC_SA_COUNTER_INFO_SUPPORTED 320 SYNC_SA_COUNTER_INFO_SUPPORTED: This notify is included in the 321 IKE_AUTH request by the peer to indicate the support for IKEv2/IPsec 322 SA counter synchronization mechanism described in this document. 324 1 2 3 325 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 326 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 327 | Next Payload |C| RESERVED | Payload Length | 328 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 329 |Protocol ID(=0)| SPI Size (=0) | Notify Message Type | 330 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 332 SYNC_SA_COUNTER_INFO_SUPPORTED 334 The 'Next Payload', 'Payload Length', 'Protocol ID', 'SPI Size', and 335 'Notify Message Type' fields are the same as described in Section 3 336 of [IKEv2bis]. The 'SPI Size' field MUST be set to 0 to indicate 337 that the SPI is not present in this message. The 'Protocol ID' MUST 338 be set to 0, since the notification is not specific to a particular 339 security association. 'Payload Length' field is set to the length in 340 octets of the entire payload, including the generic payload header. 341 The 'Notify Message Type' field is set to indicate the 342 SYNC_SA_COUNTER_INFO_SUPPORTED payload. 344 6.2. SYNC_SA_COUNTER_INFO 346 SYNC_SA_COUNTER_INFO : This payload type is defined to sync the SA 347 counter information among newly-active [standby] member and the peer. 348 The SYNC_SA_COUNTER_INFO payload can be used to synchronize IKE SA 349 counter and IPsec SA counters as well. So, multiple payloads of this 350 type can be used in the single exchange where one payload is used to 351 sync the IKE SA counter information, another payload can be used to 352 sync the Child SA [ e.g. ESP, AH etc] information. 354 1 2 3 355 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 356 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 357 | Next Payload |M| RESERVED | Payload Length | 358 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 359 |Protocol ID | SPI Size | # of SPI's |Counter Size | 360 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 361 | | 362 ~ ~ 363 | | 364 ~ Nonce Data ~ 365 | | 366 ~ ~ 367 | | 368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 369 | EXPECTED_SEND_REQ_MESSAGE_ID | 370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 371 | EXPECTED_RECV_REQ_MESSAGE_ID | 372 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 373 | SPI | 374 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 375 | | 376 ~ Last Counter ~ 377 | | 378 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 380 SYNC_SA_COUNTER_INFO 382 It contains the following data. 383 o Protocol ID (1 octet) - Must be 1 for an IKE SA, 2 for AH, or 3 384 for ESP. 385 o SPI Size (1 octet) - Length in octets of the SPI as defined by the 386 protocol ID. It MUST be zero for IKE or four for AH and ESP. 387 o # of SPIs (1 octet) - The number of SPIs contained in this 388 payload. The size of each SPI is defined by the SPI Size field. 389 It MUST be zero if protocol is IKE. 390 o Counter Size (1 octet) is the size of IPsec SA counter in octets. 391 It is 4 if the Extended Sequence Numbers option is not set for the 392 SAs described in this payload, or 8 otherwise. It MUST be zero if 393 protocol is IKE. 394 o Nonce Data (16 octets) - The nonce data MUST be present if 395 protocol is IKE. The nonce data is used to counter the replay of 396 SYNC_SA_COUNTER_INFO response by the attacker. 397 o EXPECTED_SEND_REQ_MESSAGE_ID (4 octets) : This MUST be present 398 only if protocol ID is IKE. This field is used by the sender of 399 this notify, to indicate the message Id it will use in the next 400 request, t that it will send to the peer. It MUST be present only 401 in SA counter synchronization response and MUST be ignored in SA 402 counter synchronization request. 403 o EXPECTED_RECV_REQ_MESSAGE_ID(4 octets) : This field is used by the 404 sender of this notify, to indicate the message Id it can accept in 405 the next request, received from the peer.This data MUST be present 406 only in response and MUST be ignored if present in REQUEST.This 407 MUST be present only if protocol ID is IKE. 408 o SPI (4 octets) is the Security Parameter Index of the outbound SA 409 for the sender, or the inbound SA for the receiver. 410 o Last Counter (4 or 8 octets) is the counter number of the last 411 packet sent. The receiver MUST drop any IPsec packet with replay 412 counter lower than this. 413 o M (More - 1 bit) - This flag MUST be set when there are some IPsec 414 are left to be synced, but can not be send due to packet size or 415 some other limitation. When M bit is zero it, it tell it is last 416 SA counter sync message. 418 7. Details of implementation 420 The message Id used in this exchange MUST be zero so that it is not 421 vaildated upon receipt. Message Id zero MUST be permitted only for 422 informational exchange that would have NOTIFY of type 423 SYNC_SA_COUNTER_INFO. If any packet uses the message Id Zero, 424 without having this Notify along with the Nonce payload, then such 425 packets MUST be discarded upon decryption. No other payloads are 426 allowed in this Informational exchange. 428 The standby member can initiate the synchronization of IKEv2 Message 429 Id's 430 o When it receives the bad IKEv2/IPsec packet. The 'bad" IKEv2/ 431 IPsec packet means a packet outside receive window. 432 o When it has to send an IKEv2/IPsec packet after failover event. 433 o It has just got the control from active member and would require 434 to update the values before-hand, so that it need not start this 435 exchange at the time of sending/receiving the request. 437 The standby member can initiate the synchronization of IPsec SA 438 Counters 439 o If there is traffic using the IPsec SA in the recent past and 440 there could be stale replay counter at standby member 442 Since there can be many sessions at Standby member, and sending 443 exchanges from all of the sessions can cause throttling, the standby 444 member can choose to initiate the exchange when it has to send or 445 receive the request. Thus the trigger to initiate this exchange 446 depends on the requirement/discretion of the standby member. 448 The member which has not announced its capability 449 SYNC_SA_COUNTER_INFO_SUPPORTED MUST NOT send/receive the notify 450 SYNC_SA_COUNTER_INFO. 452 If a peer gets SYNC_SA_COUNTER_INFO request even though it did not 453 announce its capability in IKE_AUTH exchange, then it MUST ignore 454 this message. 456 8. Step-by-Step details 458 The step by step details of the synchronisation of IKE message Id is 459 as follows. 460 o Active member and peer device establish the session . They 461 announce the capability to sync the counter info by sending 462 SYNC_SA_COUNTER_INFO_SUPPORTED notify in AUTH Exchange. 463 o Active member dies and Stand-by member takes over. . Stand-by 464 Member sends its own idea of the IKE Message ID (its side) to 465 peer. 466 o The peer will send its EXPECTED_SEND_REQ_MESSAGE_ID and 467 EXPECTED_RECV_REQ_MESSAGE_ID. Since the message Id values 468 received are higher than values at the stand-by member , itwould 469 update its local values of message Id's with the received values. 470 o The peer should not wait for pending response while responding 471 with this message Id values. For example if window size is 5 and 472 peer window is 3-7 and if peer has sent requests 3, 4,5,6,7 and 473 but got response only for 4,5,6,7 but not 3 then it should send 474 the EXPECTED_SEND_REQ_MESSAGE_ID as 8 and should not wait for 475 response of 3 anymore. 476 o The peer should not wait for pending request also. For example if 477 window size is 5 and peer window is 3-7 and if peer has received 478 requests 4,5,6,7 but not 3 then it should send the 479 EXPECTED_RECV_REQ_MESSAGE_ID as 8 and should not wait for 3 480 anymore. 482 The step by step details of the synchronisation of IPsec SA Counter 483 synchronization is as follows. 484 o Active member and peer device establish the session . They 485 announce the capability to sync the counter info by sending 486 SYNC_SA_COUNTER_INFO_SUPPORTED notify in AUTH Exchange. 487 o Active member dies and Stand-by member takes over. Stand-by 488 Member increments its values of Outbound SA Counters for each 489 IPsec SA and sends them to the peer. 490 o The peer will update its Inbound SA Counter corresponding to each 491 IPsec SA and send its Outbound SA Counter value for each IPsec SA 492 on it. 493 o If replay counters were bumped by large amount, we MAY slowly do 494 child sa rekey to reset counter when member is less loaded after 495 failover event. 497 9. Security Considerations 499 There can be two types of DOS attacks. 500 o Replay of Message SYNC Request. This can be countered by rate 501 limiting the number of such requests a peer can receive. The rate 502 limiting can be done either by number or the time delay between 503 which Message SYNC request can be received or both.These options 504 are configurable. 505 o Replay of Message SYNC Response. This can be countered by sending 506 the NONCE data along with the SYNC_SA_COUNTER_INFO notify. The 507 same NONCE data has to be returned in response. Thus the standby 508 member can accept the reply only for the current request. After 509 it receives the response, it MUST not accept the same response 510 again and MUST drop the response. 512 10. Interaction with other drafts 514 The primary assumption of IKEv2/IPsec SA Counter Synchronization 515 prososal is IKEv2 SA has been established between active member of 516 Hot Standby Cluster and peer, after that the failover event occurred 517 and now standby member has "become" active. It also assumes the 518 IKEv2 SA state was synced between active and standby member of the 519 Hot Standby Cluster before the failover event. 520 o Session Resumption. Session resumption assumes that peer i.e. 521 client or initiator detects the need to re-establish the session. 522 In IKEv2/IPsec SA counter cynchronization, standby member which 523 becomes active i.e. gateway or responder detects the need to 524 synchronize the SA counter after the failover event. Also in Hot 525 Standby Cluster, peer establishes the IKEv2/IPsec session with 526 single cluster's IP address, so peer normally does not detect the 527 event of failover in the cluster until standby member took very 528 long to become active and IKEv2 SA times out via liveness check. 529 So, session resumption and SA counter synchronization after 530 failover are mutually exclusive. 531 o This document describes the operation of tightly coupled clusters, 532 which are the common way of building IPsec clusters. In these 533 clusters, all members appear to the peer as one gateway, 534 specifically they share a single IP address. High availability 535 can also be provided by loosely coupled clusters (for lack of a 536 better term), which are a group of gateways that do not share an 537 IP address and do not synchronize state. In this architecture, 538 the client can use Session Resumption to fail-over from one 539 cluster member to another. Specifically this requires: 540 * Support of session resumption on peers and gateways. 541 * A common session resumption ticket format on all gateways (not 542 currently standardized). 544 * Configuration on the peers of the group of gateways that 545 constitute the cluster. 546 o Redirect. Redirect mechanism for load-balancing can be used 547 during init (IKE_SA_INIT) and auth (IKE_AUTH) and after session 548 establishment. While SA counter sync is used after IKE SA has 549 been established and failover event has occurred. So it is 550 mutually exclusive with redirect during init and auth. The 551 redirect after session established is used for timed or planned 552 shutdown/maintenance. The failover event can not be detected on 553 active member beforehand and so using redirect after session 554 establishment is not possible in case of failover. So, Redirect 555 and SA counter synchronization after failover are mutually 556 exclusive. 557 o Crash detection. Either SA counter information sync or crash 558 detection approach can be taken by standby member on failover 559 event. 561 11. IANA Considerations 563 This document introduces two new IKEv2 Notification Message types as 564 described in Section 6.The new Notify Message Types must be assigned 565 values between 16396 and 40959. 566 o SYNC_SA_COUNTER_INFO_SUPPORTED 567 o SYNC_SA_COUNTER_INFO 569 12. Acknowledgements 571 This draft is the combined effort of IPSECME WG assigned HA Design 572 team which consists of the following members (in alphabetical order) 573 Dacheng Zhang, Min Huang, Raj Singh, Yaron Sheffer and Yoav Nir. I 574 would like to thank Pratima Sethi and Frederic Detienne for their 575 valuable reviews and suggestions. 577 13. References 579 13.1. Normative References 581 [IKEv2bis] 582 Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, 583 "Internet Key Exchange Protocol: IKEv2", 584 draft-ietf-IPsecme-ikev2bis (work in progress), May 2010. 586 [IPsec Cluster Problem Statement] 587 Nir, Y., "IPsec Cluster Problem Statement", July 2010. 589 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 590 Requirement Levels", BCP 14, RFC 2119, March 1997. 592 13.2. Informative References 594 [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for 595 IKEv2", RFC 5685, November 2009. 597 [RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", 598 RFC 5723, January 2010. 600 Author's Address 602 Kalyani Garigipati 603 Cisco Systems, Inc. 604 SEZ Unit, Cessna Business Park 605 Bangalore, Karnataka 560025 606 India 608 Phone: +91 80 4426 4831 609 Email: kagarigi@cisco.com