idnits 2.17.1 draft-ietf-ipsecme-ipsec-ha-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 1, 2010) is 5071 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nir 3 Internet-Draft Check Point 4 Intended status: Informational June 1, 2010 5 Expires: December 3, 2010 7 IPsec High Availability and Load Sharing Problem Statement 8 draft-ietf-ipsecme-ipsec-ha-04 10 Abstract 12 This document describes a requirement from IKE and IPsec to allow for 13 more scalable and available deployments for VPNs. It defines 14 terminology for high availability and load sharing clusters 15 implementing IKE and IPsec, and describes gaps in the existing 16 standards. 18 Status of this Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on December 3, 2010. 35 Copyright Notice 37 Copyright (c) 2010 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. Conventions Used in This Document . . . . . . . . . . . . 3 54 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 3. The Problem Statement . . . . . . . . . . . . . . . . . . . . 5 56 3.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 3.2. Lots of Long Lived State . . . . . . . . . . . . . . . . . 5 58 3.3. IKE Counters . . . . . . . . . . . . . . . . . . . . . . . 6 59 3.4. Outbound SA Counters . . . . . . . . . . . . . . . . . . . 6 60 3.5. Inbound SA Counters . . . . . . . . . . . . . . . . . . . 7 61 3.6. Missing Synch Messages . . . . . . . . . . . . . . . . . . 7 62 3.7. Simultaneous use of IKE and IPsec SAs by Different 63 Members . . . . . . . . . . . . . . . . . . . . . . . . . 8 64 3.7.1. Outbound SAs using counter modes . . . . . . . . . . . 9 65 3.8. Different IP addresses for IKE and IPsec . . . . . . . . . 9 66 3.9. Allocation of SPIs . . . . . . . . . . . . . . . . . . . . 9 67 4. Security Considerations . . . . . . . . . . . . . . . . . . . 10 68 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 69 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 70 7. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 10 71 8. Informative References . . . . . . . . . . . . . . . . . . . . 11 72 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12 74 1. Introduction 76 IKEv2, as described in [RFC4306] and [IKEv2bis], and IPsec, as 77 described in [RFC4301] and others, allows deployment of VPNs between 78 different sites as well as from VPN clients to protected networks. 80 As VPNs become increasingly important to the organizations deploying 81 them, there is a demand to make IPsec solutions more scalable and 82 less prone to down time, by using more than one physical gateway to 83 either share the load or back each other up. Similar demands have 84 been made in the past for other critical pieces of an organizations's 85 infrastructure, such as DHCP and DNS servers, web servers, databases 86 and others. 88 IKE and IPsec are in particular less friendly to clustering than 89 these other protocols, because they store more state, and that state 90 is more volatile. Section 2 defines terminology for use in this 91 document, and in the envisioned solution documents. 93 In general, deploying IKE and IPsec in a cluster requires such a 94 large amount of information to be synchronized among the members of 95 the cluster, that it becomes impractical. Alternatively, if less 96 information is synchronized, failover would mean a prolonged and 97 intensive recovery phase, which negates the scalability and 98 availability promises of using clusters. In Section 3 we will 99 describe this in more detail. 101 1.1. Conventions Used in This Document 103 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 104 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 105 document are to be interpreted as described in [RFC2119]. 107 2. Terminology 109 "Single Gateway" is an implementation of IKE and IPsec enforcing a 110 certain policy, as described in [RFC4301]. 112 "Cluster" is a set of two or more gateways, implementing the same 113 security policy, and protecting the same domain. Clusters exist to 114 provide both high availability through redundancy, and scalability 115 through load sharing. 117 "Member" is one gateway in a cluster. 119 "High Availability" is a condition of a system, not a configuration 120 type. A system is said to have high availability if its expected 121 down time is low. High availability can be achieved in various ways, 122 one of which is clustering. All the clusters described in this 123 document achieve high availability. 125 "Fault Tolerance" is a condition related to high availability, where 126 a system maintains service availability, even when a specified set of 127 fault conditions occur. In clusters, we expect the system to 128 maintain service availability, when one or more of the cluster 129 members fails. 131 "Completely Transparent Cluster" is a cluster where the occurence of 132 a fault is never visible to the peers. 134 "Partially Transparent Cluster" is a cluster where the occurence of a 135 fault may be visible to the peers. 137 "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of 138 the members is active at any one time. This member is also referred 139 to as the the "active", whereas the others are referred to as "stand- 140 bys". [VRRP] is one method of building such a cluster. 142 "Load Sharing Cluster", or "LS Cluster" is a cluster where more than 143 one of the members may be active at the same time. The term "load 144 balancing" is also common, but it implies that the load is actually 145 balanced between the members, and we don't want to even imply that 146 this is a requirement. 148 "Failover" is the event where a one member takes over some load from 149 some other member. In a hot standby cluster, this hapens when a 150 standby member becomes active due to a failure of the former active 151 member, or because of an administrator command. In a load sharing 152 cluster this usually happens because of a failure of one of the 153 members, but certain load-balancing technologies may allow a 154 particular load (such as all the flows associated with a particular 155 child SA) to move from one member to another to even out the load, 156 even without any failures. 158 "Tight Cluster" is a cluster where all the members share an IP 159 address. This could be accomplished using configured interfaces with 160 specialized protocols or hardware, such as VRRP, or through the use 161 of multicast addresses, but in any case, peers need only be 162 configured with one IP address in the PAD. 164 "Loose Cluster" is a cluster where each member has a different IP 165 address. Peers find the correct member using some method such as DNS 166 queries or [REDIRECT]. In some cases, members IP addresses may be 167 allocated to other members at failover. 169 "Synch Channel" is a communications channel among the cluster 170 members, used to transfer state information. The synch channel may 171 or may not be IP based, may or may not be encrypted, and may work 172 over short or long distances. The security and physical 173 characteristics of this channel are out of scope for this document, 174 but it is a requirement that its use be minimized for scalability. 176 3. The Problem Statement 178 This section starts by scoping the problem, and goes on to list each 179 of the issues encountered while setting up a cluster of IPsec VPN 180 gateways. 182 3.1. Scope 184 This document will make no attempt to describe the problems in 185 setting up a generic cluster. It describes only problems related to 186 the IKE/IPsec protocols. 188 The problem of synchronizing the policy between cluster members is 189 out of scope, as this is an administrative issue that is not 190 particular to either clusters or to IPsec. 192 The interesting scenario here is VPN, whether tunneled site-to-site 193 or remote access. Host-to-host transport mode is not expected to 194 benefit from this work. 196 We do not describe in full the problems of the communication channel 197 between cluster members (the Synch Channel), nor do we intend to 198 specify anything in this space later. Specifically, mixed-vendor 199 clusters are out of scope. 201 The problem statement anticipates possible protocol-level solutions 202 between IKE/IPsec peers, in order to improve the availability and/or 203 performance of VPN clusters. One vendor's IPsec endpoint should be 204 able to work, optimally, with another vendor's cluster. 206 3.2. Lots of Long Lived State 208 IKE and IPsec have a lot of long lived state: 209 o IKE SAs last for minutes, hours, or days, and carry keys and other 210 information. Some gateways may carry thousands to hundreds of 211 thousands of IKE SAs. 212 o IPsec SAs last for minutes or hours, and carry keys, selectors and 213 other information. Some gateways may carry hundreds of thousands 214 such IPsec SAs. 216 o SPD Cache entries. While the SPD is unchanging, the SPD cache 217 changes on the fly due to narrowing. Entries last at least as 218 long as the SAD entries, but tend to last even longer than that. 220 A naive implementation of a high availability cluster would have no 221 synchronized state, and a failover would produce an effect similar to 222 that of a rebooted gateway. [resumption] describes how new IKE and 223 IPsec SAs can be recreated in such a case. 225 3.3. IKE Counters 227 We can overcome the first problem described in Section 3.2, by 228 synchronizing states - whenever an SA is created, we can synch this 229 new state to all other members. However, those states are not only 230 long-lived, they are also ever changing. 232 IKE has message counters. A peer may not process message n until 233 after it has processed message n-1. Skipping message IDs is not 234 allowed. So a newly-active member needs to know the last message IDs 235 both received and transmitted. 237 Often, it is feasible to synchronize the IKE message counters for 238 every IKE exchange. This way, the newly active member knows what 239 messages it is allowed to process, and what message IDs to use on IKE 240 requests, so that peers process them. 242 3.4. Outbound SA Counters 244 ESP and AH have an optional anti-replay feature, where every 245 protected packet carries a counter number. Repeating counter numbers 246 is considered an attack, so the newly-active member must not use a 247 replay counter number that has already been used. The peer will drop 248 those packets as duplicates and/or warn of an attack. 250 Though it may be feasible to synchronize the IKE message counters, it 251 is almost never feasible to synchronize the IPsec packet counters for 252 every IPsec packet transmitted. So we have to assume that at least 253 for IPsec, the replay counter will not be up-to-date on the newly- 254 active member, and the newly-active member may repeat a counter. 256 A possible solution is to synch replay counter information, not for 257 each packet emitted, but only at regular intervals, say, every 10,000 258 packets or every 0.5 seconds. After a failover, the newly-active 259 member advances the counters for outbound SAs by 10,000. To the peer 260 this looks like up to 10,000 packets were lost, but this should be 261 acceptable, as neither ESP nor AH guarantee reliable delivery. 263 3.5. Inbound SA Counters 265 An even tougher issue, is the synchronization of packet counters for 266 inbound SAs. If a packet arrives at a newly-active member, there is 267 no way to determine whether this packet is a replay or not. The 268 periodic synch does not solve the problem at all, because suppose we 269 synchronize every 10,000 packets, and the last synch before the 270 failover had the counter at 170,000. It is probable, though not 271 certain, that packet number 180,000 has not yet been processed, but 272 if packet 175,000 arrives at the newly- active member, it has no way 273 of determining whether or not that packet has or has not already been 274 processed. The synchronization does prevent the processing of really 275 old packets, such as those with counter number 165,000. Ignoring all 276 counters below 180,000 won't work either, because that's up to 10,000 277 dropped packets, which may be very noticeable. 279 The easiest solution is to learn the replay counter from the incoming 280 traffic. This is allowed by the standards, because replay counter 281 verification is an optional feature. The case can even be made that 282 it is relatively secure, because non-attack traffic will reset the 283 counters to what they should be, so an attacker faces the dual 284 challenge of a very narrow window for attack, and the need to time 285 the attack to a failover event. Unless the attacker can actually 286 cause the failover, this would be very difficult. It should be 287 noted, though, that although this solution is acceptable as far as 288 RFC 4301 goes, it is a matter of policy whether this is acceptable. 290 Another possible solution to the inbound SA problem is to rekey all 291 child SAs following a failover. This may or may not be feasible 292 depending on the implementation and the configuration. 294 3.6. Missing Synch Messages 296 The synch channel is very likely not to be infallible. Before 297 failover is detected, some synchronization messages may have been 298 missed. For example, the active member may have created a new Child 299 SA using message n. The new information (entry in the SAD and update 300 to counters of the IKE SA) is sent on the synch channel. Still, with 301 every possible technology, the update may be missed before the 302 failover. 304 This is a bad situation, because the IKE SA is doomed. the newly- 305 active member has two problems: 306 o It does not have the new IPsec SA pair. It will drop all incoming 307 packets protected with such an SA. This could be fixed by sending 308 some DELETEs and INVALID_SPI notifications, if it wasn't for the 309 other problem... 311 o The counters for the IKE SA show that only request n-1 has been 312 sent. The next request will get the message ID n, but that will 313 be rejected by the peer. After a sufficient number of 314 retransmissions and rejections, the whole IKE SA with all 315 associated IPsec SAs will get dropped. 317 The above scenario may be rare enough that it is acceptable that on a 318 configuration with thousands of IKE SAs, a few will need to be 319 recreated from scratch or using session resumption techniques. 320 However, detecting this may take a long time (several minutes) and 321 this negates the goal of creating a high availability cluster in the 322 first place. 324 3.7. Simultaneous use of IKE and IPsec SAs by Different Members 326 For load sharing clusters, all active members may need to use the 327 same SAs, both IKE and IPsec. This is an even greater problem than 328 in the case of HA, because consecutive packets may need to be sent by 329 different members to the same peer gateway. 331 The solution to the IKE SA issue is up to the application. It's 332 possible to create some locking mechanism over the synch channel, or 333 else have one member "own" the IKE SA and manage the child SAs for 334 all other members. For IPsec, solutions fall into two broad 335 categories. 337 The first is the "sticky" category, where all communications with a 338 single peer, or all communications involving a certain SPD cache 339 entry go through a single peer. In this case, all packets that match 340 any particular SA go through the same member, so no synchronization 341 of the replay counter needs to be done. Inbound processing is a 342 "sticky" issue, because the packets have to be processed by the 343 correct member based on peer and SPI. Another issue is that 344 commodity load balancers will not be able to match the SPIs of the 345 encrypted side to the clear traffic, and so the wrong member may get 346 the the other half of the flow. 348 The other way, is to duplicate the child SAs, and have a pair of 349 IPsec SAs for each active member. Different packets for the same 350 peer go through different members, and get protected using different 351 SAs with the same selectors and matching the same entries in the SPD 352 cache. This has some shortcomings: 353 o It requires multiple parallel SAs, which the peer has no use for. 354 Section 2.8 or [RFC4306] specifically allows this, but some 355 implementation might have a policy against long term maintenance 356 of redundant SAs. 358 o Different packets that belong to the same flow may be protected by 359 different SAs, which may seem "weird" to the peer gateway, 360 especially if it is integrated with some deep inspection 361 middleware such as a firewall. It is not known whether this will 362 cause problems with current gateways. It is also impossible to 363 mandate against this, because the definition of "flow" varies from 364 one implementation to another. 365 o Reply packets may arrive with an IPsec SA that is not "matched" to 366 the one used for the outgoing packets. Also, they might arrive at 367 a different member. This problem is beyond the scope of this 368 document and should be solved by the application, perhaps by 369 forwarding misdirected packets to the correct gateway for deep 370 inspection. 372 3.7.1. Outbound SAs using counter modes 374 For SAs involving counter mode ciphers such as [CTR] or [GCM] there 375 is yet another complication. The initial vector for such modes must 376 never be repeated, and senders use methods such as counters or LFSRs 377 to ensure this. An SA shared between more than one active member, or 378 even failing over from one member to another need to make sure that 379 they do not generate the same initial vector. See [COUNTER_MODES] 380 for a discussion of this problem in another context. 382 3.8. Different IP addresses for IKE and IPsec 384 In many implementations there are separate IP addresses for the 385 cluster, and for each member. While the packets protected by tunnel 386 mode child SAs are encapsulated in IP headers with the cluster IP 387 address, the IKE packets originate from a specific member, and carry 388 that member's IP address. For the peer, this looks weird, as the 389 usual thing is for the IPsec packets to come from the same IP address 390 as the IKE packets. 392 One obvious solution, is to use some fancy capability of the IKE host 393 to change things so that IKE packets also come out of the cluster IP 394 address. This can be achieved through NAT or through assigning 395 multiple addresses to interfaces. This is not, however, possible for 396 all implementations. 398 [ARORA] discusses this problem in greater depth, and proposes another 399 solution, that does involve protocol changes. 401 3.9. Allocation of SPIs 403 The SPI associated with each child SA, and with each IKE SA, MUST be 404 unique relative to the peer of the SA. Thus, in the context of a 405 cluster, each cluster member MUST generate SPIs in a fashion that 406 avoids collisions (with other cluster members) for these SPI values. 407 The means by which cluster members achieve this requirement is a 408 local matter, outside the scope of this document. 410 4. Security Considerations 412 Implementations running on clusters MUST be as secure as 413 implementations running on single gateways. In other words, no 414 extension or interpretation used to allow operation in a cluster may 415 facilitate attacks that are not possible for single gateways. 417 Moreover, thought must be given to the synching requirements of any 418 protocol extension, to make sure that it does not create an 419 opportunity for denial of service attacks on the cluster. 421 As mentioned in Section 3.5, allowing an inbound child SA to fail 422 over to another member has the effect of disabling replay counter 423 protection for a short time. Though the threat is arguably low, it 424 is a policy decision whether this is acceptable. 426 5. IANA Considerations 428 This document has no actions for IANA. 430 6. Acknowledgements 432 This document is the collective work, and includes contribution from 433 many people who participate in the IPsecME working group. 435 The editor would particularly like to acknowledge the extensive 436 contribution of the following people (in alphabetical order): 437 Jitender Arora, Jean-Michel Combes, Dan Harkins, Steve Kent, Tero 438 Kivinen, Yaron Sheffer, Melinda Shore, and Rodney Van Meter. 440 7. Change Log 442 NOTE TO RFC EDITOR: REMOVE THIS SECTION BEFORE PUBLICATION 444 Version 00 was identical to draft-nir-ipsecme-ipsecha-ps-00, re-spun 445 as an WG document. 447 Version 01 included closing issues 177, 178 and 180, with updates to 448 terminology, and added discussion of inbound SAs and the CTR issue. 450 Version 02 includes comments by Yaron Sheffer and the acknowledgement 451 section. 453 Version 03 fixes some ID-nits, and adds the problem presented by 454 Jitender Arora in [ARORA]. 456 Version 04 fixes a spelling mistake, moves the scope discussion to a 457 subsection of its own (Section 3.1), and adds a short discussion of 458 the duplicate SPI problem, presented by Jean-Michel Combes. 460 8. Informative References 462 [ARORA] Arora, J. and P. Kumar, "Alternate Tunnel Addresses for 463 IKEv2", draft-arora-ipsecme-ikev2-alt-tunnel-addresses 464 (work in progress), April 2010. 466 [COUNTER_MODES] 467 McGrew, D. and B. Weis, "Using Counter Modes with 468 Encapsulating Security Payload (ESP) and Authentication 469 Header (AH) to Protect Group Traffic", 470 draft-ietf-msec-ipsec-group-counter-modes (work in 471 progress), March 2010. 473 [CTR] Housley, R., "Using Advanced Encryption Standard (AES) 474 Counter Mode", RFC 3686, January 2009. 476 [GCM] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode 477 (GCM) in IPsec Encapsulating Security Payload (ESP)", 478 RFC 4106, June 2005. 480 [IKEv2bis] 481 Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, 482 "Internet Key Exchange Protocol: IKEv2", 483 draft-ietf-ipsecme-ikev2bis (work in progress), May 2010. 485 [REDIRECT] 486 Devarapalli, V. and K. Weniger, "Redirect Mechanism for 487 IKEv2", RFC 5685, November 2009. 489 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 490 Requirement Levels", BCP 14, RFC 2119, March 1997. 492 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 493 Internet Protocol", RFC 4301, December 2005. 495 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 496 RFC 4306, December 2005. 498 [VRRP] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)", 499 RFC 5798, March 2010. 501 [resumption] 502 Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", 503 RFC 5723, January 2010. 505 Author's Address 507 Yoav Nir 508 Check Point Software Technologies Ltd. 509 5 Hasolelim st. 510 Tel Aviv 67897 511 Israel 513 Email: ynir@checkpoint.com