idnits 2.17.1 draft-ietf-ipsecme-ipsec-ha-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 4, 2010) is 5042 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 4306 (Obsoleted by RFC 5996) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nir 3 Internet-Draft Check Point 4 Intended status: Informational July 4, 2010 5 Expires: January 5, 2011 7 IPsec Cluster Problem Statement 8 draft-ietf-ipsecme-ipsec-ha-09 10 Abstract 12 This document defines terminology, problem statement and requirements 13 for implementing IKE and IPsec on clusters. It also describes gaps 14 in existing standards and their implementation that need to be 15 filled, in order to allow peers to interoperate with clusters from 16 different vendors. An agreed terminology, problem statement and 17 requirements will allow IETF working gropus to consider development 18 of IPsec/IKEv2 mechanisms to simplify cluster implementations. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on January 5, 2011. 37 Copyright Notice 39 Copyright (c) 2010 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 1.1. Conventions Used in This Document . . . . . . . . . . . . 3 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 3. The Problem Statement . . . . . . . . . . . . . . . . . . . . 5 58 3.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.2. Lots of Long Lived State . . . . . . . . . . . . . . . . . 6 60 3.3. IKE Counters . . . . . . . . . . . . . . . . . . . . . . . 6 61 3.4. Outbound SA Counters . . . . . . . . . . . . . . . . . . . 6 62 3.5. Inbound SA Counters . . . . . . . . . . . . . . . . . . . 7 63 3.6. Missing Synch Messages . . . . . . . . . . . . . . . . . . 8 64 3.7. Simultaneous use of IKE and IPsec SAs by Different 65 Members . . . . . . . . . . . . . . . . . . . . . . . . . 8 66 3.7.1. Outbound SAs using counter modes . . . . . . . . . . . 9 67 3.8. Different IP addresses for IKE and IPsec . . . . . . . . . 9 68 3.9. Allocation of SPIs . . . . . . . . . . . . . . . . . . . . 10 69 4. Security Considerations . . . . . . . . . . . . . . . . . . . 10 70 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 71 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 72 7. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 11 73 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 74 8.1. Normative References . . . . . . . . . . . . . . . . . . . 11 75 8.2. Informative References . . . . . . . . . . . . . . . . . . 12 76 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12 78 1. Introduction 80 IKEv2, as described in [RFC4306] and [IKEv2bis], and IPsec, as 81 described in [RFC4301] and others, allows deployment of VPNs between 82 different sites as well as from VPN clients to protected networks. 84 As VPNs become increasingly important to the organizations deploying 85 them, there is a demand to make IPsec solutions more scalable and 86 less prone to down time, by using more than one physical gateway to 87 either share the load or back each other up, forming a "cluster" (see 88 Section 2). Similar demands have been made in the past for other 89 critical pieces of an organization's infrastructure, such as DHCP and 90 DNS servers, web servers, databases and others. 92 IKE and IPsec are in particular less friendly to clustering than 93 these other protocols, because they store more state, and that state 94 is more volatile. Section 2 defines terminology for use in this 95 document, and in the envisioned solution documents. 97 In general, deploying IKE and IPsec in a cluster requires such a 98 large amount of information to be synchronized among the members of 99 the cluster, that it becomes impractical. Alternatively, if less 100 information is synchronized, failover would mean a prolonged and 101 intensive recovery phase, which negates the scalability and 102 availability promises of using clusters. In Section 3 we will 103 describe this in more detail. 105 1.1. Conventions Used in This Document 107 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 108 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 109 document are to be interpreted as described in [RFC2119]. 111 2. Terminology 113 "Single Gateway" is an implementation of IKE and IPsec enforcing a 114 certain policy, as described in [RFC4301]. 116 "Cluster" is a set of two or more gateways, implementing the same 117 security policy, and protecting the same domain. Clusters exist to 118 provide both high availability through redundancy, and scalability 119 through load sharing. 121 "Member" is one gateway in a cluster. 123 "Availability" is a measure of a system's ability to perform the 124 service for which it was designed. It is measured as the percentage 125 of time a service is available, from the time it is supposed to be 126 available. Colloquially, availability is sometimes expressed in 127 "nines" rather than percentage, with 3 "nines" meaning 99.9% 128 availability, 4 "nines" meaning 99.99% availability, etc. 130 "High Availability" is a property of a system, not a configuration 131 type. A system is said to have high availability if its expected 132 down time is low. High availability can be achieved in various ways, 133 one of which is clustering. All the clusters described in this 134 document achieve high availability. What "high" means depends on the 135 application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of 136 down time per year in a system that is supposed to be available all 137 the time. 139 "Fault Tolerance" is a property related to high availability, where a 140 system maintains service availability, even when a specified set of 141 fault conditions occur. In clusters, we expect the system to 142 maintain service availability, when one or more of the cluster 143 members fails. 145 "Completely Transparent Cluster" is a cluster where the occurence of 146 a fault is never visible to the peers. 148 "Partially Transparent Cluster" is a cluster where the occurence of a 149 fault may be visible to the peers. 151 "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of 152 the members is active at any one time. This member is also referred 153 to as the "active", whereas the other(s) are referred to as "stand- 154 bys". VRRP ([RFC5798]) is one method of building such a cluster. 156 "Load Sharing Cluster", or "LS Cluster" is a cluster where more than 157 one of the members may be active at the same time. The term "load 158 balancing" is also common, but it implies that the load is actually 159 balanced between the members, and this is not a requirement. 161 "Failover" is the event where one member takes over some load from 162 some other member. In a hot standby cluster, this happens when a 163 standby member becomes active due to a failure of the former active 164 member, or because of an administrator command. In a load sharing 165 cluster, this usually happens because of a failure of one of the 166 members, but certain load-balancing technologies may allow a 167 particular load (such as all the flows associated with a particular 168 child SA) to move from one member to another to even out the load, 169 even without any failures. 171 "Tight Cluster" is a cluster where all the members share an IP 172 address. This could be accomplished using configured interfaces with 173 specialized protocols or hardware, such as VRRP, or through the use 174 of multicast addresses, but in any case, peers need only be 175 configured with one IP address in the Peer Authentication Database. 177 "Loose Cluster" is a cluster where each member has a different IP 178 address. Peers find the correct member using some method such as DNS 179 queries or the IKEv2 redirect mechanism ([RFC5685]). In some cases, 180 a member's IP address(es) may be allocated to another member at 181 failover. 183 "Synch Channel" is a communications channel among the cluster 184 members, used to transfer state information. The synch channel may 185 or may not be IP based, may or may not be encrypted, and may work 186 over short or long distances. The security and physical 187 characteristics of this channel are out of scope for this document, 188 but it is a requirement that its use be minimized for scalability. 190 3. The Problem Statement 192 This section starts by scoping the problem, and goes on to list each 193 of the issues encountered while setting up a cluster of IPsec VPN 194 gateways. 196 3.1. Scope 198 This document will make no attempt to describe the problems in 199 setting up a generic cluster. It describes only problems related to 200 the IKE/IPsec protocols. 202 The problem of synchronizing the policy between cluster members is 203 out of scope, as this is an administrative issue that is not 204 particular to either clusters or to IPsec. 206 The interesting scenario here is VPN, whether tunneled site-to-site 207 or remote access. Host-to-host transport mode is not expected to 208 benefit from this work. 210 We do not describe in full the problems of the communication channel 211 between cluster members (the Synch Channel), nor do we intend to 212 specify anything in this space later. Specifically, mixed-vendor 213 clusters are out of scope. 215 The problem statement anticipates possible protocol-level solutions 216 between IKE/IPsec peers, in order to improve the availability and/or 217 performance of VPN clusters. One vendor's IPsec endpoint should be 218 able to work, optimally, with another vendor's cluster. 220 3.2. Lots of Long Lived State 222 IKE and IPsec have a lot of long lived state: 223 o IKE SAs last for minutes, hours, or days, and carry keys and other 224 information. Some gateways may carry thousands to hundreds of 225 thousands of IKE SAs. 226 o IPsec SAs last for minutes or hours, and carry keys, selectors and 227 other information. Some gateways may carry hundreds of thousands 228 such IPsec SAs. 229 o SPD (Security Policy Database) Cache entries. While the SPD is 230 unchanging, the SPD cache changes on the fly due to narrowing. 231 Entries last at least as long as the SAD (Security Association 232 Database) entries, but tend to last even longer than that. 234 A naive implementation of a cluster would have no synchronized state, 235 and a failover would produce an effect similar to that of a rebooted 236 gateway. [RFC5723] describes how new IKE and IPsec SAs can be 237 recreated in such a case. 239 3.3. IKE Counters 241 We can overcome the first problem described in Section 3.2, by 242 synchronizing states - whenever an SA is created, we can synch this 243 new state to all other members. However, those states are not only 244 long-lived, they are also ever changing. 246 IKE has message counters. A peer MUST NOT process message n until 247 after it has processed message n-1. Skipping message IDs is not 248 allowed. So a newly-active member needs to know the last message IDs 249 both received and transmitted. 251 One possible solution, is to synchronize information about the IKE 252 message counters after every IKE exchange. This way, the newly 253 active member knows what messages it is allowed to process, and what 254 message IDs to use on IKE requests, so that peers process them. This 255 solution may be appropriate in some cases, but may be too onerous in 256 systems with lots of SAs. It also has the drawback, that it never 257 recovers from the missing synch message problem, which is described 258 in Section 3.6. 260 3.4. Outbound SA Counters 262 ESP and AH have an optional anti-replay feature, where every 263 protected packet carries a counter number. Repeating counter numbers 264 is considered an attack, so the newly-active member MUST NOT use a 265 replay counter number that has already been used. The peer will drop 266 those packets as duplicates and/or warn of an attack. 268 Though it may be feasible to synchronize the IKE message counters, it 269 is almost never feasible to synchronize the IPsec packet counters for 270 every IPsec packet transmitted. So we have to assume that at least 271 for IPsec, the replay counter will not be up-to-date on the newly- 272 active member, and the newly-active member may repeat a counter. 274 A possible solution is to synch replay counter information, not for 275 each packet emitted, but only at regular intervals, say, every 10,000 276 packets or every 0.5 seconds. After a failover, the newly-active 277 member advances the counters for outbound IPsec SAs by 10,000. To 278 the peer this looks like up to 10,000 packets were lost, but this 279 should be acceptable, as neither ESP nor AH guarantee reliable 280 delivery. 282 3.5. Inbound SA Counters 284 An even tougher issue is the synchronization of packet counters for 285 inbound IPsec SAs. If a packet arrives at a newly-active member, 286 there is no way to determine whether this packet is a replay or not. 287 The periodic synch does not solve the problem at all, because suppose 288 we synchronize every 10,000 packets, and the last synch before the 289 failover had the counter at 170,000. It is probable, though not 290 certain, that packet number 180,000 has not yet been processed, but 291 if packet 175,000 arrives at the newly- active member, it has no way 292 of determining whether or not that packet has or has not already been 293 processed. The synchronization does prevent the processing of really 294 old packets, such as those with counter number 165,000. Ignoring all 295 counters below 180,000 won't work either, because that's up to 10,000 296 dropped packets, which may be very noticeable. 298 The easiest solution is to learn the replay counter from the incoming 299 traffic. This is allowed by the standards, because replay counter 300 verification is an optional feature (see section 3.2 in [RFC4301]). 301 The case can even be made that it is relatively secure, because non- 302 attack traffic will reset the counters to what they should be, so an 303 attacker faces the dual challenge of a very narrow window for attack, 304 and the need to time the attack to a failover event. Unless the 305 attacker can actually cause the failover, this would be very 306 difficult. It should be noted, though, that although this solution 307 is acceptable as far as RFC 4301 goes, it is a matter of policy 308 whether this is acceptable. 310 Another possible solution to the inbound IPsec SA problem is to rekey 311 all child SAs following a failover. This may or may not be feasible 312 depending on the implementation and the configuration. 314 3.6. Missing Synch Messages 316 The synch channel is very likely not to be infallible. Before 317 failover is detected, some synchronization messages may have been 318 missed. For example, the active member may have created a new Child 319 SA using message n. The new information (entry in the SAD and update 320 to counters of the IKE SA) is sent on the synch channel. Still, with 321 every possible technology, the update may be missed before the 322 failover. 324 This is a bad situation, because the IKE SA is doomed. The newly- 325 active member has two problems: 326 o It does not have the new IPsec SA pair. It will drop all incoming 327 packets protected with such an SA. This could be fixed by sending 328 some DELETEs and INVALID_SPI notifications, if it wasn't for the 329 other problem... 330 o The counters for the IKE SA show that only request n-1 has been 331 sent. The next request will get the message ID n, but that will 332 be rejected by the peer. After a sufficient number of 333 retransmissions and rejections, the whole IKE SA with all 334 associated IPsec SAs will get dropped. 336 The above scenario may be rare enough that it is acceptable that on a 337 configuration with thousands of IKE SAs, a few will need to be 338 recreated from scratch or using session resumption techniques. 339 However, detecting this may take a long time (several minutes) and 340 this negates the goal of creating a cluster in the first place. 342 3.7. Simultaneous use of IKE and IPsec SAs by Different Members 344 For load sharing clusters, all active members may need to use the 345 same SAs, both IKE and IPsec. This is an even greater problem than 346 in the case of hot-standby clusters, because consecutive packets may 347 need to be sent by different members to the same peer gateway. 349 The solution to the IKE SA issue is up to the implementation. It's 350 possible to create some locking mechanism over the synch channel, or 351 else have one member "own" the IKE SA and manage the child SAs for 352 all other members. For IPsec, solutions fall into two broad 353 categories. 355 The first is the "sticky" category, where all communications with a 356 single peer, or all communications involving a certain SPD cache 357 entry go through a single peer. In this case, all packets that match 358 any particular SA go through the same member, so no synchronization 359 of the replay counter needs to be done. Inbound processing is a 360 "sticky" issue (no pun intended), because the packets have to be 361 processed by the correct member based on peer and SPI, and most load 362 balancers will not be able to match the SPIs to the correct member, 363 unless stickyness extends to all traffic with a particular peer. 364 Another disadvantage of sticky solutions, is that the load tends to 365 not distribute evenly, especially if one SA covers a significant 366 portion of IPsec traffic. 368 The second is the "duplicate" category, where the child SA is 369 duplicated for each pair of IPsec SAs for each active member. 370 Different packets for the same peer go through different members, and 371 get protected using different SAs with the same selectors and 372 matching the same entries in the SPD cache. This has some 373 shortcomings: 374 o It requires multiple parallel SAs, which the peer has no use for. 375 Section 2.8 of [RFC4306] specifically allows this, but some 376 implementation might have a policy against long term maintenance 377 of redundant SAs. 378 o Different packets that belong to the same flow may be protected by 379 different SAs, which may seem "weird" to the peer gateway, 380 especially if it is integrated with some deep inspection 381 middleware such as a firewall. It is not known whether this will 382 cause problems with current gateways. It is also impossible to 383 mandate against this, because the definition of "flow" varies from 384 one implementation to another. 385 o Reply packets may arrive with an IPsec SA that is not "matched" to 386 the one used for the outgoing packets. Also, they might arrive at 387 a different member. This problem is beyond the scope of this 388 document and should be solved by the application, perhaps by 389 forwarding misdirected packets to the correct gateway for deep 390 inspection. 392 3.7.1. Outbound SAs using counter modes 394 For SAs involving counter mode ciphers such as CTR ([RFC3686]) or GCM 395 ([RFC4106]) there is yet another complication. The initial vector 396 for such modes MUST NOT be repeated, and senders use methods such as 397 counters or LFSRs to ensure this. An SA shared between more than one 398 active member, or even failing over from one member to another need 399 to make sure that they do not generate the same initial vector. See 400 [COUNTER_MODES] for a discussion of this problem in another context. 402 3.8. Different IP addresses for IKE and IPsec 404 In many implementations there are separate IP addresses for the 405 cluster, and for each member. While the packets protected by tunnel 406 mode child SAs are encapsulated in IP headers with the cluster IP 407 address, the IKE packets originate from a specific member, and carry 408 that member's IP address. This may be done so that IPsec traffic 409 bypasses the load balancer for greater scalability. For the peer, 410 this looks weird, as the usual thing is for the IPsec packets to come 411 from the same IP address as the IKE packets. Unmodified peers may 412 drop such packets. 414 One obvious solution is to use some fancy capability of the IKE host 415 to change things so that IKE packets also come out of the cluster IP 416 address. This can be achieved through NAT or through assigning 417 multiple addresses to interfaces. This is not, however, possible for 418 all implementations, and will not reduce load on the balancer. 420 [ARORA] discusses this problem in greater depth, and proposes another 421 solution, that does involve protocol changes. 423 3.9. Allocation of SPIs 425 The SPI associated with each child SA, and with each IKE SA, MUST be 426 unique relative to the peer of the SA. Thus, in the context of a 427 cluster, each cluster member MUST generate SPIs in a fashion that 428 avoids collisions (with other cluster members) for these SPI values. 429 The means by which cluster members achieve this requirement is a 430 local matter, outside the scope of this document. 432 4. Security Considerations 434 Implementations running on clusters MUST be as secure as 435 implementations running on single gateways. In other words, no 436 extension or interpretation used to allow operation in a cluster may 437 facilitate attacks that are not possible for single gateways. 439 Moreover, thought must be given to the synching requirements of any 440 protocol extension, to make sure that it does not create an 441 opportunity for denial of service attacks on the cluster. 443 As mentioned in Section 3.5, allowing an inbound child SA to fail 444 over to another member has the effect of disabling replay counter 445 protection for a short time. Though the threat is arguably low, it 446 is a policy decision whether this is acceptable. 448 Section 3.7 describes the problem of the two directions of a flow 449 being protected by two SAs which are not part of a matched pair, or 450 even not being processed by the same cluster member. This is not a 451 security problem as far as IPsec is concerned, because IPsec has 452 policy at the IP, protocol and port level only. However, many IPsec 453 implementations are integrated with stateful firewalls, which need to 454 see both sides of a flow. Such implementations may have to forward 455 packets to other members for the firewall to properly inspect the 456 traffic. 458 5. IANA Considerations 460 This document has no actions for IANA. 462 6. Acknowledgements 464 This document is the collective work, and includes contribution from 465 many people who participate in the IPsecME working group. 467 The editor would particularly like to acknowledge the extensive 468 contribution of the following people (in alphabetical order): 469 Jitender Arora, Jean-Michel Combes, Dan Harkins, David Harrington, 470 Steve Kent, Tero Kivinen, Alexey Melnikov, Yaron Sheffer, Melinda 471 Shore, and Rodney Van Meter. 473 7. Change Log 475 NOTE TO RFC EDITOR: REMOVE THIS SECTION BEFORE PUBLICATION 477 Version 00 was identical to draft-nir-ipsecme-ipsecha-ps-00, re-spun 478 as an WG document. 480 Version 01 included closing issues 177, 178 and 180, with updates to 481 terminology, and added discussion of inbound SAs and the CTR issue. 483 Version 02 includes comments by Yaron Sheffer and the acknowledgement 484 section. 486 Version 03 fixes some ID-nits, and adds the problem presented by 487 Jitender Arora in [ARORA]. 489 Version 04 fixes a spelling mistake, moves the scope discussion to a 490 subsection of its own (Section 3.1), and adds a short discussion of 491 the duplicate SPI problem, presented by Jean-Michel Combes. 493 Versions 05, 06 and 07 just corrected nits and notation 495 Versions 08 and 09 include suggestions from the IETF last call. 497 8. References 499 8.1. Normative References 501 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 502 Requirement Levels", BCP 14, RFC 2119, March 1997. 504 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 505 Internet Protocol", RFC 4301, December 2005. 507 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 508 RFC 4306, December 2005. 510 8.2. Informative References 512 [ARORA] Arora, J. and P. Kumar, "Alternate Tunnel Addresses for 513 IKEv2", draft-arora-ipsecme-ikev2-alt-tunnel-addresses 514 (work in progress), April 2010. 516 [COUNTER_MODES] 517 McGrew, D. and B. Weis, "Using Counter Modes with 518 Encapsulating Security Payload (ESP) and Authentication 519 Header (AH) to Protect Group Traffic", 520 draft-ietf-msec-ipsec-group-counter-modes (work in 521 progress), March 2010. 523 [IKEv2bis] 524 Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, 525 "Internet Key Exchange Protocol: IKEv2", 526 draft-ietf-ipsecme-ikev2bis (work in progress), May 2010. 528 [RFC3686] Housley, R., "Using Advanced Encryption Standard (AES) 529 Counter Mode", RFC 3686, January 2009. 531 [RFC4106] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode 532 (GCM) in IPsec Encapsulating Security Payload (ESP)", 533 RFC 4106, June 2005. 535 [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for 536 IKEv2", RFC 5685, November 2009. 538 [RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", 539 RFC 5723, January 2010. 541 [RFC5798] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)", 542 RFC 5798, March 2010. 544 Author's Address 546 Yoav Nir 547 Check Point Software Technologies Ltd. 548 5 Hasolelim st. 549 Tel Aviv 67897 550 Israel 552 Email: ynir@checkpoint.com