idnits 2.17.1 draft-ietf-ipsecme-ipsec-ha-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 24, 2010) is 5054 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nir 3 Internet-Draft Check Point 4 Intended status: Informational June 24, 2010 5 Expires: December 26, 2010 7 IPsec Cluster Problem Statement 8 draft-ietf-ipsecme-ipsec-ha-07 10 Abstract 12 This document defines terminology, problem statement and requirements 13 for implementing IKE and IPsec on clusters. It also describes gaps 14 in existing standards and their implementation that need to be 15 filled, in order to allow peers to interoperate with clusters from 16 different vendors. An agreed terminology, problem statement and 17 requirements will allow the IPSECME WG to consider development of 18 IPsec/IKEv2 mechanisms to simplify cluster implementations. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on December 26, 2010. 37 Copyright Notice 39 Copyright (c) 2010 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 1.1. Conventions Used in This Document . . . . . . . . . . . . 3 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 3. The Problem Statement . . . . . . . . . . . . . . . . . . . . 5 58 3.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.2. Lots of Long Lived State . . . . . . . . . . . . . . . . . 6 60 3.3. IKE Counters . . . . . . . . . . . . . . . . . . . . . . . 6 61 3.4. Outbound SA Counters . . . . . . . . . . . . . . . . . . . 6 62 3.5. Inbound SA Counters . . . . . . . . . . . . . . . . . . . 7 63 3.6. Missing Synch Messages . . . . . . . . . . . . . . . . . . 8 64 3.7. Simultaneous use of IKE and IPsec SAs by Different 65 Members . . . . . . . . . . . . . . . . . . . . . . . . . 8 66 3.7.1. Outbound SAs using counter modes . . . . . . . . . . . 9 67 3.8. Different IP addresses for IKE and IPsec . . . . . . . . . 9 68 3.9. Allocation of SPIs . . . . . . . . . . . . . . . . . . . . 10 69 4. Security Considerations . . . . . . . . . . . . . . . . . . . 10 70 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 71 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 72 7. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 11 73 8. Informative References . . . . . . . . . . . . . . . . . . . . 11 74 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12 76 1. Introduction 78 IKEv2, as described in [RFC4306] and [IKEv2bis], and IPsec, as 79 described in [RFC4301] and others, allows deployment of VPNs between 80 different sites as well as from VPN clients to protected networks. 82 As VPNs become increasingly important to the organizations deploying 83 them, there is a demand to make IPsec solutions more scalable and 84 less prone to down time, by using more than one physical gateway to 85 either share the load or back each other up, forming a "cluster" (see 86 Section 2). Similar demands have been made in the past for other 87 critical pieces of an organization's infrastructure, such as DHCP and 88 DNS servers, web servers, databases and others. 90 IKE and IPsec are in particular less friendly to clustering than 91 these other protocols, because they store more state, and that state 92 is more volatile. Section 2 defines terminology for use in this 93 document, and in the envisioned solution documents. 95 In general, deploying IKE and IPsec in a cluster requires such a 96 large amount of information to be synchronized among the members of 97 the cluster, that it becomes impractical. Alternatively, if less 98 information is synchronized, failover would mean a prolonged and 99 intensive recovery phase, which negates the scalability and 100 availability promises of using clusters. In Section 3 we will 101 describe this in more detail. 103 1.1. Conventions Used in This Document 105 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 106 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 107 document are to be interpreted as described in [RFC2119]. 109 2. Terminology 111 "Single Gateway" is an implementation of IKE and IPsec enforcing a 112 certain policy, as described in [RFC4301]. 114 "Cluster" is a set of two or more gateways, implementing the same 115 security policy, and protecting the same domain. Clusters exist to 116 provide both high availability through redundancy, and scalability 117 through load sharing. 119 "Member" is one gateway in a cluster. 121 "Availability" is a measure of a system's ability to perform the 122 service for which it was designed. It is measured as the percentage 123 of time a service is available, from the time it is supposed to be 124 available. Colloquially, availability is sometimes expressed in 125 "nines" rather than percentage, with 3 "nines" meaning 99.9% 126 availability, 4 "nines" meaning 99.99% availability, etc. 128 "High Availability" is a property of a system, not a configuration 129 type. A system is said to have high availability if its expected 130 down time is low. High availability can be achieved in various ways, 131 one of which is clustering. All the clusters described in this 132 document achieve high availability. What "high" means depends on the 133 application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of 134 down time per year in a system that is supposed to be available all 135 the time. 137 "Fault Tolerance" is a property related to high availability, where a 138 system maintains service availability, even when a specified set of 139 fault conditions occur. In clusters, we expect the system to 140 maintain service availability, when one or more of the cluster 141 members fails. 143 "Completely Transparent Cluster" is a cluster where the occurence of 144 a fault is never visible to the peers. 146 "Partially Transparent Cluster" is a cluster where the occurence of a 147 fault may be visible to the peers. 149 "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of 150 the members is active at any one time. This member is also referred 151 to as the "active", whereas the other(s) are referred to as "stand- 152 bys". VRRP ([RFC5798]) is one method of building such a cluster. 154 "Load Sharing Cluster", or "LS Cluster" is a cluster where more than 155 one of the members may be active at the same time. The term "load 156 balancing" is also common, but it implies that the load is actually 157 balanced between the members, and this is not a requirement. 159 "Failover" is the event where one member takes over some load from 160 some other member. In a hot standby cluster, this happens when a 161 standby member becomes active due to a failure of the former active 162 member, or because of an administrator command. In a load sharing 163 cluster, this usually happens because of a failure of one of the 164 members, but certain load-balancing technologies may allow a 165 particular load (such as all the flows associated with a particular 166 child SA) to move from one member to another to even out the load, 167 even without any failures. 169 "Tight Cluster" is a cluster where all the members share an IP 170 address. This could be accomplished using configured interfaces with 171 specialized protocols or hardware, such as VRRP, or through the use 172 of multicast addresses, but in any case, peers need only be 173 configured with one IP address in the PAD. 175 "Loose Cluster" is a cluster where each member has a different IP 176 address. Peers find the correct member using some method such as DNS 177 queries or the IKEv2 redirect mechanism ([RFC5685]). In some cases, 178 a member's IP address(es) may be allocated to another member at 179 failover. 181 "Synch Channel" is a communications channel among the cluster 182 members, used to transfer state information. The synch channel may 183 or may not be IP based, may or may not be encrypted, and may work 184 over short or long distances. The security and physical 185 characteristics of this channel are out of scope for this document, 186 but it is a requirement that its use be minimized for scalability. 188 3. The Problem Statement 190 This section starts by scoping the problem, and goes on to list each 191 of the issues encountered while setting up a cluster of IPsec VPN 192 gateways. 194 3.1. Scope 196 This document will make no attempt to describe the problems in 197 setting up a generic cluster. It describes only problems related to 198 the IKE/IPsec protocols. 200 The problem of synchronizing the policy between cluster members is 201 out of scope, as this is an administrative issue that is not 202 particular to either clusters or to IPsec. 204 The interesting scenario here is VPN, whether tunneled site-to-site 205 or remote access. Host-to-host transport mode is not expected to 206 benefit from this work. 208 We do not describe in full the problems of the communication channel 209 between cluster members (the Synch Channel), nor do we intend to 210 specify anything in this space later. Specifically, mixed-vendor 211 clusters are out of scope. 213 The problem statement anticipates possible protocol-level solutions 214 between IKE/IPsec peers, in order to improve the availability and/or 215 performance of VPN clusters. One vendor's IPsec endpoint should be 216 able to work, optimally, with another vendor's cluster. 218 3.2. Lots of Long Lived State 220 IKE and IPsec have a lot of long lived state: 221 o IKE SAs last for minutes, hours, or days, and carry keys and other 222 information. Some gateways may carry thousands to hundreds of 223 thousands of IKE SAs. 224 o IPsec SAs last for minutes or hours, and carry keys, selectors and 225 other information. Some gateways may carry hundreds of thousands 226 such IPsec SAs. 227 o SPD (Security Policy Database) Cache entries. While the SPD is 228 unchanging, the SPD cache changes on the fly due to narrowing. 229 Entries last at least as long as the SAD (Security Association 230 Database) entries, but tend to last even longer than that. 232 A naive implementation of a cluster would have no synchronized state, 233 and a failover would produce an effect similar to that of a rebooted 234 gateway. [RFC5723] describes how new IKE and IPsec SAs can be 235 recreated in such a case. 237 3.3. IKE Counters 239 We can overcome the first problem described in Section 3.2, by 240 synchronizing states - whenever an SA is created, we can synch this 241 new state to all other members. However, those states are not only 242 long-lived, they are also ever changing. 244 IKE has message counters. A peer MUST NOT process message n until 245 after it has processed message n-1. Skipping message IDs is not 246 allowed. So a newly-active member needs to know the last message IDs 247 both received and transmitted. 249 One possible solution, is to synchronize information about the IKE 250 message counters after every IKE exchange. This way, the newly 251 active member knows what messages it is allowed to process, and what 252 message IDs to use on IKE requests, so that peers process them. This 253 solution may be appropriate in some cases, but may be too onerous in 254 systems with lots of SAs. It also has the drawback, that it never 255 recovers from the missing synch message problem, which is described 256 in Section 3.6. 258 3.4. Outbound SA Counters 260 ESP and AH have an optional anti-replay feature, where every 261 protected packet carries a counter number. Repeating counter numbers 262 is considered an attack, so the newly-active member MUST NOT use a 263 replay counter number that has already been used. The peer will drop 264 those packets as duplicates and/or warn of an attack. 266 Though it may be feasible to synchronize the IKE message counters, it 267 is almost never feasible to synchronize the IPsec packet counters for 268 every IPsec packet transmitted. So we have to assume that at least 269 for IPsec, the replay counter will not be up-to-date on the newly- 270 active member, and the newly-active member may repeat a counter. 272 A possible solution is to synch replay counter information, not for 273 each packet emitted, but only at regular intervals, say, every 10,000 274 packets or every 0.5 seconds. After a failover, the newly-active 275 member advances the counters for outbound IPsec SAs by 10,000. To 276 the peer this looks like up to 10,000 packets were lost, but this 277 should be acceptable, as neither ESP nor AH guarantee reliable 278 delivery. 280 3.5. Inbound SA Counters 282 An even tougher issue is the synchronization of packet counters for 283 inbound IPsec SAs. If a packet arrives at a newly-active member, 284 there is no way to determine whether this packet is a replay or not. 285 The periodic synch does not solve the problem at all, because suppose 286 we synchronize every 10,000 packets, and the last synch before the 287 failover had the counter at 170,000. It is probable, though not 288 certain, that packet number 180,000 has not yet been processed, but 289 if packet 175,000 arrives at the newly- active member, it has no way 290 of determining whether or not that packet has or has not already been 291 processed. The synchronization does prevent the processing of really 292 old packets, such as those with counter number 165,000. Ignoring all 293 counters below 180,000 won't work either, because that's up to 10,000 294 dropped packets, which may be very noticeable. 296 The easiest solution is to learn the replay counter from the incoming 297 traffic. This is allowed by the standards, because replay counter 298 verification is an optional feature (see section 3.2 in [RFC4301]). 299 The case can even be made that it is relatively secure, because non- 300 attack traffic will reset the counters to what they should be, so an 301 attacker faces the dual challenge of a very narrow window for attack, 302 and the need to time the attack to a failover event. Unless the 303 attacker can actually cause the failover, this would be very 304 difficult. It should be noted, though, that although this solution 305 is acceptable as far as RFC 4301 goes, it is a matter of policy 306 whether this is acceptable. 308 Another possible solution to the inbound IPsec SA problem is to rekey 309 all child SAs following a failover. This may or may not be feasible 310 depending on the implementation and the configuration. 312 3.6. Missing Synch Messages 314 The synch channel is very likely not to be infallible. Before 315 failover is detected, some synchronization messages may have been 316 missed. For example, the active member may have created a new Child 317 SA using message n. The new information (entry in the SAD and update 318 to counters of the IKE SA) is sent on the synch channel. Still, with 319 every possible technology, the update may be missed before the 320 failover. 322 This is a bad situation, because the IKE SA is doomed. The newly- 323 active member has two problems: 324 o It does not have the new IPsec SA pair. It will drop all incoming 325 packets protected with such an SA. This could be fixed by sending 326 some DELETEs and INVALID_SPI notifications, if it wasn't for the 327 other problem... 328 o The counters for the IKE SA show that only request n-1 has been 329 sent. The next request will get the message ID n, but that will 330 be rejected by the peer. After a sufficient number of 331 retransmissions and rejections, the whole IKE SA with all 332 associated IPsec SAs will get dropped. 334 The above scenario may be rare enough that it is acceptable that on a 335 configuration with thousands of IKE SAs, a few will need to be 336 recreated from scratch or using session resumption techniques. 337 However, detecting this may take a long time (several minutes) and 338 this negates the goal of creating a cluster in the first place. 340 3.7. Simultaneous use of IKE and IPsec SAs by Different Members 342 For LS clusters, all active members may need to use the same SAs, 343 both IKE and IPsec. This is an even greater problem than in the case 344 of HS clusters, because consecutive packets may need to be sent by 345 different members to the same peer gateway. 347 The solution to the IKE SA issue is up to the application. It's 348 possible to create some locking mechanism over the synch channel, or 349 else have one member "own" the IKE SA and manage the child SAs for 350 all other members. For IPsec, solutions fall into two broad 351 categories. 353 The first is the "sticky" category, where all communications with a 354 single peer, or all communications involving a certain SPD cache 355 entry go through a single peer. In this case, all packets that match 356 any particular SA go through the same member, so no synchronization 357 of the replay counter needs to be done. Inbound processing is a 358 "sticky" issue, because the packets have to be processed by the 359 correct member based on peer and SPI. Another issue is that most 360 load balancers will not be able to match the SPIs of the encrypted 361 side to the clear traffic, and so the wrong member may get the the 362 other half of the flow. 364 The second is the "duplicate" category, where the child SA is 365 duplicated for each pair of IPsec SAs for each active member. 366 Different packets for the same peer go through different members, and 367 get protected using different SAs with the same selectors and 368 matching the same entries in the SPD cache. This has some 369 shortcomings: 370 o It requires multiple parallel SAs, which the peer has no use for. 371 Section 2.8 or [RFC4306] specifically allows this, but some 372 implementation might have a policy against long term maintenance 373 of redundant SAs. 374 o Different packets that belong to the same flow may be protected by 375 different SAs, which may seem "weird" to the peer gateway, 376 especially if it is integrated with some deep inspection 377 middleware such as a firewall. It is not known whether this will 378 cause problems with current gateways. It is also impossible to 379 mandate against this, because the definition of "flow" varies from 380 one implementation to another. 381 o Reply packets may arrive with an IPsec SA that is not "matched" to 382 the one used for the outgoing packets. Also, they might arrive at 383 a different member. This problem is beyond the scope of this 384 document and should be solved by the application, perhaps by 385 forwarding misdirected packets to the correct gateway for deep 386 inspection. 388 3.7.1. Outbound SAs using counter modes 390 For SAs involving counter mode ciphers such as CTR ([RFC3686]) or GCM 391 ([RFC4106]) there is yet another complication. The initial vector 392 for such modes MUST NOT be repeated, and senders use methods such as 393 counters or LFSRs to ensure this. An SA shared between more than one 394 active member, or even failing over from one member to another need 395 to make sure that they do not generate the same initial vector. See 396 [COUNTER_MODES] for a discussion of this problem in another context. 398 3.8. Different IP addresses for IKE and IPsec 400 In many implementations there are separate IP addresses for the 401 cluster, and for each member. While the packets protected by tunnel 402 mode child SAs are encapsulated in IP headers with the cluster IP 403 address, the IKE packets originate from a specific member, and carry 404 that member's IP address. For the peer, this looks weird, as the 405 usual thing is for the IPsec packets to come from the same IP address 406 as the IKE packets. 408 One obvious solution is to use some fancy capability of the IKE host 409 to change things so that IKE packets also come out of the cluster IP 410 address. This can be achieved through NAT or through assigning 411 multiple addresses to interfaces. This is not, however, possible for 412 all implementations. 414 [ARORA] discusses this problem in greater depth, and proposes another 415 solution, that does involve protocol changes. 417 3.9. Allocation of SPIs 419 The SPI associated with each child SA, and with each IKE SA, MUST be 420 unique relative to the peer of the SA. Thus, in the context of a 421 cluster, each cluster member MUST generate SPIs in a fashion that 422 avoids collisions (with other cluster members) for these SPI values. 423 The means by which cluster members achieve this requirement is a 424 local matter, outside the scope of this document. 426 4. Security Considerations 428 Implementations running on clusters MUST be as secure as 429 implementations running on single gateways. In other words, no 430 extension or interpretation used to allow operation in a cluster may 431 facilitate attacks that are not possible for single gateways. 433 Moreover, thought must be given to the synching requirements of any 434 protocol extension, to make sure that it does not create an 435 opportunity for denial of service attacks on the cluster. 437 As mentioned in Section 3.5, allowing an inbound child SA to fail 438 over to another member has the effect of disabling replay counter 439 protection for a short time. Though the threat is arguably low, it 440 is a policy decision whether this is acceptable. 442 5. IANA Considerations 444 This document has no actions for IANA. 446 6. Acknowledgements 448 This document is the collective work, and includes contribution from 449 many people who participate in the IPsecME working group. 451 The editor would particularly like to acknowledge the extensive 452 contribution of the following people (in alphabetical order): 454 Jitender Arora, Jean-Michel Combes, Dan Harkins, Steve Kent, Tero 455 Kivinen, Yaron Sheffer, Melinda Shore, and Rodney Van Meter. 457 7. Change Log 459 NOTE TO RFC EDITOR: REMOVE THIS SECTION BEFORE PUBLICATION 461 Version 00 was identical to draft-nir-ipsecme-ipsecha-ps-00, re-spun 462 as an WG document. 464 Version 01 included closing issues 177, 178 and 180, with updates to 465 terminology, and added discussion of inbound SAs and the CTR issue. 467 Version 02 includes comments by Yaron Sheffer and the acknowledgement 468 section. 470 Version 03 fixes some ID-nits, and adds the problem presented by 471 Jitender Arora in [ARORA]. 473 Version 04 fixes a spelling mistake, moves the scope discussion to a 474 subsection of its own (Section 3.1), and adds a short discussion of 475 the duplicate SPI problem, presented by Jean-Michel Combes. 477 Versions 05, 06 and 07 just corrected nits and notation 479 8. Informative References 481 [ARORA] Arora, J. and P. Kumar, "Alternate Tunnel Addresses for 482 IKEv2", draft-arora-ipsecme-ikev2-alt-tunnel-addresses 483 (work in progress), April 2010. 485 [COUNTER_MODES] 486 McGrew, D. and B. Weis, "Using Counter Modes with 487 Encapsulating Security Payload (ESP) and Authentication 488 Header (AH) to Protect Group Traffic", 489 draft-ietf-msec-ipsec-group-counter-modes (work in 490 progress), March 2010. 492 [IKEv2bis] 493 Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, 494 "Internet Key Exchange Protocol: IKEv2", 495 draft-ietf-ipsecme-ikev2bis (work in progress), May 2010. 497 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 498 Requirement Levels", BCP 14, RFC 2119, March 1997. 500 [RFC3686] Housley, R., "Using Advanced Encryption Standard (AES) 501 Counter Mode", RFC 3686, January 2009. 503 [RFC4106] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode 504 (GCM) in IPsec Encapsulating Security Payload (ESP)", 505 RFC 4106, June 2005. 507 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 508 Internet Protocol", RFC 4301, December 2005. 510 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 511 RFC 4306, December 2005. 513 [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for 514 IKEv2", RFC 5685, November 2009. 516 [RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", 517 RFC 5723, January 2010. 519 [RFC5798] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)", 520 RFC 5798, March 2010. 522 Author's Address 524 Yoav Nir 525 Check Point Software Technologies Ltd. 526 5 Hasolelim st. 527 Tel Aviv 67897 528 Israel 530 Email: ynir@checkpoint.com