idnits 2.17.1 draft-ietf-ipsecme-ipsec-ha-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 14, 2010) is 5118 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) -- Obsolete informational reference (is this intentional?): RFC 4718 (Obsoleted by RFC 5996) -- Obsolete informational reference (is this intentional?): RFC 3768 (ref. 'VRRP') (Obsoleted by RFC 5798) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Nir 3 Internet-Draft Check Point 4 Intended status: Informational April 14, 2010 5 Expires: October 16, 2010 7 IPsec High Availability and Load Sharing Problem Statement 8 draft-ietf-ipsecme-ipsec-ha-01 10 Abstract 12 This document describes a requirement from IKE and IPsec to allow for 13 more scalable and available deployments for VPNs. It defines 14 terminology for high availability and load sharing clusters 15 implementing IKE and IPsec, and describes gaps in the existing 16 standards. 18 Status of this Memo 20 This Internet-Draft is submitted to IETF in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt. 36 The list of Internet-Draft Shadow Directories can be accessed at 37 http://www.ietf.org/shadow.html. 39 This Internet-Draft will expire on October 16, 2010. 41 Copyright Notice 43 Copyright (c) 2010 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the BSD License. 56 This document may contain material from IETF Documents or IETF 57 Contributions published or made publicly available before November 58 10, 2008. The person(s) controlling the copyright in some of this 59 material may not have granted the IETF Trust the right to allow 60 modifications of such material outside the IETF Standards Process. 61 Without obtaining an adequate license from the person(s) controlling 62 the copyright in such materials, this document may not be modified 63 outside the IETF Standards Process, and derivative works of it may 64 not be created outside the IETF Standards Process, except to format 65 it for publication as an RFC or to translate it into languages other 66 than English. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 1.1. Conventions Used in This Document . . . . . . . . . . . . 4 72 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 3. The Problem Statement . . . . . . . . . . . . . . . . . . . . 6 74 3.1. Lots of Long Lived State . . . . . . . . . . . . . . . . . 6 75 3.2. IKE Counters . . . . . . . . . . . . . . . . . . . . . . . 6 76 3.3. Outbound SA Counters . . . . . . . . . . . . . . . . . . . 7 77 3.4. Inbound SA Counters . . . . . . . . . . . . . . . . . . . 7 78 3.5. Missing Synch Messages . . . . . . . . . . . . . . . . . . 8 79 3.6. Simultaneous use of IKE and IPsec SAs by Different 80 Members . . . . . . . . . . . . . . . . . . . . . . . . . 8 81 3.6.1. Outbound SAs using counter modes . . . . . . . . . . . 9 82 4. Security Considerations . . . . . . . . . . . . . . . . . . . 10 83 5. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 10 84 6. Informative References . . . . . . . . . . . . . . . . . . . . 10 85 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11 87 1. Introduction 89 IKEv2, as described in [RFC4306] and [RFC4718], and IPsec, as 90 described in [RFC4301] and others, allows deployment of VPNs between 91 different sites as well as from VPN clients to protected networks. 93 As VPNs become increasingly important to the organizations deploying 94 them, there is a demand to make IPsec solutions more scalable and 95 less prone to down time, by using more than one physical gateway to 96 either share the load or back each other up. Similar demands have 97 been made in the past for other critical pieces of an organizations's 98 infrastructure, such as DHCP and DNS servers, web servers, databases 99 and others. 101 IKE and IPsec are in particular less friendly to clustering than 102 these other protocols, because they store more state, and that state 103 is more volatile. Section 2 defines terminology for use in this 104 document, and in the envisioned solution documents. 106 In general, deploying IKE and IPsec in a cluster requires such a 107 large amount of information to be synchronized among the members of 108 the cluster, that it becomes impractical. Alternatively, if less 109 information is synchronized, failover would mean a prolonged and 110 intensive recovery phase, which negates the scalability and 111 availability promises of using clusters. In Section 3 we will 112 describe this in more detail. 114 1.1. Conventions Used in This Document 116 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 117 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 118 document are to be interpreted as described in [RFC2119]. 120 2. Terminology 122 "Single Gateway" is an implementation of IKE and IPsec enforcing a 123 certain policy, as described in [RFC4301]. 125 "Cluster" is a set of two or more gateways, implementing the same 126 security policy, and protecting the same domain. Clusters exist to 127 provide both high availability through redundancy, and scalability 128 through load sharing. 130 "Member" is one gateway in a cluster. 132 "High Availability" is a condition of a system, not a configuration 133 type. A system is said to have high availability if its expected 134 down time is low. High availability can be achieved in various ways, 135 one of which is clustering. All the clusters described in this 136 document achieve high availability. 138 "Fault Tolerance" is a condition related to high availability, where 139 a system maintains service availability, even when a specified set of 140 fault conditions occur. In clusters, we expect the system to 141 maintain service availability, when one or more of the cluster 142 members fails. 144 "Completely Transparent Cluster" is a cluster where the occurence of 145 a fault is never visible to the peers. 147 "Partially Transparent Cluster" is a cluster where the occurence of a 148 fault may be visible to the peers. 150 "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of 151 the members is active at any one time. This member is also referred 152 to as the the "active", whereas the others are referred to as "stand- 153 bys". [VRRP] is one method of building such a cluster. 155 "Load Sharing Cluster", or "LS Cluster" is a cluster where more than 156 one of the members may be active at the same time. The term "load 157 balancing" is also common, but it implies that the load is actually 158 balanced between the members, and we don't want to even imply that 159 this is a requirement. 161 "Failover" is the event where a one member takes over some load from 162 some other member. In a hot standby cluster, this hapens when a 163 standby memeber becomes active due to a failure of the former active 164 member, or because of an administrator command. In a load sharing 165 cluster this usually happens because of a failure of one of the 166 members, but certain load-balancing technologies may allow a 167 particular load (an SA) to move from one member to another to even 168 out the load, even without any failures. 170 "Tight Cluster" is a cluster where all the members share an IP 171 address. This could be accomplished using configured interfaces with 172 specialized protocols or hardware, such as VRRP, or through the use 173 of multicast addresses, but in any case, peers need only be 174 configured with one IP address in the PAD. 176 "Loose Cluster" is a cluster where each member has a different IP 177 address. Peers find the correct member using some method such as DNS 178 queries or [REDIRECT]. 180 "Synch Channel" is a communications channel among the cluster 181 members, used to transfer state information. The synch channel may 182 or may not be IP based, may or may not be encrypted, and may work 183 over short or long distances. The security and physical 184 characteristics of this channel are out of scope for this document, 185 but it is a requirement that its use be minimized for scalability. 187 3. The Problem Statement 189 This document will make no attempt to describe the problems in 190 setting up a cluster. The following subsections describe the 191 problems related to the protocol itself. 193 We also ignore the problem of synchronizing the policy between 194 cluster members, as this is an administrative issue that is not 195 particular to either clusters or to IPsec. 197 Note that the interesting scenario here is VPN, whether tunneled 198 site-to-site or remote access. host-to-host transport mode is not 199 expected to benefit from this work. 201 3.1. Lots of Long Lived State 203 IKE and IPsec have a lot of long lived state: 204 o IKE SAs last for minutes, hours, or days, and carry keys and other 205 information. Some gateways may carry thousands to hundreds of 206 thousands of IKE SAs. 207 o IPsec SAs last for minutes or hours, and carry keys, selectors and 208 other information. Some gateways may carry hundreds of thousands 209 such IPsec SAs. 210 o SPD Cache entries. While the SPD is unchanging, the SPD cache 211 changes on the fly due to narrowing. Entries last at least as 212 long as the SAD entries, but tend to last even longer than that. 214 A naive implementation of a high availability cluster would have no 215 synchronized state, and a failover would produce an effect similar to 216 that of a rebooted gateway. [resumption] describes how new IKE and 217 IPsec SAs can be recreated in such a case. 219 3.2. IKE Counters 221 We can overcome the first problem described in Section 3.1, by 222 synchronizing states - whenever an SA is created, we can synch this 223 new state to all other members. However, those states are not only 224 long-lived, they are also ever changing. 226 IKE has message counters. A peer may not process message n until 227 after it has processed message n-1. Skipping message IDs is not 228 allowed. So a newly-active member needs to know the last message IDs 229 both received and transmitted. 231 Often, it is feasible to synchronize the IKE message counters for 232 every IKE exchange. This way, the newly active member knows what 233 messages it is allowed to process, and what message IDs to use on IKE 234 requests, so that peers process them. 236 3.3. Outbound SA Counters 238 ESP and AH have an optional anti-replay feature, where every 239 protected packet carries a counter number. Repeating counter numbers 240 is considered an attack, so the newly-active member must not use a 241 replay counter number that has already been used. The peer will drop 242 those packets as duplicates and/or warn of an attack. 244 Though it may be feasible to synchronize the IKE message counters, it 245 is almost never feasible to synchronize the IPsec packet counters for 246 every IPsec packet transmitted. So we have to assume that at least 247 for IPsec, the replay counter will not be up-to-date on the newly- 248 active member, and the newly-active member may repeat a counter. 250 A possible solution is to synch replay counter information, not for 251 each packet emitted, but only at regular intervals, say, every 10,000 252 packets or every 0.5 seconds. After a failover, the newly-active 253 member advances the counters for outbound SAs by 10,000. To the peer 254 this looks like up to 10,000 packets were lost, but this should be 255 acceptable, as neither ESP nor AH guarantee reliable delivery. 257 3.4. Inbound SA Counters 259 An even tougher issue, is the synchronization of packet counters for 260 inbound SAs. If a packet arrives at a newly-active member, there is 261 no way to determine whether this packet is a replay or not. The 262 periodic synch does not solve the problem at all, because suppose we 263 synchronize every 10,000 packets, and the last synch before the 264 failover had the counter at 170,000. It is probable, though not 265 certain, that packet number 180,000 has not yet been processed, but 266 if packet 175,000 arrives at the newly- active member, it has no way 267 of determining whether or not that packet has or has not already been 268 processed. The synchronization does prevent the processing of really 269 old packets, such as those with counter number 165,000. Ignoring all 270 counters below 180,000 won't work either, because that's up to 10,000 271 dropped packets, which may be very noticeable. 273 The easiest solution is to learn the replay counter from the incoming 274 traffic. This is allowed by the standards, because replay counter 275 verification is an optional feature. The case can even be made that 276 it is relatively secure, because non-attack traffic will reset the 277 counters to what they should be, so an attacker faces the dual 278 challenge of a very narrow window for attack, and the need to time 279 the attack to a failover event. Unless the attacker can actually 280 cause the failover, this would be very difficult. It should be 281 noted, though, that although this solution is acceptable as far as 282 RFC 4301 goes, it is a matter of policy whether this is acceptable. 284 Another possible solution to the inbound SA problem is to rekey all 285 child SAs following a failover. This may or may not be feasible 286 depending on the implementation and the configuration. 288 3.5. Missing Synch Messages 290 The synch channel is very likely not to be infallible. Before 291 failover is detected, some synchronization messages may have been 292 missed. For example, the active member may have created a new Child 293 SA using message n. The new information (entry in the SAD and update 294 to counters of the IKE SA) is sent on the synch channel. Still, with 295 every possible technology, the update may be missed before the 296 failover. 298 This is a bad situation, because the IKE SA is doomed. the newly- 299 active member has two problems: 300 o It does not have the new IPsec SA pair. It will drop all incoming 301 packets protected with such an SA. This could be fixed by sending 302 some DELETEs and INVALID_SPI notifications, if it wasn't for the 303 other problem... 304 o The counters for the IKE SA show that only request n-1 has been 305 sent. The next request will get the message ID n, but that will 306 be rejected by the peer. After a sufficient number of 307 retransmissions and rejections, the whole IKE SA with all 308 associated IPsec SAs will get dropped. 310 The above scenario may be rare enough that it is acceptable that on a 311 configuration with thousands of IKE SAs, a few will need to be 312 recreated from scratch or using session resumption techniques. 313 However, detecting this may take a long time (several minutes) and 314 this negates the goal of creating a high availability cluster in the 315 first place. 317 3.6. Simultaneous use of IKE and IPsec SAs by Different Members 319 For load sharing clusters, all active members may need to use the 320 same SAs, both IKE and IPsec. This is an even greater problem than 321 in the case of HA, because consecutive packets may need to be sent by 322 different members to the same peer gateway. 324 The solution to the IKE SA issue is up to the application. It's 325 possible to create some locking mechanism over the synch channel, or 326 else have one member "own" the IKE SA and manage the child SAs for 327 all other members. For IPsec, solutions fall into two broad 328 categories. 330 The first is the "sticky" category, where all communications with a 331 single peer, or all communications involving a certain SPD cache 332 entry go through a single peer. In this case, all packets that match 333 any particular SA go through the same member, so no synchronization 334 of the replay counter needs to be done. Inbound processing is a 335 "sticky" issue, because the packets have to be processed by the 336 correct member based on peer and SPI. Another issue is that 337 commodity load balancers will not be able to match the SPIs of the 338 encrypted side to the clear traffic, and so the wrong member may get 339 the the other half of the flow. 341 The other way, is to duplicate the child SAs, and have a pair of 342 IPsec SAs for each active member. Different packets for the same 343 peer go through different members, and get protected using different 344 SAs with the same selectors and matching the same entries in the SPD 345 cache. This has some shortcomings: 346 o It requires multiple parallel SAs, which the peer has no use for. 347 Section 2.8 or [RFC4306] specifically allows this, but some 348 implementation might have a policy against long term maintenance 349 of redundant SAs. 350 o Different packets that belong to the same flow may be protected by 351 different SAs, which may seem "weird" to the peer gateway, 352 especially if it is integrated with some deep inspection 353 middleware such as a firewall. It is not known whether this will 354 cause problems with current gateways. It is also impossible to 355 mandate against this, because the definition of "flow" varies from 356 one implementation to another. 357 o Reply packets may arrive with an IPsec SA that is not "matched" to 358 the one used for the outgoing packets. Also, they might arrive at 359 a different member. This problem is beyond the scope of this 360 document and should be solved by the application, perhaps by 361 forwarding misdirected packets to the correct gateway for deep 362 inspection. 364 3.6.1. Outbound SAs using counter modes 366 For SAs involving counter mode ciphers such as [CTR] or [GCM] there 367 is yet another complication. The initial vector for such modes must 368 never be repeated, and senders use methods such as counters or LFSRs 369 to ensure this. An SA shared between more than one active member, or 370 even failing over from one member to another need to make sure that 371 they do not generate the same initial vector. See [COUNTER_MODES] 372 for a discussion of this problem in another context. 374 4. Security Considerations 376 Implementations running on clusters MUST be as secure as 377 implementations running on single gateways. In other words, no 378 extension or interpretation used to allow operation in a cluster may 379 facilitate attacks that are not possible for single gateways. 381 Moreover, thought must be given to the synching requirements of any 382 protocol extension, to make sure that it does not create an 383 opportunity for denial of service attacks on the cluster. 385 As mentioned in Section 3.4, allowing an inbound child SA to fail 386 over to another member has the effect of disabling replay counter 387 protection for a short time. Though the threat is arguably low, it 388 is a policy decision whether this is acceptable. 390 5. Change Log 392 This is the first version, re-spun as an WG document 394 6. Informative References 396 [COUNTER_MODES] 397 McGrew, D. and B. Weis, "Using Counter Modes with 398 Encapsulating Security Payload (ESP) and Authentication 399 Header (AH) to Protect Group Traffic", 400 draft-ietf-msec-ipsec-group-counter-modes (work in 401 progress), March 2010. 403 [CTR] Housley, R., "Using Advanced Encryption Standard (AES) 404 Counter Mode", RFC 3686, January 2009. 406 [GCM] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode 407 (GCM) in IPsec Encapsulating Security Payload (ESP)", 408 RFC 4106, June 2005. 410 [REDIRECT] 411 Devarapalli, V. and K. Weniger, "Redirect Mechanism for 412 IKEv2", RFC 5685, November 2009. 414 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 415 Requirement Levels", BCP 14, RFC 2119, March 1997. 417 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 418 Internet Protocol", RFC 4301, December 2005. 420 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 421 RFC 4306, December 2005. 423 [RFC4718] Eronen, P. and P. Hoffman, "IKEv2 Clarifications and 424 Implementation Guidelines", RFC 4718, October 2006. 426 [VRRP] Hinden, R., "Virtual Router Redundancy Protocol (VRRP)", 427 RFC 3768, April 2004. 429 [resumption] 430 Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", 431 RFC 5723, January 2010. 433 Author's Address 435 Yoav Nir 436 Check Point Software Technologies Ltd. 437 5 Hasolelim st. 438 Tel Aviv 67897 439 Israel 441 Email: ynir@checkpoint.com