idnits 2.17.1 

draft-ietf-ipsecme-ipsec-ha-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 1, 2010) is 5071 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Obsolete informational reference (is this intentional?): RFC 4306
     (Obsoleted by RFC 5996)


     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                             Y. Nir
3	Internet-Draft                                               Check Point
4	Intended status: Informational                              June 1, 2010
5	Expires: December 3, 2010

7	       IPsec High Availability and Load Sharing Problem Statement
8	                     draft-ietf-ipsecme-ipsec-ha-04

10	Abstract

12	   This document describes a requirement from IKE and IPsec to allow for
13	   more scalable and available deployments for VPNs.  It defines
14	   terminology for high availability and load sharing clusters
15	   implementing IKE and IPsec, and describes gaps in the existing
16	   standards.

18	Status of this Memo

20	   This Internet-Draft is submitted in full conformance with the
21	   provisions of BCP 78 and BCP 79.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF).  Note that other groups may also distribute
25	   working documents as Internet-Drafts.  The list of current Internet-
26	   Drafts is at http://datatracker.ietf.org/drafts/current/.

28	   Internet-Drafts are draft documents valid for a maximum of six months
29	   and may be updated, replaced, or obsoleted by other documents at any
30	   time.  It is inappropriate to use Internet-Drafts as reference
31	   material or to cite them other than as "work in progress."

33	   This Internet-Draft will expire on December 3, 2010.

35	Copyright Notice

37	   Copyright (c) 2010 IETF Trust and the persons identified as the
38	   document authors.  All rights reserved.

40	   This document is subject to BCP 78 and the IETF Trust's Legal
41	   Provisions Relating to IETF Documents
42	   (http://trustee.ietf.org/license-info) in effect on the date of
43	   publication of this document.  Please review these documents
44	   carefully, as they describe your rights and restrictions with respect
45	   to this document.  Code Components extracted from this document must
46	   include Simplified BSD License text as described in Section 4.e of
47	   the Trust Legal Provisions and are provided without warranty as
48	   described in the Simplified BSD License.

50	Table of Contents

52	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
53	     1.1.  Conventions Used in This Document  . . . . . . . . . . . .  3
54	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	   3.  The Problem Statement  . . . . . . . . . . . . . . . . . . . .  5
56	     3.1.  Scope  . . . . . . . . . . . . . . . . . . . . . . . . . .  5
57	     3.2.  Lots of Long Lived State . . . . . . . . . . . . . . . . .  5
58	     3.3.  IKE Counters . . . . . . . . . . . . . . . . . . . . . . .  6
59	     3.4.  Outbound SA Counters . . . . . . . . . . . . . . . . . . .  6
60	     3.5.  Inbound SA Counters  . . . . . . . . . . . . . . . . . . .  7
61	     3.6.  Missing Synch Messages . . . . . . . . . . . . . . . . . .  7
62	     3.7.  Simultaneous use of IKE and IPsec SAs by Different
63	           Members  . . . . . . . . . . . . . . . . . . . . . . . . .  8
64	       3.7.1.  Outbound SAs using counter modes . . . . . . . . . . .  9
65	     3.8.  Different IP addresses for IKE and IPsec . . . . . . . . .  9
66	     3.9.  Allocation of SPIs . . . . . . . . . . . . . . . . . . . .  9
67	   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
68	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
69	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
70	   7.  Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 10
71	   8.  Informative References . . . . . . . . . . . . . . . . . . . . 11
72	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12

74	1.  Introduction

76	   IKEv2, as described in [RFC4306] and [IKEv2bis], and IPsec, as
77	   described in [RFC4301] and others, allows deployment of VPNs between
78	   different sites as well as from VPN clients to protected networks.

80	   As VPNs become increasingly important to the organizations deploying
81	   them, there is a demand to make IPsec solutions more scalable and
82	   less prone to down time, by using more than one physical gateway to
83	   either share the load or back each other up.  Similar demands have
84	   been made in the past for other critical pieces of an organizations's
85	   infrastructure, such as DHCP and DNS servers, web servers, databases
86	   and others.

88	   IKE and IPsec are in particular less friendly to clustering than
89	   these other protocols, because they store more state, and that state
90	   is more volatile.  Section 2 defines terminology for use in this
91	   document, and in the envisioned solution documents.

93	   In general, deploying IKE and IPsec in a cluster requires such a
94	   large amount of information to be synchronized among the members of
95	   the cluster, that it becomes impractical.  Alternatively, if less
96	   information is synchronized, failover would mean a prolonged and
97	   intensive recovery phase, which negates the scalability and
98	   availability promises of using clusters.  In Section 3 we will
99	   describe this in more detail.

101	1.1.  Conventions Used in This Document

103	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
104	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
105	   document are to be interpreted as described in [RFC2119].

107	2.  Terminology

109	   "Single Gateway" is an implementation of IKE and IPsec enforcing a
110	   certain policy, as described in [RFC4301].

112	   "Cluster" is a set of two or more gateways, implementing the same
113	   security policy, and protecting the same domain.  Clusters exist to
114	   provide both high availability through redundancy, and scalability
115	   through load sharing.

117	   "Member" is one gateway in a cluster.

119	   "High Availability" is a condition of a system, not a configuration
120	   type.  A system is said to have high availability if its expected
121	   down time is low.  High availability can be achieved in various ways,
122	   one of which is clustering.  All the clusters described in this
123	   document achieve high availability.

125	   "Fault Tolerance" is a condition related to high availability, where
126	   a system maintains service availability, even when a specified set of
127	   fault conditions occur.  In clusters, we expect the system to
128	   maintain service availability, when one or more of the cluster
129	   members fails.

131	   "Completely Transparent Cluster" is a cluster where the occurence of
132	   a fault is never visible to the peers.

134	   "Partially Transparent Cluster" is a cluster where the occurence of a
135	   fault may be visible to the peers.

137	   "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of
138	   the members is active at any one time.  This member is also referred
139	   to as the the "active", whereas the others are referred to as "stand-
140	   bys".  [VRRP] is one method of building such a cluster.

142	   "Load Sharing Cluster", or "LS Cluster" is a cluster where more than
143	   one of the members may be active at the same time.  The term "load
144	   balancing" is also common, but it implies that the load is actually
145	   balanced between the members, and we don't want to even imply that
146	   this is a requirement.

148	   "Failover" is the event where a one member takes over some load from
149	   some other member.  In a hot standby cluster, this hapens when a
150	   standby member becomes active due to a failure of the former active
151	   member, or because of an administrator command.  In a load sharing
152	   cluster this usually happens because of a failure of one of the
153	   members, but certain load-balancing technologies may allow a
154	   particular load (such as all the flows associated with a particular
155	   child SA) to move from one member to another to even out the load,
156	   even without any failures.

158	   "Tight Cluster" is a cluster where all the members share an IP
159	   address.  This could be accomplished using configured interfaces with
160	   specialized protocols or hardware, such as VRRP, or through the use
161	   of multicast addresses, but in any case, peers need only be
162	   configured with one IP address in the PAD.

164	   "Loose Cluster" is a cluster where each member has a different IP
165	   address.  Peers find the correct member using some method such as DNS
166	   queries or [REDIRECT].  In some cases, members IP addresses may be
167	   allocated to other members at failover.

169	   "Synch Channel" is a communications channel among the cluster
170	   members, used to transfer state information.  The synch channel may
171	   or may not be IP based, may or may not be encrypted, and may work
172	   over short or long distances.  The security and physical
173	   characteristics of this channel are out of scope for this document,
174	   but it is a requirement that its use be minimized for scalability.

176	3.  The Problem Statement

178	   This section starts by scoping the problem, and goes on to list each
179	   of the issues encountered while setting up a cluster of IPsec VPN
180	   gateways.

182	3.1.  Scope

184	   This document will make no attempt to describe the problems in
185	   setting up a generic cluster.  It describes only problems related to
186	   the IKE/IPsec protocols.

188	   The problem of synchronizing the policy between cluster members is
189	   out of scope, as this is an administrative issue that is not
190	   particular to either clusters or to IPsec.

192	   The interesting scenario here is VPN, whether tunneled site-to-site
193	   or remote access.  Host-to-host transport mode is not expected to
194	   benefit from this work.

196	   We do not describe in full the problems of the communication channel
197	   between cluster members (the Synch Channel), nor do we intend to
198	   specify anything in this space later.  Specifically, mixed-vendor
199	   clusters are out of scope.

201	   The problem statement anticipates possible protocol-level solutions
202	   between IKE/IPsec peers, in order to improve the availability and/or
203	   performance of VPN clusters.  One vendor's IPsec endpoint should be
204	   able to work, optimally, with another vendor's cluster.

206	3.2.  Lots of Long Lived State

208	   IKE and IPsec have a lot of long lived state:
209	   o  IKE SAs last for minutes, hours, or days, and carry keys and other
210	      information.  Some gateways may carry thousands to hundreds of
211	      thousands of IKE SAs.
212	   o  IPsec SAs last for minutes or hours, and carry keys, selectors and
213	      other information.  Some gateways may carry hundreds of thousands
214	      such IPsec SAs.

216	   o  SPD Cache entries.  While the SPD is unchanging, the SPD cache
217	      changes on the fly due to narrowing.  Entries last at least as
218	      long as the SAD entries, but tend to last even longer than that.

220	   A naive implementation of a high availability cluster would have no
221	   synchronized state, and a failover would produce an effect similar to
222	   that of a rebooted gateway. [resumption] describes how new IKE and
223	   IPsec SAs can be recreated in such a case.

225	3.3.  IKE Counters

227	   We can overcome the first problem described in Section 3.2, by
228	   synchronizing states - whenever an SA is created, we can synch this
229	   new state to all other members.  However, those states are not only
230	   long-lived, they are also ever changing.

232	   IKE has message counters.  A peer may not process message n until
233	   after it has processed message n-1.  Skipping message IDs is not
234	   allowed.  So a newly-active member needs to know the last message IDs
235	   both received and transmitted.

237	   Often, it is feasible to synchronize the IKE message counters for
238	   every IKE exchange.  This way, the newly active member knows what
239	   messages it is allowed to process, and what message IDs to use on IKE
240	   requests, so that peers process them.

242	3.4.  Outbound SA Counters

244	   ESP and AH have an optional anti-replay feature, where every
245	   protected packet carries a counter number.  Repeating counter numbers
246	   is considered an attack, so the newly-active member must not use a
247	   replay counter number that has already been used.  The peer will drop
248	   those packets as duplicates and/or warn of an attack.

250	   Though it may be feasible to synchronize the IKE message counters, it
251	   is almost never feasible to synchronize the IPsec packet counters for
252	   every IPsec packet transmitted.  So we have to assume that at least
253	   for IPsec, the replay counter will not be up-to-date on the newly-
254	   active member, and the newly-active member may repeat a counter.

256	   A possible solution is to synch replay counter information, not for
257	   each packet emitted, but only at regular intervals, say, every 10,000
258	   packets or every 0.5 seconds.  After a failover, the newly-active
259	   member advances the counters for outbound SAs by 10,000.  To the peer
260	   this looks like up to 10,000 packets were lost, but this should be
261	   acceptable, as neither ESP nor AH guarantee reliable delivery.

263	3.5.  Inbound SA Counters

265	   An even tougher issue, is the synchronization of packet counters for
266	   inbound SAs.  If a packet arrives at a newly-active member, there is
267	   no way to determine whether this packet is a replay or not.  The
268	   periodic synch does not solve the problem at all, because suppose we
269	   synchronize every 10,000 packets, and the last synch before the
270	   failover had the counter at 170,000.  It is probable, though not
271	   certain, that packet number 180,000 has not yet been processed, but
272	   if packet 175,000 arrives at the newly- active member, it has no way
273	   of determining whether or not that packet has or has not already been
274	   processed.  The synchronization does prevent the processing of really
275	   old packets, such as those with counter number 165,000.  Ignoring all
276	   counters below 180,000 won't work either, because that's up to 10,000
277	   dropped packets, which may be very noticeable.

279	   The easiest solution is to learn the replay counter from the incoming
280	   traffic.  This is allowed by the standards, because replay counter
281	   verification is an optional feature.  The case can even be made that
282	   it is relatively secure, because non-attack traffic will reset the
283	   counters to what they should be, so an attacker faces the dual
284	   challenge of a very narrow window for attack, and the need to time
285	   the attack to a failover event.  Unless the attacker can actually
286	   cause the failover, this would be very difficult.  It should be
287	   noted, though, that although this solution is acceptable as far as
288	   RFC 4301 goes, it is a matter of policy whether this is acceptable.

290	   Another possible solution to the inbound SA problem is to rekey all
291	   child SAs following a failover.  This may or may not be feasible
292	   depending on the implementation and the configuration.

294	3.6.  Missing Synch Messages

296	   The synch channel is very likely not to be infallible.  Before
297	   failover is detected, some synchronization messages may have been
298	   missed.  For example, the active member may have created a new Child
299	   SA using message n.  The new information (entry in the SAD and update
300	   to counters of the IKE SA) is sent on the synch channel.  Still, with
301	   every possible technology, the update may be missed before the
302	   failover.

304	   This is a bad situation, because the IKE SA is doomed. the newly-
305	   active member has two problems:
306	   o  It does not have the new IPsec SA pair.  It will drop all incoming
307	      packets protected with such an SA.  This could be fixed by sending
308	      some DELETEs and INVALID_SPI notifications, if it wasn't for the
309	      other problem...

311	   o  The counters for the IKE SA show that only request n-1 has been
312	      sent.  The next request will get the message ID n, but that will
313	      be rejected by the peer.  After a sufficient number of
314	      retransmissions and rejections, the whole IKE SA with all
315	      associated IPsec SAs will get dropped.

317	   The above scenario may be rare enough that it is acceptable that on a
318	   configuration with thousands of IKE SAs, a few will need to be
319	   recreated from scratch or using session resumption techniques.
320	   However, detecting this may take a long time (several minutes) and
321	   this negates the goal of creating a high availability cluster in the
322	   first place.

324	3.7.  Simultaneous use of IKE and IPsec SAs by Different Members

326	   For load sharing clusters, all active members may need to use the
327	   same SAs, both IKE and IPsec.  This is an even greater problem than
328	   in the case of HA, because consecutive packets may need to be sent by
329	   different members to the same peer gateway.

331	   The solution to the IKE SA issue is up to the application.  It's
332	   possible to create some locking mechanism over the synch channel, or
333	   else have one member "own" the IKE SA and manage the child SAs for
334	   all other members.  For IPsec, solutions fall into two broad
335	   categories.

337	   The first is the "sticky" category, where all communications with a
338	   single peer, or all communications involving a certain SPD cache
339	   entry go through a single peer.  In this case, all packets that match
340	   any particular SA go through the same member, so no synchronization
341	   of the replay counter needs to be done.  Inbound processing is a
342	   "sticky" issue, because the packets have to be processed by the
343	   correct member based on peer and SPI.  Another issue is that
344	   commodity load balancers will not be able to match the SPIs of the
345	   encrypted side to the clear traffic, and so the wrong member may get
346	   the the other half of the flow.

348	   The other way, is to duplicate the child SAs, and have a pair of
349	   IPsec SAs for each active member.  Different packets for the same
350	   peer go through different members, and get protected using different
351	   SAs with the same selectors and matching the same entries in the SPD
352	   cache.  This has some shortcomings:
353	   o  It requires multiple parallel SAs, which the peer has no use for.
354	      Section 2.8 or [RFC4306] specifically allows this, but some
355	      implementation might have a policy against long term maintenance
356	      of redundant SAs.

358	   o  Different packets that belong to the same flow may be protected by
359	      different SAs, which may seem "weird" to the peer gateway,
360	      especially if it is integrated with some deep inspection
361	      middleware such as a firewall.  It is not known whether this will
362	      cause problems with current gateways.  It is also impossible to
363	      mandate against this, because the definition of "flow" varies from
364	      one implementation to another.
365	   o  Reply packets may arrive with an IPsec SA that is not "matched" to
366	      the one used for the outgoing packets.  Also, they might arrive at
367	      a different member.  This problem is beyond the scope of this
368	      document and should be solved by the application, perhaps by
369	      forwarding misdirected packets to the correct gateway for deep
370	      inspection.

372	3.7.1.  Outbound SAs using counter modes

374	   For SAs involving counter mode ciphers such as [CTR] or [GCM] there
375	   is yet another complication.  The initial vector for such modes must
376	   never be repeated, and senders use methods such as counters or LFSRs
377	   to ensure this.  An SA shared between more than one active member, or
378	   even failing over from one member to another need to make sure that
379	   they do not generate the same initial vector.  See [COUNTER_MODES]
380	   for a discussion of this problem in another context.

382	3.8.  Different IP addresses for IKE and IPsec

384	   In many implementations there are separate IP addresses for the
385	   cluster, and for each member.  While the packets protected by tunnel
386	   mode child SAs are encapsulated in IP headers with the cluster IP
387	   address, the IKE packets originate from a specific member, and carry
388	   that member's IP address.  For the peer, this looks weird, as the
389	   usual thing is for the IPsec packets to come from the same IP address
390	   as the IKE packets.

392	   One obvious solution, is to use some fancy capability of the IKE host
393	   to change things so that IKE packets also come out of the cluster IP
394	   address.  This can be achieved through NAT or through assigning
395	   multiple addresses to interfaces.  This is not, however, possible for
396	   all implementations.

398	   [ARORA] discusses this problem in greater depth, and proposes another
399	   solution, that does involve protocol changes.

401	3.9.  Allocation of SPIs

403	   The SPI associated with each child SA, and with each IKE SA, MUST be
404	   unique relative to the peer of the SA.  Thus, in the context of a
405	   cluster, each cluster member MUST generate SPIs in a fashion that
406	   avoids collisions (with other cluster members) for these SPI values.
407	   The means by which cluster members achieve this requirement is a
408	   local matter, outside the scope of this document.

410	4.  Security Considerations

412	   Implementations running on clusters MUST be as secure as
413	   implementations running on single gateways.  In other words, no
414	   extension or interpretation used to allow operation in a cluster may
415	   facilitate attacks that are not possible for single gateways.

417	   Moreover, thought must be given to the synching requirements of any
418	   protocol extension, to make sure that it does not create an
419	   opportunity for denial of service attacks on the cluster.

421	   As mentioned in Section 3.5, allowing an inbound child SA to fail
422	   over to another member has the effect of disabling replay counter
423	   protection for a short time.  Though the threat is arguably low, it
424	   is a policy decision whether this is acceptable.

426	5.  IANA Considerations

428	   This document has no actions for IANA.

430	6.  Acknowledgements

432	   This document is the collective work, and includes contribution from
433	   many people who participate in the IPsecME working group.

435	   The editor would particularly like to acknowledge the extensive
436	   contribution of the following people (in alphabetical order):
437	   Jitender Arora, Jean-Michel Combes, Dan Harkins, Steve Kent, Tero
438	   Kivinen, Yaron Sheffer, Melinda Shore, and Rodney Van Meter.

440	7.  Change Log

442	   NOTE TO RFC EDITOR: REMOVE THIS SECTION BEFORE PUBLICATION

444	   Version 00 was identical to draft-nir-ipsecme-ipsecha-ps-00, re-spun
445	   as an WG document.

447	   Version 01 included closing issues 177, 178 and 180, with updates to
448	   terminology, and added discussion of inbound SAs and the CTR issue.

450	   Version 02 includes comments by Yaron Sheffer and the acknowledgement
451	   section.

453	   Version 03 fixes some ID-nits, and adds the problem presented by
454	   Jitender Arora in [ARORA].

456	   Version 04 fixes a spelling mistake, moves the scope discussion to a
457	   subsection of its own (Section 3.1), and adds a short discussion of
458	   the duplicate SPI problem, presented by Jean-Michel Combes.

460	8.  Informative References

462	   [ARORA]    Arora, J. and P. Kumar, "Alternate Tunnel Addresses for
463	              IKEv2", draft-arora-ipsecme-ikev2-alt-tunnel-addresses
464	              (work in progress), April 2010.

466	   [COUNTER_MODES]
467	              McGrew, D. and B. Weis, "Using Counter Modes with
468	              Encapsulating Security Payload (ESP) and Authentication
469	              Header (AH) to Protect Group Traffic",
470	              draft-ietf-msec-ipsec-group-counter-modes (work in
471	              progress), March 2010.

473	   [CTR]      Housley, R., "Using Advanced Encryption Standard (AES)
474	              Counter Mode", RFC 3686, January 2009.

476	   [GCM]      Viega, J. and D. McGrew, "The Use of Galois/Counter Mode
477	              (GCM) in IPsec Encapsulating Security Payload (ESP)",
478	              RFC 4106, June 2005.

480	   [IKEv2bis]
481	              Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen,
482	              "Internet Key Exchange Protocol: IKEv2",
483	              draft-ietf-ipsecme-ikev2bis (work in progress), May 2010.

485	   [REDIRECT]
486	              Devarapalli, V. and K. Weniger, "Redirect Mechanism for
487	              IKEv2", RFC 5685, November 2009.

489	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
490	              Requirement Levels", BCP 14, RFC 2119, March 1997.

492	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
493	              Internet Protocol", RFC 4301, December 2005.

495	   [RFC4306]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
496	              RFC 4306, December 2005.

498	   [VRRP]     Nadas, S., "Virtual Router Redundancy Protocol (VRRP)",
499	              RFC 5798, March 2010.

501	   [resumption]
502	              Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption",
503	              RFC 5723, January 2010.

505	Author's Address

507	   Yoav Nir
508	   Check Point Software Technologies Ltd.
509	   5 Hasolelim st.
510	   Tel Aviv  67897
511	   Israel

513	   Email: ynir@checkpoint.com