idnits 2.17.1 

draft-ietf-ipsecme-ipsec-ha-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 10, 2010) is 5068 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Obsolete informational reference (is this intentional?): RFC 4306
     (Obsoleted by RFC 5996)


     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                             Y. Nir
3	Internet-Draft                                               Check Point
4	Intended status: Informational                             June 10, 2010
5	Expires: December 12, 2010

7	                    IPsec Cluster Problem Statement
8	                     draft-ietf-ipsecme-ipsec-ha-06

10	Abstract

12	   This document defines terminology, problem statement and requirements
13	   for implementing IKE and IPsec on clusters.  It also describes gaps
14	   in existing standards and their implementation that need to be
15	   filled, in order to allow peers to interoperate with clusters from
16	   different vendors.  An agreed terminology, problem statement and
17	   requirements will allow the IPSECME WG to consider development of
18	   IPsec/IKEv2 mechanisms to simplify cluster implementations.

20	Status of this Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at http://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on December 12, 2010.

37	Copyright Notice

39	   Copyright (c) 2010 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (http://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of
49	   the Trust Legal Provisions and are provided without warranty as
50	   described in the Simplified BSD License.

52	Table of Contents

54	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	     1.1.  Conventions Used in This Document  . . . . . . . . . . . .  3
56	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
57	   3.  The Problem Statement  . . . . . . . . . . . . . . . . . . . .  5
58	     3.1.  Scope  . . . . . . . . . . . . . . . . . . . . . . . . . .  5
59	     3.2.  Lots of Long Lived State . . . . . . . . . . . . . . . . .  6
60	     3.3.  IKE Counters . . . . . . . . . . . . . . . . . . . . . . .  6
61	     3.4.  Outbound SA Counters . . . . . . . . . . . . . . . . . . .  6
62	     3.5.  Inbound SA Counters  . . . . . . . . . . . . . . . . . . .  7
63	     3.6.  Missing Synch Messages . . . . . . . . . . . . . . . . . .  8
64	     3.7.  Simultaneous use of IKE and IPsec SAs by Different
65	           Members  . . . . . . . . . . . . . . . . . . . . . . . . .  8
66	       3.7.1.  Outbound SAs using counter modes . . . . . . . . . . .  9
67	     3.8.  Different IP addresses for IKE and IPsec . . . . . . . . .  9
68	     3.9.  Allocation of SPIs . . . . . . . . . . . . . . . . . . . . 10
69	   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
70	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
71	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
72	   7.  Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 11
73	   8.  Informative References . . . . . . . . . . . . . . . . . . . . 11
74	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12

76	1.  Introduction

78	   IKEv2, as described in [RFC4306] and [IKEv2bis], and IPsec, as
79	   described in [RFC4301] and others, allows deployment of VPNs between
80	   different sites as well as from VPN clients to protected networks.

82	   As VPNs become increasingly important to the organizations deploying
83	   them, there is a demand to make IPsec solutions more scalable and
84	   less prone to down time, by using more than one physical gateway to
85	   either share the load or back each other up, forming a "cluster" (see
86	   Section 2).  Similar demands have been made in the past for other
87	   critical pieces of an organization's infrastructure, such as DHCP and
88	   DNS servers, web servers, databases and others.

90	   IKE and IPsec are in particular less friendly to clustering than
91	   these other protocols, because they store more state, and that state
92	   is more volatile.  Section 2 defines terminology for use in this
93	   document, and in the envisioned solution documents.

95	   In general, deploying IKE and IPsec in a cluster requires such a
96	   large amount of information to be synchronized among the members of
97	   the cluster, that it becomes impractical.  Alternatively, if less
98	   information is synchronized, failover would mean a prolonged and
99	   intensive recovery phase, which negates the scalability and
100	   availability promises of using clusters.  In Section 3 we will
101	   describe this in more detail.

103	1.1.  Conventions Used in This Document

105	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
106	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
107	   document are to be interpreted as described in [RFC2119].

109	2.  Terminology

111	   "Single Gateway" is an implementation of IKE and IPsec enforcing a
112	   certain policy, as described in [RFC4301].

114	   "Cluster" is a set of two or more gateways, implementing the same
115	   security policy, and protecting the same domain.  Clusters exist to
116	   provide both high availability through redundancy, and scalability
117	   through load sharing.

119	   "Member" is one gateway in a cluster.

121	   "Availability" is a measure of a system's ability to perform the
122	   service for which it was designed.  It is measured as the percentage
123	   of time a service is available, from the time it is supposed to be
124	   available.  Colloquially, availability is sometimes expressed in
125	   "nines" rather than percentage, with 3 "nines" meaning 99.9%
126	   availability, 4 "nines" meaning 99.99% availability, etc.

128	   "High Availability" is a condition of a system, not a configuration
129	   type.  A system is said to have high availability if its expected
130	   down time is low.  High availability can be achieved in various ways,
131	   one of which is clustering.  All the clusters described in this
132	   document achieve high availability.  What "high" means depends on
133	   application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of
134	   down time per year in a system that is supposed to be available all
135	   the time.

137	   "Fault Tolerance" is a condition related to high availability, where
138	   a system maintains service availability, even when a specified set of
139	   fault conditions occur.  In clusters, we expect the system to
140	   maintain service availability, when one or more of the cluster
141	   members fails.

143	   "Completely Transparent Cluster" is a cluster where the occurence of
144	   a fault is never visible to the peers.

146	   "Partially Transparent Cluster" is a cluster where the occurence of a
147	   fault may be visible to the peers.

149	   "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of
150	   the members is active at any one time.  This member is also referred
151	   to as the the "active", whereas the other(s) are referred to as
152	   "stand-bys".  [VRRP] is one method of building such a cluster.

154	   "Load Sharing Cluster", or "LS Cluster" is a cluster where more than
155	   one of the members may be active at the same time.  The term "load
156	   balancing" is also common, but it implies that the load is actually
157	   balanced between the members, and this is not a requirement.

159	   "Failover" is the event where a one member takes over some load from
160	   some other member.  In a hot standby cluster, this hapens when a
161	   standby member becomes active due to a failure of the former active
162	   member, or because of an administrator command.  In a load sharing
163	   cluster, this usually happens because of a failure of one of the
164	   members, but certain load-balancing technologies may allow a
165	   particular load (such as all the flows associated with a particular
166	   child SA) to move from one member to another to even out the load,
167	   even without any failures.

169	   "Tight Cluster" is a cluster where all the members share an IP
170	   address.  This could be accomplished using configured interfaces with
171	   specialized protocols or hardware, such as VRRP, or through the use
172	   of multicast addresses, but in any case, peers need only be
173	   configured with one IP address in the PAD.

175	   "Loose Cluster" is a cluster where each member has a different IP
176	   address.  Peers find the correct member using some method such as DNS
177	   queries or [REDIRECT].  In some cases, a member's IP address(es) may
178	   be allocated to another member at failover.

180	   "Synch Channel" is a communications channel among the cluster
181	   members, used to transfer state information.  The synch channel may
182	   or may not be IP based, may or may not be encrypted, and may work
183	   over short or long distances.  The security and physical
184	   characteristics of this channel are out of scope for this document,
185	   but it is a requirement that its use be minimized for scalability.

187	3.  The Problem Statement

189	   This section starts by scoping the problem, and goes on to list each
190	   of the issues encountered while setting up a cluster of IPsec VPN
191	   gateways.

193	3.1.  Scope

195	   This document will make no attempt to describe the problems in
196	   setting up a generic cluster.  It describes only problems related to
197	   the IKE/IPsec protocols.

199	   The problem of synchronizing the policy between cluster members is
200	   out of scope, as this is an administrative issue that is not
201	   particular to either clusters or to IPsec.

203	   The interesting scenario here is VPN, whether tunneled site-to-site
204	   or remote access.  Host-to-host transport mode is not expected to
205	   benefit from this work.

207	   We do not describe in full the problems of the communication channel
208	   between cluster members (the Synch Channel), nor do we intend to
209	   specify anything in this space later.  Specifically, mixed-vendor
210	   clusters are out of scope.

212	   The problem statement anticipates possible protocol-level solutions
213	   between IKE/IPsec peers, in order to improve the availability and/or
214	   performance of VPN clusters.  One vendor's IPsec endpoint should be
215	   able to work, optimally, with another vendor's cluster.

217	3.2.  Lots of Long Lived State

219	   IKE and IPsec have a lot of long lived state:
220	   o  IKE SAs last for minutes, hours, or days, and carry keys and other
221	      information.  Some gateways may carry thousands to hundreds of
222	      thousands of IKE SAs.
223	   o  IPsec SAs last for minutes or hours, and carry keys, selectors and
224	      other information.  Some gateways may carry hundreds of thousands
225	      such IPsec SAs.
226	   o  SPD (Security Policy Database) Cache entries.  While the SPD is
227	      unchanging, the SPD cache changes on the fly due to narrowing.
228	      Entries last at least as long as the SAD (Security Association
229	      Database) entries, but tend to last even longer than that.

231	   A naive implementation of a cluster would have no synchronized state,
232	   and a failover would produce an effect similar to that of a rebooted
233	   gateway. [resumption] describes how new IKE and IPsec SAs can be
234	   recreated in such a case.

236	3.3.  IKE Counters

238	   We can overcome the first problem described in Section 3.2, by
239	   synchronizing states - whenever an SA is created, we can synch this
240	   new state to all other members.  However, those states are not only
241	   long-lived, they are also ever changing.

243	   IKE has message counters.  A peer MUST NOT process message n until
244	   after it has processed message n-1.  Skipping message IDs is not
245	   allowed.  So a newly-active member needs to know the last message IDs
246	   both received and transmitted.

248	   One possible solution, is to synchronize information about the IKE
249	   message counters after every IKE exchange.  This way, the newly
250	   active member knows what messages it is allowed to process, and what
251	   message IDs to use on IKE requests, so that peers process them.  This
252	   solution may be appropriate in some cases, but may be too onerous in
253	   systems with lots of SAs.  It also has the drawback, that it never
254	   recovers from the missing synch message problem, which is described
255	   in Section 3.6.

257	3.4.  Outbound SA Counters

259	   ESP and AH have an optional anti-replay feature, where every
260	   protected packet carries a counter number.  Repeating counter numbers
261	   is considered an attack, so the newly-active member MUST NOT use a
262	   replay counter number that has already been used.  The peer will drop
263	   those packets as duplicates and/or warn of an attack.

265	   Though it may be feasible to synchronize the IKE message counters, it
266	   is almost never feasible to synchronize the IPsec packet counters for
267	   every IPsec packet transmitted.  So we have to assume that at least
268	   for IPsec, the replay counter will not be up-to-date on the newly-
269	   active member, and the newly-active member may repeat a counter.

271	   A possible solution is to synch replay counter information, not for
272	   each packet emitted, but only at regular intervals, say, every 10,000
273	   packets or every 0.5 seconds.  After a failover, the newly-active
274	   member advances the counters for outbound IPsec SAs by 10,000.  To
275	   the peer this looks like up to 10,000 packets were lost, but this
276	   should be acceptable, as neither ESP nor AH guarantee reliable
277	   delivery.

279	3.5.  Inbound SA Counters

281	   An even tougher issue, is the synchronization of packet counters for
282	   inbound IPsec SAs.  If a packet arrives at a newly-active member,
283	   there is no way to determine whether this packet is a replay or not.
284	   The periodic synch does not solve the problem at all, because suppose
285	   we synchronize every 10,000 packets, and the last synch before the
286	   failover had the counter at 170,000.  It is probable, though not
287	   certain, that packet number 180,000 has not yet been processed, but
288	   if packet 175,000 arrives at the newly- active member, it has no way
289	   of determining whether or not that packet has or has not already been
290	   processed.  The synchronization does prevent the processing of really
291	   old packets, such as those with counter number 165,000.  Ignoring all
292	   counters below 180,000 won't work either, because that's up to 10,000
293	   dropped packets, which may be very noticeable.

295	   The easiest solution is to learn the replay counter from the incoming
296	   traffic.  This is allowed by the standards, because replay counter
297	   verification is an optional feature (see section 3.2 in [RFC4301]).
298	   The case can even be made that it is relatively secure, because non-
299	   attack traffic will reset the counters to what they should be, so an
300	   attacker faces the dual challenge of a very narrow window for attack,
301	   and the need to time the attack to a failover event.  Unless the
302	   attacker can actually cause the failover, this would be very
303	   difficult.  It should be noted, though, that although this solution
304	   is acceptable as far as RFC 4301 goes, it is a matter of policy
305	   whether this is acceptable.

307	   Another possible solution to the inbound IPsec SA problem is to rekey
308	   all child SAs following a failover.  This may or may not be feasible
309	   depending on the implementation and the configuration.

311	3.6.  Missing Synch Messages

313	   The synch channel is very likely not to be infallible.  Before
314	   failover is detected, some synchronization messages may have been
315	   missed.  For example, the active member may have created a new Child
316	   SA using message n.  The new information (entry in the SAD and update
317	   to counters of the IKE SA) is sent on the synch channel.  Still, with
318	   every possible technology, the update may be missed before the
319	   failover.

321	   This is a bad situation, because the IKE SA is doomed. the newly-
322	   active member has two problems:
323	   o  It does not have the new IPsec SA pair.  It will drop all incoming
324	      packets protected with such an SA.  This could be fixed by sending
325	      some DELETEs and INVALID_SPI notifications, if it wasn't for the
326	      other problem...
327	   o  The counters for the IKE SA show that only request n-1 has been
328	      sent.  The next request will get the message ID n, but that will
329	      be rejected by the peer.  After a sufficient number of
330	      retransmissions and rejections, the whole IKE SA with all
331	      associated IPsec SAs will get dropped.

333	   The above scenario may be rare enough that it is acceptable that on a
334	   configuration with thousands of IKE SAs, a few will need to be
335	   recreated from scratch or using session resumption techniques.
336	   However, detecting this may take a long time (several minutes) and
337	   this negates the goal of creating a cluster in the first place.

339	3.7.  Simultaneous use of IKE and IPsec SAs by Different Members

341	   For LS clusters, all active members may need to use the same SAs,
342	   both IKE and IPsec.  This is an even greater problem than in the case
343	   of HS clusters, because consecutive packets may need to be sent by
344	   different members to the same peer gateway.

346	   The solution to the IKE SA issue is up to the application.  It's
347	   possible to create some locking mechanism over the synch channel, or
348	   else have one member "own" the IKE SA and manage the child SAs for
349	   all other members.  For IPsec, solutions fall into two broad
350	   categories.

352	   The first is the "sticky" category, where all communications with a
353	   single peer, or all communications involving a certain SPD cache
354	   entry go through a single peer.  In this case, all packets that match
355	   any particular SA go through the same member, so no synchronization
356	   of the replay counter needs to be done.  Inbound processing is a
357	   "sticky" issue, because the packets have to be processed by the
358	   correct member based on peer and SPI.  Another issue is that most
359	   load balancers will not be able to match the SPIs of the encrypted
360	   side to the clear traffic, and so the wrong member may get the the
361	   other half of the flow.

363	   The second is the "duplicate" category, where the child SA is
364	   duplicated for each pair of IPsec SAs for each active member.
365	   Different packets for the same peer go through different members, and
366	   get protected using different SAs with the same selectors and
367	   matching the same entries in the SPD cache.  This has some
368	   shortcomings:
369	   o  It requires multiple parallel SAs, which the peer has no use for.
370	      Section 2.8 or [RFC4306] specifically allows this, but some
371	      implementation might have a policy against long term maintenance
372	      of redundant SAs.
373	   o  Different packets that belong to the same flow may be protected by
374	      different SAs, which may seem "weird" to the peer gateway,
375	      especially if it is integrated with some deep inspection
376	      middleware such as a firewall.  It is not known whether this will
377	      cause problems with current gateways.  It is also impossible to
378	      mandate against this, because the definition of "flow" varies from
379	      one implementation to another.
380	   o  Reply packets may arrive with an IPsec SA that is not "matched" to
381	      the one used for the outgoing packets.  Also, they might arrive at
382	      a different member.  This problem is beyond the scope of this
383	      document and should be solved by the application, perhaps by
384	      forwarding misdirected packets to the correct gateway for deep
385	      inspection.

387	3.7.1.  Outbound SAs using counter modes

389	   For SAs involving counter mode ciphers such as [CTR] or [GCM] there
390	   is yet another complication.  The initial vector for such modes MUST
391	   NOT be repeated, and senders use methods such as counters or LFSRs to
392	   ensure this.  An SA shared between more than one active member, or
393	   even failing over from one member to another need to make sure that
394	   they do not generate the same initial vector.  See [COUNTER_MODES]
395	   for a discussion of this problem in another context.

397	3.8.  Different IP addresses for IKE and IPsec

399	   In many implementations there are separate IP addresses for the
400	   cluster, and for each member.  While the packets protected by tunnel
401	   mode child SAs are encapsulated in IP headers with the cluster IP
402	   address, the IKE packets originate from a specific member, and carry
403	   that member's IP address.  For the peer, this looks weird, as the
404	   usual thing is for the IPsec packets to come from the same IP address
405	   as the IKE packets.

407	   One obvious solution, is to use some fancy capability of the IKE host
408	   to change things so that IKE packets also come out of the cluster IP
409	   address.  This can be achieved through NAT or through assigning
410	   multiple addresses to interfaces.  This is not, however, possible for
411	   all implementations.

413	   [ARORA] discusses this problem in greater depth, and proposes another
414	   solution, that does involve protocol changes.

416	3.9.  Allocation of SPIs

418	   The SPI associated with each child SA, and with each IKE SA, MUST be
419	   unique relative to the peer of the SA.  Thus, in the context of a
420	   cluster, each cluster member MUST generate SPIs in a fashion that
421	   avoids collisions (with other cluster members) for these SPI values.
422	   The means by which cluster members achieve this requirement is a
423	   local matter, outside the scope of this document.

425	4.  Security Considerations

427	   Implementations running on clusters MUST be as secure as
428	   implementations running on single gateways.  In other words, no
429	   extension or interpretation used to allow operation in a cluster may
430	   facilitate attacks that are not possible for single gateways.

432	   Moreover, thought must be given to the synching requirements of any
433	   protocol extension, to make sure that it does not create an
434	   opportunity for denial of service attacks on the cluster.

436	   As mentioned in Section 3.5, allowing an inbound child SA to fail
437	   over to another member has the effect of disabling replay counter
438	   protection for a short time.  Though the threat is arguably low, it
439	   is a policy decision whether this is acceptable.

441	5.  IANA Considerations

443	   This document has no actions for IANA.

445	6.  Acknowledgements

447	   This document is the collective work, and includes contribution from
448	   many people who participate in the IPsecME working group.

450	   The editor would particularly like to acknowledge the extensive
451	   contribution of the following people (in alphabetical order):

453	   Jitender Arora, Jean-Michel Combes, Dan Harkins, Steve Kent, Tero
454	   Kivinen, Yaron Sheffer, Melinda Shore, and Rodney Van Meter.

456	7.  Change Log

458	   NOTE TO RFC EDITOR: REMOVE THIS SECTION BEFORE PUBLICATION

460	   Version 00 was identical to draft-nir-ipsecme-ipsecha-ps-00, re-spun
461	   as an WG document.

463	   Version 01 included closing issues 177, 178 and 180, with updates to
464	   terminology, and added discussion of inbound SAs and the CTR issue.

466	   Version 02 includes comments by Yaron Sheffer and the acknowledgement
467	   section.

469	   Version 03 fixes some ID-nits, and adds the problem presented by
470	   Jitender Arora in [ARORA].

472	   Version 04 fixes a spelling mistake, moves the scope discussion to a
473	   subsection of its own (Section 3.1), and adds a short discussion of
474	   the duplicate SPI problem, presented by Jean-Michel Combes.

476	8.  Informative References

478	   [ARORA]    Arora, J. and P. Kumar, "Alternate Tunnel Addresses for
479	              IKEv2", draft-arora-ipsecme-ikev2-alt-tunnel-addresses
480	              (work in progress), April 2010.

482	   [COUNTER_MODES]
483	              McGrew, D. and B. Weis, "Using Counter Modes with
484	              Encapsulating Security Payload (ESP) and Authentication
485	              Header (AH) to Protect Group Traffic",
486	              draft-ietf-msec-ipsec-group-counter-modes (work in
487	              progress), March 2010.

489	   [CTR]      Housley, R., "Using Advanced Encryption Standard (AES)
490	              Counter Mode", RFC 3686, January 2009.

492	   [GCM]      Viega, J. and D. McGrew, "The Use of Galois/Counter Mode
493	              (GCM) in IPsec Encapsulating Security Payload (ESP)",
494	              RFC 4106, June 2005.

496	   [IKEv2bis]
497	              Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen,
498	              "Internet Key Exchange Protocol: IKEv2",
499	              draft-ietf-ipsecme-ikev2bis (work in progress), May 2010.

501	   [REDIRECT]
502	              Devarapalli, V. and K. Weniger, "Redirect Mechanism for
503	              IKEv2", RFC 5685, November 2009.

505	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
506	              Requirement Levels", BCP 14, RFC 2119, March 1997.

508	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
509	              Internet Protocol", RFC 4301, December 2005.

511	   [RFC4306]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
512	              RFC 4306, December 2005.

514	   [VRRP]     Nadas, S., "Virtual Router Redundancy Protocol (VRRP)",
515	              RFC 5798, March 2010.

517	   [resumption]
518	              Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption",
519	              RFC 5723, January 2010.

521	Author's Address

523	   Yoav Nir
524	   Check Point Software Technologies Ltd.
525	   5 Hasolelim st.
526	   Tel Aviv  67897
527	   Israel

529	   Email: ynir@checkpoint.com