idnits 2.17.1 

draft-ietf-ipsecme-ipsec-ha-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (April 14, 2010) is 5118 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Obsolete informational reference (is this intentional?): RFC 4306
     (Obsoleted by RFC 5996)

  -- Obsolete informational reference (is this intentional?): RFC 4718
     (Obsoleted by RFC 5996)

  -- Obsolete informational reference (is this intentional?): RFC 3768 (ref.
     'VRRP') (Obsoleted by RFC 5798)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                             Y. Nir
3	Internet-Draft                                               Check Point
4	Intended status: Informational                            April 14, 2010
5	Expires: October 16, 2010

7	       IPsec High Availability and Load Sharing Problem Statement
8	                     draft-ietf-ipsecme-ipsec-ha-01

10	Abstract

12	   This document describes a requirement from IKE and IPsec to allow for
13	   more scalable and available deployments for VPNs.  It defines
14	   terminology for high availability and load sharing clusters
15	   implementing IKE and IPsec, and describes gaps in the existing
16	   standards.

18	Status of this Memo

20	   This Internet-Draft is submitted to IETF in full conformance with the
21	   provisions of BCP 78 and BCP 79.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF), its areas, and its working groups.  Note that
25	   other groups may also distribute working documents as Internet-
26	   Drafts.

28	   Internet-Drafts are draft documents valid for a maximum of six months
29	   and may be updated, replaced, or obsoleted by other documents at any
30	   time.  It is inappropriate to use Internet-Drafts as reference
31	   material or to cite them other than as "work in progress."

33	   The list of current Internet-Drafts can be accessed at
34	   http://www.ietf.org/ietf/1id-abstracts.txt.

36	   The list of Internet-Draft Shadow Directories can be accessed at
37	   http://www.ietf.org/shadow.html.

39	   This Internet-Draft will expire on October 16, 2010.

41	Copyright Notice

43	   Copyright (c) 2010 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the BSD License.

56	   This document may contain material from IETF Documents or IETF
57	   Contributions published or made publicly available before November
58	   10, 2008.  The person(s) controlling the copyright in some of this
59	   material may not have granted the IETF Trust the right to allow
60	   modifications of such material outside the IETF Standards Process.
61	   Without obtaining an adequate license from the person(s) controlling
62	   the copyright in such materials, this document may not be modified
63	   outside the IETF Standards Process, and derivative works of it may
64	   not be created outside the IETF Standards Process, except to format
65	   it for publication as an RFC or to translate it into languages other
66	   than English.

68	Table of Contents

70	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
71	     1.1.  Conventions Used in This Document  . . . . . . . . . . . .  4
72	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
73	   3.  The Problem Statement  . . . . . . . . . . . . . . . . . . . .  6
74	     3.1.  Lots of Long Lived State . . . . . . . . . . . . . . . . .  6
75	     3.2.  IKE Counters . . . . . . . . . . . . . . . . . . . . . . .  6
76	     3.3.  Outbound SA Counters . . . . . . . . . . . . . . . . . . .  7
77	     3.4.  Inbound SA Counters  . . . . . . . . . . . . . . . . . . .  7
78	     3.5.  Missing Synch Messages . . . . . . . . . . . . . . . . . .  8
79	     3.6.  Simultaneous use of IKE and IPsec SAs by Different
80	           Members  . . . . . . . . . . . . . . . . . . . . . . . . .  8
81	       3.6.1.  Outbound SAs using counter modes . . . . . . . . . . .  9
82	   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
83	   5.  Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 10
84	   6.  Informative References . . . . . . . . . . . . . . . . . . . . 10
85	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11

87	1.  Introduction

89	   IKEv2, as described in [RFC4306] and [RFC4718], and IPsec, as
90	   described in [RFC4301] and others, allows deployment of VPNs between
91	   different sites as well as from VPN clients to protected networks.

93	   As VPNs become increasingly important to the organizations deploying
94	   them, there is a demand to make IPsec solutions more scalable and
95	   less prone to down time, by using more than one physical gateway to
96	   either share the load or back each other up.  Similar demands have
97	   been made in the past for other critical pieces of an organizations's
98	   infrastructure, such as DHCP and DNS servers, web servers, databases
99	   and others.

101	   IKE and IPsec are in particular less friendly to clustering than
102	   these other protocols, because they store more state, and that state
103	   is more volatile.  Section 2 defines terminology for use in this
104	   document, and in the envisioned solution documents.

106	   In general, deploying IKE and IPsec in a cluster requires such a
107	   large amount of information to be synchronized among the members of
108	   the cluster, that it becomes impractical.  Alternatively, if less
109	   information is synchronized, failover would mean a prolonged and
110	   intensive recovery phase, which negates the scalability and
111	   availability promises of using clusters.  In Section 3 we will
112	   describe this in more detail.

114	1.1.  Conventions Used in This Document

116	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
117	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
118	   document are to be interpreted as described in [RFC2119].

120	2.  Terminology

122	   "Single Gateway" is an implementation of IKE and IPsec enforcing a
123	   certain policy, as described in [RFC4301].

125	   "Cluster" is a set of two or more gateways, implementing the same
126	   security policy, and protecting the same domain.  Clusters exist to
127	   provide both high availability through redundancy, and scalability
128	   through load sharing.

130	   "Member" is one gateway in a cluster.

132	   "High Availability" is a condition of a system, not a configuration
133	   type.  A system is said to have high availability if its expected
134	   down time is low.  High availability can be achieved in various ways,
135	   one of which is clustering.  All the clusters described in this
136	   document achieve high availability.

138	   "Fault Tolerance" is a condition related to high availability, where
139	   a system maintains service availability, even when a specified set of
140	   fault conditions occur.  In clusters, we expect the system to
141	   maintain service availability, when one or more of the cluster
142	   members fails.

144	   "Completely Transparent Cluster" is a cluster where the occurence of
145	   a fault is never visible to the peers.

147	   "Partially Transparent Cluster" is a cluster where the occurence of a
148	   fault may be visible to the peers.

150	   "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of
151	   the members is active at any one time.  This member is also referred
152	   to as the the "active", whereas the others are referred to as "stand-
153	   bys".  [VRRP] is one method of building such a cluster.

155	   "Load Sharing Cluster", or "LS Cluster" is a cluster where more than
156	   one of the members may be active at the same time.  The term "load
157	   balancing" is also common, but it implies that the load is actually
158	   balanced between the members, and we don't want to even imply that
159	   this is a requirement.

161	   "Failover" is the event where a one member takes over some load from
162	   some other member.  In a hot standby cluster, this hapens when a
163	   standby memeber becomes active due to a failure of the former active
164	   member, or because of an administrator command.  In a load sharing
165	   cluster this usually happens because of a failure of one of the
166	   members, but certain load-balancing technologies may allow a
167	   particular load (an SA) to move from one member to another to even
168	   out the load, even without any failures.

170	   "Tight Cluster" is a cluster where all the members share an IP
171	   address.  This could be accomplished using configured interfaces with
172	   specialized protocols or hardware, such as VRRP, or through the use
173	   of multicast addresses, but in any case, peers need only be
174	   configured with one IP address in the PAD.

176	   "Loose Cluster" is a cluster where each member has a different IP
177	   address.  Peers find the correct member using some method such as DNS
178	   queries or [REDIRECT].

180	   "Synch Channel" is a communications channel among the cluster
181	   members, used to transfer state information.  The synch channel may
182	   or may not be IP based, may or may not be encrypted, and may work
183	   over short or long distances.  The security and physical
184	   characteristics of this channel are out of scope for this document,
185	   but it is a requirement that its use be minimized for scalability.

187	3.  The Problem Statement

189	   This document will make no attempt to describe the problems in
190	   setting up a cluster.  The following subsections describe the
191	   problems related to the protocol itself.

193	   We also ignore the problem of synchronizing the policy between
194	   cluster members, as this is an administrative issue that is not
195	   particular to either clusters or to IPsec.

197	   Note that the interesting scenario here is VPN, whether tunneled
198	   site-to-site or remote access. host-to-host transport mode is not
199	   expected to benefit from this work.

201	3.1.  Lots of Long Lived State

203	   IKE and IPsec have a lot of long lived state:
204	   o  IKE SAs last for minutes, hours, or days, and carry keys and other
205	      information.  Some gateways may carry thousands to hundreds of
206	      thousands of IKE SAs.
207	   o  IPsec SAs last for minutes or hours, and carry keys, selectors and
208	      other information.  Some gateways may carry hundreds of thousands
209	      such IPsec SAs.
210	   o  SPD Cache entries.  While the SPD is unchanging, the SPD cache
211	      changes on the fly due to narrowing.  Entries last at least as
212	      long as the SAD entries, but tend to last even longer than that.

214	   A naive implementation of a high availability cluster would have no
215	   synchronized state, and a failover would produce an effect similar to
216	   that of a rebooted gateway. [resumption] describes how new IKE and
217	   IPsec SAs can be recreated in such a case.

219	3.2.  IKE Counters

221	   We can overcome the first problem described in Section 3.1, by
222	   synchronizing states - whenever an SA is created, we can synch this
223	   new state to all other members.  However, those states are not only
224	   long-lived, they are also ever changing.

226	   IKE has message counters.  A peer may not process message n until
227	   after it has processed message n-1.  Skipping message IDs is not
228	   allowed.  So a newly-active member needs to know the last message IDs
229	   both received and transmitted.

231	   Often, it is feasible to synchronize the IKE message counters for
232	   every IKE exchange.  This way, the newly active member knows what
233	   messages it is allowed to process, and what message IDs to use on IKE
234	   requests, so that peers process them.

236	3.3.  Outbound SA Counters

238	   ESP and AH have an optional anti-replay feature, where every
239	   protected packet carries a counter number.  Repeating counter numbers
240	   is considered an attack, so the newly-active member must not use a
241	   replay counter number that has already been used.  The peer will drop
242	   those packets as duplicates and/or warn of an attack.

244	   Though it may be feasible to synchronize the IKE message counters, it
245	   is almost never feasible to synchronize the IPsec packet counters for
246	   every IPsec packet transmitted.  So we have to assume that at least
247	   for IPsec, the replay counter will not be up-to-date on the newly-
248	   active member, and the newly-active member may repeat a counter.

250	   A possible solution is to synch replay counter information, not for
251	   each packet emitted, but only at regular intervals, say, every 10,000
252	   packets or every 0.5 seconds.  After a failover, the newly-active
253	   member advances the counters for outbound SAs by 10,000.  To the peer
254	   this looks like up to 10,000 packets were lost, but this should be
255	   acceptable, as neither ESP nor AH guarantee reliable delivery.

257	3.4.  Inbound SA Counters

259	   An even tougher issue, is the synchronization of packet counters for
260	   inbound SAs.  If a packet arrives at a newly-active member, there is
261	   no way to determine whether this packet is a replay or not.  The
262	   periodic synch does not solve the problem at all, because suppose we
263	   synchronize every 10,000 packets, and the last synch before the
264	   failover had the counter at 170,000.  It is probable, though not
265	   certain, that packet number 180,000 has not yet been processed, but
266	   if packet 175,000 arrives at the newly- active member, it has no way
267	   of determining whether or not that packet has or has not already been
268	   processed.  The synchronization does prevent the processing of really
269	   old packets, such as those with counter number 165,000.  Ignoring all
270	   counters below 180,000 won't work either, because that's up to 10,000
271	   dropped packets, which may be very noticeable.

273	   The easiest solution is to learn the replay counter from the incoming
274	   traffic.  This is allowed by the standards, because replay counter
275	   verification is an optional feature.  The case can even be made that
276	   it is relatively secure, because non-attack traffic will reset the
277	   counters to what they should be, so an attacker faces the dual
278	   challenge of a very narrow window for attack, and the need to time
279	   the attack to a failover event.  Unless the attacker can actually
280	   cause the failover, this would be very difficult.  It should be
281	   noted, though, that although this solution is acceptable as far as
282	   RFC 4301 goes, it is a matter of policy whether this is acceptable.

284	   Another possible solution to the inbound SA problem is to rekey all
285	   child SAs following a failover.  This may or may not be feasible
286	   depending on the implementation and the configuration.

288	3.5.  Missing Synch Messages

290	   The synch channel is very likely not to be infallible.  Before
291	   failover is detected, some synchronization messages may have been
292	   missed.  For example, the active member may have created a new Child
293	   SA using message n.  The new information (entry in the SAD and update
294	   to counters of the IKE SA) is sent on the synch channel.  Still, with
295	   every possible technology, the update may be missed before the
296	   failover.

298	   This is a bad situation, because the IKE SA is doomed. the newly-
299	   active member has two problems:
300	   o  It does not have the new IPsec SA pair.  It will drop all incoming
301	      packets protected with such an SA.  This could be fixed by sending
302	      some DELETEs and INVALID_SPI notifications, if it wasn't for the
303	      other problem...
304	   o  The counters for the IKE SA show that only request n-1 has been
305	      sent.  The next request will get the message ID n, but that will
306	      be rejected by the peer.  After a sufficient number of
307	      retransmissions and rejections, the whole IKE SA with all
308	      associated IPsec SAs will get dropped.

310	   The above scenario may be rare enough that it is acceptable that on a
311	   configuration with thousands of IKE SAs, a few will need to be
312	   recreated from scratch or using session resumption techniques.
313	   However, detecting this may take a long time (several minutes) and
314	   this negates the goal of creating a high availability cluster in the
315	   first place.

317	3.6.  Simultaneous use of IKE and IPsec SAs by Different Members

319	   For load sharing clusters, all active members may need to use the
320	   same SAs, both IKE and IPsec.  This is an even greater problem than
321	   in the case of HA, because consecutive packets may need to be sent by
322	   different members to the same peer gateway.

324	   The solution to the IKE SA issue is up to the application.  It's
325	   possible to create some locking mechanism over the synch channel, or
326	   else have one member "own" the IKE SA and manage the child SAs for
327	   all other members.  For IPsec, solutions fall into two broad
328	   categories.

330	   The first is the "sticky" category, where all communications with a
331	   single peer, or all communications involving a certain SPD cache
332	   entry go through a single peer.  In this case, all packets that match
333	   any particular SA go through the same member, so no synchronization
334	   of the replay counter needs to be done.  Inbound processing is a
335	   "sticky" issue, because the packets have to be processed by the
336	   correct member based on peer and SPI.  Another issue is that
337	   commodity load balancers will not be able to match the SPIs of the
338	   encrypted side to the clear traffic, and so the wrong member may get
339	   the the other half of the flow.

341	   The other way, is to duplicate the child SAs, and have a pair of
342	   IPsec SAs for each active member.  Different packets for the same
343	   peer go through different members, and get protected using different
344	   SAs with the same selectors and matching the same entries in the SPD
345	   cache.  This has some shortcomings:
346	   o  It requires multiple parallel SAs, which the peer has no use for.
347	      Section 2.8 or [RFC4306] specifically allows this, but some
348	      implementation might have a policy against long term maintenance
349	      of redundant SAs.
350	   o  Different packets that belong to the same flow may be protected by
351	      different SAs, which may seem "weird" to the peer gateway,
352	      especially if it is integrated with some deep inspection
353	      middleware such as a firewall.  It is not known whether this will
354	      cause problems with current gateways.  It is also impossible to
355	      mandate against this, because the definition of "flow" varies from
356	      one implementation to another.
357	   o  Reply packets may arrive with an IPsec SA that is not "matched" to
358	      the one used for the outgoing packets.  Also, they might arrive at
359	      a different member.  This problem is beyond the scope of this
360	      document and should be solved by the application, perhaps by
361	      forwarding misdirected packets to the correct gateway for deep
362	      inspection.

364	3.6.1.  Outbound SAs using counter modes

366	   For SAs involving counter mode ciphers such as [CTR] or [GCM] there
367	   is yet another complication.  The initial vector for such modes must
368	   never be repeated, and senders use methods such as counters or LFSRs
369	   to ensure this.  An SA shared between more than one active member, or
370	   even failing over from one member to another need to make sure that
371	   they do not generate the same initial vector.  See [COUNTER_MODES]
372	   for a discussion of this problem in another context.

374	4.  Security Considerations

376	   Implementations running on clusters MUST be as secure as
377	   implementations running on single gateways.  In other words, no
378	   extension or interpretation used to allow operation in a cluster may
379	   facilitate attacks that are not possible for single gateways.

381	   Moreover, thought must be given to the synching requirements of any
382	   protocol extension, to make sure that it does not create an
383	   opportunity for denial of service attacks on the cluster.

385	   As mentioned in Section 3.4, allowing an inbound child SA to fail
386	   over to another member has the effect of disabling replay counter
387	   protection for a short time.  Though the threat is arguably low, it
388	   is a policy decision whether this is acceptable.

390	5.  Change Log

392	   This is the first version, re-spun as an WG document

394	6.  Informative References

396	   [COUNTER_MODES]
397	              McGrew, D. and B. Weis, "Using Counter Modes with
398	              Encapsulating Security Payload (ESP) and Authentication
399	              Header (AH) to Protect Group Traffic",
400	              draft-ietf-msec-ipsec-group-counter-modes (work in
401	              progress), March 2010.

403	   [CTR]      Housley, R., "Using Advanced Encryption Standard (AES)
404	              Counter Mode", RFC 3686, January 2009.

406	   [GCM]      Viega, J. and D. McGrew, "The Use of Galois/Counter Mode
407	              (GCM) in IPsec Encapsulating Security Payload (ESP)",
408	              RFC 4106, June 2005.

410	   [REDIRECT]
411	              Devarapalli, V. and K. Weniger, "Redirect Mechanism for
412	              IKEv2", RFC 5685, November 2009.

414	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
415	              Requirement Levels", BCP 14, RFC 2119, March 1997.

417	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
418	              Internet Protocol", RFC 4301, December 2005.

420	   [RFC4306]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
421	              RFC 4306, December 2005.

423	   [RFC4718]  Eronen, P. and P. Hoffman, "IKEv2 Clarifications and
424	              Implementation Guidelines", RFC 4718, October 2006.

426	   [VRRP]     Hinden, R., "Virtual Router Redundancy Protocol (VRRP)",
427	              RFC 3768, April 2004.

429	   [resumption]
430	              Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption",
431	              RFC 5723, January 2010.

433	Author's Address

435	   Yoav Nir
436	   Check Point Software Technologies Ltd.
437	   5 Hasolelim st.
438	   Tel Aviv  67897
439	   Israel

441	   Email: ynir@checkpoint.com