idnits 2.17.1 

draft-arkko-multi6dt-failure-detection-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1.a on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 837.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 814.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 821.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 827.

  ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure
     Acknowledgement. 

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.

  ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate
     instead of verbatim RFC 3978 boilerplate.  After 6 May 2005, submission
     of drafts without verbatim RFC 3978 boilerplate is not accepted.

     The following non-3978 patterns matched text found in the document. 
     That text should be removed or replaced:

        This document is an Internet-Draft and is subject to all provisions of
        Section 3 of RFC 3667.

        By submitting this Internet-Draft, each author represents that any
        applicable patent or other IPR claims of which he or she is aware
        have been or will be disclosed, and any of which he or she
        becomes aware will be disclosed, in accordance with Section 6 of
        BCP 79.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 18
     longer pages, the longest (page 15) being 75 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 23 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 18, 2004) is 7130 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '1' is defined on line 702, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 2461 (ref. '2') (Obsoleted by RFC 4861)

  ** Obsolete normative reference: RFC 2462 (ref. '3') (Obsoleted by RFC 4862)

  ** Obsolete normative reference: RFC 3315 (ref. '4') (Obsoleted by RFC 8415)

  ** Obsolete normative reference: RFC 3484 (ref. '5') (Obsoleted by RFC 6724)

  == Outdated reference: A later version (-18) exists of
     draft-ietf-dhc-dna-ipv4-08

  == Outdated reference: A later version (-07) exists of
     draft-ietf-ipv6-optimistic-dad-01

  == Outdated reference: A later version (-09) exists of
     draft-ietf-ipv6-unique-local-addr-05

  -- Obsolete informational reference (is this intentional?): RFC 2960 (ref.
     '9') (Obsoleted by RFC 4960)

  == Outdated reference: A later version (-08) exists of
     draft-ietf-mobike-design-00

  == Outdated reference: A later version (-08) exists of
     draft-dupont-ikev2-addrmgmt-05

  == Outdated reference: A later version (-02) exists of
     draft-eronen-mobike-mopo-00


     Summary: 11 errors (**), 0 flaws (~~), 12 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                           J. Arkko
2	Internet-Draft                                                  Ericsson
3	Expires: April 18, 2005                                 October 18, 2004

5	           Failure Detection and Locator Selection in Multi6
6	               draft-arkko-multi6dt-failure-detection-00

8	Status of this Memo

10	   This document is an Internet-Draft and is subject to all provisions
11	   of section 3 of RFC 3667.  By submitting this Internet-Draft, each
12	   author represents that any applicable patent or other IPR claims of
13	   which he or she is aware have been or will be disclosed, and any of
14	   which he or she become aware will be disclosed, in accordance with
15	   RFC 3668.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as
20	   Internet-Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt.

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire on April 18, 2005.

35	Copyright Notice

37	   Copyright (C) The Internet Society (2004).

39	Abstract

41	   This draft discusses locator pair selection and failure detection
42	   mechanisms for the IPv6 multihoming feature being developed in the
43	   Multi6 working group.  Elements of this document may also be useful
44	   for developing the details of the MOBIKE or HIP multihoming
45	   mechanisms.  The draft also discusses the roles of a multihoming
46	   protocol versus network attachment functions at IP and link layers.

48	Table of Contents

50	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
51	   2.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . .  4
52	   3.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  6
53	         3.1   Available Addresses  . . . . . . . . . . . . . . . . .  6
54	         3.2   Locally Operational Addresses  . . . . . . . . . . . .  6
55	         3.3   Operational Address Pairs  . . . . . . . . . . . . . .  7
56	         3.4   Primary Address Pair . . . . . . . . . . . . . . . . .  8
57	         3.5   Miscellaneous  . . . . . . . . . . . . . . . . . . . .  8
58	   4.  Architectural Considerations . . . . . . . . . . . . . . . . . 10
59	   5.  An Approach  . . . . . . . . . . . . . . . . . . . . . . . . . 12
60	         5.1   State Machine for Addresses  . . . . . . . . . . . . . 12
61	         5.2   State Machine for Address Pair Selection . . . . . . . 13
62	         5.3   Pair Selection Algorithm . . . . . . . . . . . . . . . 16
63	         5.4   Protocol for Testing Unidirectional Reachability . . . 18
64	   6.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
65	       6.1   Normative References . . . . . . . . . . . . . . . . . . 20
66	       6.2   Informative References . . . . . . . . . . . . . . . . . 20
67	       Author's Address . . . . . . . . . . . . . . . . . . . . . . . 21
68	   A.  Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 22
69	       Intellectual Property and Copyright Statements . . . . . . . . 23

71	1.  Introduction

73	   The Multi6 working group is extending IPv6 to support multihoming.  A
74	   number of possible approaches exist in this space, but the current
75	   focus of the group is to look at an IP layer (or layer 3.5) mechanism
76	   that hides multihoming from applications.  Different variants of the
77	   IP layer mechanism have been suggested in [17, 18, 19, 21] and other
78	   references.

80	   All these mechanisms have a common need to detect when a switch to
81	   another address or addresses becomes necessary.  We call this failure
82	   detection, because the multi6 protocol works primarily as a failover
83	   rather than a load balancing scheme.

85	   This draft discusses what requirements such a component of the multi6
86	   protocol has, and how these requirements can be achieved.  The draft
87	   is structured as follows: Section 2 discusses what kind of solutions
88	   have been used in other similar protocols.  Section 3 defines a set
89	   of useful terms and discusses them, and xref target='transport'/>
90	   discusses the architectural implications of multihoming at IP layer.
91	   Finally, Section 5 describes one possible solution involving two
92	   state machines, a failure testing protocol, and an address pair
93	   selection algorithm.

95	   For the purposes of this draft, we consider an address to be
96	   synonymous with a locator.  There may be other, higher level
97	   identifiers such as security associations, FQDNs, CGA public keys, or
98	   HITs that tie the different locators used by a node together.

100	2.  Related Work

102	   In SCTP [9], the addresses of the endpoints are learned in the
103	   connection setup phase either through listing them explictly or via
104	   giving a DNS name that points to them.  In order to provide a
105	   failover mechanism between multihomed hosts, SCTP has the following
106	   functions:

108	   o  One of the peer's addresses is selected as the primary address by
109	      the application running on top of SCTP.  All data packets are sent
110	      to this address until there is a reason to choose another address,
111	      such as the failure of the primary address.

113	   o  Testing the reachability of the peer endpoint's addresses.  This
114	      is done both via observing the data packets sent to the peer or
115	      via a periodic heartbeat when there is no data packets to send.

117	      Each time data packet retransmission is initiated (or when a
118	      heartbeat is not answered within the estimated round-trip time) an
119	      error counter is incremented.  When a configured error limit is
120	      reached, the particular destination address is marked as inactive.
121	      The reception of an acknowledgement or heartbeat response clears
122	      the counter.

124	   o  Retransmission: When retransmitting the endpoint attempts pick the
125	      most "divergent" source-destination pair from the original
126	      source-destination pair to which the packet was transmitted.
127	      Rules for such selection are, however, left as implementation
128	      decisions in SCTP.

130	   SCTP does not define how local knowledge (such as information learned
131	   from the link layer) should be used.  SCTP also has no mechanism to
132	   deal with dynamic changes to the set of available addresses.

134	   The MOBIKE protocol is currently being designed, and some proposals
135	   for the protocol exists [12, 13, 14, 15].  No official decision about
136	   the protocol has been made yet, but there has been a lot of
137	   discussion around the failure detection mechanisms in the context of
138	   MOBIKE, and reference [10] records some of the current thoughts of
139	   the WG on this issue.

141	   Some of the issues that have been discussed include the following:

143	   o  Single address vs.  multiple peer addresses.  A simple approach is
144	      to have the peers be aware of just the current address of the
145	      other side instead of all possible ones.  Assuming that one of the
146	      peers will request the other to start sending to a new address
147	      this works well.  However, this approach is unable to deal with
148	      problems that affect both nodes.  For instance, two nodes
149	      connected by two separate point-to-point links will be unable to
150	      switch to the other link if a failure occurs on the first one.

152	   o  Addresses vs.  address pairs.  Are tests and current paths
153	      individual peer addresses, or pairs of peer and own addresses
154	      (paths)?  It seems that some failure scenarios require the use of
155	      a path rather than a single address.  A network failure may make
156	      it impossible to communicate between a particular pair of
157	      addresses, even if those addresses have some other connectivity.

159	   o  Where the connectivity information comes from.  Does it come from
160	      local stack (such as interface up/down, router advertisement),
161	      from reception of ESP packets, from IKEv2 keepalives, or through
162	      some MOBIKE-defined mechanism?

164	   The mobility and multihoming specification for the HIP protocol [16]
165	   leaves the determination of when address updates are sent to a local
166	   policy, but suggests the use of local information and ICMP error
167	   messages.

169	   Network attachment procedures are also relevant for multihoming.  The
170	   IPv6 and MIP6 working groups have standardized mechanisms to
171	   dynamically learn about new networks that a node has attached to, and
172	   enhanced or optimized mechanisms are being designed in the DHC and
173	   DNA working groups.  Network attachment detection has turned out to
174	   be a relatively complex procedure for mobile hosts, and it was not
175	   fully anticipated at the time IPv6 Neighbor Discovery or DHCP were
176	   being designed.

178	3.  Definitions

180	   This section defines terms useful in discussing the failure detection
181	   problem space.

183	3.1  Available Addresses

185	   Multi6 nodes need to be aware of what addresses they themselves have.
186	   If a node loses the address it is currently using for communications,
187	   another address must replace this address.  And if a node loses an
188	   address that the node's peer knows about, the peer must be informed.
189	   Similarly, when a node acquires a new address it may generally wish
190	   the peer to know about it.

192	   Definition.  Available address.  An address is said to be available
193	   if the following conditions are fulfilled:

195	   o  The address has been assigned to an interface of the node.

197	   o  If the address is an IPv6 address, we additionally require that
198	      (a) the address is valid in the sense of RFC 2461 [2], and that
199	      (b) the address is not tentative in the sense of RFC 2462 [3].  In
200	      other words, the address assignment is complete so that
201	      communications can be started.

203	      Note this explicitly allows an address to be optimistic in the
204	      sense of [7] even though implementations are probably better off
205	      using other addresses as long as there is an alternative.

207	   o  The address is a global unicast or unique site-local address [8].
208	      That is, it is not an IPv6 link-local or site-local address.
209	      Where IPv4 is considered, it is not an RFC 1918 address.

211	   o  The address and interface is acceptable for use according to a
212	      local policy.

214	   Available addresses are discovered and monitored through mechanisms
215	   outside the scope of MULTI6 (and HIP or MOBIKE).  These mechanisms
216	   include IPv6 Neighbor Discovery and Address Autoconfiguration [2, 3],
217	   DHCP [4], enhanced network detection mechanisms detected by the DNA
218	   working group, and corresponding IPv4 mechanisms, such as [6].

220	3.2  Locally Operational Addresses

222	   Two different granularity levels are needed for failure detection.
223	   The coarser granularity is for individual addresses:

225	   Definition.  Locally Operational Address.  An available address is
226	   said to be locally operational when its use is known to be possible
227	   locally: the interface is up and the relevant default router (if
228	   applicable) is known to be reachable.

230	   Locally operational addresses are discovered and monitored through
231	   mechanisms outside MULTI6 (and HIP or MOBIKE).  These mechanisms
232	   include IPv6 Neighbor Discovery [2], corresponding IPv4 mechanisms,
233	   and link layer specific mechanisms.  Theoretically, it is also
234	   possible for hosts to learn about routing failures for a particular
235	   selected source prefix, even if no protocol exists today to
236	   distribute this information in a convenient manner.

238	3.3  Operational Address Pairs

240	   The existence of locally operational addresses are not, however, a
241	   guarantee that communications can be established with the peer.  A
242	   failure in the routing infrastructure can prevent the sent packets
243	   from reaching their destination.  For this reason we need the
244	   definition of a second level of granularity, for pairs of addresses:

246	   Definition.  Bidirectionally operational address pair.  A pair of
247	   locally operational addresses are said to be an operational address
248	   pair, iff bidirectional connectivity can be shown between the
249	   addresses.  That is, a packet sent with one of the addresses in the
250	   source field and the other in the destination field reaches the
251	   destination, and vice versa.

253	   Unfortunately, there are scenarios where bidirectionally operational
254	   address pairs do not exist.  For instance, ingress filtering or
255	   network failures may result in one address pair being operational in
256	   one direction while another one is operational from the other
257	   direction.  The following definition captures this general situation:

259	   Definition.  Undirectionally operational address pair.  A pair of
260	   locally operational addresses are said to be an unidirectionally
261	   operational address pair, iff packets sent with the first address as
262	   the source and the second address as the destination can be shown to
263	   reach the destination.

265	   Both types of operational pairs are discovered and monitored through
266	   the following mechanisms:

268	   o  Positive feedback from upper layer protocols.  For instance, TCP
269	      can indicate to the IP layer that it is making progress.  This is
270	      similar to how IPv6 Neighbor Unreachability Detection can in some
271	      cases be avoided when upper layers provide information about
272	      bidirectional connectivity [2].  In the case of unidirectional
273	      connectivity, the upper layer protocol responses come back using
274	      another address pair, but show that the messages sent using the
275	      first address pair have been received.

277	   o  Negative feedback from upper layer protocols.  It is conceivable
278	      that upper layer protocols give an indication of a problem to the
279	      MULTI6 layer.  For instance, TCP could indicate that there's
280	      either congestion or lack of connectivity in the path because it
281	      is not getting ACKs.

283	   o  Explicit reachability tests.  For instance, the IKEv2 keepalive
284	      mechanism can be used to test that the current pair of addresses
285	      is operational.

287	   o  ICMP error messages.  Given the ease of spoofing ICMP messages,
288	      one should be careful to not trust these blindly, however.  Our
289	      suggestion is to use ICMP error messages only as a hint to perform
290	      an explicit reachability test, but not as a reason to disrupt
291	      ongoing communications without other indications of problems.

293	   Note that some protocols, such as HIP [16], perform a return
294	   routability test of an address before it is taken into use.  The
295	   purpose of this test is to ensure that fraudulent peers do not trick
296	   others into redirecting traffic streams onto innocent victims [22].
297	   Such tests can at the same time work as a means to ensure that an
298	   address pair is operational.  Note, however, that some advanced
299	   optimizations attempt to postpone the reachability tests so that they
300	   do not increase movement-related latency [20].

302	3.4  Primary Address Pair

304	   Contrary to SCTP which has a specific congestion avoidance design
305	   suitable for multi-homing, IP-layer solutions need to avoid sending
306	   packets concurrently over multiple paths; TCP behaves rather poorly
307	   in such circumstances.  For this reason it is necessary to choose a
308	   particular pair of addresses as the primary address pair which is
309	   used until problems occur, at least for the same session.

311	   A primary address pair need not be operational at all times.  If
312	   there is no traffic to send, we may not know if the primary address
313	   pair is operational.  Neverthless, it makes sense to assume that the
314	   address pair that worked in some time ago continues to work for new
315	   communications as well.

317	3.5  Miscellaneous

319	   Addresses can become deprecated [2].  When other operational
320	   addresses exist, nodes generally wish to move their communications
321	   away from the deprecated addresses.

323	   Similarly, IPv6 source address selection [5] may guide the selection
324	   of a particular source address - destination address pair.

326	4.  Architectural Considerations

328	   Architecturally, a number of questions arises.  One simple question
329	   is whether there needs to be communications between a multihoming
330	   solution residing at the IP layer and upper layer protocols?  Upon
331	   changing to a new address pair, transport layer protocol SHOULD be
332	   notified so that it can perform a slow start.  This is necessary, for
333	   instance, when switching from a high-bandwidth LAN interface to a low
334	   bandwidth cellular interface.  (Note that this notification can not
335	   be done in protocol designs where the end points are not the final
336	   hosts, such as where a gateway is used.

338	   A more fundamental question is which protocols should be responsible
339	   for which parts of the problem.  It seems clear that no multihoming
340	   solution should take on the task of lower layers and other IP
341	   functions for discovering its own addresses or testing local
342	   connectivity.  Protocols such as DHCP or Neighbor and Router
343	   Discovery do this already.

345	   But it is less clear which protocol(s) should discover end-to-end
346	   connectivity problems or recover from them.  One answer is that this
347	   is clearly within the domain of multihoming protocol.  By performing
348	   testing and failure detection of the used path and switching to a new
349	   path if necessary, the transport and application protocols can work
350	   unchanged.

352	   On the other hand, one could argue that transport and application
353	   protocols would have more knowledge about the situation, and have a
354	   better ability to decide when a move is required.  For instance, they
355	   know what the required throughput and congestion status is.  Also, it
356	   would be unfortunate if both the IP layer and transport/application
357	   layer took action for the same problem, for instance by switching to
358	   a new address at the IP layer and throttling back due to "congestion"
359	   at the transport layer.

361	   Generally speaking, we can divide information that a host has into
362	   three categories: local information from "lower layers" such as IPv6
363	   Neighbor Discovery, transit and congestion condition information from
364	   either from the multihoming protocol itself or from transport layer
365	   protocols and (where available) ECN, and application layer policies
366	   that dictate what the requirements are for acceptable connections.

368	   The division of work is largely left as an open issue as far as this
369	   document is concerned, but our description works from a point of view
370	   of a multihoming protocol at the IP layer.  We also note that in the
371	   CELP proposal [11], both IP, transport, and application layer
372	   entities could share their connectivity status in a common
373	   information pool.  This may also be a useful approach.

375	   Finally, the last architectural question is about the difference
376	   between mobility and multihoming.  Given our definitions above,
377	   there's no fundamental difference with respect to how the
378	   multihoming/mobility protocol learns the addresses it has available.
379	   However, a practical difference is that in a multihoming scenario
380	   there are alternative addresses, whereas in mobility changes to a new
381	   address are forced due to the old address no longer being available.

383	5.  An Approach

385	   One suggested approach consists of a mechanism for keeping track of
386	   the host's own available addresses, operational addresses, and
387	   operational address pairs.

389	5.1  State Machine for Addresses

391	   Addresses can be in the AVAILABLE and OPERATIONAL states.  The state
392	   transitions relating to this are shown in Figure 1.

394	                     +--------------+
395	     Address becomes |              |
396	     available       |              |
397	   ----------------->|              |
398	                     |  AVAILABLE   |
399	   <-----------------|              |
400	     Address is no   |              |
401	    longer available |              |
402	                     +--------------+
403	                        |       / \
404	                Address |        | Address
405	                becomes |        | is no longer
406	            operational |        | operational
407	                        |        |
408	                       \ /       |
409	                     +--------------+
410	                     |              |
411	     Address is no   |              |
412	    longer available |              |
413	   <-----------------| OPERATIONAL  |
414	                     |              |
415	                     |              |
416	                     |              |
417	                     +--------------+

419	          Figure 1. Address state machine.

421	   When an address becomes operational, it SHOULD be reported as a new
422	   address to the peer.  Similarly, when an address is no longer
423	   operational or available, the peer SHOULD be informed.

425	   In addition, a particular address can be either preferred or
426	   deprecated.  This is not shown in the state machine.

428	5.2  State Machine for Address Pair Selection

430	   A node runs the address pair selection state machine to choose the
431	   currently used primary address pair, the one which is used for
432	   sending outgoing packets.  A node runs one of these state machines
433	   towards each different peer, tracking the known address pairs and
434	   their status.  Each peer also has its own state machine for talking
435	   back to the node; there is no guarantee that the same address pairs
436	   (in reverse order) have the same state; lack of bidirectionally
437	   operational pair would result in a different state on both sides, for
438	   instance.

440	   The state machine can be in the NO PRIMARY, TESTING PRIMARY, and
441	   PRIMARY OPERATIONAL states.  The chosen address pair is known to be
442	   operational in the PRIMARY OPERATIONAL state, and is either
443	   unverified or non-operational in the other states.

445	   Figure 2 shows the state machine:

447	                         +----------------+
448	                         |                |
449	                         |                |
450	                         |                |
451	                         |                |
452	                         |       NO       |
453	                         |     PRIMARY    |
454	                         |                |
455	                   +-----|                |<---------------+
456	                   |     |                |                |
457	                   |     +----------------+                |
458	                   |         / \    / \                    |
459	               Add |          |      |                     |
460	             pair: |   Delete |      | Test         Delete |
461	              Send |   pair & |      | fail &       pair & |
462	              test |     Last |      | Last           Last |
463	                   |          |      |                     |
464	                   |     +----------------+                |
465	                   |     |                |                |
466	                   +---->|                |<----+          |
467	                         |                |     | Test     |
468	    Connect: Send test   |                |     | fail &   |
469	   --------------------->|     TESTING    |     | !Last    |
470	                         |     PRIMARY    |+----+          |
471	          +------------->|                |                |
472	          |              |                |<----+          |
473	          |        +---->|                |     |          |
474	          |        |     +----------------+     |          |
475	   Policy | ICMP | |          |      |          |          |
476	   change | Timer: |      ULP |      | Test     | Delete   |
477	          |   Send | feedback:|      | OK:      | pair &   |
478	          |   test |    Reset |      | Reset    | !Last    |
479	          |        |    timer |      | timer    |          |
480	          |        |         \ /    \ /         |          |
481	          |        |     +----------------+     |          |
482	          |        +-----|                |     |          |
483	          |              |                |-----+          |
484	          +--------------|                |                |
485	                         |                |                |
486	                   +-----|   OPERATIONAL  |                |
487	     ULP feedback: |     |     PRIMARY    |                |
488	       Reset timer |     |                |----------------+
489	                   +---->|                |
490	                         |                |
491	                         +----------------+

493	          Figure 2. Pair selection state machine.

495	   The notation used in Figure 2 is explained below:

497	   Connect

499	      An event representing the desire of the application to send a
500	      packet to a new peer, or an indication from a peer wishing to
501	      connect to us.

503	   Test OK

505	      An event representing a successful completion of the reachability
506	      test.

508	   Test fail

510	      An event representing failure to complete the reachability test.

512	   ULP feedback

514	      An event representing positive indication from an upper layer
515	      protocol that the packets we have sent to the peer are getting
516	      through.

518	   ICMP

520	      An event representing the reception of an ICMP error message.

522	   Timer

524	      An event representing timer elapsing.

526	   Add pair

528	      An event representing the addition of a new possible address pair,
529	      either through learning a new local address or being told of a new
530	      remote address.

532	   Delete pair

534	      An event representing the deletion of the currently chosen primary
535	      address pair.

537	   Policy change

539	      An event representing the desire of the local or remote end to
540	      change to a different address pair, despite the current one being
541	      operational.  This can be due to the availability of the
542	      higher-bandwidth connection, cost, or other issues.

544	   Last

546	      A condition that tells whether or not the currently chosen primary
547	      pair is the only known address pair.

549	   Send test

551	      An action to initiate the reachability test for a particular pair.
552	      This test is typically embedded in the Multi6 connection setup
553	      exchange when run initially, and a separate exchange later.

555	      Note that due to potentially asymmetric connectivity, both sides
556	      have to perform their own tests, and make their own primary pair
557	      selections.

559	      An action to reset a timer so that it will send an event after a
560	      specified time.

562	   The state machines also assumes an underlying multihoming signaling
563	   capabability, consisting of the following abstract message exchanges:

565	   Open

567	      Establishes a connection between the peers.  May also exchange
568	      locator sets and test reachability at the same time.

570	   Test

572	      Verifies reachability using a specific address pair.

574	   Add

576	      Informs the peer about new locators.

578	   Delete

580	      Informs the peer about losing some locators.

582	   Note that the above state machine leaves open how specific address
583	   pairs are chosen, as this will be discussed in the next section.  We
584	   have also, on purpose, decided to avoid attaching functional labels
585	   such as "backup" to other address pairs beyond the primary pair.  It
586	   is our belief that a general design does not need these labels.

588	5.3  Pair Selection Algorithm

590	   The pair selection state machine assumes an ability to pick primary
591	   and alternative address pairs.

593	   This process result in a combinatorial explosion when there are many
594	   addresses on both sides.  Do both sides track all possible
595	   combinations of addresses? If a failure occurs, shall all
596	   combinations be tested before giving up? Are such tests performed in
597	   parallel or in sequence, and what kind of backoff procedures should
598	   be applied?

600	   Our suggestion is that nodes MUST first consult RFC 3484 [5] policy
601	   tables to determine what combinations of addresses are legal from a
602	   local point of view, as this reduces the search space.  Nodes SHOULD
603	   also use local information, such as known quality of service
604	   parameters or interface types to determine what addresses are
605	   preferred over others, and try pairs containing such addresses first.
606	   In some cases we can also learn the peer's preferences through the
607	   multihoming protocol [16].

609	      Discussion note 1: It may also be possible to simulate preferences
610	      by choosing to not tell the peer about some (non-preferred)
611	      addresses.

613	      Discussion note 2: The preferences may either be learned
614	      dynamically or be configured.  It is believed, however, that
615	      dynamic learning based purely on the MULTI6 protocol is too hard
616	      and not the task this layer should do.  Solutions where multiple
617	      protocols share their information in a common pool of locators
618	      could provide this information from transport protocols, however
619	      [11].

621	   The reception of packets from the peer with a given address pair is a
622	   good hint that the address pair works, particularly when these
623	   packets are authenticated multihoming protocol packets.  However, the
624	   reception of these packets alone is an insufficient reason to switch
625	   to a new address, as in an unidirectional connectivity case the
626	   return path may not work.

628	   One suggested good implementation strategy is to record the
629	   reachability test result (an on/off value) and multiply this by the
630	   age of the information.  This allows recently tested address pairs to
631	   be chosen before old ones.

633	   Out of the set of possible candidate address pairs, nodes SHOULD
634	   attempt a test through all of them, but MUST do this sequentially
635	   (based on an implementation-dependent priority order) and using an
636	   exponential back-off procedure.

638	   This sequantial process is necessary in order to avoid a "signaling
639	   storm" when an outage occurs (particularly for a complete site).
640	   However, it also limits the number of addresses that can in practice
641	   be used for multihoming, considering that transport and application
642	   layer protocols will fail if the switch to a new address pair takes
643	   too long.  For instance, we can assume that an initial timeout value
644	   is 0.1 seconds and there are four addresses on both sides.  Going
645	   through all sixteen address pairs and doubling the timeout value at
646	   every trial would take 3200 seconds!

648	   Finally, as has been noted in the context of MOBIKE, the existence of
649	   NATs can require that peers continuously monitor the operational
650	   status of address pairs, as otherwise NAT state related to a
651	   particular communication is lost, and the peer on the outer side of
652	   the NAT can no longer reach the peer inside the NAT.

654	5.4  Protocol for Testing Unidirectional Reachability

656	   Testing for reachability is not easy in an environment where
657	   unidirectional reachability is a possibility.  This is because the
658	   test of a single pair may not result in a working paths to send both
659	   the request and response packets.  The following protocol could be
660	   used to avoid this problem:

662	    Peer A                                        Peer B
663	      |                                             |
664	      |  Poll 1 (src=A1, dst=B1)                    |
665	      |-------------------------------------------->|
666	      |                                             |
667	      |               Poll 2 (src=B1, dst=A1) OK: 1 |
668	      |        X------------------------------------|
669	      |                                             |
670	      |  Poll 3 (src=A2, dst=B1)                    |
671	      |------------------------------X              |
672	      |                                             |
673	      |          Poll 4 (src=B2, dst=A1) OK: 1      |
674	      |<--------------------------------------------|
675	      |                                             |
676	      |  Poll 5 (src=A1, dst=B1) OK: 4              |
677	      |-------------------------------------------->|
678	      |                                             |

680	   When B receives the first Poll message, it memorizes that it has
681	   gotten it.  The Poll message from B, however, is lost so A tries
682	   again with another pair.  This is lost too, but B continues its own
683	   testing process by sending its second Poll message, which is received
684	   by A.  The messages carry identifiers, and a list of identifiers that
685	   were found messages the sender had itself successfully received
686	   earlier.

688	   In the end of the example case, A and B know that they have a working
689	   path from A to B using (A1, B1) and from B to A using (B2, A1).

691	   More generally, when A decides that it needs to test for
692	   connectivity, it will initiate a set of Poll messages, in sequence,
693	   until it gets a Poll message from B indicating that (a) B has
694	   received one of A's Poll messages and, obviously, (b) that B's Poll
695	   message is getting through.  B uses the same algorithm, but starts
696	   the process from the reception of the first Poll mesage from A.

698	6.  References

700	6.1  Normative References

702	   [1]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
703	        Levels", BCP 14, RFC 2119, March 1997.

705	   [2]  Narten, T., Nordmark, E. and W. Simpson, "Neighbor Discovery for
706	        IP Version 6 (IPv6)", RFC 2461, December 1998.

708	   [3]  Thomson, S. and T. Narten, "IPv6 Stateless Address
709	        Autoconfiguration", RFC 2462, December 1998.

711	   [4]  Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C. and M.
712	        Carney, "Dynamic Host Configuration Protocol for IPv6 (DHCPv6)",
713	        RFC 3315, July 2003.

715	   [5]  Draves, R., "Default Address Selection for Internet Protocol
716	        version 6 (IPv6)", RFC 3484, February 2003.

718	   [6]  Aboba, B., "Detection of Network Attachment (DNA) in IPv4",
719	        draft-ietf-dhc-dna-ipv4-08 (work in progress), July 2004.

721	   [7]  Moore, N., "Optimistic Duplicate Address Detection for IPv6",
722	        draft-ietf-ipv6-optimistic-dad-01 (work in progress), June 2004.

724	   [8]  Hinden, R. and B. Haberman, "Unique Local IPv6 Unicast
725	        Addresses", draft-ietf-ipv6-unique-local-addr-05 (work in
726	        progress), June 2004.

728	6.2  Informative References

730	   [9]   Stewart, R., Xie, Q., Morneault, K., Sharp, C., Schwarzbauer,
731	         H., Taylor, T., Rytina, I., Kalla, M., Zhang, L. and V. Paxson,
732	         "Stream Control Transmission Protocol", RFC 2960, October 2000.

734	   [10]  Kivinen, T., "Design of the MOBIKE protocol",
735	         draft-ietf-mobike-design-00 (work in progress), June 2004.

737	   [11]  Crocker, D., "Framework for Common Endpoint Locator Pools",
738	         draft-crocker-celp-00 (work in progress), February 2004.

740	   [12]  Dupont, F., "Address Management for IKE version 2",
741	         draft-dupont-ikev2-addrmgmt-05 (work in progress), June 2004.

743	   [13]  Eronen, P., "Mobility Protocol Options for IKEv2 (MOPO-IKE)",
744	         draft-eronen-mobike-mopo-00 (work in progress), July 2004.

746	   [14]  Eronen, P. and H. Tschofenig, "Simple Mobility and Multihoming
747	         Extensions for IKEv2 (SMOBIKE)", draft-eronen-mobike-simple-00
748	         (work in progress), March 2004.

750	   [15]  Kivinen, T., "MOBIKE protocol",
751	         draft-kivinen-mobike-protocol-00 (work in progress), March
752	         2004.

754	   [16]  Nikander, P., "End-Host Mobility and Multi-Homing with Host
755	         Identity Protocol", draft-nikander-hip-mm-02 (work in
756	         progress), July 2004.

758	   [17]  Nordmark, E., "Multihoming without IP Identifiers",
759	         draft-nordmark-multi6-noid-02 (work in progress), July 2004.

761	   [18]  Nordmark, E., "Multihoming using 64-bit Crypto-based IDs",
762	         draft-nordmark-multi6-cb64-00 (work in progress), November
763	         2003.

765	   [19]  Nordmark, E., "Strong Identity Multihoming using 128 bit
766	         Identifiers (SIM/CBID128)", draft-nordmark-multi6-sim-01 (work
767	         in progress), October 2003.

769	   [20]  Vogt, C., Arkko, J., Bless, R., Doll, M. and T. Kuefner,
770	         "Credit-Based Authorization for Mobile IPv6 Early Binding
771	         Updates", draft-vogt-mipv6-credit-based-authorization-00 (work
772	         in progress), May 2004.

774	   [21]  Ylitalo, J., "Weak Identifier Multihoming Protocol (WIMP)",
775	         draft-ylitalo-multi6-wimp-01 (work in progress), July 2004.

777	   [22]  Aura, T., Roe, M. and J. Arkko, "Security of Internet Location
778	         Management", In Proceedings of the 18th Annual Computer
779	         Security Applications Conference, Las Vegas, Nevada, USA.,
780	         December 2002.

782	Author's Address

784	   Jari Arkko
785	   Ericsson
786	   Jorvas  02420
787	   Finland

789	   EMail: jari.arkko@ericsson.com

791	Appendix A.  Contributors

793	   This draft attempts to summarize the thoughts and unpublished
794	   contributions of many people, including the MULTI6 WG design team
795	   members Marcelo Bagnulo Braun, Iljitsch van Beijnum, Erik Nordmark,
796	   Geoff Huston, Margaret Wasserman, and Jukka Ylitalo, the MOBIKE WG
797	   contributors Pasi Eronen, Tero Kivinen, Francis Dupont, Spencer
798	   Dawkins, and James Kempf, and my colleague Pekka Nikander at
799	   Ericsson.  This draft is also in debt to work done in the context of
800	   SCTP [9].

802	   The protocol design in Section 5.4 is due to Erik, Marcelo, and
803	   Iljitsch.

805	Intellectual Property Statement

807	   The IETF takes no position regarding the validity or scope of any
808	   Intellectual Property Rights or other rights that might be claimed to
809	   pertain to the implementation or use of the technology described in
810	   this document or the extent to which any license under such rights
811	   might or might not be available; nor does it represent that it has
812	   made any independent effort to identify any such rights.  Information
813	   on the procedures with respect to rights in RFC documents can be
814	   found in BCP 78 and BCP 79.

816	   Copies of IPR disclosures made to the IETF Secretariat and any
817	   assurances of licenses to be made available, or the result of an
818	   attempt made to obtain a general license or permission for the use of
819	   such proprietary rights by implementers or users of this
820	   specification can be obtained from the IETF on-line IPR repository at
821	   http://www.ietf.org/ipr.

823	   The IETF invites any interested party to bring to its attention any
824	   copyrights, patents or patent applications, or other proprietary
825	   rights that may cover technology that may be required to implement
826	   this standard.  Please address the information to the IETF at
827	   ietf-ipr@ietf.org.

829	Disclaimer of Validity

831	   This document and the information contained herein are provided on an
832	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
833	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
834	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
835	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
836	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
837	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

839	Copyright Statement

841	   Copyright (C) The Internet Society (2004).  This document is subject
842	   to the rights, licenses and restrictions contained in BCP 78, and
843	   except as set forth therein, the authors retain all their rights.

845	Acknowledgment

847	   Funding for the RFC Editor function is currently provided by the
848	   Internet Society.