idnits 2.17.1 

draft-ietf-shim6-reach-detect-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 12.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 399.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 376.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 383.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 389.

  ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line
     405), which is fine, but *also* found old RFC 2026, Section 10.4C,
     paragraph 1 text on line 34.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 6) being 421 lines

  == It seems as if not all pages are separated by form feeds - found 1 form
     feeds but 6 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 110: '...e, a value of 10 for ShimKeepT MUST be...'
     RFC 2119 keyword, line 215: '...implementations SHOULD try, within rea...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (Jul 11, 2005) is 6858 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

     No issues found here.

     Summary: 7 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET-DRAFT                                      Iljitsch van Beijnum
2	Jul 11, 2005

4	                        Shim6 Reachability Detection
5	                    draft-ietf-shim6-reach-detect-01.txt

7	   Status of this Memo

9	   By submitting this Internet-Draft, each author represents that any
10	   applicable patent or other IPR claims of which he or she is aware
11	   have been or will be disclosed, and any of which he or she becomes
12	   aware will be disclosed, in accordance with Section 6 of BCP 79.

14	   Internet-Drafts are working documents of the Internet Engineering
15	   Task Force (IETF), its areas, and its working groups.  Note that
16	   other groups may also distribute working documents as Internet-
17	   Drafts.

19	   Internet-Drafts are draft documents valid for a maximum of six months
20	   and may be updated, replaced, or obsoleted by other documents at any
21	   time.  It is inappropriate to use Internet-Drafts as reference
22	   material or to cite them other than as "work in progress."

24	   The list of current Internet-Drafts can be accessed at
25	   http://www.ietf.org/ietf/1id-abstracts.txt

27	   The list of Internet-Draft Shadow Directories can be accessed at
28	   http://www.ietf.org/shadow.html.

30	   This Internet Draft expires April 24, 2006.

32	   Copyright Notice

34	      Copyright (C) The Internet Society (2005).  All Rights Reserved.

36	Abstract

38	The shim6 working group is developing a mechanism that allows
39	multihoming by using multiple addresses. When communication between
40	the initially chosen addresses for a transport session is no longer
41	possible, a "shim" layer makes it possible to switch to a different
42	set of addresses without breaking current transport protocol
43	assumptions. This draft discusses the issues of detecting failures
44	in a currently used address pair between two hosts and picking a
45	new address pair to be used when a failure occurs. The input for
46	these processes are ordered lists of local and remote addresses
47	that are reasonably likely to work. (I.e., not include addresses
48	that are known to be unreachable for local reasons.) These lists
49	must be available at both ends of the communication, although the
50	ordering may differ. Building these address lists from locally
51	available information and synchronizing them with the remote end
52	are outside the scope of this document.

54	This text is for the most part based on discussions on the multi6
55	list, several multi6 design team lists and the shim6 list, with
56	notable contributions from Erik Nordmark, Marcelo Bagnulo and Jari
57	Arkko. Suggestions and additions are more than welcome.

59	1 Introduction

61	A naive implementation of an (un)reachability detection mechanism
62	could just probe all possible paths between two hosts periodically.
63	A "path" is defined as a combination of a source address for host A
64	and a destination address for host B. In hop-by-hop forwarding the
65	source address doesn't have any effect on reachability, but in the
66	presence of filters or source address based routing, it may. And
67	although links almost always work in two directions, routing
68	protocols and filters only work in one direction so unidirectional
69	reachability can happen. Without additional mechanisms, the
70	practice of ingress filtering by ISPs makes unidirectional
71	connectivity likely. Being able to use the working leg in a
72	unidirectional path is useful, it's not an essential requirement.
73	It is essential, however, to avoid assuming bidirectional
74	connectivity when there is in fact a unidirectional failure.

76	Exploring the full set of communication options between two hosts
77	that both have two or more addresses is an expensive operation as
78	the number of combinations to be explored increases very quickly
79	with the number of addresses. For instance, with two addresses on
80	both sides, there are four possible address pairs. Since we can't
81	assume that reachability in one direction automatically means
82	reachability for the complement pair in the other direction, the
83	total number of two-way combinations is eight. (Combinations = nA *
84	nB * 2.)

86	An important observation in multihoming is that failures are
87	relatively infrequent, so that a path that worked a few seconds ago
88	is very likely to work now as well. So it makes sense to have a
89	light-weight protocol that confirms existing reachability, and only
90	invoke the much heavier protocol that can determine full
91	reachability when a there is a suspected failure.

93	2 Determining reachability for the current pair

95	Reachability for the currently used address pair in a shim context
96	is determined by making sure that whenever there is data traffic in
97	one direction, there is also traffic in the other direction. This
98	can be data traffic as well, but also transport layer
99	acknowledgments or a shim reachability keepalive if there is no
100	other traffic. This way, it is no longer possible to have traffic
101	in only one direction, so whenever there is data traffic going out,
102	but there are no return packets, there must be a failure, so the
103	full path exploration mechanism is started.

105	A more detailed description of the current pair reachability
106	evaluation mechanism:

108	1. The base timing unit for this mechanism is named ShimKeepT.
109	   Until a negotiation mechanism to negotiate different values for
110	   ShimKeepT becomes available, a value of 10 for ShimKeepT MUST be
111	   used.

113	2. Whenever outgoing packets are generated that are part of a shim
114	   context, one of two timestamps belonging to the shim context is
115	   updated: the timestamp for outgoing data packets, or the timestamp
116	   for outgoing non-data packets. The difference between the two is
117	   that data packets are packets that should generate return traffic.
118	   The host should use the information available to it to determine
119	   whether a packet is a data or a non-data packet. Examples of
120	   non-data packets are TCP ACKs and shim keepalive packets. If there
121	   is any doubt, a packet should be considered a data packet.

123	3. Whenever incoming packets are received that are part of a shim
124	   context, one of two timestamps belonging to the shim context is
125	   updated: the timestamp for incoming data packets, or the timestamp
126	   of incoming non-data packets. For incoming packets, it's less
127	   critical that packets are labeled as data or non-data correctly. In
128	   the absence of better information, hosts may assume that any IPv6
129	   packet with a total length field with a value of 20 or lower is a
130	   non-data packet.

132	4. ShimKeepT seconds after the last data packet has been received
133	   for a context, and if no other packet has been sent within this
134	   context since the data packet has been received, a shim keepalive
135	   packet is generated for the context in question and transmitted to
136	   the correspondent. The shim keepalive packet consists of an IPv6
137	   header and a shim header containing the context tag, but no
138	   subsequent headers. Intermediate headers may be present between the
139	   IPv6 and shim headers. A host may send the shim keepalive after
140	   fewer than ShimKeepT seconds if implementation considerations
141	   warrant this. The average time after which shim keepalives are sent
142	   must be at least ShimKeepT / 2 seconds. After potentially sending a
143	   single shim keepalive, no additional shim keepalives are sent until
144	   a data packet is received within this shim context. If the shim
145	   keepalive wasn't sent because a data or non-data packet was sent
146	   since the last received data packet, no shim keepalives are sent.

148	5. When after a timeout period since the last transmission of a
149	   data packet no packets were received from the correspondent within
150	   this context, a full reachability exploration is started. The
151	   timeout period is ShimKeepT seconds plus additional time to
152	   accommodate for a round trip and regular variations in
153	   network-related functions. In the absence of better information, a
154	   timeout of at least ShimKeepT + 2 seconds but no more than
155	   ShimKeepT + 5 seconds is recommended.

157	3 Address pair exploration

159	In its essence, address pair exploration is very simple: just send
160	probes using every possible address pair, wait for something to
161	come back and possibly consider the round trip time. In practice,
162	testing the full combination of all source addresses and all
163	destination addresses is very undesirable because of the large
164	number of packets involved. This can be especially harmful when a
165	lot of hosts on a link start doing this for many of their
166	correspondents at the same time when there is a failure further
167	upstream.

169	In order to arrive at a desired outcome more quickly and with less
170	packets, and also to accommodate traffic engineering needs, we'll
171	assume a model where each address (source or destination) has two
172	preference values: p1 and p2. Addresses within the same set (source
173	or destination) are ranked by their p1 value, where a higher p1
174	means that the address is more preferred. When there are multiple
175	addresses with the same p1 value, an address is selected at random
176	from the group with the same p1 value, where the likelihood of
177	selecting any given address is relative to its p2 value compared to
178	the sum of all p2 values. So if addresses A, B and C have the same
179	p1 value and p2 values of 10, 30 and 60 for a total of 100, the
180	chance that A is selected is 10%, the chance that B is selected is
181	30% and the chance that C is selected is 60%.

183	Note that preference information may be related to type of service.
184	So different context with different type of service requirements
185	may see different p1 and p2 values for a given address.

187	When a host suspects that there is a failure for a context, it
188	gathers the set of possible source addresses and the set of
189	possible destination addresses. Both sets are ordered such that
190	each next address has an equal or lower p1 value. Addresses with
191	the same p1 value are further ordered as per any heuristics that
192	the host may employ, such as longest prefix matches on known
193	working and/or known not working addresses along with the p2 value.
194	The p2 value is considered relatively weak, and breaking p2
195	ordering is allowed if there is a sufficient reason for this.
196	However, in the absence of other information, p2 ordering should be
197	used. P1 ordering overrules any other information except a recent
198	reachability failure for the address in question. In addition to
199	this, the most recently used address is put in front of the list.

201	From the lists of eligible source and destination addresses, the
202	host creates a list of source/destination address pairs, along with
203	a combined preference value for this address pair. The calculation
204	of the preference value is implementation specific, with the only
205	requirement being that when one address pair has a higher p1 for
206	both the source and destination address than another pair, the pair
207	with the higher p1 values also has a higher combined pair
208	preference value.

210	The list of address pairs from different contexts is combined into
211	a host-wide list of address pairs. The preference values are
212	updated to take into consideration the number of contexts that is
213	interested in the pair. The specifics of calculating the resulting
214	host-wide preference value are left upto the implementation, but
215	implementations SHOULD try, within reason, to avoid using address
216	pairs with lower p1 values when pairs with higher p1 values are
217	available for a context. Context-specific address pair preferences
218	may be normalized prior to calculating host-wide address pair
219	preference values. (So when context A has pairs P and Q with p1
220	values 10 and 1, while context B has pairs R and S with p1 values 7
221	and 4, the values for P and R are changed to 2 and the values for Q
222	and S to 1.)

224	The host now starts probing address pairs, in order from the pair
225	with the highest pair preference to the pair with the lowest pair
226	preference. When all address pairs have been tested, testing
227	restarts from the pair with the highest preference. New pairs that
228	become available are put in the list before pairs that have been
229	probed already, regardless of the preference values. However, both
230	the group of address pairs that haven't been probed and the group
231	of address pairs that have may be reordered to reflect the
232	preference values, as long as reordering is done such that
233	starvation doesn't occur.

235	When a probe is answered by the correspondent, the context that use
236	the address pair in question are informed so they can start
237	remapping address is outgoing packets to the pair in question. (All
238	of this also happens when there is a working pair but an address
239	pair with at least one address with a higher preference is
240	determined to work.) At this point, the context updates its list of
241	address pairs to probe by removing all pairs where either the
242	source address has a lower p1 value than the p1 value of the now
243	working source address, or the destination address has a lower p1
244	value than the p1 value of the now working destination address.
245	Additionally, all address pairs where the p1 values for the source
246	and destination addresses match the respective p1 values of the
247	source and destination addresses in the now working pair are
248	removed from the list. The host-wide list of address pair to probe
249	is updated to reflect the removal of lower or equal priority
250	addresses, so probing will only continue for pairs where at least
251	one address has a higher p1 than the currently working pair.

253	The time between probes (ShimProbeT) must be chosen such that the
254	number of probes is limited to 60 per 300 second period. When no
255	probes have been sent for some time, an implementation may send the
256	initial group of probes at a fairly aggressive rate. For instance,
257	when no probes have been sent for 60 seconds, a host may send a
258	second probe 200 ms after the first one, and increase the
259	ShimProbeT by a factor 1.25 after every probe, until ShimProbeT
260	reaches 5 seconds. This results in sending 5 probes in the first 2
261	seconds and/or 14 probes within the first 20 seconds after a
262	failure. After that, there is one probe every 5 seconds.

264	When a context didn't see any outgoing data packets (see section 2)
265	for four minutes, it removes all its address pairs from the
266	host-wide list of address pairs.

268	4 Address pair exploration packet format

270	The address pair exploration packet may be encapsulated in
271	different ways. An obvious way is inside a shim header. The address
272	pair exploration packet contains the following information:

274	- A type field that is at least 8 bits long
275	- An 8 bit "number of probes sent" field
276	- An 8 bit "number of probes received" field
277	- An 8 bit "options length" field
278	- One or more sent probes (see below)
279	- Zero or more received probes (see below)
280	- Zero or more bytes of option data

282	There is currently one bit in the type field defined: the reply
283	requested bit. If this bit is set, the other side should send a
284	probe in reply to this probe.

286	The option data contains zero or more options in the following
287	format:

289	- An 8 bit option type
290	- An 8 bit option length
291	- Zero or more bytes of data in this option

293	Sent and received probes contain data in the following format:

295	- Source locator/address (128 bits)
296	- Destination locator/address (128 bits)
297	- Sent timestamp (32 bits in ms resolution relative to private epoch)
298	- Time between reception and retransmission (32 bits in ms resolution,
299	  0 on first transmission)
300	- Nonce (32 bits)
301	- Sequence number (32 bits)

303	The first and only mandatory sent probe structure contains the
304	addresses that are present in the current IPv6 packet along with a
305	timestamp for the current time. Additional probe structures contain
306	copies of earlier probes, presumably toward different addresses,
307	with the appropriate field indicating how long ago the probe in
308	question was sent. The received probes are copies of the last seen
309	probes from the other side.

311	Note that an application must be able to infer which addresses
312	belong to the same host in order to perform this probing correctly

314	5 NAT and firewall considerations

316	Since shim6 is chartered for IPv6 solutions only, and NAT
317	compatibility is not expected, and by most people, not desired in
318	IPv6, there is no requirement for this protocol to pass through
319	Network Address Translation devices. However, the protocol may be
320	applicable outside shim6, making NAT compatibility desirable.

322	It is absolutely essential that the shim6 negotiations and the
323	reachability detection packets are passed through filters or
324	firewalls wherever application packets are passed through. If the
325	shim6 negotiation and reachability detection packets are filtered
326	out, shim6 can't be used.

328	A more complex situation arises when the shim6 negotiation packets
329	pass through a firewall, but the reachability detection packets are
330	blocked. To avoid this complexity, it's highly desirable to make
331	the shim6 negotiation and reachability detection part of the same
332	protocol, so either both are allowed through or both are blocked.
333	However, the same is true if this reachability detection mechanism
334	is used in other protocols. This makes it desirable to define the
335	reachability detection protocol such that it can be embedded in
336	other protocols.

338	Since firewalls are in wide use, it's important to consider whether
339	a new protocol will be able to pass through most firewalls without
340	requiring changes to the filter configuration. On the other hand,
341	it may not be possible to come up with a protocol that would be
342	allowed through a large percentage of all firewalls without
343	changes, so extra effort in this area may produce limited results.
344	Also, in the long run firewall configuration will presumably be
345	changed, so any compromises would only have short term benefits but
346	long term downsides.

348	6 Security considerations

350	To avoid exposing information (even if it's just the fact that an
351	address is reachable), hosts will probably want to limit themselves
352	to taking part in reachability detection with known correspondents.
353	This means that there must be identifying information and a nonce
354	that is at least hard to guess but easy to check in all
355	reachability detection packets.

357	4 Document and author information

359	This document expires April, 2006. The latest version will always
360	be available at http://www.muada.com/drafts/. Comments are welcome
361	at:

363	    Iljitsch van Beijnum

365	    Email: iljitsch@muada.com

367	Intellectual Property Statement

369	   The IETF takes no position regarding the validity or scope of any
370	   Intellectual Property Rights or other rights that might be claimed to
371	   pertain to the implementation or use of the technology described in
372	   this document or the extent to which any license under such rights
373	   might or might not be available; nor does it represent that it has
374	   made any independent effort to identify any such rights.  Information
375	   on the procedures with respect to rights in RFC documents can be
376	   found in BCP 78 and BCP 79.

378	   Copies of IPR disclosures made to the IETF Secretariat and any
379	   assurances of licenses to be made available, or the result of an
380	   attempt made to obtain a general license or permission for the use of
381	   such proprietary rights by implementers or users of this
382	   specification can be obtained from the IETF on-line IPR repository at
383	   http://www.ietf.org/ipr.

385	   The IETF invites any interested party to bring to its attention any
386	   copyrights, patents or patent applications, or other proprietary
387	   rights that may cover technology that may be required to implement
388	   this standard.  Please address the information to the IETF at
389	   ietf-ipr@ietf.org.

391	Disclaimer of Validity

393	   This document and the information contained herein are provided on an
394	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
395	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
396	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
397	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
398	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
399	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

401	Copyright Statement

403	   Copyright (C) The Internet Society (2005).  This document is subject
404	   to the rights, licenses and restrictions contained in BCP 78, and
405	   except as set forth therein, the authors retain all their rights.

407	Acknowledgment

409	   Funding for the RFC Editor function is currently provided by the
410	   Internet Society.