idnits 2.17.1 

draft-thaler-multipath-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 9 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Abstract section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (21 June 1999) is 9076 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 2178 (ref. '1') (Obsoleted by RFC 2328)

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  ** Obsolete normative reference: RFC 2362 (ref. '5') (Obsoleted by RFC
     4601, RFC 5059)

  ** Obsolete normative reference: RFC 2581 (ref. '6') (Obsoleted by RFC 5681)


     Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                               D. Thaler
2	INTERNET-DRAFT                                                Microsoft
3	Expires December 1999                                          C. Hopps
4	                                                          Merit Network
5	                                                           21 June 1999

7	      Multipath Issues in Unicast and Multicast Next-Hop Selection
8	                    <draft-thaler-multipath-04.txt>

10	Status of this Memo

12	This document is an Internet-Draft and is in full conformance with all
13	provisions of Section 10 of RFC2026.

15	This document is an Internet Draft.  Internet Drafts are working
16	documents of the Internet Engineering Task Force (IETF), its Areas, and
17	its Working Groups.  Note that other groups may also distribute working
18	documents as Internet Drafts.

20	Internet Drafts are valid for a maximum of six months and may be
21	updated, replaced, or obsoleted by other documents at any time.  It is
22	inappropriate to use Internet Drafts as reference material or to cite
23	them other than as a "work in progress".

25	The list of current Internet-Drafts can be accessed at
26	http://www.ietf.org/ietf/1id-abstracts.txt

28	The list of Internet-Draft Shadow Directories can be accessed at
29	http://www.ietf.org/shadow.html.

31	Copyright Notice

33	Copyright (C) The Internet Society (1999).  All Rights Reserved.

35	1.  Introduction

37	Various routing protocols, including OSPF [1] and ISIS, explicitly allow
38	"Equal-Cost Multipath" routing.  Some router implementations also allow
39	equal-cost multipath usage with RIP and other routing protocols.  Using
40	equal-cost multipath means that if multiple equal-cost routes to the
41	same destination exist, they can be discovered and used to provide load
42	balancing among redundant paths.

44	The effect of multipath routing on a forwarder is that the forwarder
45	potentially has several next-hops for any given destination and must use
46	some method to choose which next-hop should be used for a given data
47	packet. This memo summarizes current practices, problems, and solutions.

49	2.  Concerns

51	Several router implementations allow multipath forwarding. This is
52	sometimes done naively via round-robin, where each packet matching a
53	given destination route is forwarded using the subsequent next-hop, in a
54	round-robin fashion.  This does provide a form of load balancing, but
55	there are several problems with approaches such as round-robin or
56	random:

58	Variable Path MTU
59	     Since each of the redundant paths may have a different MTU, this
60	     means that the overall path MTU can change on a packet-by-packet
61	     basis, negating the usefulness of path MTU discovery.

63	Variable Latencies
64	     Since each of the redundant paths may have a different latency
65	     involved, having packets take separate paths can cause packets to
66	     always arrive out of order, increasing delivery latency and
67	     buffering requirements.

69	     Packet reordering causes TCP to believe that loss has taken place
70	     when packets with higher sequence numbers arrive before an earlier
71	     one.   When three or more packets are received before a "late"
72	     packet, TCP enters a mode called "fast-retransmit" [6] which
73	     consumes extra bandwidth (which could potentially cause more loss,
74	     decreasing throughput) as it attempts to unnecessarily retransmit
75	     the delayed packet(s).  Hence, reordering can be detrimental to
76	     network performance.

78	Debugging
79	     Common debugging utilities such as ping and traceroute are much
80	     less reliable in the presence of multiple paths and may even
81	     present completely wrong results.

83	In multicast routing, the problem with multiple paths is that multicast
84	routing protocols prevent loops and duplicates by constructing a single
85	tree to all receivers of the same group address.  Multicast routing
86	protocols deployed today (DVMRP, PIM-DM, PIM-SM) [2] construct
87	shortest-path trees rooted at either the source, or another router known
88	as a Core or Rendezvous Point.  Hence, the way they ensure that
89	duplicates will not arise is that a given tree must use only a single
90	next-hop towards the root of the tree.

92	3.  Requirements

94	In the remainder of this document, we will use the term "flow" to
95	represent the granularity at which the router keeps state (if at all)
96	for classes of traffic.  The exact definition of a flow may depend on
97	the actual implementation.  For example, a flow might be identified
98	solely by destination address, or it might be identified by (source
99	address, destination address, protocol id) triplet.  Hence "flow" is not
100	necessarily synonymous with the term "microflow" as used in RFC 2474
101	[7], which also includes port numbers.  Indeed, including transport-
102	layer information in the next-hop selection process can actually be
103	problematic.  For example, if packets are fragmented, the transport-
104	layer information may not be available in every packet.  Furthermore,
105	having the choice of path depend on transport-layer fields may negate
106	the benefit of caching information such as MTU for use in subsequent
107	connections between the same endpoints.

109	All of the problems outlined in the previous section arise when packets
110	in the same unicast or multicast "flow" are split among multiple paths.
111	The natural solution is therefore to ensure that packets for the same
112	flow always use the same path.

114	Two additional features are desirable:

116	Minimal disruption
117	     When multipath is used, meaning that multiple routes contribute
118	     valid next-hops, the chances are higher of routes being added and
119	     deleted from consideration than when only the "best" route is used
120	     (in which case metric changes in alternate routes have no effect on
121	     traffic paths).  Since a higher number of routes may actually be
122	     used for forwarding when multipath is in use, the potential for
123	     packet reordering and packet loss due to route flaps can be much
124	     greater than when not using multipath.  Hence, it is desirable to
125	     minimize the number of active flows affected by the addition or
126	     deletion of another next-hop.

128	Fast implementation
129	     The amount of additional computation required to forward a packet
130	     should be small.  For example, when doing round-robin, this
131	     computation might consist of incrementing (modulo the number of
132	     next-hops) a next-hop index.

134	4.  Solutions

136	We now provide three possible methods for improving the performance of
137	multipath and then discuss their applicability to unicast and multicast
138	forwarding.

140	Modulo-N Hash
141	     To select a next-hop from the list of N next-hops, the router
142	     performs a modulo-N hash over the packet header fields that
143	     identify a flow.  This has the advantage of being fast, at the
144	     expense of (N-1)/N of all flows changing paths whenever a next-hop
145	     is added or removed.

147	Hash-Threshold
148	     The router first selects a key by performing a hash (e.g., modulo-K
149	     where K is large, or CRC16) over the packet header fields that
150	     identify the flow.  The N next-hops have been assigned unique
151	     regions in the key space. By comparing the key against region
152	     boundaries the router can determine which region the key belongs to
153	     and thus which next-hop to use.  This method has the advantage of
154	     only affecting flows near the region boundaries (or thresholds)
155	     when next-hops are added or removed.  Hash-threshold's lookup can
156	     be done in software using a binary search yielding O(logN), or in
157	     hardware in parallel for O(1).  When a next-hop is added or
158	     removed, between 1/4 and 1/2 of all flows change paths. An analysis
159	     of this method can be found in [3].

161	Highest Random Weight (HRW)
162	     The router computes a key for EACH next-hop by performing a hash
163	     over the packet header fields that identify the flow, as well as
164	     over the address of the next-hop.  The router then chooses the
165	     next-hop with the highest resulting key value [4].  This has the
166	     advantage of minimizing the number of flows affected by a next-hop
167	     addition or deletion (only 1/N of them), but is approximately N
168	     times as expensive as a modulo-N hash.

170	The applicability of these three alternatives depends on (at least) two
171	factors: whether the forwarder maintains per-flow state, and how
172	precious CPU is to a multipath forwarder.

174	Some routers may maintain per-flow state for reasons other than for
175	supporting multipath.  For example, routers typically keep per-flow
176	state for multicast flows so that they can maintain the list of
177	interfaces to which packets in the flow should be copied.

179	If per-flow state is maintained in a multipath forwarder, then
180	computation of the next-hop can be done by the router at state creation
181	time. This entails no additional computations at packet forwarding time
182	compared with normal forwarding to a single next-hop, since the next-hop
183	is precomputed.  In this case, any method can be used, including round-
184	robin, random, modulo-N, hash-threshold or HRW.  Hash functions such as
185	modulo-N, hash-threshold and HRW are better if the forwarder state may
186	be deleted for any reason during the lifetime of a flow since subsequent
187	next-hop computations by the router will always select the same path.
188	This also improves the usefulness of debugging utilities such as
189	traceroute.  Finally, to maximize the stability of paths (and hence the
190	usefulness of traceroute, etc.), the use of HRW is recommended over the
191	other methods mentioned herein.

193	If per-flow state is not maintained by the forwarder, then using
194	multiple next-hops requires that the next-hop be calculated at packet
195	arrival time.  When CPU is more precious than stability of flow paths,
196	hash-threshold is recommended over the other methods mentioned herein.

198	4.1.  Unicast Forwarding

200	Depending on the implementation, unicast forwarding may or may not keep
201	per-flow state.  We recommend that where forwarder implementations keep
202	flow state, routers should use HRW at state creation time (and next-hop
203	deletion time) to select the next-hop, and that forwarders without per-
204	flow state use hash-threshold.

206	4.2.  Multicast Forwarding

208	Today's multicast forwarding engines use a cache of forwarding entries
209	indexed by group (or group prefix) and source (or source prefix).  This
210	means that today's multicast forwarder's always keep per-flow state,
211	although for some multicast routing protocols, the "flow" may be fairly
212	coarse (e.g., traffic from all sources to the same destination).  Since
213	per-flow state is kept by the forwarder, it is recommended that the
214	router always use HRW to select the next-hop.

216	Routers using explicit-joining protocols such as PIM-SM [5] should thus
217	use the multipath information when determining to which neighbor a join
218	message should be sent.  For example, when multiple next-hops exist for
219	a given Rendezvous Point (RP) toward which a (*,G) Join should be sent,
220	it is recommended that HRW be used to select the next-hop to use for
221	each group.

223	5.  Applicability

225	The algorithms discussed above (except round-robin) all rely on some
226	form of hash function.  Equal flow distribution is achieved when the
227	hash function is uniformly distributed.  Since the commonly used hash
228	functions only become uniformly distributed when the number of inputs is
229	relatively large, these algorithms are more applicable to routers used
230	to route many flows, than in, for example, a small business setting.

232	6.  Redundant Parallel Links

234	A related problem occurs when multiple parallel links are used between
235	the same pair of routers.  A common solution is to bundle the two links
236	together into a "super"-link when is then used for routing.  For
237	multicast forwarding, this results in the two links being reduced to a
238	single next-hop (over the combined link) which can be used to prevent
239	duplicates.  When a unicast or multicast packet is queued to the
240	combined link, some method, such as those discussed earlier, is still
241	required to determine the physical link on which to transmit the packet.
242	If the parallel links are identical, then most of the concerns discussed
243	in this document are avoided with the combined link. The exception is
244	packet reordering, which can still occur with round-robin, adversely
245	affecting TCP.

247	7.  Security Considerations

249	This document discusses issues with various methods of choosing a next-
250	hop from among multiple valid next-hops.  As such, it does not directly
251	impact the security of the Internet infrastructure or its applications.

253	One issue that is worth mentioning, however, is that when next-hop
254	selection is predictable, an attacker can synthesize traffic that will
255	all hash the same, making it possible to launch a denial-of-service
256	attack that overloads a particular path.  Since a special case of this
257	is when the same (single) next-hop is always selected, such an attack is
258	easiest when multipath is not being used.  Introducing multipath routing
259	can make such an attack more difficult; the more unpredictable the hash
260	is, the harder it becomes to conduct a denial-of-service attack against
261	any single link.

263	8.  References

265	[1]  Moy, J., "OSPF Version 2", RFC 2178, July 1997.

267	[2]  Maufer, T., "Deploying IP Multicast in the Enterprise", Prentice-
268	     Hall, 1998.

270	[3]  Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm",, Work
271	     in progress, April 1999.

273	[4]  Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to
274	     Increase Hit Rates", IEEE/ACM Transactions on Networking, February
275	     1998.

277	[5]  Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S.,
278	     Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei,
279	     "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol
280	     Specification", RFC 2362, June 1998.

282	[6]  Allman, M., Paxson, V., and W. Stevens, "TCP Congestion Control",
283	     RFC 2581, April 1999.

285	[7]  Nichols, K., Blake, S., Baker, F., and D. Black., "Definition of
286	     the Differentiated Services Field (DS Field) in the IPv4 and IPv6
287	     Headers", RFC 2474, December 1998.

289	9.  Authors' Addresses

291	    Dave Thaler
292	    Microsoft
293	    One Microsoft Way
294	    Redmond, WA  98052
295	    Phone: +1 425 703 8835
296	    EMail: dthaler@dthaler.microsoft.com

298	    Christian E. Hopps
299	    Merit Network
300	    4251 Plymouth Road, Suite C.
301	    Ann Arbor, MI  48105
302	    Phone: +1 734 936 0291
303	    EMail: chopps@merit.edu

305	10.  Full Copyright Statement

307	Copyright (C) The Internet Society (1999).  All Rights Reserved.

309	This document and translations of it may be copied and furnished to
310	others, and derivative works that comment on or otherwise explain it or
311	assist in its implementation may be prepared, copied, published and
312	distributed, in whole or in part, without restriction of any kind,
313	provided that the above copyright notice and this paragraph are included
314	on all such copies and derivative works.  However, this document itself
315	may not be modified in any way, such as by removing the copyright notice
316	or references to the Internet Society or other Internet organizations,
317	except as needed for the purpose of developing Internet standards in
318	which case the procedures for copyrights defined in the Internet
319	languages other than English.

321	The limited permissions granted above are perpetual and will not be
322	revoked by the Internet Society or its successors or assigns.

324	This document and the information contained herein is provided on an "AS
325	IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
326	FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
327	LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
328	INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR
329	FITNESS FOR A PARTICULAR PURPOSE.