< draft-thaler-multipath-04.txt   draft-thaler-multipath-05.txt >
Internet Engineering Task Force D. Thaler Internet Engineering Task Force D. Thaler
INTERNET-DRAFT Microsoft INTERNET-DRAFT Microsoft
Expires December 1999 C. Hopps Expires July 2000 C. Hopps
Merit Network Merit Network
21 June 1999 7 February 2000
Multipath Issues in Unicast and Multicast Next-Hop Selection Multipath Issues in Unicast and Multicast Next-Hop Selection
<draft-thaler-multipath-04.txt> <draft-thaler-multipath-05.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with all This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026. provisions of Section 10 of RFC2026.
This document is an Internet Draft. Internet Drafts are working This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas, and documents of the Internet Engineering Task Force (IETF), its Areas, and
its Working Groups. Note that other groups may also distribute working its Working Groups. Note that other groups may also distribute working
documents as Internet Drafts. documents as Internet Drafts.
Internet Drafts are valid for a maximum of six months and may be Internet Drafts are valid for a maximum of six months and may be
updated, replaced, or obsoleted by other documents at any time. It is updated, replaced, or obsoleted by other documents at any time. It is
inappropriate to use Internet Drafts as reference material or to cite inappropriate to use Internet Drafts as reference material or to cite
them other than as a "work in progress". them other than as a "work in progress".
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (1999). All Rights Reserved. Copyright (C) The Internet Society (2000). All Rights Reserved.
1. Introduction 1. Introduction
Various routing protocols, including OSPF [1] and ISIS, explicitly allow Various routing protocols, including OSPF [1] and ISIS, explicitly allow
"Equal-Cost Multipath" routing. Some router implementations also allow "Equal-Cost Multipath" routing. Some router implementations also allow
equal-cost multipath usage with RIP and other routing protocols. Using equal-cost multipath usage with RIP and other routing protocols. Using
equal-cost multipath means that if multiple equal-cost routes to the equal-cost multipath means that if multiple equal-cost routes to the
Draft Multipath Issues February 2000
same destination exist, they can be discovered and used to provide load same destination exist, they can be discovered and used to provide load
balancing among redundant paths. balancing among redundant paths.
The effect of multipath routing on a forwarder is that the forwarder The effect of multipath routing on a forwarder is that the forwarder
potentially has several next-hops for any given destination and must use potentially has several next-hops for any given destination and must use
some method to choose which next-hop should be used for a given data some method to choose which next-hop should be used for a given data
packet. This memo summarizes current practices, problems, and solutions. packet. This memo summarizes current practices, problems, and
solutions.
2. Concerns 2. Concerns
Several router implementations allow multipath forwarding. This is Several router implementations allow multipath forwarding. This is
sometimes done naively via round-robin, where each packet matching a sometimes done naively via round-robin, where each packet matching a
given destination route is forwarded using the subsequent next-hop, in a given destination route is forwarded using the subsequent next-hop, in a
round-robin fashion. This does provide a form of load balancing, but round-robin fashion. This does provide a form of load balancing, but
there are several problems with approaches such as round-robin or there are several problems with approaches such as round-robin or
random: random:
Variable Path MTU Variable Path MTU
Since each of the redundant paths may have a different MTU, this Since each of the redundant paths may have a different MTU, this
means that the overall path MTU can change on a packet-by-packet means that the overall path MTU can change on a packet-by-packet
basis, negating the usefulness of path MTU discovery. basis, negating the usefulness of path MTU discovery.
skipping to change at page 2, line 44 skipping to change at page 3, line 4
one. When three or more packets are received before a "late" one. When three or more packets are received before a "late"
packet, TCP enters a mode called "fast-retransmit" [6] which packet, TCP enters a mode called "fast-retransmit" [6] which
consumes extra bandwidth (which could potentially cause more loss, consumes extra bandwidth (which could potentially cause more loss,
decreasing throughput) as it attempts to unnecessarily retransmit decreasing throughput) as it attempts to unnecessarily retransmit
the delayed packet(s). Hence, reordering can be detrimental to the delayed packet(s). Hence, reordering can be detrimental to
network performance. network performance.
Debugging Debugging
Common debugging utilities such as ping and traceroute are much Common debugging utilities such as ping and traceroute are much
less reliable in the presence of multiple paths and may even less reliable in the presence of multiple paths and may even
Draft Multipath Issues February 2000
present completely wrong results. present completely wrong results.
In multicast routing, the problem with multiple paths is that multicast In multicast routing, the problem with multiple paths is that multicast
routing protocols prevent loops and duplicates by constructing a single routing protocols prevent loops and duplicates by constructing a single
tree to all receivers of the same group address. Multicast routing tree to all receivers of the same group address. Multicast routing
protocols deployed today (DVMRP, PIM-DM, PIM-SM) [2] construct protocols deployed today (DVMRP, PIM-DM, PIM-SM) [2] construct shortest-
shortest-path trees rooted at either the source, or another router known path trees rooted at either the source, or another router known as a
as a Core or Rendezvous Point. Hence, the way they ensure that Core or Rendezvous Point. Hence, the way they ensure that duplicates
duplicates will not arise is that a given tree must use only a single will not arise is that a given tree must use only a single next-hop
next-hop towards the root of the tree. towards the root of the tree.
3. Requirements 3. Requirements
In the remainder of this document, we will use the term "flow" to In the remainder of this document, we will use the term "flow" to
represent the granularity at which the router keeps state (if at all) represent the granularity at which the router keeps state (if at all)
for classes of traffic. The exact definition of a flow may depend on for classes of traffic. The exact definition of a flow may depend on
the actual implementation. For example, a flow might be identified the actual implementation. For example, a flow might be identified
solely by destination address, or it might be identified by (source solely by destination address, or it might be identified by (source
address, destination address, protocol id) triplet. Hence "flow" is not address, destination address, protocol id) triplet. Hence "flow" is not
necessarily synonymous with the term "microflow" as used in RFC 2474 necessarily synonymous with the term "microflow" as used in RFC 2474
skipping to change at page 3, line 46 skipping to change at page 4, line 4
Two additional features are desirable: Two additional features are desirable:
Minimal disruption Minimal disruption
When multipath is used, meaning that multiple routes contribute When multipath is used, meaning that multiple routes contribute
valid next-hops, the chances are higher of routes being added and valid next-hops, the chances are higher of routes being added and
deleted from consideration than when only the "best" route is used deleted from consideration than when only the "best" route is used
(in which case metric changes in alternate routes have no effect on (in which case metric changes in alternate routes have no effect on
traffic paths). Since a higher number of routes may actually be traffic paths). Since a higher number of routes may actually be
used for forwarding when multipath is in use, the potential for used for forwarding when multipath is in use, the potential for
packet reordering and packet loss due to route flaps can be much packet reordering and packet loss due to route flaps can be much
Draft Multipath Issues February 2000
greater than when not using multipath. Hence, it is desirable to greater than when not using multipath. Hence, it is desirable to
minimize the number of active flows affected by the addition or minimize the number of active flows affected by the addition or
deletion of another next-hop. deletion of another next-hop.
Fast implementation Fast implementation
The amount of additional computation required to forward a packet The amount of additional computation required to forward a packet
should be small. For example, when doing round-robin, this should be small. For example, when doing round-robin, this
computation might consist of incrementing (modulo the number of computation might consist of incrementing (modulo the number of
next-hops) a next-hop index. next-hops) a next-hop index.
skipping to change at page 4, line 26 skipping to change at page 4, line 31
forwarding. forwarding.
Modulo-N Hash Modulo-N Hash
To select a next-hop from the list of N next-hops, the router To select a next-hop from the list of N next-hops, the router
performs a modulo-N hash over the packet header fields that performs a modulo-N hash over the packet header fields that
identify a flow. This has the advantage of being fast, at the identify a flow. This has the advantage of being fast, at the
expense of (N-1)/N of all flows changing paths whenever a next-hop expense of (N-1)/N of all flows changing paths whenever a next-hop
is added or removed. is added or removed.
Hash-Threshold Hash-Threshold
The router first selects a key by performing a hash (e.g., modulo-K The router first selects a key by performing a hash over the packet
where K is large, or CRC16) over the packet header fields that header fields that identify the flow. The N next-hops have been
identify the flow. The N next-hops have been assigned unique assigned unique regions in the hash function's output space. By
regions in the key space. By comparing the key against region comparing the hash value against region boundaries the router can
boundaries the router can determine which region the key belongs to determine which region the hash value belongs to and thus which
and thus which next-hop to use. This method has the advantage of next-hop to use. This method has the advantage of only affecting
only affecting flows near the region boundaries (or thresholds) flows near the region boundaries (or thresholds) when next-hops are
when next-hops are added or removed. Hash-threshold's lookup can added or removed. For ECMP hash-threshold's lookup can be done
be done in software using a binary search yielding O(logN), or in with a simple division (hash_value / fixed_region_size). When a
hardware in parallel for O(1). When a next-hop is added or next-hop is added or removed, between 1/4 and 1/2 of all flows
removed, between 1/4 and 1/2 of all flows change paths. An analysis change paths. An analysis of this method can be found in [3].
of this method can be found in [3].
Highest Random Weight (HRW) Highest Random Weight (HRW)
The router computes a key for EACH next-hop by performing a hash The router computes a key for EACH next-hop by performing a hash
over the packet header fields that identify the flow, as well as over the packet header fields that identify the flow, as well as
over the address of the next-hop. The router then chooses the over the address of the next-hop. The router then chooses the
Draft Multipath Issues February 2000
next-hop with the highest resulting key value [4]. This has the next-hop with the highest resulting key value [4]. This has the
advantage of minimizing the number of flows affected by a next-hop advantage of minimizing the number of flows affected by a next-hop
addition or deletion (only 1/N of them), but is approximately N addition or deletion (only 1/N of them), but is approximately N
times as expensive as a modulo-N hash. times as expensive as a modulo-N hash.
The applicability of these three alternatives depends on (at least) two The applicability of these three alternatives depends on (at least) two
factors: whether the forwarder maintains per-flow state, and how factors: whether the forwarder maintains per-flow state, and how
precious CPU is to a multipath forwarder. precious CPU is to a multipath forwarder.
Some routers may maintain per-flow state for reasons other than for Some routers may maintain per-flow state for reasons other than for
supporting multipath. For example, routers typically keep per-flow supporting multipath. For example, routers typically keep per-flow
state for multicast flows so that they can maintain the list of state for multicast flows so that they can maintain the list of
interfaces to which packets in the flow should be copied. interfaces to which packets in the flow should be copied.
If per-flow state is maintained in a multipath forwarder, then If per-flow state is maintained in a multipath forwarder, then
computation of the next-hop can be done by the router at state creation computation of the next-hop can be done by the router at state creation
time. This entails no additional computations at packet forwarding time time. This entails no additional computations at packet forwarding time
compared with normal forwarding to a single next-hop, since the next-hop compared with normal forwarding to a single next-hop, since the next-hop
is precomputed. In this case, any method can be used, including round- is precomputed. In this case, any method can be used, including round-
robin, random, modulo-N, hash-threshold or HRW. Hash functions such as robin, random, modulo-N, hash-threshold or HRW. Hash functions such as
modulo-N, hash-threshold and HRW are better if the forwarder state may modulo-N, hash-threshold and HRW are better if the forwarder state may
be deleted for any reason during the lifetime of a flow since subsequent be deleted for any reason during the lifetime of a flow since subsequent
next-hop computations by the router will always select the same path. next-hop computations by the router will always select the same path.
This also improves the usefulness of debugging utilities such as This also improves the usefulness of debugging utilities such as
traceroute. Finally, to maximize the stability of paths (and hence the traceroute. Finally, to maximize the stability of paths (and hence the
usefulness of traceroute, etc.), the use of HRW is recommended over the usefulness of traceroute, etc.), the use of HRW is recommended over the
other methods mentioned herein. other methods mentioned herein.
skipping to change at page 6, line 5 skipping to change at page 6, line 5
hash-threshold is recommended over the other methods mentioned herein. hash-threshold is recommended over the other methods mentioned herein.
4.1. Unicast Forwarding 4.1. Unicast Forwarding
Depending on the implementation, unicast forwarding may or may not keep Depending on the implementation, unicast forwarding may or may not keep
per-flow state. We recommend that where forwarder implementations keep per-flow state. We recommend that where forwarder implementations keep
flow state, routers should use HRW at state creation time (and next-hop flow state, routers should use HRW at state creation time (and next-hop
deletion time) to select the next-hop, and that forwarders without per- deletion time) to select the next-hop, and that forwarders without per-
flow state use hash-threshold. flow state use hash-threshold.
Draft Multipath Issues February 2000
4.2. Multicast Forwarding 4.2. Multicast Forwarding
Today's multicast forwarding engines use a cache of forwarding entries Today's multicast forwarding engines use a cache of forwarding entries
indexed by group (or group prefix) and source (or source prefix). This indexed by group (or group prefix) and source (or source prefix). This
means that today's multicast forwarder's always keep per-flow state, means that today's multicast forwarder's always keep per-flow state,
although for some multicast routing protocols, the "flow" may be fairly although for some multicast routing protocols, the "flow" may be fairly
coarse (e.g., traffic from all sources to the same destination). Since coarse (e.g., traffic from all sources to the same destination). Since
per-flow state is kept by the forwarder, it is recommended that the per-flow state is kept by the forwarder, it is recommended that the
router always use HRW to select the next-hop. router always use HRW to select the next-hop.
skipping to change at page 6, line 42 skipping to change at page 6, line 44
A related problem occurs when multiple parallel links are used between A related problem occurs when multiple parallel links are used between
the same pair of routers. A common solution is to bundle the two links the same pair of routers. A common solution is to bundle the two links
together into a "super"-link when is then used for routing. For together into a "super"-link when is then used for routing. For
multicast forwarding, this results in the two links being reduced to a multicast forwarding, this results in the two links being reduced to a
single next-hop (over the combined link) which can be used to prevent single next-hop (over the combined link) which can be used to prevent
duplicates. When a unicast or multicast packet is queued to the duplicates. When a unicast or multicast packet is queued to the
combined link, some method, such as those discussed earlier, is still combined link, some method, such as those discussed earlier, is still
required to determine the physical link on which to transmit the packet. required to determine the physical link on which to transmit the packet.
If the parallel links are identical, then most of the concerns discussed If the parallel links are identical, then most of the concerns discussed
in this document are avoided with the combined link. The exception is in this document are avoided with the combined link. The exception is
packet reordering, which can still occur with round-robin, adversely packet reordering, which can still occur with round-robin, adversely
affecting TCP. affecting TCP.
Draft Multipath Issues February 2000
7. Security Considerations 7. Security Considerations
This document discusses issues with various methods of choosing a next- This document discusses issues with various methods of choosing a next-
hop from among multiple valid next-hops. As such, it does not directly hop from among multiple valid next-hops. As such, it does not directly
impact the security of the Internet infrastructure or its applications. impact the security of the Internet infrastructure or its applications.
One issue that is worth mentioning, however, is that when next-hop One issue that is worth mentioning, however, is that when next-hop
selection is predictable, an attacker can synthesize traffic that will selection is predictable, an attacker can synthesize traffic that will
all hash the same, making it possible to launch a denial-of-service all hash the same, making it possible to launch a denial-of-service
attack that overloads a particular path. Since a special case of this attack that overloads a particular path. Since a special case of this
skipping to change at page 7, line 29 skipping to change at page 7, line 31
any single link. any single link.
8. References 8. References
[1] Moy, J., "OSPF Version 2", RFC 2178, July 1997. [1] Moy, J., "OSPF Version 2", RFC 2178, July 1997.
[2] Maufer, T., "Deploying IP Multicast in the Enterprise", Prentice- [2] Maufer, T., "Deploying IP Multicast in the Enterprise", Prentice-
Hall, 1998. Hall, 1998.
[3] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm",, Work [3] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm",, Work
in progress, April 1999. in progress, January 2000.
[4] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to [4] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to
Increase Hit Rates", IEEE/ACM Transactions on Networking, February Increase Hit Rates", IEEE/ACM Transactions on Networking, February
1998. 1998.
[5] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S., [5] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S.,
Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei, Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei,
"Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol
Specification", RFC 2362, June 1998. Specification", RFC 2362, June 1998.
[6] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion Control", [6] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion Control",
RFC 2581, April 1999. RFC 2581, April 1999.
[7] Nichols, K., Blake, S., Baker, F., and D. Black., "Definition of [7] Nichols, K., Blake, S., Baker, F., and D. Black., "Definition of
the Differentiated Services Field (DS Field) in the IPv4 and IPv6 the Differentiated Services Field (DS Field) in the IPv4 and IPv6
Headers", RFC 2474, December 1998. Headers", RFC 2474, December 1998.
Draft Multipath Issues February 2000
9. Authors' Addresses 9. Authors' Addresses
Dave Thaler Dave Thaler
Microsoft Microsoft
One Microsoft Way One Microsoft Way
Redmond, WA 98052 Redmond, WA 98052
Phone: +1 425 703 8835 Phone: +1 425 703 8835
EMail: dthaler@dthaler.microsoft.com EMail: dthaler@dthaler.microsoft.com
Christian E. Hopps Christian E. Hopps
Merit Network Merit Network
4251 Plymouth Road, Suite C. 4251 Plymouth Road, Suite C.
Ann Arbor, MI 48105 Ann Arbor, MI 48105
Phone: +1 734 936 0291 Phone: +1 734 936 0291
EMail: chopps@merit.edu EMail: chopps@merit.edu
10. Full Copyright Statement 10. Full Copyright Statement
Copyright (C) The Internet Society (1999). All Rights Reserved. Copyright (C) The Internet Society (2000). All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it or others, and derivative works that comment on or otherwise explain it or
assist in its implementation may be prepared, copied, published and assist in its implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction of any kind, distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are included provided that the above copyright notice and this paragraph are included
on all such copies and derivative works. However, this document itself on all such copies and derivative works. However, this document itself
may not be modified in any way, such as by removing the copyright notice may not be modified in any way, such as by removing the copyright notice
or references to the Internet Society or other Internet organizations, or references to the Internet Society or other Internet organizations,
except as needed for the purpose of developing Internet standards in except as needed for the purpose of developing Internet standards in
skipping to change at page 9, line 4 skipping to change at page 9, line 4
languages other than English. languages other than English.
The limited permissions granted above are perpetual and will not be The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns. revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an "AS This document and the information contained herein is provided on an "AS
IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR
Draft Multipath Issues February 2000
FITNESS FOR A PARTICULAR PURPOSE. FITNESS FOR A PARTICULAR PURPOSE.
 End of changes. 23 change blocks. 
31 lines changed or deleted 51 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/