< draft-hopps-ecmp-algo-analysis-03.txt   draft-hopps-ecmp-algo-analysis-04.txt >
Internet Engineering Task Force Christian E. Hopps Internet Engineering Task Force Christian E. Hopps
INTERNET-DRAFT Merit Network INTERNET-DRAFT Merit Network
Expires September 1999 13 April 1999 Expires July 2000 7 February 2000
Analysis of an Equal-Cost Multi-Path Algorithm Analysis of an Equal-Cost Multi-Path Algorithm
<draft-hopps-ecmp-algo-analysis-03.txt> <draft-hopps-ecmp-algo-analysis-04.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 33 skipping to change at page 1, line 32
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (1999). All Rights Reserved. Copyright (C) The Internet Society (2000). All Rights Reserved.
Draft Analysis of an ECMP Algorithm February 2000
Abstract Abstract
Equal-cost multi-path (ECMP) is a routing technique for routing Equal-cost multi-path (ECMP) is a routing technique for routing
packets along multiple paths of equal cost. The forwarding engine packets along multiple paths of equal cost. The forwarding engine
identifies paths by next-hop. When forwarding a packet the router identifies paths by next-hop. When forwarding a packet the router
must decide which next-hop (path) to use. This document gives an must decide which next-hop (path) to use. This document gives an
analysis of one method for making that decision. The analysis analysis of one method for making that decision. The analysis
includes the performance of the algorithm and the disruption caused includes the performance of the algorithm and the disruption caused
by changes to the set of next-hops. by changes to the set of next-hops.
1. Hash-Threshold 1. Hash-Threshold
One method for determining which next-hop to use when routing with One method for determining which next-hop to use when routing with
ECMP can be called hash-threshold. The router first selects a key by ECMP can be called hash-threshold. The router first selects a key by
performing a hash (e.g., modulo-K where K is large, or CRC16) over performing a hash (e.g., CRC16) over the packet header fields that
the packet header fields that identify a flow. The N next-hops have identify a flow. The N next-hops have been assigned unique regions in
been assigned unique regions in the key space. The router uses the the key space. The router uses the key to determine which region and
key to determine which region and thus which next-hop to use. thus which next-hop to use.
As an example of hash-threshold, upon receiving a packet the router As an example of hash-threshold, upon receiving a packet the router
performs a CRC16 on the packet's header fields that define the flow performs a CRC16 on the packet's header fields that define the flow
(e.g., the source and destination fields of the packet), this is the (e.g., the source and destination fields of the packet), this is the
key. Say for this destination there are 4 next-hops to choose from. key. Say for this destination there are 4 next-hops to choose from.
Each next-hop is assigned a region in 16 bit space (the key space). Each next-hop is assigned a region in 16 bit space (the key space).
For equal usage the router may have chosen to divide it up evenly so For equal usage the router may have chosen to divide it up evenly so
each region is 65536/4 or 16k large. The next-hop is chosen by each region is 65536/4 or 16k large. The next-hop is chosen by
determining which region contains the key (i.e., the CRC result). determining which region contains the key (i.e., the CRC result).
2. Analysis 2. Analysis
There are a few concerns when choosing an algorithm for deciding There are a few concerns when choosing an algorithm for deciding
which next-hop to use. One is performance, the computational which next-hop to use. One is performance, the computational
requirements to run the algorithm. Another is disruption (i.e., the requirements to run the algorithm. Another is disruption (i.e., the
changing of which path a flow uses). Balancing is a third concern; changing of which path a flow uses). Balancing is a third concern;
however since the algorithm's balancing characteristics are directly however, since the algorithm's balancing characteristics are directly
related to the chosen hash function this analysis does not treat this related to the chosen hash function this analysis does not treat this
concern in depth. concern in depth.
For this analysis we will assume regions of equal size. If the hash For this analysis we will assume regions of equal size. If the
function is uniformly distributed the distribution of flows amongst output of the hash function is uniformly distributed the distribution
paths will also be uniform. of flows amongst paths will also be uniform, and so the algorithm
will properly implement ECMP. One can implement non-equal-cost
multi-path routing by using regions of unequal size; however, non-
Draft Analysis of an ECMP Algorithm February 2000
equal-cost multi-path routing is outside the scope of this document.
2.1. Performance 2.1. Performance
The performance of the hash-threshold algorithm can be broken down The performance of the hash-threshold algorithm can be broken down
into three parts: selection of regions for the next-hops, obtaining into three parts: selection of regions for the next-hops, obtaining
the key and comparing the key to the regions to decide which next-hop the key and comparing the key to the regions to decide which next-hop
to use. to use.
Since regions are restricted to be of equal size the calculation of
region boundaries is trivial. The boundaries can be calculated as
follows:
i = 0;
regionssize = keyspace.size / #{ next-hops }
for n in { next-hops }
n.start = i * regionsize;
n.end = n.start + regionsize;
i = i + 1
done
This calculation is O(N). Furthermore the calculation can be done
when the next-hops are added to or removed from the destination
route.
The algorithm doesn't specify the hash function used to obtain the The algorithm doesn't specify the hash function used to obtain the
key. Its performance in this area will be exactly the performance of key. Its performance in this area will be exactly the performance of
the hash function. It is presumed that if this calculation proves to the hash function. It is presumed that if this calculation proves to
be a concern it can be done in hardware parallel to other operations be a concern it can be done in hardware parallel to other operations
that need to complete before deciding which next-hop to use. that need to complete before deciding which next-hop to use.
Finally the next-hop must be chosen. This is done by determining Since regions are restricted to be of equal size the calculation of
which region contains the key. The time required to do this is region boundaries is trivial. Each boundary is exactly regionsize
dependent on the way the next-hops are organized in memory. The away from the previous boundary starting from 0 for the first region.
problem reduces to a search. For example if the next-hops are stored As we will show, for equal sized regions, we don't need to store the
as a linked list the time is O(N) as the router must traverse the boundary values.
list comparing each next-hop's region boundaries against the key. If
the next-hops are stored as an ordered array a binary search can be
used yielding O(logN).
If the number of next-hops is limited to a fixed maximum the To choose the next-hop we must determine which region contains the
comparison can be done in parallel in hardware yielding O(1). key. Because the regions are of equal size determining which region
contains the key is a simple division operation.
regionsize = keyspace.size / #{nexthops}
region = key / regionsize;
Thus the time required to find the next-hop is dependent on the way
the next-hops are organized in memory. The obvious use of an array
indexed by region yields O(1).
2.2. Disruption 2.2. Disruption
Protocols such as TCP perform better if the path they flow along does Protocols such as TCP perform better if the path they flow along does
not change while the stream is connected. Disruption is the not change while the stream is connected. Disruption is the
measurement of how many flows have their paths changed due to some measurement of how many flows have their paths changed due to some
change in the router. We measure disruption as the fraction of total change in the router. We measure disruption as the fraction of total
flows whose path changes in response to some change in the router. flows whose path changes in response to some change in the router.
This can become important if one or more of the paths is flapping.
For a description of disruption and how it affects protocols such as
Draft Analysis of an ECMP Algorithm February 2000
TCP see [1].
Some algorithms such as round-robin (i.e., upon receiving a packet Some algorithms such as round-robin (i.e., upon receiving a packet
the least recently used next-hop is chosen) are disruptive regardless the least recently used next-hop is chosen) are disruptive regardless
of any change in the router. Clearly this is not the case with hash- of any change in the router. Clearly this is not the case with hash-
threshold. As long as the region boundaries remain unchanged the threshold. As long as the region boundaries remain unchanged the
same next-hop will be chosen for a given flow. same next-hop will be chosen for a given flow.
Because we have required regions to be equal in size the only reason Because we have required regions to be equal in size the only reason
for a change in region boundaries is the addition or removal of a for a change in region boundaries is the addition or removal of a
next-hop. In this case the regions must all grow or shrink to fill next-hop. In this case the regions must all grow or shrink to fill
skipping to change at page 5, line 5 skipping to change at page 5, line 5
in region 4 and 1/4 of region 4 is in region 5. Since each of the in region 4 and 1/4 of region 4 is in region 5. Since each of the
original regions represent 1/5 of the flows, the total disruption is original regions represent 1/5 of the flows, the total disruption is
1/5*(1/4 + 1/2 + 1/2 + 1/4) or 3/10. 1/5*(1/4 + 1/2 + 1/2 + 1/4) or 3/10.
Note that the disruption to flows when adding a region is equivalent Note that the disruption to flows when adding a region is equivalent
to that of removing a region. That is, we are considering the to that of removing a region. That is, we are considering the
fraction of total flows that changes regions when moving from N to fraction of total flows that changes regions when moving from N to
N-1 regions, and that same fraction of flows will change when moving N-1 regions, and that same fraction of flows will change when moving
from N-1 to N regions. from N-1 to N regions.
Draft Analysis of an ECMP Algorithm February 2000
0123456701234567012345670123456701234567 0123456701234567012345670123456701234567
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+
| 1 | 2 | 3 | 4 | 5 | | 1 | 2 | 3 | 4 | 5 |
+-------+-+-----+---+---+-----+-+-------+ +-------+-+-----+---+---+-----+-+-------+
| 1 | 2 | 3 | 5 | | 1 | 2 | 3 | 5 |
+---------+---------+---------+---------+ +---------+---------+---------+---------+
0123456789012345678901234567890123456789 0123456789012345678901234567890123456789
Figure 2. Before and after deletion of region 4 Figure 2. Before and after deletion of region 4
skipping to change at page 6, line 5 skipping to change at page 6, line 5
K-1 N K-1 N
--- i --- (i-K) --- i --- (i-K)
disruption = \ --- + \ --- disruption = \ --- + \ ---
/ (N)(N-1) / (N)(N-1) / (N)(N-1) / (N)(N-1)
--- --- --- ---
i=1 i=K+1 i=1 i=K+1
We can factor 1/((N)(N-1)) out as it is constant. We can factor 1/((N)(N-1)) out as it is constant.
Draft Analysis of an ECMP Algorithm February 2000
/ K-1 N \ / K-1 N \
1 | --- --- | 1 | --- --- |
= --- | \ i + \ (i-K) | = --- | \ i + \ (i-K) |
(N)(N-1) | / / | (N)(N-1) | / / |
\ --- --- / \ --- --- /
1 i=K+1 1 i=K+1
We now use the the concrete formulas for the sum of integers. The We now use the the concrete formulas for the sum of integers. The
first summation is (K)(K-1)/2. For the second summation notice that first summation is (K)(K-1)/2. For the second summation notice that
we are summing the integers from 1 to N-K, thus it is (N-K)(N-K+1)/2. we are summing the integers from 1 to N-K, thus it is (N-K)(N-K+1)/2.
skipping to change at page 7, line 5 skipping to change at page 7, line 5
2(N)(N-1) 2(N)(N-1)
K*K - K - NK N + 1 K*K - K - NK N + 1
= -------------- + ------- = -------------- + -------
(N)(N-1) 2(N-1) (N)(N-1) 2(N-1)
Since we are minimizing for K the right side (N+1)/2(N-1) is constant Since we are minimizing for K the right side (N+1)/2(N-1) is constant
as is the denominator (N)(N-1) so we can drop them. To minimize we as is the denominator (N)(N-1) so we can drop them. To minimize we
take the derivative. take the derivative.
Draft Analysis of an ECMP Algorithm February 2000
d d
-- (K*K - (N+1)K) -- (K*K - (N+1)K)
dk dk
= 2K - (N+1) = 2K - (N+1)
Which is zero when K is (N+1)/2. Which is zero when K is (N+1)/2.
The last thing to consider is that K must be an integer. When N is The last thing to consider is that K must be an integer. When N is
odd (N+1)/2 will yield an integer, however when N is even (N+1)/2 odd (N+1)/2 will yield an integer, however when N is even (N+1)/2
skipping to change at page 7, line 40 skipping to change at page 7, line 42
3. Comparison to other algorithms 3. Comparison to other algorithms
Other algorithms exist to decide which next-hop to use. These Other algorithms exist to decide which next-hop to use. These
algorithms all have different performance and disruptive algorithms all have different performance and disruptive
characteristics. Of these algorithms we will only consider ones that characteristics. Of these algorithms we will only consider ones that
are not disruptive by design (i.e., if no change to the set of next- are not disruptive by design (i.e., if no change to the set of next-
hops occurs the path a flow takes remains the same). This will hops occurs the path a flow takes remains the same). This will
exclude round-robin and random choice. We will look at modulo-N and exclude round-robin and random choice. We will look at modulo-N and
highest random weight. highest random weight.
Modulo-N is a simpler form of hash-threshold. Given N next-hops the Modulo-N is a "simpler" form of hash-threshold. Given N next-hops
hash function includes a final modulo-N which directly maps the the packet header fields which describe the flow are run through a
result to one of the next-hops. This operation is the fastest of the hash function. A final modulo-N is applied to the output of the hash.
three we consider, however if a next-hop is added or removed the This result then directly maps to one of the next-hops. Modulo-N is
disruption is (N-1)/N. the most disruptive of the algorithms; if a next-hop is added or
removed the disruption is (N-1)/N. The performance of Modulo-N is
Highest random weight (HRW) is another comparative method similar to Draft Analysis of an ECMP Algorithm February 2000
hash-threshold. The router seeds a pseudo-random number generator
with the packet header fields which describe the flow and the next- equivalent to hash-threshold.
hop to obtain a weight. The next-hop which receives the highest
weight is selected. The advantage with using HRW is that it has Highest random weight (HRW) is a comparative method similar in some
minimal disruption (i.e., disruption due to adding or removing a ways to hash-threshold with non-fixed sized regions. For each next-
next-hop is always 1/N.) The disadvantage with HRW is an only hop, the router seeds a pseudo-random number generator with the
slightly more complex function to choose the next-hop. A description packet header fields which describe the flow and the next-hop to
of HRW along with comparisons to other methods can be found in [1]. obtain a weight. The next-hop which receives the highest weight is
selected. The advantage with using HRW is that it has minimal
disruption (i.e., disruption due to adding or removing a next-hop is
always 1/N.) The disadvantage with HRW is that the next-hop
selection is more expensive than hash-threshold. A description of
HRW along with comparisons to other methods can be found in [2].
Although not used for next-hop calculation an example usage of HRW Although not used for next-hop calculation an example usage of HRW
can be found in [2]. can be found in [3].
If the complexity of HRW's next-hop selection processes is acceptable Since each of modulo-N, hash-threshold and HRW require a hash on the
packet header fields which define a flow, we can factor the
performance of the hash out of the comparison. If the hash can not
be done inexpensively (e.g., in hardware) it too must be considered
when using any of the above methods.
The lookup performance for hash-threshold, like modulo-N is an
optimal O(1). HRW's lookup performance is O(N).
Disruptive behavior is the opposite of performance. HRW is best with
1/N. Hash-threshold is between 1/4 and 1/2. Finally Modulo-N is
(N-1)/N.
If the complexity of HRW's next-hop selection process is acceptable
we think it should be considered as an alternative to hash-threshold. we think it should be considered as an alternative to hash-threshold.
This could be the case when, for example, per-flow state is kept and
thus the next-hop choice is made infrequently.
However, when HRW's next-hop selection is seen as too expensive the
obvious choice is hash-threshold as it performs as well as modulo-N
and is less disruptive.
4. Security Considerations 4. Security Considerations
This document is an analysis of an algorithm used to implement an This document is an analysis of an algorithm used to implement an
ECMP routing decision. This analysis does not directly effect the ECMP routing decision. This analysis does not directly affect the
security of the Internet Infrastructure. security of the Internet Infrastructure.
Draft Analysis of an ECMP Algorithm February 2000
5. References 5. References
[1] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to [1] Thaler, D., and Hopps, C., "Multipath Issues in Unicast and
Multicast", Work in progress, June 1999.
[2] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to
Increase Hit Rates", IEEE/ACM Transactions on Networking, February Increase Hit Rates", IEEE/ACM Transactions on Networking, February
1998. 1998.
[2] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S., [3] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S.,
Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei, Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei,
"Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol
Specification", RFC 2362, June 1998. Specification", RFC 2362, June 1998.
6. Author's Address 6. Author's Address
Christian E. Hopps Christian E. Hopps
Merit Network Merit Network
4251 Plymouth Road, Suite C. 4251 Plymouth Road, Suite C.
Ann Arbor, MI 48105 Ann Arbor, MI 48105
Phone: +1 734 936 0291 Phone: +1 734 936 0291
EMail: chopps@merit.edu EMail: chopps@merit.edu
7. Full Copyright Statement 7. Full Copyright Statement
Copyright (C) The Internet Society (1999). All Rights Reserved. Copyright (C) The Internet Society 2000. All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for developing Internet standards in which case the procedures for
copyrights defined in the Internet languages other than English. copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns. revoked by the Internet Society or its successors or assigns.
Draft Analysis of an ECMP Algorithm February 2000
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 End of changes. 27 change blocks. 
60 lines changed or deleted 100 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/