idnits 2.17.1 draft-hopps-ecmp-algo-analysis-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (13 April 1999) is 9117 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 2362 (ref. '2') (Obsoleted by RFC 4601, RFC 5059) Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Christian E. Hopps 3 INTERNET-DRAFT Merit Network 4 Expires September 1999 13 April 1999 6 Analysis of an Equal-Cost Multi-Path Algorithm 7 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet- Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 Copyright Notice 32 Copyright (C) The Internet Society (1999). All Rights Reserved. 34 Abstract 36 Equal-cost multi-path (ECMP) is a routing technique for routing 37 packets along multiple paths of equal cost. The forwarding engine 38 identifies paths by next-hop. When forwarding a packet the router 39 must decide which next-hop (path) to use. This document gives an 40 analysis of one method for making that decision. The analysis 41 includes the performance of the algorithm and the disruption caused 42 by changes to the set of next-hops. 44 1. Hash-Threshold 46 One method for determining which next-hop to use when routing with 47 ECMP can be called hash-threshold. The router first selects a key by 48 performing a hash (e.g., modulo-K where K is large, or CRC16) over 49 the packet header fields that identify a flow. The N next-hops have 50 been assigned unique regions in the key space. The router uses the 51 key to determine which region and thus which next-hop to use. 53 As an example of hash-threshold, upon receiving a packet the router 54 performs a CRC16 on the packet's header fields that define the flow 55 (e.g., the source and destination fields of the packet), this is the 56 key. Say for this destination there are 4 next-hops to choose from. 57 Each next-hop is assigned a region in 16 bit space (the key space). 58 For equal usage the router may have chosen to divide it up evenly so 59 each region is 65536/4 or 16k large. The next-hop is chosen by 60 determining which region contains the key (i.e., the CRC result). 62 2. Analysis 64 There are a few concerns when choosing an algorithm for deciding 65 which next-hop to use. One is performance, the computational 66 requirements to run the algorithm. Another is disruption (i.e., the 67 changing of which path a flow uses). Balancing is a third concern; 68 however since the algorithm's balancing characteristics are directly 69 related to the chosen hash function this analysis does not treat this 70 concern in depth. 72 For this analysis we will assume regions of equal size. If the hash 73 function is uniformly distributed the distribution of flows amongst 74 paths will also be uniform. 76 2.1. Performance 78 The performance of the hash-threshold algorithm can be broken down 79 into three parts: selection of regions for the next-hops, obtaining 80 the key and comparing the key to the regions to decide which next-hop 81 to use. 83 Since regions are restricted to be of equal size the calculation of 84 region boundaries is trivial. The boundaries can be calculated as 85 follows: 87 i = 0; 88 regionssize = keyspace.size / #{ next-hops } 89 for n in { next-hops } 90 n.start = i * regionsize; 91 n.end = n.start + regionsize; 92 i = i + 1 93 done 95 This calculation is O(N). Furthermore the calculation can be done 96 when the next-hops are added to or removed from the destination 97 route. 99 The algorithm doesn't specify the hash function used to obtain the 100 key. Its performance in this area will be exactly the performance of 101 the hash function. It is presumed that if this calculation proves to 102 be a concern it can be done in hardware parallel to other operations 103 that need to complete before deciding which next-hop to use. 105 Finally the next-hop must be chosen. This is done by determining 106 which region contains the key. The time required to do this is 107 dependent on the way the next-hops are organized in memory. The 108 problem reduces to a search. For example if the next-hops are stored 109 as a linked list the time is O(N) as the router must traverse the 110 list comparing each next-hop's region boundaries against the key. If 111 the next-hops are stored as an ordered array a binary search can be 112 used yielding O(logN). 114 If the number of next-hops is limited to a fixed maximum the 115 comparison can be done in parallel in hardware yielding O(1). 117 2.2. Disruption 119 Protocols such as TCP perform better if the path they flow along does 120 not change while the stream is connected. Disruption is the 121 measurement of how many flows have their paths changed due to some 122 change in the router. We measure disruption as the fraction of total 123 flows whose path changes in response to some change in the router. 125 Some algorithms such as round-robin (i.e., upon receiving a packet 126 the least recently used next-hop is chosen) are disruptive regardless 127 of any change in the router. Clearly this is not the case with hash- 128 threshold. As long as the region boundaries remain unchanged the 129 same next-hop will be chosen for a given flow. 131 Because we have required regions to be equal in size the only reason 132 for a change in region boundaries is the addition or removal of a 133 next-hop. In this case the regions must all grow or shrink to fill 134 the key space. The analysis begins with some examples of this. 136 0123456701234567012345670123456701234567 137 +-------+-------+-------+-------+-------+ 138 | 1 | 2 | 3 | 4 | 5 | 139 +-------+-+-----+---+---+-----+-+-------+ 140 | 1 | 2 | 4 | 5 | 141 +---------+---------+---------+---------+ 142 0123456789012345678901234567890123456789 144 Figure 1. Before and after deletion of region 3 146 In figure 1. region 3 has been deleted. The remaining regions grow 147 equally and shift to compensate. In this case 1/4 of region 2 is now 148 in region 1, 1/2 (2/4) of region 3 is in region 2, 1/2 of region 3 is 149 in region 4 and 1/4 of region 4 is in region 5. Since each of the 150 original regions represent 1/5 of the flows, the total disruption is 151 1/5*(1/4 + 1/2 + 1/2 + 1/4) or 3/10. 153 Note that the disruption to flows when adding a region is equivalent 154 to that of removing a region. That is, we are considering the 155 fraction of total flows that changes regions when moving from N to 156 N-1 regions, and that same fraction of flows will change when moving 157 from N-1 to N regions. 159 0123456701234567012345670123456701234567 160 +-------+-------+-------+-------+-------+ 161 | 1 | 2 | 3 | 4 | 5 | 162 +-------+-+-----+---+---+-----+-+-------+ 163 | 1 | 2 | 3 | 5 | 164 +---------+---------+---------+---------+ 165 0123456789012345678901234567890123456789 167 Figure 2. Before and after deletion of region 4 169 In figure 2. region 4 has been deleted. Again the remaining regions 170 grow equally and shift to compensate. 1/4 of region 2 is now in 171 region 1, 1/2 of region 3 is in region 2, 3/4 of region 4 is in 172 region 3 and 1/4 of region 4 is in region 5. Since each of the 173 original regions represent 1/5 of the flows the, total disruption is 174 7/20. 176 To generalize, upon removing a region K the remaining N-1 regions 177 grow to fill the 1/N space. This growth is evenly divided between 178 the N-1 regions and so the change in size for each region is 179 1/N/(N-1) or 1/(N(N-1)). This change in size causes non-end regions 180 to move. The first region grows and so the second region is shifted 181 towards K by the change in size of the first region. 1/(N(N-1)) of 182 the flows from region 2 are subsumed by the change in region 1's 183 size. 2/(N(N-1)) of the flows in region 3 are subsumed by region 2. 184 This is because region 2 has shifted by 1/(N(N-1)) and grown by 185 1/(N(N-1)). This continues from both ends until you reach the 186 regions that bordered K. The calculation for the number of flows 187 subsumed from the Kth region into the bordering regions accounts for 188 the removal of the Kth region. Thus we have the following equation. 190 K-1 N 191 --- i --- (i-K) 192 disruption = \ --- + \ --- 193 / (N)(N-1) / (N)(N-1) 194 --- --- 195 i=1 i=K+1 197 We can factor 1/((N)(N-1)) out as it is constant. 199 / K-1 N \ 200 1 | --- --- | 201 = --- | \ i + \ (i-K) | 202 (N)(N-1) | / / | 203 \ --- --- / 204 1 i=K+1 206 We now use the the concrete formulas for the sum of integers. The 207 first summation is (K)(K-1)/2. For the second summation notice that 208 we are summing the integers from 1 to N-K, thus it is (N-K)(N-K+1)/2. 210 (K-1)(K) + (N-K)(N-K+1) 211 = ----------------------- 212 2(N)(N-1) 214 Considering the summations, one can see that the least disruption is 215 when K is as close to half way between 1 and N as possible. This can 216 be proven by finding the minimum of the concrete formula for K 217 holding N constant. First break apart the quantities and collect. 219 2K*K - 2K - 2NK + N*N + N 220 = ------------------------- 221 2(N)(N-1) 223 K*K - K - NK N + 1 224 = -------------- + ------- 225 (N)(N-1) 2(N-1) 227 Since we are minimizing for K the right side (N+1)/2(N-1) is constant 228 as is the denominator (N)(N-1) so we can drop them. To minimize we 229 take the derivative. 231 d 232 -- (K*K - (N+1)K) 233 dk 235 = 2K - (N+1) 237 Which is zero when K is (N+1)/2. 239 The last thing to consider is that K must be an integer. When N is 240 odd (N+1)/2 will yield an integer, however when N is even (N+1)/2 241 yields an integer + 1/2. In the case, because of symmetry, we get 242 the least disruption when K is N/2 or N/2 + 1. 244 Now since the formula is quadratic with a global minimum half way 245 between 1 and N the maximum possible disruption must occur when edge 246 regions (1 and N) are removed. If K is 1 or N the formula reduces to 247 1/2. 249 The minimum possible disruption is obtained by letting K=(N+1)/2. In 250 this case the formula reduces to 1/4 + 1/(4*N). So the range of 251 possible disruption is (1/4, 1/2]. 253 To minimize disruption we recommend adding new regions to the center 254 rather than the ends. 256 3. Comparison to other algorithms 258 Other algorithms exist to decide which next-hop to use. These 259 algorithms all have different performance and disruptive 260 characteristics. Of these algorithms we will only consider ones that 261 are not disruptive by design (i.e., if no change to the set of next- 262 hops occurs the path a flow takes remains the same). This will 263 exclude round-robin and random choice. We will look at modulo-N and 264 highest random weight. 266 Modulo-N is a simpler form of hash-threshold. Given N next-hops the 267 hash function includes a final modulo-N which directly maps the 268 result to one of the next-hops. This operation is the fastest of the 269 three we consider, however if a next-hop is added or removed the 270 disruption is (N-1)/N. 272 Highest random weight (HRW) is another comparative method similar to 273 hash-threshold. The router seeds a pseudo-random number generator 274 with the packet header fields which describe the flow and the next- 275 hop to obtain a weight. The next-hop which receives the highest 276 weight is selected. The advantage with using HRW is that it has 277 minimal disruption (i.e., disruption due to adding or removing a 278 next-hop is always 1/N.) The disadvantage with HRW is an only 279 slightly more complex function to choose the next-hop. A description 280 of HRW along with comparisons to other methods can be found in [1]. 281 Although not used for next-hop calculation an example usage of HRW 282 can be found in [2]. 284 If the complexity of HRW's next-hop selection processes is acceptable 285 we think it should be considered as an alternative to hash-threshold. 287 4. Security Considerations 289 This document is an analysis of an algorithm used to implement an 290 ECMP routing decision. This analysis does not directly effect the 291 security of the Internet Infrastructure. 293 5. References 295 [1] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to 296 Increase Hit Rates", IEEE/ACM Transactions on Networking, February 297 1998. 299 [2] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S., 300 Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei, 301 "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol 302 Specification", RFC 2362, June 1998. 304 6. Author's Address 306 Christian E. Hopps 307 Merit Network 308 4251 Plymouth Road, Suite C. 309 Ann Arbor, MI 48105 310 Phone: +1 734 936 0291 311 EMail: chopps@merit.edu 313 7. Full Copyright Statement 315 Copyright (C) The Internet Society (1999). All Rights Reserved. 317 This document and translations of it may be copied and furnished to 318 others, and derivative works that comment on or otherwise explain it 319 or assist in its implementation may be prepared, copied, published 320 and distributed, in whole or in part, without restriction of any 321 kind, provided that the above copyright notice and this paragraph are 322 included on all such copies and derivative works. However, this 323 document itself may not be modified in any way, such as by removing 324 the copyright notice or references to the Internet Society or other 325 Internet organizations, except as needed for the purpose of 326 developing Internet standards in which case the procedures for 327 copyrights defined in the Internet languages other than English. 329 The limited permissions granted above are perpetual and will not be 330 revoked by the Internet Society or its successors or assigns.