idnits 2.17.1 draft-hopps-ecmp-algo-analysis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (5 January 1999) is 9242 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-02) exists of draft-ietf-ospf-omp-01 -- Possible downref: Normative reference to a draft: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Obsolete normative reference: RFC 2362 (ref. '3') (Obsoleted by RFC 4601, RFC 5059) Summary: 10 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Christian E. Hopps 2 INTERNET-DRAFT Merit Network 3 Expires June 1999 5 January 1999 5 Analysis of an ECMP Algorithm 6 8 Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its areas, 12 and its working groups. Note that other groups may also distribute 13 working documents as Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six months 16 and may be updated, replaced, or obsoleted by other documents at any 17 time. It is inappropriate to use Internet- Drafts as reference 18 material or to cite them other than as "work in progress." 20 To view the entire list of current Internet-Drafts, please check the 21 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 22 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern 23 Europe), ftp.nic.it (Southern Europe), munnari.oz.au (Pacific Rim), 24 ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 26 Abstract 28 Equal cost multipath (ECMP) is a routing technique for routing 29 packets along multiple paths of equal cost. The forwarding engine 30 identifies paths by next-hop. When forwarding a packet the router 31 must decide which next hop (path) to use. This document gives an 32 analysis of one popular method for making that decision. The 33 analysis includes the performance of the algorithm and the disruption 34 caused by changes to the set of next-hops. 36 Draft Analysis of an ECMP Algorithm January 1999 38 1. Hash-Threshold 40 One popular method for determining which next-hop to use when routing 41 with ECMP can be called hash-threshold. The router first selects a 42 key by performing a hash (e.g., modulo-K where K is large, or CRC16) 43 over the packet header fields that identify a flow. The N next-hops 44 have been assigned unique regions in the key space. The router uses 45 the key to determine which region and thus which next-hop to use. 46 This method is used in [1]. 48 As an example of hash-threshold, upon receiving a packet the router 49 performs a CRC16 on the packet's header fields that define the flow 50 (e.g., the source and destination fields of the packet), this is the 51 key. Say for this destination there are 4 next-hops to choose from. 52 Each next-hop is assigned a region in 16 bit space (the key space). 53 For equal usage the router may have chosen to divide it up evenly so 54 each region is 65536/4 or 16k large. The next-hop is chosen by 55 determining which region contains the key (i.e., the CRC result). 57 2. Analysis 59 There are a few concerns when choosing an algorithm for deciding 60 which next hop to use. One is performance, the computational 61 requirements to run the algorithm. Another is disruption (i.e., the 62 changing of which path a flow uses). Balancing is a third concern; 63 however since the algorithm's balancing characteristics are directly 64 related to the chosen hash function this analysis does not treat this 65 concern in depth. 67 For this analysis we will assume regions of equal size. If the hash 68 function is uniformly distributed the distribution of flows amongst 69 paths will also be uniform. 71 2.1. Performance 73 The performance of the hash-threshold algorithm can be broken down 74 into three parts: selection of regions for the next-hops, obtaining 75 the key and comparing the key to the regions to decide which next-hop 76 to use. 78 Since regions are restricted to be of equal size the calculation of 79 region boundaries is trivial. The boundaries can be calculated as 80 follows: 82 Draft Analysis of an ECMP Algorithm January 1999 84 i = 0; 85 regionssize = keyspace.size / #{ next-hops } 86 for n in { next-hops } 87 n.start = i * regionsize; 88 n.end = n.start + regionsize; 89 i = i + 1 90 done 92 This calculation is O(N). Furthermore the calculation can be done 93 when the next-hops are added to or removed from the destination 94 route. 96 The algorithm doesn't specify the hash function used to obtain the 97 key. Its performance in this area will be exactly the performance of 98 the hash function. It is presumed that if this calculation proves to 99 be a concern it can be done in hardware parallel to other operations 100 that need to complete before deciding which next-hop to use. 102 Finally the next-hop must be chosen. This is done by determining 103 which region contains the key. The time required to do this is 104 dependent on the way the next-hops are organized in memory. The 105 problem reduces to a search. For example if the next-hops are stored 106 as a linked list the time is O(N) as the router must traverse the 107 list comparing each next-hop's region boundaries against the key. If 108 the next-hops are stored as an ordered array a binary search can be 109 used yielding O(logN). 111 As [1] notes if the number of next-hops is limited to a fixed maximum 112 the comparison can be done in parallel in hardware, thus O(1). 114 2.2. Disruption 116 Protocols such as TCP perform better if the path they flow along does 117 not change while the stream is connected. Disruption is the 118 measurement of how many flows have their paths changed due to some 119 change in the router. We measure disruption as the fraction of total 120 flows whose path changes in response to some change in the router. 122 Some algorithms such as round-robin (i.e., upon receiving a packet 123 the least recently used next-hop is chosen) are disruptive regardless 124 of any change in the router. Clearly this is not the case with hash- 125 threshold. As long as the region boundaries remain unchanged the 126 same next-hop will be chosen for a given flow. 128 Draft Analysis of an ECMP Algorithm January 1999 130 Because we have required regions to be equal in size the only reason 131 for a change in region boundaries is the addition or removal of a 132 next-hop. In this case the regions must all grow or shrink to fill 133 the key space. The analysis begins with some examples of this. 135 0123456701234567012345670123456701234567 136 +-------+-------+-------+-------+-------+ 137 | 1 | 2 | 3 | 4 | 5 | 138 +-------+-+-----+---+---+-----+-+-------+ 139 | 1 | 2 | 4 | 5 | 140 +---------+---------+---------+---------+ 141 0123456789012345678901234567890123456789 143 Figure 1. Before and after deletion of region 3 145 In figure 1. region 3 has been deleted. The remaining regions grow 146 equally and shift to compensate. In this case 1/4 of region 2 is now 147 in region 1, 1/2 (2/4) of region 3 is in region 2, 1/2 of region 3 is 148 in region 4 and 1/4 of region 4 is in region 5. Since each of the 149 original regions represent 1/5 of the flows, the total disruption is 150 1/5*(1/4 + 1/2 + 1/2 + 1/4) or 3/10. 152 Note that the disruption to flows when adding a region is equivalent 153 to that of removing a region. That is, we are considering the 154 fraction of total flows that changes regions when moving from N to 155 N-1 regions, and that same fraction of flows will change when moving 156 from N-1 to N regions. 158 0123456701234567012345670123456701234567 159 +-------+-------+-------+-------+-------+ 160 | 1 | 2 | 3 | 4 | 5 | 161 +-------+-+-----+---+---+-----+-+-------+ 162 | 1 | 2 | 3 | 5 | 163 +---------+---------+---------+---------+ 164 0123456789012345678901234567890123456789 166 Figure 2. Before and after deletion of region 4 168 In figure 2. region 4 has been deleted. Again the remaining regions 169 grow equally and shift to compensate. 1/4 of region 2 is now in 170 region 1, 1/2 of region 3 is in region 2, 3/4 of region 4 is in 172 Draft Analysis of an ECMP Algorithm January 1999 174 region 3 and 1/4 of region 4 is in region 5. Since each of the 175 original regions represent 1/5 of the flows the, total disruption is 176 7/20. 178 To generalize, upon removing a region K the remaining N-1 regions 179 grow to fill the 1/N space. This growth is evenly divided between 180 the N-1 regions and so the change in size for each region is 181 1/N/(N-1) or 1/(N(N-1)). This change in size causes non-end regions 182 to move. The first region grows and so the second region is shifted 183 towards K by the change in size of the first region. 1/(N(N-1)) of 184 the flows from region 2 are subsumed by the change in region 1's 185 size. 2/(N(N-1)) of the flows in region 3 are subsumed by region 2. 186 This is because region 2 has shifted by 1/(N(N-1)) and grown by 187 1/(N(N-1)). This continues from both ends until you reach the 188 regions that bordered K. The calculation for the number of flows 189 subsumed from the Kth region into the bordering regions accounts for 190 the removal of the Kth region. Thus we have the following equation. 192 K-1 N 193 --- i --- (i-K) 194 disruption = \ --- + \ --- 195 / (N)(N-1) / (N)(N-1) 196 --- --- 197 i=1 i=K+1 199 We can factor 1/((N)(N-1)) out as it is constant. 201 / K-1 N \ 202 1 | --- --- | 203 = --- | \ i + \ (i-K) | 204 (N)(N-1) | / / | 205 \ --- --- / 206 1 i=K+1 208 We now use the the concrete formulas for the sum of integers. The 209 first summation is (K)(K-1)/2. For the second summation notice that 210 we are summing the integers from 1 to N-K, thus it is (N-K)(N-K+1)/2. 212 Draft Analysis of an ECMP Algorithm January 1999 214 (K-1)(K) + (N-K)(N-K+1) 215 = ----------------------- 216 2(N)(N-1) 218 Considering the summations, one can see that the least disruption is 219 when K is as close to half way between 1 and N as possible. This can 220 be proven by finding the minimum of the concrete formula for K 221 holding N constant. First break apart the quantities and collect. 223 2K*K - 2K - 2NK + N*N + N 224 = ------------------------- 225 2(N)(N-1) 227 K*K - K - NK N + 1 228 = -------------- + ------- 229 (N)(N-1) 2(N-1) 231 Since we are minimizing for K the right side (N+1)/2(N-1) is constant 232 as is the denominator (N)(N-1) so we can drop them. To minimize we 233 take the derivative. 235 d 236 -- (K*K - (N+1)K) 237 dk 239 = 2K - (N+1) 241 Which is zero when K is (N+1)/2. 243 The last thing to consider is that K must be an integer. When N is 244 odd (N+1)/2 will yield an integer, however when N is even (N+1)/2 245 yields an integer + 1/2. In the case, because of symmetry, we get 246 the least disruption when K is N/2 or N/2 + 1. 248 Draft Analysis of an ECMP Algorithm January 1999 250 Now since the formula is quadratic with a global minimum half way 251 between 1 and N the maximum possible disruption must occur when edge 252 regions (1 and N) are removed. If K is 1 or N the formula reduces to 253 1/2. 255 Thus to minimize disruption we recommend adding new regions to the 256 center rather than the ends. 258 3. Comparison to other algorithms 260 Other algorithms exist to decide which next-hop to use. These 261 algorithms all have different performance and disruptive 262 characteristics. Of these algorithms we will only consider ones that 263 are not disruptive by design (i.e., if no change to the set of next- 264 hops occurs the path a flow takes remains the same). This will 265 exclude round-robin and random choice. We will look at modulo-N and 266 highest random weight. 268 Modulo-N is a simpler form of hash-threshold. Given N next-hops the 269 hash function includes a final modulo-N which directly maps the 270 result to one of the next-hops. This operation is the fastest of the 271 three we consider, however if a next-hop is added or removed the 272 disruption is (N-1)/N. 274 Highest random weight (HRW) is another comparative method similar to 275 hash-threshold. The router seeds a pseudo-random number generator 276 with the packet header fields which describe the flow and the next- 277 hop to obtain a weight. The next-hop which receives the highest 278 weight is selected. The advantage with using HRW is that it has 279 minimal disruption (i.e., disruption due to adding or removing a 280 next-hop is always 1/N.) The disadvantage with HRW is an only 281 slightly more complex function to choose the next-hop. A description 282 of HRW along with comparisons to other methods can be found in [2]. 283 Although not used for next-hop calculation an example usage of HRW 284 can be found in [3]. 286 If the complexity of HRW's next-hop selection processes is acceptable 287 we think it should be considered as an alternative to hash-threshold. 289 4. Security Considerations 291 This document is an analysis of an algorithm used to implement an 292 ECMP routing decision. This analysis does not directly effect the 294 Draft Analysis of an ECMP Algorithm January 1999 296 security of the Internet Infrastructure. 298 5. References 300 [1] Villamizar, C., "OSPF Optimized Multipath (OSPF-OMP)", draft-ietf- 301 ospf-omp-01.txt, October, 1998. 303 [2] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to 304 Increase Hit Rates", IEEE/ACM Transactions on Networking, February 305 1998. 307 [3] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S., 308 Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei, 309 "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol 310 Specification", RFC 2362, June 1998. 312 6. Author's Address 314 Christian E. Hopps 315 Merit Network 316 4251 Plymouth Road, Suite C. 317 Ann Arbor, MI 48105 318 Phone: +1 734 936 0291 319 EMail: chopps@merit.edu