idnits 2.17.1 draft-ietf-ospf-omp-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 6 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1556 has weird spacing: '...-egress loa...' == Line 1564 has weird spacing: '... link load...' == Line 1674 has weird spacing: '...-egress loa...' == Line 1687 has weird spacing: '... link load...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 24, 1999) is 9185 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'NextHopStruct' on line 1168 == Unused Reference: '6' is defined on line 787, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2370 (ref. '1') (Obsoleted by RFC 5250) -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 2001 (ref. '6') (Obsoleted by RFC 2581) Summary: 9 errors (**), 0 flaws (~~), 6 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Curtis Villamizar 3 INTERNET-DRAFT UUNET 4 draft-ietf-ospf-omp-02 February 24, 1999 6 OSPF Optimized Multipath (OSPF-OMP) 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with all 11 provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering 14 Task Force (IETF), its areas, and its working groups. Note that other 15 groups may also distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet- Drafts as reference mate- 20 rial or to cite them other than as ``work in progress.'' 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 Copyright (C) The Internet Society (February 24, 1999). All Rights 29 Reserved. 31 Abstract 33 OSPF may form multiple equal cost paths between points. This is true 34 of any link state protocol. In the absense of any explicit support 35 to take advantage of this, a path may be chosen arbitrarily. Tech- 36 niques have been utilized to divide traffic somewhat evenly among the 37 available paths. These techniques have been referred to as Equal Cost 38 Multipath (ECMP). An unequal division of traffic among the available 39 paths is generally preferable. Routers generally have no knowledge 40 of traffic loading on distant links and therefore have no basis to 41 optimize the allocation of traffic. 43 Optimized Mulitpath is a compatible extension to OSPF, utilizing the 44 Opaque LSA to distribute loading information, proposing a means to 45 adjust forwarding, and providing an algorithm to make the adjustments 46 gradually enough to insure stability yet provide reasonably fast ad- 47 justment when needed. 49 1 Overview 51 Networks running OSPF are often heavily loaded. Topologies often 52 evolve to include multiple paths. Multiple paths may be initially de- 53 signed to provide redundancy but also result from incremental addition 54 of circuits to accomodate traffic growth. The redundant paths provide 55 a potential to distribute traffic loading and reduce congestion. Op- 56 timized Mulitpath (OMP) provides a means for OSPF to make better use 57 of this potential to distribute loading. 59 1.1 Past Attempts 61 Early attempts to provide load sensitive routing involved changing 62 link costs according to loading. These attempts were doomed to fail- 63 ure because the adjustment increment was grossly course and oscilla- 64 tion was inevitable [2]. This early experience is largely responsible 65 for the common belief that any form of load sensitive routing will 66 fail due to severe oscillations resulting from instability. 68 Attempts to use a metric composed of weighted components of delay, 69 traffic, and fixed costs have also been met with very limited success. 70 The problem again is the granularity of adjustment. As the composi- 71 tion of weighted components switches favored paths large amounts of 72 traffic are suddenly moved, making the technique prone to oscillations 73 [3]. The oscillation is damped to some extent by providing a range 74 of composite metric differences in which composite metrics are con- 75 sidered equal and equal cost multipath techniques are used. Even then 76 the technique still suffers oscillations due to the course adjustments 77 made at equal/unequal metric boundaries. 79 1.2 Equal Cost Multipath 81 A widely utilized technique to improve loading is known as Equal Cost 82 Multipath (ECMP). ECMP is specified in [5]. In ECMP no attempt to 83 make dynamic adjustments to OSPF costs based on loading and therefore 84 ECMP is completely stable. If the topology is such that equal cost 85 paths exist, then an attempt is made to divide traffic equally among 86 the paths. At least three methods of splitting traffic have been 87 used. 89 1. Per packet round robin forwarding. 91 2. Dividing destination prefixes among available next hops in the for- 92 warding entries. 94 3. Dividing traffic according to a hash function applied to the source 95 and desination pair. 97 The ``per packet round robin forwarding'' technique is only applicable 98 if the delays on the paths are almost equal. The delay difference 99 must be small relative to packet serialization time. Delay differ- 100 ences greater than three times the packet serialization time can cause 101 terrible TCP performance. For example, packet 2, 4, and 6 may arrive 102 before packet 1, triggering TCP fast retransmit. The result will be 103 limiting TCP to a very small window and very poor performance over 104 long delay paths. 106 The delay differences must be quite small. A 532 byte packet is seri- 107 alized onto a DS1 link in under 2.8 msec. At DS3 speed, serialization 108 is accomplished in under 100 usec. At OC12 it is under 7 usec. For 109 this reason ``per packet round robin forwarding'' is not applicable to 110 a high speed WAN. 112 Dividing destination prefixes among available next hops provides a 113 very course and unpredictable load split. Very short prefixes are 114 problematic. In reaching an end node, the majority of traffic is of- 115 ten destined to a single prefix. This technique is applicable to a 116 high speed WAN but with the drawbacks just mentioned better techniques 117 are needed. 119 The ``source/destination hash'' based technique was used as far back 120 as the T1-NSFNET in the IBM RT-PC based routers. A hash function, 121 such as CRC-16, is applied over the source address and destination ad- 122 dress. The hash space is then split evenly among the available paths 123 by either setting threshholds or performing a modulo operation. Traf- 124 fic between any given source and destination remain on the same path. 125 Because the technique is based on host addresses, and uses both the 126 source and destination address, it does not suffer the course gran- 127 ularity problem of the prefix based technique, even when forwarding 128 to a single prefix. Source/destination hash is the best technique 129 available for a high speed WAN. 131 The forwarding decision for the ``source/destination hash'' based 132 technique is quite simple. When a packet arrives, look up the for- 133 warding entry in the radix tree. The next hop entry can be an array 134 index into a set of structures, each containing one or more actual 135 next hops. If more than one next hop is present, compute a CRC16 136 value based on the source and destination addresses. The CRC16 can 137 be implemented in hardware and computed in parallel to the radix tree 138 lookup in high speed implementations, and discarded if not needed. 140 .----. 141 / \ 142 | N2 | 143 \ / 144 `----' 145 // \\ 146 // \\ 147 // \\ 148 .----. .----. 149 / \ / \ 150 | N1 | ======== | N3 | 151 \ / \ / 152 `----' `----' 154 Figure 1: A very simple application of ECMP 156 Each next hop entry in the structure must contain a boundary value 157 and the next hop itself. An integer ``less than'' comparison is made 158 against the boundary value determining whether to use this next hop 159 or move to the next a comparison. In hardware the full set of compar- 160 isons can be made simultaneously for up to some number of next hops or 161 a binary search can be performed. This yields the next hop to use. 163 1.3 Optimized Multipath differs from ECMP 165 For ECMP, the boundary values are set by first dividing one more than 166 the maximum value that the hash computation can return (65536 for 167 CRC16) by the number of available next hops and then setting the Nth 168 boundary to N times that number (with the Nth value fixed at one more 169 than the maximum value regardless of underflow caused by trucating 170 during division, 65536 for CRC16). 172 An equal load split is not always optimal. Consider the example in 173 Figure 1 with the offered traffic in Table 1. If all of the link 174 costs are set equally, then the link N1---N3 is significantly over- 175 loaded (135.75%) while the path N1---N2---N3 is lightly loaded (45.25% 176 and 22.62%). If the cost on the N1---N3 link is equal to the cost 177 of the N1---N2---N3 path, then N1 will try to split the load destined 178 toward N3 across the two paths. 180 No ECMP OMP 181 Nodes Node Names Split Traffic Traffic 183 n3-n1 Node 3 -> Node 1 60 30 40 184 n1-n3 Node 1 -> Node 3 60 30 40 185 n3-n2 Node 3 -> Node 2 20 50 40 186 n2-n3 Node 2 -> Node 3 20 50 40 187 n2-n1 Node 2 -> Node 1 10 40 30 188 n1-n2 Node 1 -> Node 2 10 40 30 190 Table 1: Traffic loading for the example in Figure 1 192 Given the offered traffic in Table 1 the loading on N1---N3 is reduced 193 to 67.87% but the link loading on the path N2---N3 becomes 113.12%. 194 Ideally node N1 should put 1/3 of the traffic toward N3 on the path 195 N1---N2---N3 and 2/3 on the path N1---N3. To know to do this N1 must 196 know the loading on N2--N3. 198 This is where Optimized Multipath (OMP) provides additional benefit 199 over ECMP. Ignoring for the moment how node N1 knows to put 1/3 of the 200 traffic toward N3 on the path N1---N2---N3 (described later in Sec- 201 tion 2), the way the distribution of traffic is accomplished from a 202 forwarding standpoint is to move the boundary in the forwarding struc- 203 ture from the default value of 1/2 of 65536 to about 1/3 of 65536. If 204 there are a very large set of source and destination host addresses 205 pairs, then the traffic will be split among the 65536 possible hash 206 values. This provides the means for a very fine granularity of ad- 207 justment. 209 Having explained how a fine granularity of forwarding adjustment can 210 be accomplished, what remains is to define how nodes in a large topol- 211 ogy can know what the loading levels are elsewhere in the topology and 212 defining an algorithm which can allow autonomous unsyncronized deci- 213 sions on the parts of many routers in a topology to quickly converge 214 on a near optimal loading without the risk of oscillation. This is 215 covered in the following sections. 217 2 Flooding Loading Information 219 Loading information is flooded within an OSPF area using Opaque 220 LSAs [1]. Area local scope (link-state type 10) link state at- 221 tributes are flooded containing an ``Opaque Type'' of LSA_OMP_LINK_LOAD 222 or LSA_OMP_PATH_LOAD. The type LSA_OMP_LINK_LOAD Opaque LSA is 223 used to flood link loading information within an area. The 224 type LSA_OMP_PATH_LOAD Opaque LSA is used to flood loading informa- 225 tion for use with inter-area routes. Loading information obtained 226 from an exterior routing protocol may also be considered if avail- 227 able. The means of passing loading information in an exterior routing 228 protocol is beyond the scope of this document. 230 2.1 Link Loading Information 232 Within an area link loading is flooded using the type LSA_OMP_LINK_LOAD 233 Opaque LSA. The format of this LSA is described in Appendix A. 235 The ``Opaque Information'' in the type LSA_OMP_LINK_LOAD Opaque LSA 236 contains the following. 238 1. a measure of link loading in each direction as a fraction of link 239 capacity, 241 2. a measure of packets dropped due to queue overflow in each direc- 242 tion (if known) expressed as a fraction, 243 3. the link capacity in kilobits per second (or unity if less than 244 1000 bytes per second). 246 Generally the number of ouput packets dropped will be known. In de- 247 signs where drops occur on the input, the rate of input queue drops 248 should be recorded. These measures of loading and drop are computed 249 using the interface counters generally maintained for SNMP purposes, 250 plus a running count of output queue drops if available. The coun- 251 ters are sampled every 15 seconds but generally flooded at longer time 252 intervals. 254 The previous value of each of the counters is substracted from the 255 current value. The counters required are 1) bytes out, 2) bytes in, 256 3) packets out, 4) packets in, 5) output queue drops, and 6) input 257 queue drops. These counters should already exist to satisfy SNMP 258 requirements. 260 A value of instantaneous load in each direction is based on byte count 261 and link capacity. An instantaneous output queue drop rate is based 262 on queue drops and packet count. Some of the values are filtered as 263 described in Appendix B.1. 265 The last time that a type LSA_OMP_LINK_LOAD Opaque LSA with the same 266 Opaque ID was sent is recorded and the values sent are recorded. For 267 the purpose of determining when to reflood, an equivalent loading fig- 268 ure is used. The computation of equivalent loading is described in 269 Section 2.3. 271 The higher of the current equivalent loading computation and 272 the previous is used when determining whether to send the 273 type LSA_OMP_LINK_LOAD Opaque LSA. The type LSA_OMP_LINK_LOAD Opaque 274 LSA is flooded according to elapsed time since last flooded, the cur- 275 rent equivalent load, and the difference between the current equiva- 276 lent load and the previously flooded equivalent load. The reflooding 277 decision is described in detail in Appendix B.1. 279 The point of this graduated reflooding schedule is to reduce the 280 amount of flooding that is occurring unless links are in trouble or 281 undergoing a significant traffic shift. Change may occur in a qui- 282 escent network due to failure external to the network that causes 283 traffic to take alternate paths. In this case, the more frequent 284 flooding will trigger a faster convergence. Traffic shift may also 285 occur due to shedding of loading by the OMP algorithm itself as the 286 algorithm converges in response to an external change. 288 2.2 Path Loading Information 290 Path loading information regarding an adjacent area is flooded by an 291 Area Border Router (ABR) using the type LSA_OMP_PATH_LOAD Opaque LSA. 292 The format of this LSA is described in Appendix A. 294 The ``Opaque Information'' in the type LSA_OMP_PATH_LOAD Opaque LSA 295 contains the following. 297 1. the highest loading in the direction toward the destination as a 298 fraction of link capacity, 300 2. a measure of total packet drop due to queue overflow in the direc- 301 tion toward the destination expressed as a fraction, 302 3. the smallest link capacity on the path to the destination. 304 These values are taken from the link on the path from the ABR to the 305 destination of the summary LSA. The link with the highest loading may 306 not be the link with the lowest capacity. The queue drop value is one 307 minus the product of fraction of packets that are not dropped at each 308 measurement point on the path (input and output in the direction of 309 the path). The following computation is used. 311 path-loss = 1 - product(1 - link-loss) 313 The path loading and path loss rate are filtered according to the 314 algorithm defined in Appendix B.1. Rather than polling a set of coun- 315 ters the current value of the path loading and path loss rate is used. 316 An equivalent load is calculated for each path to a summary LSA des- 317 tination as described in Section 2.3. A type LSA_OMP_PATH_LOAD Opaque 318 LSA is flooded according to the same rate schedule as described in the 319 prior section and Appendix B.1. 321 An ABR may be configured to not send type LSA_OMP_PATH_LOAD Opaque LSA 322 into any given area. See Appendix C. 324 2.3 Computing equivalent loading 326 The equivalent load is the actual fractional loading multiplied by a 327 factor that provides an estimate based on loss of the extent to which 328 TCP is expected to slow down to avoid congestion. This estimate is 329 based on the link bandwidth and loss rate, knowledge of TCP dynamics, 330 and some assumption about the characteristics of the TCP flows being 331 passed through the link. Some of the assumptions must be configured. 333 If loss is low or zero, the equivalent load will be equal to the ac- 334 tual fractional loading (link utilization expressed as a number be- 335 tween 0 and 1). If loss is high and loading is at or near 100%, then 336 the equivalent load calculation provides a means of deciding which 337 links are more heavily overloaded. The equivalent load figure is 338 not intended to be an accurate pridiction of offerred load, simply a 339 metric for use in deciding which link to offload. 341 Mathis et al provide the following estimate of loss given TCP window 342 size and round trip time [4]. 344 p < (MSS / (BW * RTT))**2 346 The basis for the estimate is that TCP slows down roughly in propor- 347 tion to the inverse of the square root of loss. There is no way to 348 know how fast TCP would be going if no loss were present if there are 349 other bottlenecks. A somewhat arbitrary assumption is made that TCP 350 would go no faster than if loss were at 0.5%. If loss is greater than 351 0.5% then TCP performance would be reduced. The equivalent loading is 352 estimated using the following computation. 354 equiv-load = load * K * sqrt(loss) 356 The inverse of the square root of 0.1% is 10 so 10 may be used for the 357 value of ``K''. 359 The conversion of loss to estimated loading is not at all accurate. 360 The non-linearity does affect the time to converge though convergence 361 still occurs as long as loss is positively correlated to loading. 362 This is discussed further in Appendix E.1. 364 3 Next hop structures 366 A ``next hop structure'' contains a set of complete paths to a des- 367 tination, some of which may share the same immediate next hop. The 368 name is not meant to imply a single next hop. A given route can ref- 369 erence only one next hop structure, which can contain multiple paths 370 and multiple next hops. Entries for paths that use the same next hop 371 are combined before moving information to the forwarding table. A 372 next hop structure contains the information nessecary to balance load 373 across a set of next hops. 375 For intra-area routes, a separate next hop structure must exist for 376 each destination router or network. For inter-area routes (summary 377 routes), at most one next hop structure is needed for each combination 378 of ABRs which announce summary routes that are considered equidis- 379 tant. Optimizing inter-area and external routing is discussed in 380 Section 3.2. 382 The set of intra-area next hop structures is initialized after the 383 OSPF SPF calculation is completed. An additional set of next hops is 384 then added by relaxing the best path criteria. 386 The use of the next hop structure and its contents is described in 387 Section 4.1. 389 3.1 Relaxing the Best Path Criteria 391 The exercise of setting link costs to produce the most beneficial set 392 of equal costs paths is tedious and very difficult for large topolo- 393 gies. OSPF as defined in RFC--2328 requires that only the best path 394 be considered. For the purpose of Optimized Multipath, this crite- 395 ria can be relaxed to allow a greater number of multipaths but not to 396 the point of creating routing loops. Any next hop which is closer in 397 terms of costs than the current hop and does not cross a virtual link 398 can be considered a viable next hop for multipath routing. If next 399 hops were used where the cost at the next hop is equal or greater, 400 routing loops would form. 402 In considering the paths beyond the next hop path, only the best paths 403 should be considered. There is no way to determine if subsequent 404 routers have relaxed the best path criteria. In addition, there is 405 no need to consider the additional paths if the best path criteria 406 is relaxed downstream. If best path criteria is relaxed downstream, 407 the best paths must be part of the downstream next hop structure. If 408 there are additional paths the the downstream is able to use to fur- 409 ther distribute the load, the entire set of paths will still converge 410 toward optimal loading. 412 The best path criteria is relaxed only for intra-area routes. The 413 best path criteria can also be relaxed when considering the cost to 414 reach ABRs or ASBRs. The best path criteria should not be relaxed 415 when considering the total cost to reach a summary route or external 416 route. 418 3.2 Offloading Congestion Outside the OSPF Area 420 For inter-area routes or external routes, a separate next hop struc- 421 ture must exist for each such route if it is desireable to reduce 422 loading outside of the area and the loading within the area is suffi- 423 ciently low to safely allow this. 425 The existing procedures regarding selection of inter-area and exter- 426 nal routes outlined in [5] still apply. For inter-area routes the 427 intra-area cost and cost of the summary route are summed. For exter- 428 nal routes the intra-area cost is summed with a type 1 external cost 429 and considered before a type 2 external cost. The best path criteria 430 is not relaxed when applied to the sum of intra-area cost and summary 431 route cost or intra-area cost and type 1 external cost. 433 In order for an ABR or ASBR to be considered as a viable exit point to 434 the area for a given destination, it must be advertising an applicable 435 summary route or external route. The best summary route or external 436 route must still be choosen. If a single ABR or ASBR advertises the 437 best route, multiple paths to that ABR or ASBR may be used, but traf- 438 fic cannot be sent toward an ABR or ASBR advertising a higher cost 439 summary route or external route. If two or more ABR or ASBR advertise 440 a route at the same cost, then traffic load can be split among these 441 ABR or ASBR. 443 For intra-area routes if a type LSA_OMP_PATH_LOAD Opaque LSA exists 444 for the summary LSA and more than one ABR is advertising an equally 445 preferred summary route and the equivalent load for the summary LSA 446 is greater than 90% and the equivalent load within the area is suffi- 447 ciently smaller than the inter-area loading, then a next hop structure 448 can be created specifically to allow offloading of the intra-area 449 route. For external routes, if an equivalent loading exists, and more 450 than one ASBR is advertising an equally preferred external route, and 451 the equivalent load is greater than 95% and the equivalent load within 452 the area is sufficiently smaller than the external route loading, then 453 a separate structure is used. 455 Hysterysis must be used in the algorithm for determining if an equiv- 456 alent load on a summary LSA or external route is considered suffi- 457 ciently larger than the intra-area equivalent load or if an external 458 route loading is considered sufficiently larger than the inter-area 459 equivalent load. For for the purpose of describing this algorithm one 460 equivalent load is referred to as the more external, and the other as 461 the more internal equivalent load. 463 If the more external equivalent load exceeds the more internal equiv- 464 alent load by 15% and the more internal equivalent load is under 85%, 465 then a separate next hop structure is created. If the more external 466 equivalent load falls below 20% of the more internal equivalent load 467 or the more internal equivalent load exceeds 98%, then an existing 468 separate next hop structure is marked for removal and combined with 469 the more internal next hop structure (see Section 3.3). The more ex- 470 ternal equivalent load should not fall significantly below the more 471 internal unless either the traffic toward the more external destina- 472 tion increases or the loading on the more internal increases, since 473 the more internal equivalent load will become the critical segment on 474 the separate next hop structure if the load is sufficiently shifted 475 but is unlikely to overshoot by 20%. These threshholds should be 476 configurable at least per type of routes (inter-AS or external). 478 The degree to which Summary LSA loading and external route loading 479 will be considered is limited. This serves two purposes. First, it 480 prevents compensating for external congestion to the point of loading 481 the internal network beyond a fixed threshhold. Second, it prevents 482 triggering the removal of the next hop structure, which if allowed to 483 occur could trigger a hysteresis loop. This mechanisms are described 484 in Section 3.4, and Appendix C.4. 486 3.3 Creating and destroying next hop structures 488 As described in Section 3.2 separate next hop structure is needed 489 if the loading indicated by the type LSA_OMP_PATH_LOAD Opaque LSA or 490 exterior routing protocol is sufficiently high to require separate 491 balancing for traffic to the summary-LSA or exterior route and the 492 intra-AS loading is sufficiently low. 494 When a separate next hop structure is created, the same available 495 paths appear in the structure, leading to the same set of ABR or ASBR. 496 The balance on these available paths should be copied from the exist- 497 ing more internal next hop structure. By initializing the new next 498 hop structure this way, a sudden change in loading is avoided if a 499 great deal of traffic is destined toward the summary route or external 500 route. 502 When a separate next hop structure can be destroyed, the traffic 503 should be transitioned gradually. The next hop structure must be 504 marked for deletion. The traffic share in this separate next hop 505 structure should be gradually changed so that it exactly matches the 506 traffic share in the more internal next hop structure. The gradual 507 change should follow the adjustment rate schedule described in Sec- 508 tion 4.1 where the move increment is increased gradually as moves 509 continue in the same direction. The only difference is that there is 510 no need to overshoot when adjusting to match the more internal next 511 hop structure parameters. Once the separate next hop structure marked 512 for deletion matches the more internal next hop structure, the summary 513 route or external route can be changed to point to the more internal 514 next hop structure and the deletion can be made. 516 3.4 Critcally loaded segment 518 For every set of paths, one link or part of the path is identified as 519 the ``critcally loaded'' segment. This is the part of the path with 520 the highest equivalent load as defined in Section 2.3. For an inter- 521 area route with a separate next hop structure, the critically loaded 522 segment may be the critcally loaded segment for the intra-area set of 523 paths, or it may be the summary LSA if the equivalent load on the sum- 524 mary LSA is greater. For an external route with a separate next hop 525 structure, the critcally loaded segment may be the critcally loaded 526 segment for the internal route or it may be the external route if 527 the equivalent load on the external route is greater. In considering 528 loading reported for summary LSA or external routes, the loading may 529 be clamped to some configured ceiling (see Appendix C.4). If intra- 530 area loading exceeds this ceiling, the summary LSA loads or external 531 routes loads are in effect ignored. 533 Each next hop structure has exactly one ``critcally loaded'' segment. 534 There may be more than one path in the next hop structure sharing 535 this critcally loaded segment. A particular Opaque LSA may be the 536 critcally loaded segment for no next hop structures if it is lightly 537 loaded. Another Opaque LSA may be the critcally loaded segment for 538 many next hop structures if it is heavily loaded. 540 3.5 Optimizing Partial Paths 542 Under some circumstances multiple paths will exist to a destination 543 where all of the available paths share one or more links. In some 544 cases overall system convergence time can be substantially improved by 545 optimizing a partial path when the most heavily loaded link is shared 546 by all available paths to a destination. 548 Computations are actually reduced when partial paths are considered. 549 The next hop structures kept within the routing process must contain 550 the full paths used to reach a destination (this is already a require- 551 ment). After an SPF calculation has changed the next hop structure 552 and before attempting any optimization the set of paths are examined 553 looking for intr-area links which are common to all paths. If any 554 such links are found, only intra-area links closer than any of these 555 links can be considered as candidates for the ``critcally loaded'' 556 segment (Section 3.4). If there is only one immediate hop, no attempt 557 is made to load balance. 559 The change in load adjustment parameters should be applied to the 560 data structures for the full paths even though only a subset of the 561 links are eligible to be considered as the critcally loaded segment. 562 For the purpose of building type LSA_OMP_PATH_LOAD Opaque LSA loading 563 along the entire path must be considered including links shared by all 564 available paths. 566 4 Adjusting Equal Cost Path Loadings 568 Next hop structures are described in Section 3. A next hop structure 569 contains a set of complete paths to a destination. 571 Adjustments are made to a next hop structure to reflect differences in 572 loading on the paths as reported by the type LSA_OMP_LINK_LOAD Opaque 573 LSA and type LSA_OMP_PATH_LOAD Opaque LSA. Section 3.4 describes the 574 selection of a ``critically loaded segment'' which is used to de- 575 termine when to make adjustments and the size of the adjustments. 576 Section 3.5 describes conditions under which some links are excluded 577 from considerations as the ``critically loaded segment''. 579 An adjustment to loading of a given set of equal cost paths is made 580 when one of two conditions are true. Either the ``critically loaded 581 segment'' has been reflooded, or a criteria is met involving 1) the 582 difference between the equivalent load of the``critically loaded seg- 583 ment'' and the lightest loaded path, 2) the equivalent load of the 584 ``critically loaded segment'', 3) the type of destination, intr-area, 585 inter-area, or external, and 4) the amount of time since the last 586 load adjustment. The details of this conditional are described in 587 Appendix B. 589 The reflooding algorithm is designed to be slightly less aggressive 590 than the adjustment algorithm. This reduces the need to continuously 591 flood small changes except in conditions of overload or substantial 592 change in loading. Some overshoot may occur due to adjustments made 593 in the absence of accurate knowledge of loading. 595 4.1 Load Adjustment Rate 597 In order to assure stability the rate of adjustment must be suffi- 598 ciently limited. An adaptive adjustment rate method is used. 600 A ``critcally loaded'' segment for a next hop structure is determined 601 as described in Section 3.4. When the type LSA_OMP_LINK_LOAD Opaque 602 LSA or type LSA_OMP_PATH_LOAD Opaque LSA for this segment is updated 603 or the criteria in Appendix B is met, load is shed from all paths in 604 the next hop structure that include that segment toward all paths in 605 the next hop structure that do not include that segment. A separate 606 set of variables controlling rate of adjustment is kept for each path 607 receiving load. 609 The number of paths usually exceeds the number of next hops. The dis- 610 tinction between paths which share a next hop is important if one of 611 the paths sharing a next hop goes down (see Section 4.2). This dis- 612 tinction is only needed in making the computations. When moving the 613 next hop structure into the data structures used for forwarding, paths 614 which share a common next hop may be combined. 616 The following variables are kept for each path in a next hop struc- 617 ture. 619 1. The current ``traffic share'' (an integer, the range is 0 to 65355 620 for a CRC16 hash), 621 2. The current ``move increment'' used when moving traffic toward this 622 path (an integer, the range is 0 to 65355 for a CRC16 hash), 624 3. The number of moves in the same direction, referred to as the 625 ``move count''. 627 If there is no prior history for a path, then the move increment is 628 initialized to a constant, typically about 1% (about 650 for CRC16). 629 The number of moves in the same direction is initialized to 0. No 630 loading adjustment is made on the first iteration. 632 If the critcally loaded segment has changed, all paths now containing 633 the critically loaded segment are first examined. The lowest move 634 increment of any one of these paths is noted. 636 The move increment is adjusted for each path before any traffic is 637 moved. One of the following actions is taken for each path. 639 1. If the path contains the critcally loaded segment its move incre- 640 ment is left unchanged. 642 2. If the path does not contain the critcally loaded segment but the 643 critically loaded segment has changed and the path contains the 644 prior critcally loaded segment, then first the move increment is 645 replaced with the lowest move increment from any of the paths con- 646 taining the critically loaded segment unless the move increment is 647 already lower. Then in either case the move increment is cut in 648 half. 650 3. If the path does not contain the critcally loaded segment and ei- 651 ther the critically loaded segment has not changed, or the path 652 does not contain the prior critcally loaded segment, then the move 653 increment is increased. 655 The amount increase in the move increment is described in Ap- 656 pendix B.4. The increase is designed to minimize the possibility 657 of dramatic overshoot due to to great an increase in adjustment rate. 659 The move increment is never less than a configured minimum. The in- 660 crease in move increment is never less than one but generally is con- 661 strained to a higher number by virtue of being calculated based on the 662 prior move increment. The configured minimum for the move increment 663 is typically 0.1% (65 for CRC16). The move increment is never allowed 664 to exceed the size of the hash space divided by the number of equal 665 cost paths in the next hop structure. 667 The dramatic decrease in move increment when move direction is re- 668 versed and the slow increase in move increment when it remains in the 669 same direction keeps the algorithm stable. The exponential nature of 670 the increase allows the algorithm to track externally caused changes 671 in traffic loading. 673 The traffic share allocated to a path not containing the critcally 674 loaded segment is incremented by the move amount for that path and the 675 traffic share allocated to the path or paths containing the the crit- 676 cally loaded segment are reduced by this amount divided by the number 677 of paths containing the critcally loaded segment. This adjustment is 678 described in pseudocode in Appendix B.4. 680 This adjustment process is repeated for each path in a next hop struc- 681 ture. The new hash space boundaries are then moved to the forwarding 682 engine. 684 4.2 Dealing with Link Adjacency Changes 686 Link failures do occur for various reasons. OSPF routing will con- 687 verge to a new set of paths. Whatever load balance had previously 688 existed will be upset and the load balancing will have to converge to 689 a new load balanced state. Previous load balancing parameter should 690 remain intact to the extent possible after the SPF calculation has 691 completed. Adjustments for new or deleted paths in the SPF result are 692 described here. These adjustments must be made after the best path 693 criteria is relaxed as described in Section 3.1. 695 4.2.1 Impact of Link Adjacency Changes 697 Links which are intermitent may be the most harmful. The OSPF 698 ``Hello'' protocol is inadequate for handling intermitent links. When 699 such a link is up it may draw traffic during periods of high loss, 700 even brief periods of complete loss. 702 The inadequacies of the OSPF ``Hello'' protocol is well known and many 703 implementations provide lower level protocol state information to OSPF 704 to indicate a link in the ``down'' state. For example, indications 705 may include carrier loss, excessive framing errors, unavailable sec- 706 onds, or loss indications from PPP LQM. 708 Even where the use of a link is avoided by providing indication of 709 lower level link availability, intermitent links are still problem- 710 atic. During a brief period immediately after a link state attribute 711 is initially flooded OSPF state can be inconsistent among routers 712 within the OSPF area. This inconsistency can cause intermittent rout- 713 ing loops and have a severe short term impact on link loading. An 714 oscillating link can cause high levels of loss and is generally better 715 off held in the neighbor adjacency ``down'' state. The algorithm de- 716 scribed in the [7] can be used when advertising OSPF type 1 or type 2 717 LSA (router and network LSAs). 719 Regardless as to whether router and network LSAs are damped, neigh- 720 bor adjacency state changes will occur and router and network LSAs 721 will have to be handled. The LSA may indicate an up transition or 722 a down transition. In either an up or down transition, when the SPF 723 algorithm is applied, existing paths to specific destinations may no 724 longer be usable and new paths may become usable. In the case of an 725 up transition, some paths may no longer be usable because their cost 726 is no longer among those tied for the best. In the case of down tran- 727 sitions, new paths may become usable because they are now the best 728 path still available. 730 4.2.2 Handling the Loss of Paths 732 When a path becomes unusable, paths which previously had the same 733 cost may remain. This can only occur on an LSA down transition. 734 A new next hop entry should be created in which the proportion of 735 source/destination hash space allocated to the now infeasible path 736 is distributed to the remaining paths proportionally to their prior 737 allocation. Very high loading percentages should result, trigger- 738 ing an increase in LSA_OMP_LINK_LOAD Opaque LSA flooding rate until 739 convergence is approached. 741 4.2.3 Handling the Addition of Paths 743 When a new path becomes usable it may be tied for best with paths car- 744 rying existing traffic. This can only occur on an LSA up transition. 745 A new next hop entry should be created in which the loading on the 746 new path is zero. If such a path were to oscillate, little or no load 747 would be affected. If the path remains usable, the shift of load to 748 this path will accellerate until a balance is reached. 750 If a completely new set of best paths becomes available, the load 751 should be split across the available paths. The split used in sim- 752 ulations was a share on a given link proportional to 10% of link 753 capacity plus the remaining link bandwidth as determined by prior 754 LSA_OMP_LINK_LOAD Opaque LSA values. The contribution of link capacity 755 in the weighting should be configurable. See Appendix C.5. 757 Acknowledgements 759 Numerous individual have provided valuable comments regarding this 760 work. Dave Ward made a very substantial contribution by pointing out 761 that the best path criteria could be relaxed. Geoffrey Cristallo pro- 762 vided comments on the handling of inter-area and external routes with 763 worked examples which resulted in corrections and clarifications to 764 this document. John Scudder, Tony Li, and Daniel Awduche have also 765 provided particularly valuable review and comments. 767 References 769 [1] R. Coltun. The ospf opaque lsa option. Technical Report RFC 2370, 770 Internet Engineering Task Force, 1998. ftp://ftp.isi.edu/in- 771 notes/rfc2370.txt. 772 [2] Atul Khanna and John Zinky. The revised ARPAnet routing met- 773 ric. In SIGCOMM Symposium on Communications Architectures and 774 Protocols, pages 45--56, Austin, Texas, September 1989. ACM. 776 [3] Steven H. Low and P. Varaiya. Dynamic behavior of a class of 777 adaptive routing protocols (IGRP). In Proceedings of the Confer- 778 ence on Computer Communications (IEEE Infocom), pages 610--616, 779 March/April 1993. 781 [4] M. Mathis, J. Semke, J. Mahdavi, and T. Ott. The macroscopic be- 782 havior of the TCP congestion avoidance algorithm. ACM Computer 783 Communication Review, 27(3), July 1997. 785 [5] J. Moy. Ospf version 2. Technical Report RFC 2328, Internet Engi- 786 neering Task Force, 1998. ftp://ftp.isi.edu/in-notes/rfc2328.txt. 787 [6] W. Stevens. Tcp slow start, congestion avoidance, fast retrans- 788 mit, and fast recovery algorithms. Technical Report RFC 2001, 789 Internet Engineering Task Force, 1997. ftp://ftp.isi.edu/in- 790 notes/rfc2001.txt. 792 [7] C. Villamizar, R. Chandra, and R. Govindan. Bgp route flap damp- 793 ing. Technical Report RFC 2439, Internet Engineering Task Force, 794 1998. ftp://ftp.isi.edu/in-notes/rfc2439.txt. 796 Security Considerations 798 The use of the Opaque LSAs described in this document do no impact 799 the security of OSPF deployments. In deployments which use a strong 800 OSPF authentication method, and require signatures on LSA from the 801 originating router, no leveraging of a partial compromise beyond a 802 localized disruption of service is possible. In deployments which 803 use a strong OSPF authentication method, but do not require signatures 804 on LSA from the originating router, compromise of a single router can 805 be leveraged to cause significant disruption of service with or with- 806 out the use of these Opaque LSA, but disruption of service cannot be 807 achieved without such a compromise. In deployments which use a weak 808 OSPF authentication method, significant disruption of service can be 809 caused through forged protocol interactions with or without the use of 810 these Opaque LSA. 812 Author's Addresses 814 Curtis Villamizar 815 UUNET Network Architecture Group 816 818 Full Copyright Statement 820 Copyright (C) The Internet Society (February 24, 1999). All Rights 821 Reserved. 823 This document and translations of it may be copied and furnished to 824 others, and derivative works that comment on or otherwise explain it 825 or assist in its implmentation may be prepared, copied, published and 826 distributed, in whole or in part, without restriction of any kind, 827 provided that the above copyright notice and this paragraph are in- 828 cluded on all such copies and derivative works. However, this doc- 829 ument itself may not be modified in any way, such as by removing the 830 copyright notice or references to the Internet Society or other In- 831 ternet organizations, except as needed for the purpose of developing 832 Internet standards in which case the procedures for copyrights defined 833 in the Internet Standards process must be followed, or as required to 834 translate it into languages other than English. 836 The limited permissions granted above are perpetual and will not be 837 revoked by the Internet Society or its successors or assigns. 839 This document and the information contained herein is provided on an 840 ``AS IS'' basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEER- 841 ING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUD- 842 ING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 843 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER- 844 CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 846 A Data Formats 848 +----------------+----------------+----------------+----------------+ 849 | Link State Advertisement Age | Options | LSA Type | 850 +----------------+----------------+----------------+----------------+ 851 | Opaque Type | Opaque ID | 852 +----------------+----------------+----------------+----------------+ 853 | Advertising Router | 854 +----------------+----------------+----------------+----------------+ 855 | Link State Advertisement Sequence Number | 856 +----------------+----------------+----------------+----------------+ 857 | LSA Checksum | LSA length | 858 +----------------+----------------+----------------+----------------+ 859 | Version | Reference Type | Packing Method | BW Scale | 860 +----------------+----------------+----------------+----------------+ 861 | Reference to a Type 1-5 LSA (32 or 64 bits, see below) | 862 +----------------+----------------+----------------+----------------+ 863 | Packed Loading Information (variable length, see below) | 864 +----------------+----------------+----------------+----------------+ 866 The ``LSA Age'', ``Options'', and ``LSA Type'' are part of the Link 867 State Advertisement Format described in Appendix A of RFC--2328. The 868 LSA Type is 10, signifying an Opaque LSA with Area local scope, as de- 869 fined in RFC--2370. RFC--2370 splits the Link State ID field into two 870 part, Opaque Type, and Opaque ID. The Opaque Type contains either the 871 value LSA_OMP_LINK_LOAD or LSA_OMP_PATH_LOAD as described in Section 2. 872 The ``Advertising Router'', ``Link State Advertisement Sequence Num- 873 ber'', ``LSA Checksum'', and ``LSA length'' are part of the Link State 874 Advertisement Format described in Appendix A of RFC--2328. The re- 875 mainder of the packet is specific to Opaque Type LSA_OMP_LINK_LOAD or 876 LSA_OMP_PATH_LOAD LSAs. 878 Opaque Type The Opaque Type will contain the value LSA_OMP_LINK_LOAD 879 or LSA_OMP_PATH_LOAD as described in Section 2. Numeric values will 880 be requested from IANA. 882 Opaque ID The Opaque ID will contain an integer which will be unique 883 per router and interface, virtual interface, or MAC address for 884 which loading is reported. These numbers are only of significance 885 to the advertising router except as a means of identification of 886 subsequent LSAs. 887 For a LSA_OMP_LINK_LOAD Opaque LSA, the ``Opaque ID'' must contain 888 a 24 bit integer that is unique to the link or virtual link. The 889 method of assignment of these 24 bit integers is a local matter. A 890 router must be capable of uniquely identify an interface using a 24 891 bit number. 892 For a LSA_OMP_PATH_LOAD Opaque LSA, the ``Opaque ID'' must contain 893 a 24 bit integer that is unique to a summary LSA or AS-external LSA 894 advertised by the same router. The method of assignment of these 895 24 bit integers is a local matter. A router must be capable of 896 uniquely identify a summary LSA or AS-external LSA using a 24 bit 897 number. 899 Version The version number is 1. 901 Reference Type The Reference Type indicates the type of LSA that is 902 being referenced in the ``Reference to a Type 1-5 LSA'' field. 903 Packing Method The Packing Method is an integer that indicates how 904 the ``Packed Loading Information'' field is formated. 906 BW Scale Bandwidth numbers must be scale by shifting the 32 bit in- 907 teger left by this amount. If this value is non-zero, 64 bit inte- 908 gers or larger should be used to represent the bandwidth. 910 Reference to a Type 1-5 LSA This field contains the 32 bit ``Link 911 State ID'' field of a Type 1-5 LSA. Type 1-5 indicate: 913 1. Router-LSAs 915 2. Network-LSAs 916 3. Summary-LSAs (IP network) 917 4. Summary-LSAs (ASBR) 919 5. AS-external-LSAs 920 Loading information for a Type 1 LSA, a Router-LSA, is sent as a 921 LSA_OMP_LINK_LOAD Opaque LSA. For a Type 1 LSA the ``Link State ID'' 922 field is followed by a 32 bit ``Link ID''. This identifies a single 923 link. There are four types of Links. 925 1. Point-to-point connection to another router 926 2. Connection to a transit network 927 3. Connection to a stub network 929 4. Virtual link 931 Normally loading information is provided for a Link Type 1. Load- 932 ing information may also be provided for a Link Type 2 or 3. Load- 933 ing information cannot be provided to a Link Type 4. 934 Loading information is not provided for Type 2 LSAs, Network-LSAs. 936 Loading information for Type 3 and Type 4 LSAs, Summary-LSAs for 937 IP networks in another area and Summary-LSAs for ASBRs, is sent as 938 a LSA_OMP_PATH_LOAD Opaque LSA as described in Section 2.2. For 939 a Type 3 and Type 4 LSA there is no information in the Reference 940 following the ``Link State ID''. 941 Loading information for Type 5 LSAs, AS-external-LSAs, is sent as 942 a LSA_OMP_PATH_LOAD Opaque LSA as described in Section 2.2. For a 943 Type 5 LSA there is no information in the Reference following the 944 ``Link State ID''. 945 Packed Loading Information The format of the Packed Loading Informa- 946 tion depends on the value of the Packing Method field. Currently 947 only the value 1 is defined. 949 The following format is used when the Packing Method field contains 1. 950 The LSA must be ignored if values other than 1 are found in Packing 951 Method. 953 +----------------+----------------+----------------+----------------+ 954 | In Scaled Link Capacity in kilobits per second | 955 +----------------+----------------+----------------+----------------+ 956 | In Link Loading Fraction | In Link Drop Fraction (packets) | 957 +----------------+----------------+----------------+----------------+ 958 | Out Scaled Link Capacity in kilobits per second | 959 +----------------+----------------+----------------+----------------+ 960 | Out Link Loading Fraction | Out Link Drop Fraction (packet) | 961 +----------------+----------------+----------------+----------------+ 963 The Scaled Link Capacity is an unsigned integer in kilobits per sec- 964 ond. If this would be larger than a 32 bit integer, the value should 965 be shifted to the right until the top bit is in the 32 bit number MSB 966 and the number of bit shifts recorded in the BW Scale field 968 The Link Loading Fraction is a 16 bit unsigned integer from 0 to 969 0xffff representing capacity in bytes being used relative to capac- 970 ity in bytes per the measurement interval. The hex number 0x10000 971 would represent unity, but if full loading has been acheived or due 972 to counting or truncation error, greater than full loading, substitute 973 0xffff. The Link Drop Fraction is a 16 bit unsigned integer from 0 to 974 0xffff representing number of packets dropped relative to the number 975 of packets received. This value can be derived from the change in two 976 MIB-2 counters (ifOutDiscard<<16)/ifInPacket. The hex number 0x10000 977 would represent unity (all of the packets being dropped) so 0xffff 978 must be substituted. 980 B Concise Statement of the Algorithms 982 An OSPF router may play one of two roles, or both. The two functions 983 are flooding loading information and load balancing. An interior 984 routers and edge routers will flood loading information. A router 985 may choose not to flood information if it is beleived that there is no 986 way that the interface could become congested or if it has no way to 987 measure the load, as is the case in a shared broadcast interface. An 988 ingress or interior router will process loading information and if it 989 has equal cost paths will balance load across those paths. 991 The description of algorithms is broken down into the following sub- 992 sections. 994 Flooding Loading Information Appendix B.1 995 Building Next Hop Structures Appendix B.2 997 Processing Loading Information Appendix B.3 999 Adjusting Loading Appendix B.4 1001 The algorithms are defined in the following section in pseudocode. 1003 B.1 Flooding Loading Information 1005 It is assumed that counters are large enough to avoid multiple over- 1006 flow (ie: 64 bit counters are used for high speed interfaces) and 1007 that counter overflow is easily detected is compensated for in counter 1008 deltas. It is assumed that ifInDiscard and ifOutDiscard accurately 1009 counts all queueing drops. 1011 The following counters are sampled at 15 second intervals: 1012 ifInOctets, ifOutOctets, ifInPacket, ifOutPacket, ifInDiscard and 1013 ifOutDiscard. The value if ifInSpeed and ifOutSpeed is assumed to 1014 be constant. Some state must be stored. The previously used value 1015 of each raw counter is needed to compute deltas. State variables 1016 InFilteredUtil, OutFilteredUtil, InLoss, OutLoss, InEquivLoad and Out- 1017 EquivLoad must be saved. The last time a reflooding occurred must 1018 also be stored. 1020 The input and output utilizations are expressed as fractions using 1021 ifInOctets, ifOutOctets, ifInSpeed, and ifOutSpeed. Call the raw 15 1022 second fractional utilizations InRawUtil and OutRawUtil. Compute the 1023 following filtered values for both In and Out, making sure to save the 1024 previous values. 1026 PrevFilteredUtil = FilteredUtil; 1027 if (RawUtil < FilteredUtil) { 1028 FilteredUtil -= (FilteredUtil >> 3); 1029 FilteredUtil += (RawUtil >> 3); 1030 } else if (RawUtil > FilteredUtil) { 1031 FilteredUtil -= (FilteredUtil >> 1); 1032 FilteredUtil += (RawUtil >> 1); 1033 } 1035 InLoss and OutLoss is computed using the ratio of the deltas of Dis- 1036 card to Packet SNMP counters. Next compute an ``equivalent loading'' 1037 for the purpose of determining whether to reflood. 1039 PrevEquivLoad = EquivLoad; 1040 if (Loss < 0.005) { 1041 EquivLoad = FilteredUtil; 1042 } else { 1043 if (Loss <= 0.09) { 1044 LossComp = 10 * sqrt(Loss); 1045 } else { 1046 LossComp = 3; 1047 } 1048 EquivLoad = FilteredUtil * LossComp; 1049 } 1051 A square root is somewhat time consuming to compute, so a table lookup 1052 can be done to avoid this computation. Increments of 0.1% loss would 1053 yield an 90 entry table. A 128-512 entry table would be adequate. 1054 The table can be sized so a shift and mask can be used to index it. 1055 The computation could then be done with a table lookup, a shift, and 1056 an integer multiply. At most this needs to be done for links with 1057 nonzero loss once every 15 seconds. 1059 Then decide whether to flood based on the greater of the relative 1060 change in InEquivLoad or OutEquivLoad and on the time elapsed since 1061 the last reflooding (taking care not to divide by zero). 1063 Diff = max(abs(InEquivLoad - InPrevEquivLoad) 1064 / InPrevEquivLoad, 1065 abs(OutEquivLoad - OutPrevEquivLoad) 1066 / OutPrevEquivLoad); 1067 Load = max(InEquivLoad, OutEquivLoad) 1068 Elapsed = Now - LastReflood; 1069 if (((Load > 1.00) && (Diff > 0.05) && (Elapsed >= 30)) 1070 || ((Load > 1.00) && (Diff > 0.02) && (Elapsed >= 60)) 1071 || ((Load > 1.00) && (Diff > 0.01) && (Elapsed >= 90)) 1072 || ((Load > 1.00) && (Elapsed >= 180)) 1073 || ((Load > 0.90) && (Diff > 0.05) && (Elapsed >= 60)) 1074 || ((Load > 0.90) && (Diff > 0.02) && (Elapsed >= 240)) 1075 || ((Load > 0.90) && (Diff > 0.01) && (Elapsed >= 480)) 1076 || ((Load > 0.90) && (Elapsed >= 600)) 1077 || ((Load > 0.70) && (Diff > 0.10) && (Elapsed >= 60)) 1078 || ((Load > 0.70) && (Diff > 0.05) && (Elapsed >= 120)) 1079 || ((Load > 0.70) && (Diff > 0.02) && (Elapsed >= 480)) 1080 || ((Load > 0.70) && (Elapsed >= 900)) 1081 || ((Load > 0.50) && (Diff > 0.10) && (Elapsed >= 60)) 1082 || ((Load > 0.50) && (Diff > 0.05) && (Elapsed >= 300)) 1083 || ((Load > 0.25) && (Diff > 0.25) && (Elapsed >= 120)) 1084 || ((Load > 0.25) && (Elapsed >= 1200)) 1085 ) { 1086 /* we passed one of the tests so flood it */ 1087 LastReflood = Now; 1088 ... 1090 If the decision is made to reflood an LSA according to the test above, 1091 then input and output FilteredUtil and Loss must be packed into an LSA 1092 and flooded. 1094 The 15second timer interval should be jittered by a random number in 1095 the range of plus or minus 5 seconds (obviously taking the actual time 1096 interval into account in computing rates). 1098 B.2 Building Next Hop Structures 1100 The OSPF SPF calculation is done as per RFC--2328. Minor differ- 1101 ences in the outcome result from relaxing the best path criteria as 1102 described in Section 3.1. Modification to the SPF algorithm is de- 1103 scribed in Appendix D. The arrival of loading information does not 1104 require new SPF calculations since neither reacheability or costs are 1105 changed. 1107 The SPF calculation yields the shortest paths from the given node 1108 to all other routers and networks in the OSPF area. In some cases 1109 multipaths will already exist. For all destinations, every feasible 1110 hop is examined, and paths through next hops that simply provide an 1111 improvement are added, as described in Section 3.1. 1113 After completing the SPF calculation and relaxing the best path cri- 1114 teria, intra-area multipath sets are recorded as next hop structures. 1115 If a previous SPF had been in use, loadings are transfered to the new 1116 set of data structures and links are added or removed as needed as 1117 described in Section 4.2. 1119 After recording the intra-area next hop structures, the existing set 1120 of inter-area next hop structures and the set of external route next 1121 hop structures is examined. Paths are added or removed from next 1122 hop structures as needed, as described in Section 3, Section 3.3, and 1123 Section 4.2. 1125 Inter-area and external routes map onto the intra-area routing. These 1126 therefore share the same set of paths and the same next hop structure 1127 as the intra-area route to the nearest ABR or ASBR. Next hop struc- 1128 tures may be needed to reach any one in a set of ABRs or ASBRs if 1) 1129 there are multiple ABRs equally distant to a summary route or 2) mul- 1130 tiple ASBRs equally distant advertising an external route at the same 1131 cost, 3) relaxing the best path criteria for an intra-area destination 1132 results in going to a next-hop which does not share the same closest 1133 ABR or ASBR. 1135 Next hop structures may also be needed to offload paths in adjacent 1136 areas or external paths. The following conditional is used to deter- 1137 mine whether a next hop structure should be added for a SummaryLSA. 1139 if (IntraAreaLoad < 85% 1140 && SummaryLSALoad > 90% 1141 && SummaryLSALoad - IntraAreaLoad > 15%) { 1142 /* add a next hop structure */ 1143 ... 1145 The conditional for an external route is the same, except the intra- 1146 area load would be a more internal load, intra-area, or Summary LSA, 1147 and the 90% threshhold is increased to 95%. 1149 The following conditional is used to determine is an existing sepa- 1150 rate next hop structure for a Summary LSA or external route should be 1151 deleted. 1153 if (MoreInternalLoad > 98% 1154 || MoreInternalLoad - MoreExternalLoad > 20%) { 1155 /* delete a next hop structure */ 1156 ... 1158 B.3 Processing Loading Information 1160 Adjustments to loading may be triggerred by one of two events. When 1161 a new loading LSA is received, if the LSA corresponds to the most 1162 heavily loaded link for a next hop structure, then the next hop struc- 1163 ture should be readjusted immediately. The last time each next hop 1164 structure has been readjusted must be maintained and periodically 1165 readjusted. Timer events are handled as follows. 1167 foreach NextHopStruct ( AllNextHopStructs ) { 1168 Elapsed = Now - LastReadjust[NextHopStruct]; 1169 MinLoaded = MinEquivLoad(NextHopStruct); 1170 MaxLoaded = MaxEquivLoad(NextHopStruct); 1171 AbsDiff = MaxLoaded - MinLoaded; 1172 if (((Elapsed >= 60) 1173 && (AbsDiff > 0.045) && (MaxLoaded > 0.95)) 1174 || ((Elapsed >= 90) 1175 && (AbsDiff > 0.03) && (MaxLoaded > 0.95)) 1176 || ((Elapsed >= 120) 1177 && (AbsDiff > 0.01) && (MaxLoaded > 0.97)) 1178 || ((Elapsed >= 240) 1179 && (AbsDiff > 0.005) && (MaxLoaded > 0.98)) 1180 || ((Elapsed >= 90) 1181 && (AbsDiff > 0.05) && (MaxLoaded > 0.90)) 1182 || ((Elapsed >= 120) 1183 && (AbsDiff > 0.03) && (MaxLoaded > 0.90)) 1184 || ((Elapsed >= 180) 1185 && (AbsDiff > 0.01) && (MaxLoaded > 0.90)) 1186 || (Elapsed >= 300)) { 1187 /* we need to readjust this multipath set */ 1188 ... 1190 This loop and conditional results in a subset of the available next 1191 hop structures being adjusted based on the timer. The same effect may 1192 be accomplished by determining when a next hop structure will need to 1193 be adjusted if no further flooding changes arrive and queueing next 1194 hop structures on lists according to how long they will remain idle. 1196 B.4 Adjusting Loading 1198 A next hop structure will need to be adjusted when one of the two cri- 1199 teria in the prior section is met. The adjustment procedure itslef 1200 relies upon the following stored state. 1202 NextHopStruct { 1203 LastReadjust; 1204 PrevCriticalSegment; 1205 TotalPaths; 1206 SetofPaths ( 1207 Path; 1208 bit HasCriticalSegment, 1209 HasPrevCriticalSeg; 1210 TrafficShare; 1211 MoveIncrement; 1212 MoveCount; 1213 ); 1214 }; 1216 Before the path move increments are adjusted, a preliminary pass is 1217 made over the next hop structure. This pass notes which paths con- 1218 tain the prior critical segment, notes which paths contain the current 1219 critical segment and counts the number of paths containing the current 1220 critical segment. 1222 NumberWithCritical = 0; 1223 MinRateWithCritical = 65536; 1224 foreach Path ( SetofPaths ) { 1225 SetOrClear HasCriticalSegment; 1226 SetOrClear HasPrevCriticalSeg; 1227 if (HasCriticalSegment) { 1228 ++NumberWithCritical; 1229 if (MoveIncrement < MinRateWithCritical) 1230 MinRateWithCritical = MoveIncrement; 1231 } 1232 } 1234 Next the move increments for each path is adjusted. 1236 foreach Path ( SetofPaths ) { 1237 if (HasCriticalSegment) 1238 continue; 1239 if (!HasPrevCriticalSeg) { 1240 ++MoveCount; 1241 if (MoveCount > 4) { 1242 Increase = MoveIncrement 1243 / (2 * (1 + NumberWithCritical)); 1244 } else { 1245 Increase = MoveIncrement 1246 / (4 * (1 + NumberWithCritical)); 1247 } 1248 if (Increase < 1) 1249 Increase = 1; 1250 MoveIncrement += Increase; 1251 } else { 1252 if (MoveIncrement > MinRateWithCritical) 1253 MoveIncrement = MinRateWithCritical; 1254 MoveIncrement /= 2; 1255 MoveCount = 0; 1256 } 1257 if (MoveIncrement < MinMoveIncrement) 1258 MoveIncrement = MinMoveIncrement; 1259 if (MoveIncrement > 65535) 1260 MoveIncrement = 65535; 1261 } 1263 Then traffic share is adjusted. 1265 foreach Path1 ( SetofPaths ) { 1266 if (!Path1.HasCriticalSegment) 1267 continue; 1268 foreach Path2 ( SetofPaths ) { 1269 if (Path2.HasCriticalSegment) 1270 continue; 1271 Move = Path2.MoveIncrement / NumberWithCritical; 1272 if (Move < 1) 1273 Move = 1; 1274 if (Move > (65536 - Path2.TrafficShare)) { 1275 Move = 65536 - Path2.TrafficShare; 1276 Path2.MoveIncrement = Move; 1277 } 1278 if (Move > Path1.TrafficShare) 1279 Move = Path1.TrafficShare; 1280 Path2.TrafficShare += Move; 1281 Path1.TrafficShare -= Move; 1282 } 1283 } 1285 The traffic shares for paths sharing a common next hop are then summed 1286 and the appropriate information is transferred to the forwarding data 1287 structures. 1289 C Configuration Options 1291 Many of the capabilities described here must be configurable. The 1292 ability to enable and disable capability subsets is needed. Many 1293 parameters used by the algorithm should also be configurable. 1295 C.1 Capability Subsets 1297 There should at least be the ability to enabled and disabled the fol- 1298 lowing. 1300 default description of capability 1301 ON Flooding any loading information 1302 ON Flooding loading information for specific links 1303 - Relaxing best path criteria 1304 - Adjusting traffic shares (default to even split) 1305 OFF Flooding loading information for Summary LSA 1306 OFF Flooding loading information for specific Summary LSA 1307 OFF Flooding loading information for external routes 1308 OFF Flooding loading information for specific external routes 1309 OFF Considering loading information for Summary LSA 1310 OFF Considering loading information for specific Summary LSA 1311 OFF Considering loading information for external routes 1312 OFF Considering loading information for specific exter- 1313 nal routes 1315 Flooding and considering Summary LSA and external route loading in- 1316 formation should be disabled by default. Flooding loading information 1317 should be enabled by default. Relaxing best path criteria and adjust- 1318 ing traffic shares could be enabled or disabled by default, depending 1319 on vendor preference. 1321 C.2 Parameters for Equivalent Load Calculation 1323 The following values affect the computation of equivalent load. 1325 default description of parameter 1326 10 The value of K in ``equiv-load = load * K * sqrt(loss)'' 1327 0.5% The minimum loss rate at which to compensate for loss 1328 9% The maximum loss rate above which compensation is fixed 1330 C.3 Parameters for Relaxing the Best Path Criteria 1332 The following parameter affects the number of next hops and paths 1333 added as a result of relaxing the best path criteria. For example, 1334 increasing the mtric difference to 2 would require the next hop to be 1335 a metric of ``2'' closer than the current distance to the destination, 1336 and reduce the number of paths added. 1338 default description of parameter 1339 1 The metric difference required to relaxing best path 1341 C.4 Parameters for Loading Outside of the OSPF Area 1343 The following parameters affect the creation of separate next hop 1344 structures to compensate for loading on Summary LSA and external 1345 routes when the those loadings are high and intra-AS loadings are 1346 substantially lower. 1348 default description of parameter 1349 15% The loading difference to consider a more external load 1350 over a more internal load 1351 85% The maximum internal loading where a more external load 1352 will become eligible for consideration 1353 90% The minimum loading in which a Summary LSA will be 1354 considered over a an intra-area loading 1355 95% The minimum loading in which an external route will be 1356 considered over a an intra-area loading 1358 20% The load difference at which an external load will be 1359 removed from consideration due to being well under the 1360 internal load. 1361 94% The maximum value used for in place of loading for a 1362 Summary LSA when performing traffic share adjustment. 1363 98% The internal loading where a Summary LSA will be 1364 removed from consideration over the internal load 1365 90% The maximum value used for in place of loading for a 1366 external route when performing traffic share adjustment. 1367 98% The internal loading where a external route will be 1368 removed from consideration over the internal load 1370 Limiting the compensation that will be made to accommodate external 1371 loading is consistent with the reason for considering external routes. 1372 Rarely does a business go out of its way to improve the performance of 1373 their competitor's product, a network service or otherwise. Avoiding 1374 congestion in a peer's network is doing a favor for one's own cus- 1375 tomers by not sending their traffic into known areas of congestion 1376 but only if it does not significantly impact congestion in one's own 1377 network. 1379 Limiting the compensation for Summary LSA loading and external route 1380 loading avoids triggering the hysteresis criteria where a separate 1381 next hop structure is deleted if an internal loading exceeds a fixed 1382 threshhold. In effect the loading on the Summary LSA loading and ex- 1383 ternal route loading is ignored if internal loadings exceed a given 1384 threshhold, since the Summary LSA loading or external route loading 1385 will no longer be considered as the critical segment. If internal 1386 loading reaches a point where even with load balancing internal paths 1387 exceed the higher threshhold, the next hop structure will be removed. 1389 C.5 Parameters for Initial Loading of Paths 1391 When determining the initial loading on a new set of paths, where the 1392 destination was previously unreachable, or none of the previous paths 1393 appear in the new next hop structure, the following weighting is used. 1395 weight = (link-capacity * 0.10) 1396 + (link-capacity * (1 - utilization)) 1398 The contribution of link capacity in the weighting should be config- 1399 urable. 1401 default description of parameter 1402 10% The fraction of total link capacity to consider in 1403 addition to the reported unused link capacity. 1405 C.6 Parameters associated with Flooding and Traffic Share 1407 Parameters associated with flooding rate, move increment and traffic 1408 share adjustment should not be configurable by the user or should be 1409 well hidden so as only to be accessible to developers. Adjustment of 1410 these parameters can slow convergence or increase overshoot. If the 1411 parameters are sufficiently altered instability could result. While 1412 it is likely that some improvements could result from further tuning, 1413 experimentation on live networks is not encouraged. 1415 D Modified SPF Calculation 1417 The most common implementation of the SPF calculation is Dikstra's 1418 algorithm. Most implementations do not yield the full path as a 1419 consequence of the SPF calculation. Retaining the full path as the 1420 algorithm procedes is a relatively minor modification. 1422 If the best path criteria is relaxed, the information obtained from 1423 a single Dikstra calculation is insufficient. Dikstra's algorithm 1424 provides a very efficient single-source shortest path calculation. 1425 For the relaxed best path criteria, the cost to any destination from 1426 any immediately adjacent node is needed in addition to the set of best 1427 paths and costs from the current node. 1429 It is beleived to be more efficient to compute an SPF using Dikstra's 1430 algorithm from the standpoint of each adjacent node in addition to an 1431 SPF from the current node than it is to use an algorithm to compute 1432 the costs from any node to any other node. The former runs in order 1433 N squared while algorithms to accomplish the latter runs in order N 1434 cubed, where N is the number of nodes in the graph. The amount of 1435 computation would be expected to be about equal in the case where all 1436 nodes are immediately adjacent to the current node. 1438 There is likely to be more efficient methods of computing the costs 1439 from a subset of nodes to all destinations than either using multiple 1440 Dikstra calculations or computing the costs from all nodes to all oth- 1441 ers and only making use of a subset of the results. This efficiency 1442 consideration is left as an exercise for the implementor. 1444 E Algorithm Performance 1446 A number of simulations have been performed to test the OSPF-OMP algo- 1447 rithms. In these simulations the algorithm has been demonstrated to 1448 be very stable. This work has not been formally published yet but is 1449 currently available at http://engr.ans.net/ospf-omp. 1451 The simulations done to date have only modeled behavior of load bal- 1452 ancing intra-area routes. This would be applicable to networks in 1453 which external routing was mapped onto IGP routing with a single OSPF 1454 area. Passing loading information between areas, allowing loading in 1455 one area affect an adjacent area, has not been simulated. Similarly 1456 passing loading information with external routes and affecting loading 1457 in a peer AS has not been simulated. 1459 E.1 Conversion from Loss to Equivalent Load 1461 The current adjustment for loss in the equivalent load is based on 1462 the premise that traffic is dominated by TCP and that TCP flows suffi- 1463 ciently unconstrained by other bottlenecks and of sufficient duration 1464 exist to keep the aggrgate traffic flow elastic. This is considered 1465 a very safe assumption for Internet traffic. Enterprise networks 1466 may have a higher contribution of incompressible traffic (traffic not 1467 conforming to a form of congestion avoidance such as TCP congestion 1468 avoidance described in RFC--2001). 1470 The assumed relationship between packet loss and link utilization is 1471 based on the work of Mathis et al [4]. The constants in this rela- 1472 tionship cannot be determined as they depend on delay bandwith product 1473 of TCP flows, the number and duration of TCP flows, and whether TCP 1474 flows are severely constrained elsewhere. 1476 The objective is to estimate the offered load, which cannot be mea- 1477 sured directly when links are at full utilization, using the link 1478 utilization and loss. The load adjustment algorithm should remain 1479 stable as long as the first derivative of the estimator over offered 1480 load remains positive. If the first derivative is negative within 1481 some region, then oscillation will occur within that range of opera- 1482 tion. The greatest risk of this occurring is in routers where active 1483 queue management is not used (ie: where drop-tail queues are used) 1484 and in particular where buffering is too small. In such cases, as 1485 offered load increases just beyond full utilization, loss increases 1486 somewhat, but utilization can drop substantially (typically to about 1487 90%) as offered load increases. In this region, as the offered load 1488 increases, the estimator of offered load may decrease, causing the 1489 link to appear less loaded than another. The rather aggresive com- 1490 pensation for loss is intended to insure that this effect either does 1491 not occur, or occurs only within a very narrow range of offerred load 1492 at just over full utiization. If the derivative is negative within a 1493 narrow range, oscillations can occur only within that range, and the 1494 oscillations are well bounded. 1496 E.2 Performance as traffic loads change 1498 This work has considerable advantages over other approaches, particu- 1499 larly traffic engineering approaches that involve adjustment of vir- 1500 tual circuit layouts based on historic traffic figures. The advantage 1501 is the ability to adjust loading gradually as actual offerred traffic 1502 deviates for expected. The advantage is even greater when compared to 1503 dynamic virtual circuit layouts, using PNNI for example, since these 1504 algorithms have proven to often converge on very suboptimal layouts. 1506 Simulations demonstrating behavior in these particular cases can be 1507 found at http://engr.ans.net/ospf-omp/ramp.html. 1509 E.3 Convergence after a major perturbation 1511 Simulations have been performed in which link failures are examined. 1512 Without relaxing the best path criteria, OSPF OMP is quite dependant 1513 of the set of link metrics to create a set of equal cost paths that 1514 will allow the most heavily loaded portions of the topology to be 1515 offloaded. When links fail, the set of metrics often are far from 1516 ideal for the remaining topology. Relaxing the best path criteria 1517 significantly improves the response to link failure. 1519 Simulations are available at http://engr.ans.net/ospf-omp/fail.html. 1521 F Examples 1523 Figure 2 provides a simple topology. For the purpose of illustrating 1524 how OMP works, only the traffic flow from left to right between a few 1525 pair of dominant traffic contributors is considered. 1527 The traffic mapped onto the topology in Figure 2 is dominated by the 1528 ingress nodes A and F and the egress nodes E and G. The capacity of 1529 the links are 1 excpt link E-G which has a capacity of 2. The load 1530 contributed by the ingress-egress pairs A-E, F-G, and F-E are 0.5. 1531 The node pair A-G contributes a load of 1. Link costs are all 2, 1532 except F-D which is 3, and F-G which is 6. 1534 If ECMP were used, all the traffic from F to E would take the path 1535 F-D-E. All the traffic from F to G would take the link F-G. All the 1536 cost=2 cost=2 cost=2 cost=2 1537 A ----------> B ----------> C ----------> E ----------> G 1538 | load=1.5 | load=0.5 load=0.5 ^ load<25% ^ 1539 | | | | 1540 | | cost=2 cost=2 | | 1541 | \-----------> D ------------/ | 1542 | load=0.5 ^ load=1.0 | 1543 | | | 1544 | cost=3 | load=0.5 | 1545 | cost=2 | cost=6 | 1546 \-------------------------> F --------------------------/ 1547 load=0.5 load=1.0 1549 Figure 2: A Simple Example 1551 traffic from A to E would take the hop from A to B and then at B it 1552 would be split evenly between the paths B-C-E and B-D-E. Half of the 1553 traffic from A to G would take the hop A-B and half would take the hop 1554 A-G. 1556 ingress-egress load and path 1557 F-E 0.5 0.50 F-D-E 1558 F-G 0.5 0.50 F-G 1559 A-E 0.5 0.25 A-B-C-E 1560 0.25 A-B-D-E 1561 A-G 1.0 0.33 A-B-C-E-G 1562 0.33 A-B-D-E-G 1563 0.33 A-F-G 1564 link loading status 1565 A-B 1.16 overloaded 1566 A-F 0.33 underutilized 1567 B-C 0.58 underutilized 1568 B-D 0.58 underutilized 1569 C-E 0.58 underutilized 1570 D-E 1.08 overloaded 1571 E-G 0.66 underutilized 1572 F-D 0.50 underutilized 1573 F-G 0.83 near capacity 1575 Above is the initial loading for OMP which differs slightly from ECMP. 1576 In ECMP half the traffic from A to G would take the A-F, where OMP 1577 starts out with one third of the A-G traffic on link A-F. 1579 Note that using OMP the path F-D-E-G with cost 7 is considered close 1580 enough to equal to the path F-G with cost 6. This is because the 1581 next hop D is closer to G with a cost of 4 than F is with a cost of 1582 6. Initially node F would not move load over because link D-E at a 1583 loading of 1.08 is in worse shape than node F-G at a loading of 0.83. 1585 For illustrative purposes three opportunities for moving load are 1586 considered separately. These are shown in Figure 3, Figure 4, and 1587 Figure 5. 1589 cost=2 cost=2 cost=2 cost=2 1590 A ----------> B ----------> C ----------> E ----------> G 1591 | load=1.25 | load=.75 load=.75 ^ load<25% ^ 1592 | | | | 1593 | | cost=2 cost=2 | | 1594 | \-----------> D ------------/ | 1595 | load=.25 ^ load=.75 | 1596 | | | 1597 | cost=3 | load=0.5 | 1598 | cost=2 | cost=6 | 1599 \-------------------------> F --------------------------/ 1600 load=0.75 load=1.25 1602 Figure 3: First Opportunity for Load Adjustment 1604 Node B has a clear opportunity to reduce the load on the link D-E by 1605 moving traffic from the next hop D to the next hop C. If this opti- 1606 mization were to be applied alone, utilizations on the links B-C, 1607 C-E, and D-E would approach 0.75 and utilization on the link B-D would 1608 approach 0.25. This is illustrated in Figure 3. 1610 Node A can reduce the loss on link A-B by putting more load on link 1611 A-F. This will initially have the effect of lowering A-B to 1.0 and 1612 raising F-G to 1.0. The links can only pass 100would just reduce loss 1613 on link A-B at the expense of increasing loss on link F-G. The load on 1614 link A-F would increase to 0.5. 1616 After node B had moved enough traffic away from link D-E so that its 1617 loading fell below the 1.0 loading of link F-G, node F would begin 1618 to move traffic destined to G away from link F-G. Load would be added 1619 to link D-E but node B would continue to move load away from D-E. 1620 Utilizations of B-C, C-E, and D-E would rise. Utilizations of F-D 1621 would also approach 0.5 and utilization on F-G would fall. This is 1622 illustrated in Figure 4. 1624 Node A would be faced with an overloaded link A-B and a better path to 1625 G of A-F-G, with the worst loading being at link F-G, initially only 1626 slightly over capacity. Both links A-B and F-G would be reporting 1627 100% utilization but link A-B would be expected to report higher loss. 1628 In addition, as the other optimizations proceed, the utilization of 1629 link F-G would approach 100%. 1631 cost=2 cost=2 cost=2 cost=2 1632 A ----------> B ----------> C ----------> E ----------> G 1633 | load=1.25 | load=.92 load=.92 ^ load<25% ^ 1634 | | | | 1635 | | cost=2 cost=2 | | 1636 | \-----------> D ------------/ | 1637 | load=.08 ^ load=.92 | 1638 | | | 1639 | cost=3 | load=0.92 | 1640 | cost=2 | cost=6 | 1641 \-------------------------> F --------------------------/ 1642 load=0.75 load=0.92 1644 Figure 4: Second Opportunity for Load Adjustment 1646 cost=2 cost=2 cost=2 cost=2 1647 A ----------> B ----------> C ----------> E ----------> G 1648 | load=1.0 | load=1.0 load=1.0 ^ load=25% ^ 1649 | | | | 1650 | | cost=2 cost=2 | | 1651 | \-----------> D ------------/ | 1652 | load=0 ^ load=1.0 | 1653 | | | 1654 | cost=3 | load=1.0 | 1655 | cost=2 | cost=6 | 1656 \-------------------------> F --------------------------/ 1657 load=1.0 load=1.0 1659 Figure 5: Another Opportunity for Load Adjustment 1661 Node A would move traffic from the next hop of B to the next hop of 1662 F. Node F will continue to move load from F-G to F-D-E. Node B will 1663 continue to move load from B-D-E to B-C-E. The utilizations of links 1664 A-B, B-C, C-E, D-E, F-D, and F-G will approach 0.83. Utilization of 1665 link A-F will approach 0.63. Utilization of link B-D will approach 1666 zero. This is illustrated in Figure 5. 1668 The following table provides the approximate state of traffic loading 1669 acheived in a brief simulation. Of 6 links that could be balanced at 1670 about 0.83, 3 converged to about 0.85, and three to about 0.82. Note 1671 that the path F-G-E was used to get from F to E in addition to the 1672 lower cost F-D-E. 1674 ingress-egress load and path 1675 F-E 0.5 0.25 F-D-E 1676 0.25 F-G-E 1677 F-G 0.5 0.25 F-G 1678 0.25 F-D-E-G 1679 A-E 0.5 0.25 A-B-C-E 1680 0.00 A-B-D-E 1681 0.12 A-F-D-E 1682 0.12 A-F-G-E 1683 A-G 1.0 0.60 A-B-C-E-G 1684 0.00 A-B-D-E-G 1685 0.20 A-F-G 1686 0.20 A-F-D-E-G 1687 link loading status 1688 A-B 0.85 1689 A-F 0.65 1690 B-C 0.85 1691 B-D 0.00 1692 C-E 0.85 1693 D-E 0.82 1694 E-G 0.80 1695 F-D 0.82 1696 F-G 0.82 1698 In this example, multiple links are balanced at 82% to 85% utiliza- 1699 tion. Without using OMP it is difficult (it might be impossible using 1700 only ECMP) to avoid applying an offerred load that exceeds link ca- 1701 pacity on parts of the topology. This example is intended to provide 1702 a more advanced tutorial than the trivial three node example in Fig- 1703 ure 1. 1705 This example is among the simulations at http://engr.ans.net/ospf- 1706 omp/tutorial. More complex topologies and traffic patterns have been 1707 simulated and are available at the same URL.