idnits 2.17.1 draft-ietf-idr-route-damp-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 39 instances of too long lines in the document, the longest one being 5 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 707 has weird spacing: '...in path exc...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 15, 1998) is 9477 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '2' is defined on line 1447, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 1454, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 1458, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 1468, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 1473, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 1477, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1268 (ref. '1') (Obsoleted by RFC 1655) -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Downref: Normative reference to an Historic RFC: RFC 1267 (ref. '3') ** Obsolete normative reference: RFC 1771 (ref. '5') (Obsoleted by RFC 4271) ** Downref: Normative reference to an Historic RFC: RFC 1520 (ref. '6') ** Downref: Normative reference to an Informational RFC: RFC 1774 (ref. '7') ** Downref: Normative reference to an Informational RFC: RFC 1773 (ref. '8') Summary: 16 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Curtis Villamizar 3 INTERNET-DRAFT ANS 4 draft-ietf-idr-route-damp-03 Ravi Chandra 5 Cisco 6 Ramesh Govindan 7 ISI 8 May 15, 1998 10 BGP Route Flap Damping 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its areas, 16 and its working groups. Note that other groups may also distribute 17 working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet- Drafts as reference 22 material or to cite them other than as ``work in progress.'' 24 To view the entire list of current Internet-Drafts, please check 25 the "1id-abstracts.txt" listing contained in the Internet-Drafts 26 Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net 27 (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au 28 (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu 29 (US West Coast). 31 Abstract 33 A usage of the BGP routing protocol is described which is capable of 34 reducing the routing traffic passed on to routing peers and therefore 35 the load on these peers without adversely affecting route convergence 36 time for relatively stable routes. This technique has been 37 implemented in commercial products supporting BGP. The technique is 38 also applicable to IDRP. 40 The overall goals are: 42 o to provide a mechanism capable of reducing router processing load 43 caused by instability 45 o in doing so prevent sustained routing oscillations 47 o to do so without sacrificing route convergence time for generally 48 well behaved routes. 50 This must be accomplished keeping other goals of BGP in mind: 52 o pack changes into a small number of updates 54 o preserve consistent routing 56 o minimal addition space and computational overhead 58 An excessive rate of update to the advertised reachability of a subset 59 of Internet prefixes has been widespread in the Internet. This 60 observation was made in the early 1990s by many people involved in 61 Internet operations and remains the case. These excessive updates are 62 not necessarily periodic so route oscillation would be a misleading 63 term. The informal term used to describe this effect is ``route 64 flap''. The techniques described here are now widely deployed and are 65 commonly referred to as ``route flap damping''. 67 1 Overview 69 To maintain scalability of a routed internet, it is necessary to 70 reduce the amount of change in routing state propagated by BGP in 71 order to limit processing requirements. The primary contributors of 72 processing load resulting from BGP updates are the BGP decision 73 process and adding and removing forwarding entries. 75 Consider the following example. A widely deployed BGP implementation 76 may tend to fail due to high routing update volume. For example, it 77 may be unable to maintain it's BGP or IGP sessions if sufficiently 78 loaded. The failure of one router can further contribute to the load 79 on other routers. This additional load may cause failures in other 80 instances of the same implementation or other implementations with a 81 similar weakness. In the worst case, a stable oscillation could 82 result. Such worse cases have already been observed in practice. 84 A BGP implementation must be prepared for a large volume of routing 85 traffic. A BGP implementation cannot rely upon the sender to 86 sufficiently shield it from route instabilities. The guidelines here 87 are designed to prevent sustained oscillations, but do not eliminate 88 the need for robust and efficient implementations. The mechanisms 89 described here allow routing instability to be contained at an AS 90 border router bordering the instability. 92 Even where BGP implementations are highly robust, the performance of 93 the routing process is limited. Limiting the propagation of 94 unnecessary change then becomes an issue of maintaining reasonable 95 route change convergence time as a routing topology grows. 97 2 Methods of Limiting Route Advertisement 99 Two methods of controlling the frequency of route advertisement are 100 described here. The first involves fixed timers. The fixed timer 101 technique has no space overhead per route but has the disadvantage of 102 slowing route convergence for the normal case where a route does not 103 have a history of instability. The second method overcomes this 104 limitation at the expense of maintaining some additional space 105 overhead. The additional overhead includes a small amount of state 106 per route and a very small processing overhead. 108 It is possible and desirable to combine both techniques. In practice, 109 fixed timers have been set to very short time intervals and have 110 proven useful to pack routes (NLRI) into a smaller number of updates 111 when routes arrive in separate updates. 113 Seldom are fixed timers set to the tens of minutes to hours that would 114 be necessary to actually damp route flap. To do so would produce the 115 undesirable effect of severely limiting routing convergence. 117 2.1 Existing Fixed Timer Recommendations 119 BGP-3 does not make specific recommendations in this area [1]. The 120 short section entitled ``Frequency of Route Selection'' simply 121 recommends that something be done and makes broad statements regarding 122 certain properties that are desirable or undesirable. 124 BGP4 retains the ``Frequency of Route Advertisement'' section and adds 125 a ``Frequency of Route Origination'' section. BGP-4 describes a 126 method of limiting route advertisement involving a fixed 127 (configurable) MinRouteAdvertisementInterval timer and fixed 128 MinASOriginationInterval timer [5]. The recommended timer values of 129 MinRouteAdvertisementInterval is 30 seconds and 130 MinASOriginationInterval is 15 seconds. 132 2.2 Desirable Properties of Damping Algorithms 134 Before describing damping algorithms the objectives need to be clearly 135 defined. Some key properties are examined to clarify the design 136 rationale. 138 The overall objective is to reduce the route update load without 139 limiting convergence time for well behaved routes. To accomplish 140 this, criteria must be defined for well behaved and poorly behaved 141 routes. An algorithm must be defined which allows poorly behaved 142 routes to be identified. Ideally, this measure would be a prediction 143 of the future stability of a route. 145 Any delay in propagation of well behaved routes should be minimal. 146 Some delay is tolerable to support better packing of updates. Delay 147 of poorly behave routes should, if possible, be proportional to a 148 measure of the expected future instability of the route. Delay in 149 propagating an unstable route should cause the unstable route to be 150 suppressed until there is some degree of confidence that the route has 151 stabilized. 153 If a large number of route changes are received in separate updates 154 over some very short period of time and these updates have the 155 potential to be combined into a single update then these should be 156 packed as efficiently as possible before propagating further. Some 157 small delay in propagating well behaved routes is tolerable and is 158 necessary to allow better packing of updates. 160 Where routes are unstable, use and announcement of the routes should 161 be suppressed rather than suppressing their removal. Where one route 162 to a destination is stable, and another route to the same destination 163 is somewhat unstable, if possible, the unstable route should be 164 suppressed more aggressively than if there were no alternate path. 166 Routing consistency within an AS is very important. Only very minimal 167 delay of internal BGP (IBGP) should be done. Routing consistency 168 across AS boundaries is also very important. It is highly undesirable 169 to advertise a route that is different from the route that is being 170 used, except for a very minimal time. It is more desirable to 171 suppress the acceptance of a route (and therefore the use of that 172 route in the IGP) rather than suppress only the redistribution. 174 It is clearly not possible to accurately predict the future stability 175 of a route. The recent history of stability is generally regarded as 176 a good basis for estimating the likelihood of future stability. The 177 criteria that is used to distinguish well behaved from poorly behaved 178 routes is therefore based on the recent history of stability of the 179 route. There is no simple quantitative expression of recent stability 180 so a figure of merit must be defined. Some desirable characteristics 181 of this figure of merit would be that the farther in the past that 182 instability occurred, the less it's affect on the figure of merit and 183 that the instability measure would be cumulative rather than 184 reflecting only the most recent event. 186 The algorithms should behave such that for routes which have a history 187 of stability but make a few transitions, those transitions should be 188 made quickly. If transitions continue, advertisement of the route 189 should be suppressed. There should be some memory of prior instabil- 190 ity. The degree to which prior instability is considered should be 191 gradually reduced as long as the route remains announced and stable. 193 2.3 Design Choices 195 After routes have been accepted their readvertisement will be briefly 196 suppressed to improve packing of updates. There may be a lengthy 197 suppression of the acceptance of an external route. How long a route 198 will be suppressed is based on a figure of merit that is expected to 199 be correlated to the probability of future instability of a route. 200 Routes with high figure of merit values will be suppressed. An 201 exponential decay algorithm was chosen as the basis for reducing the 202 figure of merit over time. These choices should be viewed as 203 suggestions for implementation. 205 An exponential decay function has the property that previous 206 instability can be remembered for a fairly long time. The rate at 207 which the instability figure of merit decays slows as time goes on. 208 Exponential decay has the following property. 210 f(f(figure-of-merit, t1), t2) = f(figure-of-merit, t1+t2) 212 This property allows the decay for a long period to be computed in a 213 single operation regardless of the current value (figure-of-merit). 214 As a performance optimization, the decay can be applied in fixed time 215 increments. Given a desired decay half life, the decay for a single 216 time increment can be computed ahead of time. The decay for multiple 217 time increments is expressed below. 219 f(figure-of-merit, n*t0) = f(figure-of-merit, t0)**n = K**n 221 The values of K ** n can be precomputed for a reasonable number of 222 ``n'' and stored in an array. The value of ``K'' is always less than 223 one. The array size can be bounded since the value quickly approaches 224 zero. This makes the decay easy to compute using an array bound 225 check, an array lookup and a single multiply regardless as to how much 226 time has elapsed. 228 3 Limiting Route Advertisements using Fixed Timers 230 This method of limiting route advertisements involves the use of fixed 231 timers applied to the process of sending routes. It's primary purpose 232 is to improve the packing of routes in BGP update messages. The delay 233 in advertising a stable route should be bounded and minimal. The 234 delay in advertising an unreachable need not be zero, but should also 235 be bounded and should probably have a separate bound set less than or 236 equal to the bound for a reachable advertisement. 238 Routes that need to be readvertised can be marked in the RIB or an 239 external set of structures maintained, which references the RIB. 240 Periodically, a subset of the marked routes can be flushed. This is 241 fairly straightforward and accomplishes the objectives. Computation 242 for too simple an implementation may be order N squared. To avoid N 243 squared performance, some form of data structure is needed to group 244 routes with common attributes. 246 An implementation should pack updates efficiently, provide a minimum 247 readvertisement delay, provide a bounds on the maximum readvertisement 248 delay that would be experienced solely as a result of the algorithm 249 used to provide a minimum delay, and must be computationally efficient 250 in the presence of a very large number of candidates for 251 readvertisement. 253 4 Stability Sensitive Suppression of Route Advertisement 255 This method of limiting route advertisements uses a measure of route 256 stability applied on a per route basis. This technique is applied 257 when receiving updates from external peers only (EBGP). Applying this 258 technique to IBGP learned routes or to advertisement to IBGP or EBGP 259 peers after making a route selection can result in routing loops. 261 A figure of merit based on a measure of instability is maintained on a 262 per route basis. This figure of merit is used in the decision to 263 suppress the use of the route. Routes with high figure of merit are 264 suppressed. Each time a route is withdrawn, the figure of merit is 265 incremented. While the route is not changing the figure of merit 266 value is decayed exponentially with separate decay rates depending on 267 whether the route is stable and reachable or has been stable and 268 unreachable. The decay rate may be slower when the route is unreach- 269 able, or the stability figure of merit could remain fixed (not decay 270 at all) while the route remains unreachable. Whether to decay un- 271 reachable routes at the same rate, a slower rate, or not at all is an im- 272 plementation choice. Decaying at a slower rate is recommended. 274 A very efficient implementation is suggested in the following 275 sections. The implementation only requires computation for the routes 276 contained in an update, when an update is received or withdrawn (as 277 opposed to the simplistic approach of periodically decaying each 278 route). The suggested implementation involves only a small number of 279 simple operations, and can be implemented using scaled integers. 281 The behavior of unstable routes is fairly predictable. Severely 282 flapping routes will often be advertised and withdrawn at regular time 283 intervals corresponding to the timers of a particular protocol (the 284 IGP or exterior protocol in use where the problem exists). Marginal 285 circuits or mild congestion can result in a long term pattern of 286 occasional brief route withdrawal or occasional brief connectivity. 288 4.1 Single vs. Multiple Configuration Parameter Sets 290 The behavior of the algorithm is modified by a number of configurable 291 parameters. It is possible to configure separate sets of parameters 292 designed to handle short term severe route flap and chronic milder 293 route flap (a pattern of occasional drops over a long time period). 294 The former would require a fast decay and low threshold (allowing a 295 small number of consecutive flaps to cause a route to be suppressed, 296 but allowing it to be reused after a relatively short period of 297 stability). The latter would require a very slow decay and a higher 298 threshold and might be appropriate for routes for which there was an 299 alternate path of similar bandwidth. 301 It may also be desirable to configure different thresholds for routes 302 with roughly equivalent alternate paths than for routes where the 303 alternate paths have a lower bandwidth or tend to be congested. This 304 can be solved by associating a different set of parameters with 305 different ranges of preference values. Parameter selection could be 306 based on BGP LOCAL_PREF. 308 Parameter selection could also be based on whether an alternate route 309 was known. A route would be considered if, for any applicable 310 parameter set, an alternate route with the specified preference value 311 existed and the figure of merit associated with the parameter set did 312 not indicate a need to suppress the route. A less aggressive 313 suppression would be applied to the case where no alternate route at 314 all existed. In the simplest case, a more aggressive suppression 315 would be applied if any alternate route existed. Only the highest 316 preference (most preferred) value needs to be specified, since the 317 ranges may overlap. 319 It might also be desirable to configure a different set of thresholds 320 for routes which rely on switched services and may disconnect at times 321 to reduce connect charges. Such routes might be expected to change 322 state somewhat more often, but should be suppressed if continuous 323 state changes indicate instability. 325 While not essential, it might be desirable to be able to configure 326 multiple sets of configuration parameters per route. It may also be 327 desirable to be able to configure sets of parameters that only 328 correspond to a set of routes (identified by AS path, peer router, 329 specific destinations or other means). Experience may dictate how 330 much flexibility is needed and how to best to set the parameters. 331 Whether to allow different damping parameter sets for different 332 routes, and whether to allow multiple figures of merit per route is an 333 implementation choice. 335 Parameter selection can also be based on prefix length. The rationale 336 is that longer prefixes tend to reach less end systems and are less 337 important and these less important prefixes can be damped more 338 aggressively. This technique is in fairly widespread use. Small 339 sites or those with dense address allocation who are multihomed are 340 often reachable by long prefixes which are not easily aggregated. 341 These sites tend to dispute the choice of prefix length for parameter 342 selection. Advocates of the technique point out that it encourages 343 better aggregation. 345 4.2 Configuration Parameters 347 At configuration time, a number of parameters may be specified by the 348 user. The configuration parameters are expressed in units meaningful 349 to the user. These differ from the parameters used at run time which 350 are in unit convenient for computation. The run time parameters are 351 derived from the configuration parameters. Suggested configuration 352 parameters are listed below. 354 cutoff threshold (cut) 356 This value is expressed as a number of route withdrawals. It is 357 the value above which a route advertisement will be suppressed. 359 reuse threshold (reuse) 361 This value is expressed as a number of route withdrawals. It is 362 the value below which a suppressed route will now be used again. 364 maximum hold down time (T-hold) 366 This value is the maximum time a route can be suppressed no matter 367 how unstable it has been prior to this period of stability. 369 decay half life while reachable (decay-ok) 371 This value is the time duration in minutes or seconds during which 372 the accumulated stability figure of merit will be reduced by half 373 if the route if considered reachable (whether suppressed or not). 375 decay half life while unreachable (decay-ng) 377 This value is the time duration in minutes or seconds during which 378 the accumulated stability figure of merit will be reduced by half 379 if the route if considered unreachable. If not specified or set to 380 zero, no decay will occur while a route remains unreachable. 382 decay memory limit (Tmax-ok or Tmax-ng) 384 This is the maximum time that any memory of previous instability 385 will be retained given that the route's state remains unchanged, 386 whether reachable or unreachable. This parameter is generally used 387 to determine array sizes. 389 There may be multiple sets of the parameters above as described in 390 Section 4.1. The configuration parameters listed below would be 391 applied system wide. These include the time granularity of all 392 computations, and the parameters used to control reevaluation of 393 routes that have previously been suppressed. 395 time granularity (delta-t) 397 This is the time granularity in seconds used to perform all decay 398 computations. 400 reuse list time granularity (delta-reuse) 402 This is the time interval between evaluations of the reuse lists. 403 Each reuse lists corresponds to an additional time increment. 405 reuse list memory reuse-list-max 407 This is the time value corresponding to the last reuse list. This 408 may be the maximum value of T-hold for all parameter sets of may be 409 configured. 411 number of reuse lists (reuse-list-size) 413 This is the number of reuse lists. It may be determined from 414 reuse-list-max or set explicitly. 416 A necessary optimization is described in Section 4.8.6 that involves 417 an array referred to as the ``reuse index array''. A reuse index 418 array is needed for each decay rate in use. The reuse index array is 419 used to estimate which reuse list to place a route when it is 420 suppressed. Proper placement avoids the need to periodically evaluate 421 decay to determine if a route can be reused or when storage can be 422 recovered. Using the reuse index array avoids the need to compute a 423 logarithm to determine placement. One additional system wide 424 parameter can be introduced. 426 reuse index array size (reuse-index-array-size) 428 This is the size of reuse index arrays. This size determines the 429 accuracy with which suppressed routes can be placed within the set 430 of reuse lists when suppressed for a long time. 432 4.3 Guidelines for Setting Parameters 434 The decay half life should be set to a time considerably longer than 435 the period of the route flap it is intended to address. For example, 436 if the decay is set to ten minutes and a route is withdrawn and 437 readvertised exactly every ten minutes, the route would continue to 438 flap if the cutoff was set to a value of 2 or above. 440 The stability figure of merit itself is an accumulated time decayed 441 total. This must be kept in mind in setting the decay time, cutoff 442 values and reuse values. For example, if a route flaps at four times 443 the decay rate, it will reach 3 in 4 cycles, 4 in 6 cycles, 5 in 10 444 cycles, and will converge at about 6.3. At twice the decay time, it 445 will reach 3 in 7 cycles, and converge at a value of less than 3.5. 447 Figure 1 shows the stability figure of merit for route flap at a 448 constant rate. The time axis is labeled in multiples of the decay 449 half life. The plots represent route flap with a period of 1/2, 1/3, 450 1/4, and 1/8 times the decay half life. A ceiling of 4.5 was set, 451 which can be seen to affect three of the plots, effectively limiting 452 the time it takes to readvertise the route regardless of the prior 453 history. With the cutoff and reuse thresholds suggested by the dotted 454 lines, routes would be suppressed after being declared unreachable 2-3 455 times and be used again after approximately 2 decay half life periods 456 of stability. 458 From the maximum hold time value (T-hold), a ratio of the reuse value 459 to a ceiling can be determined. An integer value for the ceiling can 460 then be chosen such that overflow will not be a problem and all other 461 values can be scaled accordingly. If both cutoffs are specified or if 462 multiple parameter sets are used the highest ceiling will be used. 464 time figure-of-merit as a function of time 466 0.00 0.000 . 0.000 . 0.000 . 0.000 . 467 0.08 0.000 . 0.000 . 0.000 . 0.000 . 468 0.16 0.000 . 0.000 . 0.000 . 0.973 . 469 0.24 0.000 . 0.000 . 0.000 . 0.920 . 470 0.32 0.000 . 0.000 . 0.946 . 1.817 . 471 0.40 0.000 . 0.953 . 0.895 . 2.698 . 472 0.48 0.000 . 0.901 . 0.847 . 2.552 . 473 0.56 0.953 . 0.853 . 1.754 . 3.367 . 474 0.64 0.901 . 0.807 . 1.659 . 4.172 . 475 0.72 0.853 . 1.722 . 1.570 . 3.947 . 476 0.80 0.807 . 1.629 . 2.444 . 4.317 . 477 0.88 0.763 . 1.542 . 2.312 . 4.469 . 478 0.96 0.722 . 1.458 . 2.188 . 4.228 . 479 1.04 1.649 . 2.346 . 3.036 . 4.347 . 480 1.12 1.560 . 2.219 . 2.872 . 4.112 . 481 1.20 1.476 . 2.099 . 2.717 . 4.257 . 482 1.28 1.396 . 1.986 . 3.543 . 4.377 . 483 1.36 1.321 . 2.858 . 3.352 . 4.141 . 484 1.44 1.250 . 2.704 . 3.171 . 4.287 . 485 1.52 2.162 . 2.558 . 3.979 . 4.407 . 486 1.60 2.045 . 2.420 . 3.765 . 4.170 . 487 1.68 1.935 . 3.276 . 3.562 . 4.317 . 488 1.76 1.830 . 3.099 . 4.356 . 4.438 . 489 1.84 1.732 . 2.932 . 4.121 . 4.199 . 490 1.92 1.638 . 2.774 . 3.899 . 3.972 . 491 2.00 1.550 . 2.624 . 3.688 . 3.758 . 492 2.08 1.466 . 2.483 . 3.489 . 3.555 . 493 2.16 1.387 . 2.349 . 3.301 . 3.363 . 494 2.24 1.312 . 2.222 . 3.123 . 3.182 . 495 2.32 1.242 . 2.102 . 2.955 . 3.010 . 496 2.40 1.175 . 1.989 . 2.795 . 2.848 . 497 2.48 1.111 . 1.882 . 2.644 . 2.694 . 498 2.56 1.051 . 1.780 . 2.502 . 2.549 . 499 2.64 0.995 . 1.684 . 2.367 . 2.411 . 500 2.72 0.941 . 1.593 . 2.239 . 2.281 . 501 2.80 0.890 . 1.507 . 2.118 . 2.158 . 502 2.88 0.842 . 1.426 . 2.004 . 2.042 . 503 2.96 0.797 . 1.349 . 1.896 . 1.932 . 504 3.04 0.754 . 1.276 . 1.794 . 1.828 . 505 3.12 0.713 . 1.207 . 1.697 . 1.729 . 506 3.20 0.675 . 1.142 . 1.605 . 1.636 . 507 3.28 0.638 . 1.081 . 1.519 . 1.547 . 508 3.36 0.604 . 1.022 . 1.437 . 1.464 . 509 3.44 0.571 . 0.967 . 1.359 . 1.385 . 511 Figure 1: Instability figure of merit for flap at a constant rate 512 time figure-of-merit as a function of time 514 0.00 0.000 . 0.000 . 0.000 . 515 0.20 0.000 . 0.000 . 0.000 . 516 0.40 0.000 . 0.000 . 0.000 . 517 0.60 0.000 . 0.000 . 0.000 . 518 0.80 0.000 . 0.000 . 0.000 . 519 1.00 0.999 . 0.999 . 0.999 . 520 1.20 0.971 . 0.971 . 0.929 . 521 1.40 0.945 . 0.945 . 0.809 . 522 1.60 0.919 . 0.865 . 0.704 . 523 1.80 0.894 . 0.753 . 0.613 . 524 2.00 1.812 . 1.657 . 1.535 . 525 2.20 1.762 . 1.612 . 1.428 . 526 2.40 1.714 . 1.568 . 1.244 . 527 2.60 1.667 . 1.443 . 1.083 . 528 2.80 1.622 . 1.256 . 0.942 . 529 3.00 1.468 . 1.094 . 0.820 . 530 3.20 2.400 . 2.036 . 1.694 . 531 3.40 2.335 . 1.981 . 1.475 . 532 3.60 2.271 . 1.823 . 1.284 . 533 3.80 2.209 . 1.587 . 1.118 . 534 4.00 1.999 . 1.381 . 0.973 . 535 4.20 2.625 . 2.084 . 1.727 . 536 4.40 2.285 . 1.815 . 1.503 . 537 4.60 1.990 . 1.580 . 1.309 . 538 4.80 1.732 . 1.375 . 1.139 . 539 5.00 1.508 . 1.197 . 0.992 . 540 5.20 1.313 . 1.042 . 0.864 . 541 5.40 1.143 . 0.907 . 0.752 . 542 5.60 0.995 . 0.790 . 0.654 . 543 5.80 0.866 . 0.688 . 0.570 . 544 6.00 0.754 . 0.599 . 0.496 . 545 6.20 0.656 . 0.521 . 0.432 . 546 6.40 0.571 . 0.454 . 0.376 . 547 6.60 0.497 . 0.395 . 0.327 . 548 6.80 0.433 . 0.344 . 0.285 . 549 7.00 0.377 . 0.299 . 0.248 . 550 7.20 0.328 . 0.261 . 0.216 . 551 7.40 0.286 . 0.227 . 0.188 . 552 7.60 0.249 . 0.197 . 0.164 . 553 7.80 0.216 . 0.172 . 0.142 . 554 8.00 0.188 . 0.150 . 0.124 . 556 Figure 2: Separate decay constants when unreachable 558 Figure 2 show the effect of configuring separate decay rates to be 559 used when the route is reachable or unreachable. The decay rate is 560 5 times slower when the route is unreachable. In the three case 561 shown, the period of the route flap is equal to the decay half life 562 but the route is reachable 1/8 of the time in one, reachable 1/2 the 563 time in one, and reachable 7/8 of the time in the other. In the last 564 case the route is not suppressed until after the third unreachable 565 (when it is above the top threshold after becoming reachable again). 567 In both Figure 1 and Figure 2, routes would be suppressed. Routes 568 flapping at the decay half life or less would be withdrawn two or 569 three times and then remain withdrawn until they had remained stably 570 announced and stable for on the order of 1 1/2 to 2 1/2 times the 571 decay half life (given the ceiling in the example). 573 A larger time granularity will keep table storage down. The time 574 granularity should be less than a minimal reasonable time between 575 expected worse case route flaps. It might be reasonable to fix this 576 parameter at compile time or set a default and strongly recommend that 577 the user leave it alone. With an exponential decay, array size can be 578 greatly reduced by setting a period of complete stability after which 579 the decayed total will be considered zero rather than retaining a tiny 580 quantity. Alternately, very long decays can be implemented by 581 multiplying more than once if array bounds are exceeded. 583 The reuse lists hold suppressed routes grouped according to how long 584 it will be before the routes are eligible for reuse. Periodically 585 each list will be advanced by one position and one list removed as de- 586 scribed in Section 4.8.7. All of the suppressed routes in the removed 587 list will be reevaluated and either used or placed in another list 588 according to how much additional time must elapse before the route can 589 be reused. The last list will always contain all the routes which 590 will not be advertised for more time than is appropriate for the re- 591 maining list heads. When the last list advances to the front, some of 592 the routes will not be ready to be used and will have to be requeued. 593 The time interval for reconsidering suppressed routes and number of list 594 heads should be configurable. Reasonable defaults might be 30 seconds and 595 64 list heads. A route suppressed for a long time would need to be reeval- 596 uated every 32 minutes. 598 4.4 Run Time Data Structures 600 A fixed small amount of per system storage will be required. Where 601 sets of multiple configuration parameters are used, storage will be 602 required per set of parameters. A small amount of per route storage 603 is required. A set of list heads is needed. These list heads are 604 used to arrange suppressed routes according to the time remaining 605 until they can be reused. 607 A separate reuse list can be used to hold unreachable routes for the 608 purpose of later recovering storage if they remain unreachable too 609 long. This might be more accurately described as a recycling list. 610 The advantage this would provide is making free data structures 611 available as soon as possible. Alternately, the data structures can 612 simply be placed on a queue and the storage recovered when the route 613 hits the front of the queue and if storage is needed. The latter is 614 less optimal but simple. 616 If multiple sets of configuration parameters are allowed per route, 617 there is a need for some means of associating more than one figure of 618 merit and set of parameters with each route. Building a linked list 619 of these objects seems like one of a number of reasonable 620 implementations. Similarly, a means of associating a route to a reuse 621 list is required. A small overhead will be required for the pointers 622 needed to implement whatever data structure is chosen for the reuse 623 lists. The suggested implementation uses a double linked lists and so 624 requires two pointers per figure of merit. 626 Each set of configuration parameters can reference decay arrays and 627 reuse arrays. These arrays should be shared among multiple sets of 628 parameters since their storage requirement is not negligible. There 629 will be only one set of reuse list heads for the entire router. 631 4.4.1 Data Structures for Configuration Parameter Sets 633 Based on the configuration parameters described in the previous 634 section, the following values can be computed as scaled integers 635 directly from the corresponding configuration parameters. 637 o decay array scale factor (decay-array-scale-factor) 639 o cutoff value (cut) 641 o reuse value (reuse) 643 o figure of merit ceiling (ceiling) 645 Each configuration parameter set will reference one or two decay 646 arrays and one or two reuse arrays. Only one array will be needed if 647 the decay rate is the same while a route is unreachable as while it is 648 reachable, or if the stability figure of merit does not decay while a 649 route is unreachable. 651 4.4.2 Data Structures per Decay Array and Reuse Index Array 653 The following are also computed from the configuration parameters 654 though not as directly. 656 o decay rate per tick (decay-delta-t) 658 o decay array size (decay-array-size) 660 o decay array (decay[]) 662 o reuse index array size (reuse-index-array-size) 664 o reuse index array (reuse-index-array[]) 666 For each decay rate specified, an array will be used to store the 667 value of a computed parameter raised to the power of the index of each 668 array element. This is to speed computations. The decay rate per 669 tick is an intermediate value expressed as a real number and used to 670 compute the values stored in the decay arrays. The array size is 671 computed from the decay memory limit configuration parameter expressed 672 as an array size or as a maximum hold time. 674 The decay array size must be of sufficient size to accommodate the 675 specified decay memory given the time granularity, or sufficient to 676 hold the number of array elements until integer rounding produces a 677 zero result if that value is smaller, or a implementation imposed 678 reasonable size to prevent configurations which use excessive memory. 679 Implementations may chose to make the array size shorter and multiply 680 more than once when decaying a long time interval to reduce storage. 682 The reuse index arrays serve a similar purpose to the decay arrays. 683 The amount of time until a route can be reused can be determined using 684 a array lookup. The array can be built given the decay rate. The 685 array is indexed using a scaled integer proportional to the ratio 686 between a current stability figure of merit value and the value needed 687 for the route to be reused. 689 4.4.3 Per Route State 691 Information must be maintained per some tuple representing a route. 692 At the very minimum, the NLRI (BGP prefix and length) must be 693 contained in the tuple. Different BGP attributes may be included or 694 excluded depending on the specific situation. The AS path should also 695 be contained in the tuple be default. The tuple may also optionally 696 contain other BGP attributes such as MULTI_EXIT_DISCRIMINATOR (MED). 698 The tuple representing a route for the purpose of route flap damping 699 is: 701 tuple entry default options 702 ------------------------------------------- 703 NLRI 704 prefix required 705 length required 706 AS path included option to exclude 707 last AS set in path excluded option to include 708 next hop excluded option to include 709 MED excluded option to include 710 in comparisons only 712 The AS path is generally included in order to identify downstream 713 instability which is not being damped or not being sufficiently damped 714 and is alternating between a stable and an unstable path. Under rare 715 circumstances it may be desirable to exclude AS path for all or a 716 subset of prefixes. If an AS path ends in an AS set, in practice the 717 path is always for an aggregate. Changes to the trailing AS set 718 should be ignored. Ideally the AS path comparison should insure that 719 at least one AS has remained constant in the old and new AS set, but 720 completely ignoring the contents of a trailing AS set is also 721 acceptable. 723 Including next hop and MED changes can help suppress the use of an AS 724 which is internally unstable or avoid a next hop which is closer to an 725 unstable IGP path in the adjacent AS. If a large number of MED values 726 are used, the increase in the amount of state may become a problem. 727 For this reason MED is disabled by default and enabled only as part of 728 the tuple comparison, using a single state entry regardless of MED 729 value. Including MED will suppress the use of the adjacent AS even 730 though the change need not be propagated further. Using MED is only a 731 safe practice if a path is known to exist through another AS or where 732 there are enough peering sites with the adjacent AS such that routes 733 heard at only a subset of the peering sites will be suppressed. 735 4.4.4 Data Structures per Route 737 The following information must be maintained per route. A route here 738 is considered to be a tuple usually containing NLRI, next hop, and AS 739 path as defined in Section 4.4.3. 741 stability figure of merit (figure-of-merit) 742 Each route must have a stability figure of merit per applicable 743 parameter set. 745 last time updated (time-update) 747 The exact last time updated must be maintained to allow exponential 748 decay of the accumulated figure of merit to be deferred until the 749 route might reasonable be considered eligible for a change in 750 status (having gone from unreachable to reachable or advancing 751 within the reuse lists). 753 config block pointer 755 Any implementation that supports multiple parameter sets must 756 provide a means of quickly identifying which set of parameters 757 corresponds to the route currently being considered. For 758 implementations supporting only parameter sets where all routes 759 must be treated the same, this pointer is not required. 761 reuse list traversal pointers 763 If doubly linked lists are used to implement reuse lists, then two 764 pointers will be needed, previous and next. Generally there is a 765 double linked list which is unused when a route is suppressed from 766 use that can be used for reuse list traversal eliminating the need 767 for additional pointer storage. 769 4.5 Processing Configuration Parameters 771 From the configuration parameters, it is possible to precompute a 772 number of values that will be used repeatedly and retain these to 773 speed later computations that will be required frequently. 775 Scaling is usually dependent on the highest value that figure-of-merit 776 can attain, referred to here as the ceiling. The real number value of 777 the ceiling will typically be determined by the following equation. 779 ceiling = reuse * (exp(T-hold/decay-half-life) * log(2)) 781 The methods of scaled integer arithmetic are not described in detail 782 here. The methods of determining the real values are given. 783 Translation into scaled integer values and the details of scaled 784 integer arithmetic are left up to the individual implementations. 786 figure of merit scale factor ( scale-figure-of-merit ) 788 The ceiling value can be set to be the largest integer that can fit 789 in half the bits available for an unsigned integer. This will 790 allow the scaled integers to be multiplied by the scaled decay 791 value and then shifted down. Implementations may prefer to use 792 real numbers or may use any integer scaling deemed appropriate for 793 their architecture. 795 penalty value and thresholds (as proportional scaled integers) 797 The figure of merit penalty for one route withdrawal and the cutoff 798 values must be scaled according to the above scaling factor. 800 decay rate per tick (decay[1]) 802 The decay value per increment of time as defined by the time 803 granularity must be determined (at least initially as a floating 804 point number). The per tick decay is a number slightly less than 805 one. It is the Nth root of the one half where N is the half life 806 divided by the time granularity. 808 decay[1] = exp ((1 / (decay-half-life/delta-t)) * log 809 (1/2)) 811 decay array size (decay-array-size) 813 The decay array size is the decay memory divided by the time 814 granularity. If integer truncation brings the value of an array 815 element to zero, the array can be made smaller. An implementation 816 should also impose a maximum reasonable array size or allow more 817 than one multiplication. 819 decay-array-size = (Tmax/delta-t) 821 decay array (decay[]) 823 Each i-th element of the decay array is the per tick delay raised 824 to the i-th power. This might be best done by successive floating 825 point multiplies followed by scaling and integer rounding or 826 truncation. The array itself need only be computed at startup. 828 decay[i] = decay[1] ** i 830 4.6 Building the Reuse Index Arrays 832 The reuse lists may be accessed quite frequently if a lot of routes 833 are flapping sufficiently to be suppressed. A method of speeding the 834 determination of which reuse list to use for a given route is 835 suggested. This method is introduced in Section 4.2, its 836 configuration described in Section 4.4.2 and the algorithms described 837 in Section 4.8.6 and Section 4.8.7. This section describes building 838 the reuse list index arrays. 840 A ratio of the figure of merit of the route under consideration to the 841 cutoff value is used as the basis for an array lookup. The ratio is 842 scaled and truncated to an integer and used to index the array. The 843 array entry is an integer used to determine which reuse list to use. 845 reuse array maximum ratio (max-ratio) 847 This is the maximum ratio between the current value of the 848 stability figure of merit and the target reuse value that can be 849 indexed by the reuse array. It may be limited by the ceiling 850 imposed by the maximum hold time or by the amount of time that the 851 reuse lists cover. 853 max-ratio = min(ceiling/reuse, exp((1 / 854 (half-life/reuse-array-time)) * log(2))) 856 reuse array scale factor ( scale-factor ) 858 Since the reuse array is an estimator, the reuse array scale factor 859 has to be computed such that the full size of the reuse array is 860 used. 862 scale-factor = reuse-index-array-size / (max-ratio - 1) 864 reuse index array (reuse-index-array[]) 866 Each reuse index array entry should contain an index into the reuse 867 list array pointing to one of the list heads. This index should 868 corresponding to the reuse list that will be evaluated just after a 869 route would be eligible for reuse given the ratio of current value 870 of the stability figure of merit to target reuse value 871 corresponding the the reuse array entry. 873 reuse-index-array[j] = integer((decay-half-life / 875 reuse-time-granularity) * log(1/(reuse * (1 + (j / 876 scale-factor)))) / log(1/2)) 878 To determine which reuse queue to place a route which is being sup- 879 pressed, the following procedure is used. Divide the current figure 880 of merit by the cutoff. Subtract one. Multiply by the scale factor. 881 This is the index into the reuse index array (reuse-index-array[]). 882 The value fetched from the reuse index array (reuse-index-array[]) is 883 an index into the array of reuse lists (reuse-array[]). If this index 884 is off the end of the array use the last queue otherwise look in the 885 array and pick the number of the queue from the array at that index. 886 This is quite fast and well worth the setup and storage required. 888 4.7 A Sample Configuration 890 A simple example is presented here in which the space overhead is 891 estimated for a set of configuration parameters. The design here 892 assumes: 894 1. there is a single parameter set used for all routes, 896 2. decay time for unreachable routes is slower than for reachable 897 routes 899 3. the arrays must be full size, rather than allow more than one 900 multiply per decay operation to reduce the array size. 902 This example is used in later sections. The use of multiple parameter 903 sets complicates the examples somewhat. Where multiple parameter sets 904 are allowed for a single route, the decay portion of the algorithm is 905 repeated for each parameter set. If different routes are allowed to 906 have different parameter sets, the routes must have pointers to the 907 parameter sets to keep the time to locate to a minimum, but the 908 algorithms are otherwise unchanged. 910 A sample set of configuration parameters and a sample set of 911 implementation parameters are provided in in the two following lists. 913 1. Configuration Parameters 915 o cut = 1.25 917 o reuse = 0.5 918 o T-hold = 15 mins 920 o decay-ok = 5 min 922 o decay-ng = 15 min 924 o Tmax-ok, Tmax-ng = 15, 30 mins 926 2. Implementation Parameters 928 o delta-t = 1 sec 930 o delta-reuse 932 o reuse-list-size = 256 934 o reuse-index-array-size = 1,024 936 Using these configuration and implementation parameters and the 937 equations in Section 4.5, the space overhead can be computed. There 938 is a fixed space overhead that is independent of the number of routes. 939 There is a space requirement associated with a stable route. There is 940 a larger space requirement associated with an unstable route. The 941 space requirements for the parameters above are provide in the lists 942 below. 944 1. fixed overhead (using parameters from previous example) 946 o 900 * integer - decay array 948 o 1,800 * integer - decay array 950 o 120 * pointer - reuse list-heads 952 o 2,048 * integer - reuse index arrays 954 2. overhead per stable route 956 o pointer - containing null entry 958 3. overhead per unstable route 960 o pointer - to a damping structure containing the following 962 o integer - figure of merit + bit for state 963 o integer - last time updated 965 o pointer (optional) to configuration parameter block 967 o 2 * pointer - reuse list pointers (prev, next) 969 Figure 3 shows the behavior of the algorithm with the parameters given 970 above. Four cases are given in this example. In all four, there is a 971 twelve minute period of route oscillations. Two periods of oscilla- 972 tion are used, 2 minutes and 4 minutes. Two duty cycles are used, one 973 in which the route is reachable during 20% of the cycle and the other 974 where the route is reachable during 80% of the cycle. In all four 975 cases, the route becomes suppressed after it becomes unreachable the 976 second time. Once suppressed, it remains suppressed until some period 977 after becoming stable. The routes which oscillate over a 4 minute pe- 978 riod are no longer suppressed within 9-11 minutes after becoming sta- 979 ble. The routes with a 2 minute period of oscillation are suppressed for 980 nearly the maximum 15 minute period after becoming stable. 982 4.8 Processing Routing Protocol Activity 984 The prior sections concentrate on configuration parameters and their 985 relationship to the parameters and arrays used at run time and provide 986 the algorithms for initializing run time storage. This section 987 provides the steps taken in processing routing events and timer events 988 when running. 990 The routing events are: 992 1. A BGP peer or new route comes up for the first time (or after an 993 extended down time) (Section 4.8.1) 995 2. A route becomes unreachable (Section 4.8.2) 997 3. A route becomes reachable again (Section 4.8.3) 999 4. A route changes (Section 4.8.4) 1001 5. A peer goes down (Section 4.8.5) 1003 The reuse list is used to provide a means of fast evaluation of route 1004 that had been suppressed, but had been stable long enough to be reused 1005 again or had been suppressed long enough that it can be treated as a 1006 new route. The following two operations are described. 1008 time figure-of-merit as a function of time 1010 0.00 0.000 . 0.000 . 0.000 . 0.000 . 1011 0.62 0.000 . 0.000 . 0.000 . 0.000 . 1012 1.25 0.000 . 0.000 . 0.000 . 0.000 . 1013 1.88 0.000 . 0.000 . 0.000 . 0.000 . 1014 2.50 0.977 . 0.968 . 0.000 . 0.000 . 1015 3.12 0.949 . 0.888 . 0.000 . 0.000 . 1016 3.75 0.910 . 0.814 . 0.000 . 0.000 . 1017 4.37 1.846 . 1.756 . 0.983 . 0.983 . 1018 5.00 1.794 . 1.614 . 0.955 . 0.935 . 1019 5.63 1.735 . 1.480 . 0.928 . 0.858 . 1020 6.25 2.619 . 2.379 . 0.901 . 0.786 . 1021 6.88 2.544 . 2.207 . 0.876 . 0.721 . 1022 7.50 2.472 . 2.024 . 0.825 . 0.661 . 1023 8.13 3.308 . 2.875 . 1.761 . 1.608 . 1024 8.75 3.213 . 2.698 . 1.711 . 1.562 . 1025 9.38 3.122 . 2.474 . 1.662 . 1.436 . 1026 10.00 3.922 . 3.273 . 1.615 . 1.317 . 1027 10.63 3.810 . 3.107 . 1.569 . 1.207 . 1028 11.25 3.702 . 2.849 . 1.513 . 1.107 . 1029 11.88 3.498 . 2.613 . 1.388 . 1.015 . 1030 12.50 3.904 . 3.451 . 2.312 . 1.953 . 1031 13.13 3.580 . 3.164 . 2.120 . 1.791 . 1032 13.75 3.283 . 2.902 . 1.944 . 1.643 . 1033 14.38 3.010 . 2.661 . 1.783 . 1.506 . 1034 15.00 2.761 . 2.440 . 1.635 . 1.381 . 1035 15.63 2.532 . 2.238 . 1.499 . 1.267 . 1036 16.25 2.321 . 2.052 . 1.375 . 1.161 . 1037 16.88 2.129 . 1.882 . 1.261 . 1.065 . 1038 17.50 1.952 . 1.725 . 1.156 . 0.977 . 1039 18.12 1.790 . 1.582 . 1.060 . 0.896 . 1040 18.75 1.641 . 1.451 . 0.972 . 0.821 . 1041 19.38 1.505 . 1.331 . 0.891 . 0.753 . 1042 20.00 1.380 . 1.220 . 0.817 . 0.691 . 1043 20.62 1.266 . 1.119 . 0.750 . 0.633 . 1044 21.25 1.161 . 1.026 . 0.687 . 0.581 . 1045 21.87 1.064 . 0.941 . 0.630 . 0.533 . 1046 22.50 0.976 . 0.863 . 0.578 . 0.488 . 1047 23.12 0.895 . 0.791 . 0.530 . 0.448 . 1048 23.75 0.821 . 0.725 . 0.486 . 0.411 . 1049 24.37 0.753 . 0.665 . 0.446 . 0.377 . 1050 25.00 0.690 . 0.610 . 0.409 . 0.345 . 1052 Figure 3: Some fairly long route flap cycles, repeated for 12 1053 minutes, followed by a period of stability. 1055 1. Inserting into a reuse list (Section 4.8.6) 1057 2. Reuse list processing every delta-t seconds (Section 4.8.7) 1059 4.8.1 Processing a New Peer or New Routes 1061 When a peer comes up, no action is required if the routes had no 1062 previous history of instability, for example if this is the first time 1063 the peer is coming up and announcing these routes. For each route, 1064 the pointer to the damping structure would be zeroed and route used. 1065 The same action is taken for a new route or a route that has been down 1066 long enough that the figure of merit reached zero and the damping 1067 structure was deleted. 1069 4.8.2 Processing Unreachable Messages 1071 When a route is withdrawn or changed (Section 4.8.4 describes how a 1072 change is handled), the following procedure is used. 1074 If there is no previous stability history (the damping structure 1075 pointer is zero), then: 1077 1. allocate a damping structure 1079 2. set figure-of-merit = 1 1081 3. withdraw the route 1083 Otherwise, if there is an existing damping structure, then: 1085 1. set t-diff = t-now - t-updated 1087 2. if (t-diff puts you off the end of the array) { 1089 set figure-of-merit = 1 1091 } else { 1093 set figure-of-merit = figure-of-merit * decay-array-ok [t-diff] + 1 1095 if (figure-of-merit > ceiling) { 1097 set figure-of-merit = ceiling 1099 } 1101 } 1103 3. remove the route from a reuse list if it is on one 1105 4. withdraw the route unless it is already suppressed 1107 In either case then: 1109 1. set t-updated = t-now 1111 2. insert into a reuse list (see Section 4.8.6) 1113 If there was a stability history, the previous value of the stability 1114 figure of merit is decayed. This is done using the decay array 1115 (decay-array). The index is determined by subtracting the current 1116 time and the last time updated, then dividing by the time granularity. 1117 If the index is zero, the figure of merit is unchanged (no decay). If 1118 it is greater than the array size, it is zeroed. Otherwise use the 1119 index to fetch a decay array element and multiply the figure of merit 1120 by the array element. If using the suggested scaled integer method, 1121 shift down half an integer. Add the scaled penalty for one more un- 1122 reachable (shown above as 1). If the result is above the ceiling re- 1123 place it with the ceiling value. Now update the last time updated field 1124 (preferably taking into account how much time was truncated before doing 1125 the decay calculation). 1127 When a route becomes unreachable, alternate paths must be considered. 1128 This process is complicated slightly if different configuration param- 1129 eters are used in the presence or absence of viable alternate paths. 1130 If all of these alternate paths have been suppressed because there had 1131 previously been an alternate route and the new route withdrawal 1132 changes that condition, the suppressed alternate paths must be reeval- 1133 uated. They should be reevaluated in order of normal route prefer- 1134 ence. When one of these alternate routes is encountered that had been 1135 suppressed but is now usable since there is no alternate route, no 1136 further routes need to be reevaluated. This only applies if routes 1137 are given two different reuse thresholds, one for use when there is an al- 1138 ternate path and a higher threshold to use when suppressing the route would 1139 result in making the destination completely unreachable. 1141 4.8.3 Processing Route Advertisements 1143 When a route is readvertised if there is no damping structure, then 1144 the procedure is the same as in Section 4.8.1. 1146 1. don't create a new damping structure 1148 2. use the route 1150 If an damping structure exists, the figure of merit is decayed and the 1151 figure of merit and last time updated fields are updated. A decision 1152 is now made as to whether the route can be used immediately or needs 1153 to be suppressed for some period of time. 1155 1. set t-diff = t-now - t-updated 1157 2. if (t-diff puts you off the end of the array) { 1159 set figure-of-merit = 0 1161 } else { 1163 set figure-of-merit = figure-of-merit * decay-array-ng [t-diff] 1165 } 1167 3. if (not suppressed and figure-of-merit < cut) { 1169 use the route 1171 } else if (suppressed and figure-of-merit < reuse) { 1173 set state to not suppressed 1175 remove the route from a reuse list 1177 use the route 1179 } else { 1181 set state to suppressed 1183 don't use the route 1185 insert into a reuse list (see Section 4.8.6) 1187 } 1189 4. if (figure-of-merit > 0) { 1191 set t-updated = t-now 1193 } else { 1195 recover memory for damping struct 1197 zero pointer to damping struct 1199 } 1201 If the route is deemed usable, a search for the current best route 1202 must be made. The newly reachable route is then evaluated according 1203 to the BGP protocol rules for route selection. 1205 If the new route is usable, the previous best route is examined. 1206 Prior to route comparisons, the current best route may have to be 1207 reevaluated if separate parameter sets are used depending on the 1208 presence or absence of an alternate route. If there had been no 1209 alternate the previous best route may be suppressed. 1211 If the new route is to be suppressed it is placed on a reuse list only 1212 if it would have been preferred to the current best route had the new 1213 route been accepted as stable. There is no reason to queue a route on 1214 a reuse list if after the route becomes usable it would not be used 1215 anyway due to the existence of a more preferred route. Such a route 1216 would not have to be reevaluated unless the preferred route became 1217 unreachable. As specified here, the less preferred route would be 1218 reevaluated and potentially used or potentially added to a reuse list 1219 when processing the withdrawal of a more preferred best route. 1221 4.8.4 Processing Route Changes 1223 If a route is replaced by a peer router by supplying a new path, the 1224 route that is being replaced should be treated as if an unreachable 1225 were received (see Section 4.8.2). This will occur when a peer 1226 somewhere back in the AS path is continuously switching between two AS 1227 paths and that peer is not damping route flap (or applying less 1228 damping). There is no way to determine if one AS path is stable and 1229 the other is flapping, or if they are both flapping. If the cycle is 1230 sufficiently short compared to convergence times neither route through 1231 that peer will deliver packets very reliably. Since there is no way 1232 to affect the peer such that it chooses the stable of the two AS 1233 paths, the only viable option is to penalize both routes by considering 1234 each change as an unreachable followed by a route advertisement. 1236 4.8.5 Processing A Peer Router Loss 1238 When a peer routing session is broken, either all individual routes 1239 advertised by that peer may be marked as unstable, or the peering 1240 session itself may be marked as unstable. Marking the peer will save 1241 considerable memory. Since the individual routes are advertised as 1242 unreachable to routers beyond the immediate problem, per route state 1243 will be incurred beyond the peer immediately adjacent to the BGP 1244 session that went down. If the instability continues, the immediately 1245 adjacent router need only keep track of the peer stability history. 1246 The routers beyond that point will receive no further advertisements 1247 or withdrawal of routes and will dispose of the damping structure over 1248 time. 1250 BGP notification through an optional transitive attribute that damping 1251 will already be applied may be considered in the future to reduce the 1252 number of routers that incur damping structure storage overhead. 1254 4.8.6 Inserting into the Reuse Timer List 1256 The reuse lists are used to provide a means of fast evaluation of 1257 route that had been suppressed, but had been stable long enough to be 1258 reused again. The data structure consists of a series of list heads. 1259 Each list contains a set of routes that are scheduled for reevaluation 1260 at approximately the same time. The set of reuse list heads are 1261 treated as a circular array. 1263 A simple implementation of the circular array of list heads would be 1264 an array containing the list heads with an offset. The offset would 1265 identify the first list. The Nth list would be at the index 1266 corresponding to N plus the offset modulo the number of list heads. 1267 This design will be assumed in the examples that follow. 1269 A key requirement is to be able to insert an entry in the most 1270 appropriate queue with a minimum of computation. The computation is 1271 given only the current value of figure-of-merit. The array, scale, 1272 and bounds are precomputed to map figure-of-merit to the nearest list 1273 head without requiring a logarithm to be computed (see Section 4.5). 1275 1. scale figure-of-merit for the index array lookup producing index 1277 2. check index against the array bound 1279 3. if (within the array bound) { 1281 set index = reuse-array [index] 1283 } else { 1285 set index = reuse-list-size - 1 1287 } 1289 4. insert into the list 1291 reuse-list [modulo reuse-list-size (index + offset)] 1293 Choosing the correct reuse list involves only a multiply and shift to 1294 do the scaling, an integer truncation, then an array lookup. The most 1295 common method of implementing a circular array is to use an array and 1296 apply an offset and modulo operation to pick the correct array entry. 1297 The offset is incremented to rotate the the circular array. 1299 4.8.7 Handling Reuse Timer Events 1301 The granularity of the reuse timer should be more course that that of 1302 the decay timer. As a result, when the reuse timer fires, suppressed 1303 routes should be decayed by multiple increments of decay time. Some 1304 computation can be avoided by always inserting into the reuse list 1305 corresponding to one time increment past reuse eligibility. In cases 1306 where the reuse lists have a longer ``memory'' than the ``decay 1307 memory'' (described above), all of the routes in the first queue will 1308 be available for immediate reuse if reachable or the history entry 1309 could be disposed of if unreachable. 1311 When it is time to advance the lists, the first queue on the reuse 1312 list must be processed and the circular queue must be rotated. Using 1313 an array and an offset as a circular array (as described in 1314 Section 4.8.6), the algorithm below is repeated every t-reuse seconds. 1316 1. save a pointer to the current zeroth queue head and zero the list 1317 head entry 1319 2. set offset = modulo reuse-list-size ( offset + 1 ), thereby 1320 rotating the circular queue of list-heads 1322 3. if (the saved list head pointer is non-empty) 1324 foreach entry { 1326 set t-diff = t-now - t-updated 1328 set figure-of-merit = figure-of-merit * decay-array-ok [t-diff] 1329 set t-updated = t-now 1331 if (figure-of-merit < reuse) 1333 reuse the route 1335 else 1337 re-insert into another list (see Section 4.8.6) 1339 } 1341 The value of the zeroth list head would be saved and the array entry 1342 itself zeroed. The list heads would then be advanced by incrementing 1343 the offset. Starting with the saved head of the old zeroth list, each 1344 route would be reevaluated and used, disposed of entirely or requeued 1345 if it were not ready for reuse. If a route is used, it must be 1346 treated as if it were a new route advertisement as described in 1347 Section 4.8.3. 1349 5 Implementation Experience 1351 The first implementations of ``route flap damping'' were the route 1352 server daemon (rsd) coding by Ramesh Govindan (ISI) and the Cisco IOS 1353 implementation by Ravi Chandra. Both implementations first became 1354 available in 1995 and have been used extensively. The rsd 1355 implementation has been in use in route servers at the NSF funded 1356 Network Access Points (NAPs) and at other major Internet 1357 interconnects. The Cisco IOS version has been in use by Internet 1358 Service Providers worldwide. The rsd implementation has been 1359 integrated in releases of gated (see http://www.gated.org) and is 1360 available in commercial routers using gated. 1362 There are now more than 2 years of BGP route damping deployment 1363 experience. Some problems have occurred in deployment. So far these 1364 are solvable by careful implementation of the algorithm and by careful 1365 deployment. In some topologies coordinated deployment can be helpful 1366 and in all cases disclosure of the use of route damping and the param- 1367 eters used is highly beneficial in debugging connectivity problems. 1369 Some of the problems have occurred due to subtle implementation 1370 errors. Route damping should never be applied on IBGP learned routes. 1371 To do so can open the possibility for persistent route loops. 1372 Implementations should disallow this configuration. Penalties for 1373 flapping should only be applied when a route is removed or replaced 1374 and not when a route is added. If damping parameters are applied 1375 consistently, this implementation constraint will result in a stable 1376 secondary path being preferred over an unstable primary path due to 1377 damping of the primary path near the source. 1379 In topologies where multiple AS paths to a given destination exist 1380 flapping of the primary path can result in suppression of the 1381 secondary path. This can occur if no damping is being done near the 1382 cause of the route flap or if damping is being applied more 1383 aggressively by a distant AS. This problem can be solved in one of two 1384 ways. Damping can be done near the source of the route flap and the 1385 damping parameters can be made consistent. Alternately, a distant AS 1386 which insists on more aggressive damping parameters can disable 1387 penalizing routes on AS path change, penalizing routes only if they 1388 are withdrawn completely. In order to do so, the implementation must 1389 support this option (as described in Section 4.4.3). 1391 Route flap should be damped near the source. Single homed 1392 destinations can be covered by static routes. Aggregation provides 1393 another means of damping. Providers should damp their own internal 1394 problems, however damping on IGP link state origination is not yet 1395 implemented by router vendors. Providers which use multiple AS within 1396 their own topology should damp between their own AS. Providers should 1397 damp adjacent providers AS. 1399 Damping provides a means to limit propagation excessive route change 1400 when connectivity is highly intermittent. Once a problem is 1401 corrected, select damping state can be manually cleared. In order to 1402 determine where damping may have occurred after connectivity problems, 1403 providers should publish their damping parameters. Providers should 1404 be willing to manually clear damping on specific prefixes or AS paths 1405 at the request of other providers when the request is accompanied by 1406 assurance that the problem has truly been addressed. 1408 By damping their own routing information, providers can reduce their 1409 own need to make requests of other providers to clear damping state 1410 after correcting a problem. Providers should be pro-active and 1411 monitor what prefixes and paths are suppressed in addition to 1412 monitoring link states and BGP session state. 1414 Acknowledgements 1416 This work and this document may not have been completed without the 1417 advise, comments and encouragement of Yakov Rekhter (Cisco). Dennis 1418 Ferguson (MCI) provided a description of the algorithms in the gated 1419 BGP implementation and many valuable comments and insights. David 1420 Bolen (ANS) and Jordan Becker (ANS) provided valuable comments, 1421 particularly regarding early simulations. Over four years elapsed 1422 between the initial draft presented to the BGP WG (October 1993) and 1423 this iteration. At the time of this writing there is significant 1424 experience with two implementations, each having been deployed since 1425 1995. One was led by Ramesh Govindan (ISI) for the NSF Routing Ar- 1426 biter project. The second was led by Ravi Chandra (Cisco). Sean Doran 1427 (Sprintlink) and Serpil Bayraktar (ANS) were among the early independent 1428 testers of the Cisco pre-beta implementation. Valuable comments and im- 1429 plementation feedback were shared by many individuals on the IETF IDR WG 1430 and the RIPE Routing Work Group and in NANOG and IEPG. 1432 Thanks also to Rob Coltun (Fore Systems), Sanjay Wadhwa (Fore), John 1433 Scudder (IENG), Eric Bennet (IENG) and Jayesh Bhatt (Bay Networks) for 1434 pointing out errors in the math uncovered during coding of more recent 1435 implementations. These errors appeared in the details of the 1436 implementation suggestion sections written after the first two 1437 implementations were completed. 1439 References 1441 [1] P. Gross and Y. Rekhter. Application of the border gateway proto- 1442 col in 1443 the internet. Request for Comments (Draft Standard) RFC 1268, In- 1444 ternet Engineering Task Force, October 1991. (Obsoletes RFC1164); 1445 (Obsoleted by RFC1655). ftp://ds.internic.net/rfc/rfc1268.txt. 1447 [2] ISO/IEC. Iso/iec 10747 - information technology - telecommunica- 1448 tions and information exchange between systems - protocol for 1449 exchange of inter-domain routeing information among intermediate 1450 systems to support forwarding of iso 1451 8473 pdus. Technical report, International Organization for Stan- 1452 dardization, August 1994. ftp://merit.edu/pub/iso/idrp.ps.gz. 1454 [3] K. Lougheed and Y. Rekhter. A border gateway protocol 3 (BGP-3). 1455 Request for Comments (Draft Standard) RFC 1267, In- 1456 ternet Engineering Task Force, October 1991. (Obsoletes RFC1163). 1457 ftp://ds.internic.net/rfc/rfc1267.txt. 1458 [4] Y. Rekhter and P. Gross. Application of the border gateway proto- 1459 col in the internet. Request for Comments (Draft Standard) 1460 RFC 1772, Internet Engineering Task Force, March 1995. (Obsoletes 1461 RFC1655). ftp://ds.internic.net/rfc/rfc1772.txt. 1463 [5] Y. Rekhter and T. Li. A border 1464 gateway protocol 4 (BGP-4). Request for Comments (Draft Standard) 1465 RFC 1771, Internet Engineering Task Force, March 1995. (Obsoletes 1466 RFC1654). ftp://ds.internic.net/rfc/rfc1771.txt. 1468 [6] Y. Rekhter and C. Topolcic. Exchanging routing information across 1469 provider boundaries in the CIDR environment. Request for Comments 1470 (Informational) RFC 1520, Internet Engineering Task Force, 1471 September 1993. ftp://ds.internic.net/rfc/rfc1520.txt. 1473 [7] P. Traina. BGP-4 protocol analysis. Request for Comments (Infor- 1474 mational) RFC 1774, Internet Engineering Task Force, March 1995. 1475 ftp://ds.internic.net/rfc/rfc1774.txt. 1477 [8] P. Traina. Experience with the BGP-4 protocol. Request for Com- 1478 ments (Informational) RFC 1773, 1479 Internet Engineering Task Force, March 1995. (Obsoletes RFC1656). 1480 ftp://ds.internic.net/rfc/rfc1773.txt. 1482 Security Considerations 1484 The practices outlined in this document do not further weaken the 1485 security of the routing protocols. Denial of service is possible in 1486 an already insecure routing environment but these practices only 1487 contribute to the persistence of such attacks and do not impact the 1488 methods of prevention and the methods of determining the source. 1490 Author's Addresses 1492 Curtis Villamizar 1493 ANS Communications 1494 1496 Ravi Chandra 1497 Cisco Systems 1498 1500 Ramesh Govindan 1501 ISI 1502