idnits 2.17.1 

draft-ietf-idr-route-damp-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-25) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 39 instances of too long lines in the document, the longest
     one being 5 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 707 has weird spacing: '...in path    exc...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 15, 1998) is 9477 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '2' is defined on line 1447, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 1454, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 1458, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 1468, but no explicit reference
     was found in the text

  == Unused Reference: '7' is defined on line 1473, but no explicit reference
     was found in the text

  == Unused Reference: '8' is defined on line 1477, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 1268 (ref. '1') (Obsoleted by RFC 1655)

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  ** Downref: Normative reference to an Historic RFC: RFC 1267 (ref. '3')

  ** Obsolete normative reference: RFC 1771 (ref. '5') (Obsoleted by RFC 4271)

  ** Downref: Normative reference to an Historic RFC: RFC 1520 (ref. '6')

  ** Downref: Normative reference to an Informational RFC: RFC 1774 (ref. '7')

  ** Downref: Normative reference to an Informational RFC: RFC 1773 (ref. '8')


     Summary: 16 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                        Curtis Villamizar
3	INTERNET-DRAFT                                                       ANS
4	draft-ietf-idr-route-damp-03                               Ravi Chandra
5	                                                                  Cisco
6	                                                        Ramesh Govindan
7	                                                                    ISI
8	                                                           May 15, 1998

10	                          BGP Route Flap Damping

12	Status of this Memo

14	  This document is an Internet-Draft.  Internet-Drafts are working
15	  documents of the Internet Engineering Task Force (IETF), its areas,
16	  and its working groups.  Note that other groups may also distribute
17	  working documents as Internet-Drafts.

19	  Internet-Drafts are draft documents valid for a maximum of six months
20	  and may be updated, replaced, or obsoleted by other documents at any
21	  time.  It is inappropriate to use Internet- Drafts as reference
22	  material or to cite them other than as ``work in progress.''

24	  To view the entire list of current Internet-Drafts, please check
25	  the "1id-abstracts.txt" listing contained in the Internet-Drafts
26	  Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
27	  (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
28	  (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
29	  (US West Coast).

31	Abstract

33	  A usage of the BGP routing protocol is described which is capable of
34	  reducing the routing traffic passed on to routing peers and therefore
35	  the load on these peers without adversely affecting route convergence
36	  time for relatively stable routes.  This technique has been
37	  implemented in commercial products supporting BGP. The technique is
38	  also applicable to IDRP.

40	  The overall goals are:

42	  o  to provide a mechanism capable of reducing router processing load
43	     caused by instability

45	  o  in doing so prevent sustained routing oscillations

47	  o  to do so without sacrificing route convergence time for generally
48	     well behaved routes.

50	  This must be accomplished keeping other goals of BGP in mind:

52	  o  pack changes into a small number of updates

54	  o  preserve consistent routing

56	  o  minimal addition space and computational overhead

58	  An excessive rate of update to the advertised reachability of a subset
59	  of Internet prefixes has been widespread in the Internet.  This
60	  observation was made in the early 1990s by many people involved in
61	  Internet operations and remains the case.  These excessive updates are
62	  not necessarily periodic so route oscillation would be a misleading
63	  term.  The informal term used to describe this effect is ``route
64	  flap''.  The techniques described here are now widely deployed and are
65	  commonly referred to as ``route flap damping''.

67	1  Overview

69	  To maintain scalability of a routed internet, it is necessary to
70	  reduce the amount of change in routing state propagated by BGP in
71	  order to limit processing requirements.  The primary contributors of
72	  processing load resulting from BGP updates are the BGP decision
73	  process and adding and removing forwarding entries.

75	  Consider the following example.  A widely deployed BGP implementation
76	  may tend to fail due to high routing update volume.  For example, it
77	  may be unable to maintain it's BGP or IGP sessions if sufficiently
78	  loaded.  The failure of one router can further contribute to the load
79	  on other routers.  This additional load may cause failures in other
80	  instances of the same implementation or other implementations with a
81	  similar weakness.  In the worst case, a stable oscillation could
82	  result.  Such worse cases have already been observed in practice.

84	  A BGP implementation must be prepared for a large volume of routing
85	  traffic.  A BGP implementation cannot rely upon the sender to
86	  sufficiently shield it from route instabilities.  The guidelines here
87	  are designed to prevent sustained oscillations, but do not eliminate
88	  the need for robust and efficient implementations.  The mechanisms
89	  described here allow routing instability to be contained at an AS
90	  border router bordering the instability.

92	  Even where BGP implementations are highly robust, the performance of
93	  the routing process is limited.  Limiting the propagation of
94	  unnecessary change then becomes an issue of maintaining reasonable
95	  route change convergence time as a routing topology grows.

97	2  Methods of Limiting Route Advertisement

99	  Two methods of controlling the frequency of route advertisement are
100	  described here.  The first involves fixed timers.  The fixed timer
101	  technique has no space overhead per route but has the disadvantage of
102	  slowing route convergence for the normal case where a route does not
103	  have a history of instability.  The second method overcomes this
104	  limitation at the expense of maintaining some additional space
105	  overhead.  The additional overhead includes a small amount of state
106	  per route and a very small processing overhead.

108	  It is possible and desirable to combine both techniques.  In practice,
109	  fixed timers have been set to very short time intervals and have
110	  proven useful to pack routes (NLRI) into a smaller number of updates
111	  when routes arrive in separate updates.

113	  Seldom are fixed timers set to the tens of minutes to hours that would
114	  be necessary to actually damp route flap.  To do so would produce the
115	  undesirable effect of severely limiting routing convergence.

117	2.1  Existing Fixed Timer Recommendations

119	  BGP-3 does not make specific recommendations in this area [1].  The
120	  short section entitled ``Frequency of Route Selection'' simply
121	  recommends that something be done and makes broad statements regarding
122	  certain properties that are desirable or undesirable.

124	  BGP4 retains the ``Frequency of Route Advertisement'' section and adds
125	  a ``Frequency of Route Origination'' section.  BGP-4 describes a
126	  method of limiting route advertisement involving a fixed
127	  (configurable) MinRouteAdvertisementInterval timer and fixed
128	  MinASOriginationInterval timer [5].  The recommended timer values of
129	  MinRouteAdvertisementInterval is 30 seconds and
130	  MinASOriginationInterval is 15 seconds.

132	2.2  Desirable Properties of Damping Algorithms

134	  Before describing damping algorithms the objectives need to be clearly
135	  defined.  Some key properties are examined to clarify the design
136	  rationale.

138	  The overall objective is to reduce the route update load without
139	  limiting convergence time for well behaved routes.  To accomplish
140	  this, criteria must be defined for well behaved and poorly behaved
141	  routes.  An algorithm must be defined which allows poorly behaved
142	  routes to be identified.  Ideally, this measure would be a prediction
143	  of the future stability of a route.

145	  Any delay in propagation of well behaved routes should be minimal.
146	  Some delay is tolerable to support better packing of updates.  Delay
147	  of poorly behave routes should, if possible, be proportional to a
148	  measure of the expected future instability of the route.  Delay in
149	  propagating an unstable route should cause the unstable route to be
150	  suppressed until there is some degree of confidence that the route has
151	  stabilized.

153	  If a large number of route changes are received in separate updates
154	  over some very short period of time and these updates have the
155	  potential to be combined into a single update then these should be
156	  packed as efficiently as possible before propagating further.  Some
157	  small delay in propagating well behaved routes is tolerable and is
158	  necessary to allow better packing of updates.

160	  Where routes are unstable, use and announcement of the routes should
161	  be suppressed rather than suppressing their removal.  Where one route
162	  to a destination is stable, and another route to the same destination
163	  is somewhat unstable, if possible, the unstable route should be
164	  suppressed more aggressively than if there were no alternate path.

166	  Routing consistency within an AS is very important.  Only very minimal
167	  delay of internal BGP (IBGP) should be done.  Routing consistency
168	  across AS boundaries is also very important.  It is highly undesirable
169	  to advertise a route that is different from the route that is being
170	  used, except for a very minimal time.  It is more desirable to
171	  suppress the acceptance of a route (and therefore the use of that
172	  route in the IGP) rather than suppress only the redistribution.

174	  It is clearly not possible to accurately predict the future stability
175	  of a route.  The recent history of stability is generally regarded as
176	  a good basis for estimating the likelihood of future stability.  The
177	  criteria that is used to distinguish well behaved from poorly behaved
178	  routes is therefore based on the recent history of stability of the
179	  route.  There is no simple quantitative expression of recent stability
180	  so a figure of merit must be defined.  Some desirable characteristics
181	  of this figure of merit would be that the farther in the past that
182	  instability occurred, the less it's affect on the figure of merit and
183	  that the instability measure would be cumulative rather than
184	  reflecting only the most recent event.

186	  The algorithms should behave such that for routes which have a history
187	  of stability but make a few transitions, those transitions should be
188	  made quickly.  If transitions continue, advertisement of the route
189	  should be suppressed.  There should be some memory of prior instabil-
190	  ity.  The degree to which prior instability is considered should be
191	  gradually reduced as long as the route remains announced and stable.

193	2.3  Design Choices

195	  After routes have been accepted their readvertisement will be briefly
196	  suppressed to improve packing of updates.  There may be a lengthy
197	  suppression of the acceptance of an external route.  How long a route
198	  will be suppressed is based on a figure of merit that is expected to
199	  be correlated to the probability of future instability of a route.
200	  Routes with high figure of merit values will be suppressed.  An
201	  exponential decay algorithm was chosen as the basis for reducing the
202	  figure of merit over time.  These choices should be viewed as
203	  suggestions for implementation.

205	  An exponential decay function has the property that previous
206	  instability can be remembered for a fairly long time.  The rate at
207	  which the instability figure of merit decays slows as time goes on.
208	  Exponential decay has the following property.

210	        f(f(figure-of-merit, t1), t2) = f(figure-of-merit, t1+t2)

212	  This property allows the decay for a long period to be computed in a
213	  single operation regardless of the current value (figure-of-merit).
214	  As a performance optimization, the decay can be applied in fixed time
215	  increments.  Given a desired decay half life, the decay for a single
216	  time increment can be computed ahead of time.  The decay for multiple
217	  time increments is expressed below.

219	        f(figure-of-merit, n*t0) = f(figure-of-merit, t0)**n = K**n

221	  The values of K ** n can be precomputed for a reasonable number of
222	  ``n'' and stored in an array.  The value of ``K'' is always less than
223	  one.  The array size can be bounded since the value quickly approaches
224	  zero.  This makes the decay easy to compute using an array bound
225	  check, an array lookup and a single multiply regardless as to how much
226	  time has elapsed.

228	3  Limiting Route Advertisements using Fixed Timers

230	  This method of limiting route advertisements involves the use of fixed
231	  timers applied to the process of sending routes.  It's primary purpose
232	  is to improve the packing of routes in BGP update messages.  The delay
233	  in advertising a stable route should be bounded and minimal.  The
234	  delay in advertising an unreachable need not be zero, but should also
235	  be bounded and should probably have a separate bound set less than or
236	  equal to the bound for a reachable advertisement.

238	  Routes that need to be readvertised can be marked in the RIB or an
239	  external set of structures maintained, which references the RIB.
240	  Periodically, a subset of the marked routes can be flushed.  This is
241	  fairly straightforward and accomplishes the objectives.  Computation
242	  for too simple an implementation may be order N squared.  To avoid N
243	  squared performance, some form of data structure is needed to group
244	  routes with common attributes.

246	  An implementation should pack updates efficiently, provide a minimum
247	  readvertisement delay, provide a bounds on the maximum readvertisement
248	  delay that would be experienced solely as a result of the algorithm
249	  used to provide a minimum delay, and must be computationally efficient
250	  in the presence of a very large number of candidates for
251	  readvertisement.

253	4  Stability Sensitive Suppression of Route Advertisement

255	  This method of limiting route advertisements uses a measure of route
256	  stability applied on a per route basis.  This technique is applied
257	  when receiving updates from external peers only (EBGP). Applying this
258	  technique to IBGP learned routes or to advertisement to IBGP or EBGP
259	  peers after making a route selection can result in routing loops.

261	  A figure of merit based on a measure of instability is maintained on a
262	  per route basis.  This figure of merit is used in the decision to
263	  suppress the use of the route.  Routes with high figure of merit are
264	  suppressed.  Each time a route is withdrawn, the figure of merit is
265	  incremented.  While the route is not changing the figure of merit
266	  value is decayed exponentially with separate decay rates depending on
267	  whether the route is stable and reachable or has been stable and
268	  unreachable.  The decay rate may be slower when the route is unreach-
269	  able, or the stability figure of merit could remain fixed (not decay
270	  at all) while the route remains unreachable.  Whether to decay un-
271	  reachable routes at the same rate, a slower rate, or not at all is an im-
272	  plementation choice.  Decaying at a slower rate is recommended.

274	  A very efficient implementation is suggested in the following
275	  sections.  The implementation only requires computation for the routes
276	  contained in an update, when an update is received or withdrawn (as
277	  opposed to the simplistic approach of periodically decaying each
278	  route).  The suggested implementation involves only a small number of
279	  simple operations, and can be implemented using scaled integers.

281	  The behavior of unstable routes is fairly predictable.  Severely
282	  flapping routes will often be advertised and withdrawn at regular time
283	  intervals corresponding to the timers of a particular protocol (the
284	  IGP or exterior protocol in use where the problem exists).  Marginal
285	  circuits or mild congestion can result in a long term pattern of
286	  occasional brief route withdrawal or occasional brief connectivity.

288	4.1  Single vs.  Multiple Configuration Parameter Sets

290	  The behavior of the algorithm is modified by a number of configurable
291	  parameters.  It is possible to configure separate sets of parameters
292	  designed to handle short term severe route flap and chronic milder
293	  route flap (a pattern of occasional drops over a long time period).
294	  The former would require a fast decay and low threshold (allowing a
295	  small number of consecutive flaps to cause a route to be suppressed,
296	  but allowing it to be reused after a relatively short period of
297	  stability).  The latter would require a very slow decay and a higher
298	  threshold and might be appropriate for routes for which there was an
299	  alternate path of similar bandwidth.

301	  It may also be desirable to configure different thresholds for routes
302	  with roughly equivalent alternate paths than for routes where the
303	  alternate paths have a lower bandwidth or tend to be congested.  This
304	  can be solved by associating a different set of parameters with
305	  different ranges of preference values.  Parameter selection could be
306	  based on BGP LOCAL_PREF.

308	  Parameter selection could also be based on whether an alternate route
309	  was known.  A route would be considered if, for any applicable
310	  parameter set, an alternate route with the specified preference value
311	  existed and the figure of merit associated with the parameter set did
312	  not indicate a need to suppress the route.  A less aggressive
313	  suppression would be applied to the case where no alternate route at
314	  all existed.  In the simplest case, a more aggressive suppression
315	  would be applied if any alternate route existed.  Only the highest
316	  preference (most preferred) value needs to be specified, since the
317	  ranges may overlap.

319	  It might also be desirable to configure a different set of thresholds
320	  for routes which rely on switched services and may disconnect at times
321	  to reduce connect charges.  Such routes might be expected to change
322	  state somewhat more often, but should be suppressed if continuous
323	  state changes indicate instability.

325	  While not essential, it might be desirable to be able to configure
326	  multiple sets of configuration parameters per route.  It may also be
327	  desirable to be able to configure sets of parameters that only
328	  correspond to a set of routes (identified by AS path, peer router,
329	  specific destinations or other means).  Experience may dictate how
330	  much flexibility is needed and how to best to set the parameters.
331	  Whether to allow different damping parameter sets for different
332	  routes, and whether to allow multiple figures of merit per route is an
333	  implementation choice.

335	  Parameter selection can also be based on prefix length.  The rationale
336	  is that longer prefixes tend to reach less end systems and are less
337	  important and these less important prefixes can be damped more
338	  aggressively.  This technique is in fairly widespread use.  Small
339	  sites or those with dense address allocation who are multihomed are
340	  often reachable by long prefixes which are not easily aggregated.
341	  These sites tend to dispute the choice of prefix length for parameter
342	  selection.  Advocates of the technique point out that it encourages
343	  better aggregation.

345	4.2  Configuration Parameters

347	  At configuration time, a number of parameters may be specified by the
348	  user.  The configuration parameters are expressed in units meaningful
349	  to the user.  These differ from the parameters used at run time which
350	  are in unit convenient for computation.  The run time parameters are
351	  derived from the configuration parameters.  Suggested configuration
352	  parameters are listed below.

354	cutoff threshold (cut)

356	     This value is expressed as a number of route withdrawals.  It is
357	     the value above which a route advertisement will be suppressed.

359	reuse threshold (reuse)

361	     This value is expressed as a number of route withdrawals.  It is
362	     the value below which a suppressed route will now be used again.

364	maximum hold down time (T-hold)

366	     This value is the maximum time a route can be suppressed no matter
367	     how unstable it has been prior to this period of stability.

369	decay half life while reachable (decay-ok)

371	     This value is the time duration in minutes or seconds during which
372	     the accumulated stability figure of merit will be reduced by half
373	     if the route if considered reachable (whether suppressed or not).

375	decay half life while unreachable (decay-ng)

377	     This value is the time duration in minutes or seconds during which
378	     the accumulated stability figure of merit will be reduced by half
379	     if the route if considered unreachable.  If not specified or set to
380	     zero, no decay will occur while a route remains unreachable.

382	decay memory limit (Tmax-ok or Tmax-ng)

384	     This is the maximum time that any memory of previous instability
385	     will be retained given that the route's state remains unchanged,
386	     whether reachable or unreachable.  This parameter is generally used
387	     to determine array sizes.

389	  There may be multiple sets of the parameters above as described in
390	  Section 4.1.  The configuration parameters listed below would be
391	  applied system wide.  These include the time granularity of all
392	  computations, and the parameters used to control reevaluation of
393	  routes that have previously been suppressed.

395	time granularity (delta-t)

397	     This is the time granularity in seconds used to perform all decay
398	     computations.

400	reuse list time granularity (delta-reuse)

402	     This is the time interval between evaluations of the reuse lists.
403	     Each reuse lists corresponds to an additional time increment.

405	reuse list memory reuse-list-max

407	     This is the time value corresponding to the last reuse list.  This
408	     may be the maximum value of T-hold for all parameter sets of may be
409	     configured.

411	number of reuse lists (reuse-list-size)

413	     This is the number of reuse lists.  It may be determined from
414	     reuse-list-max or set explicitly.

416	  A necessary optimization is described in Section 4.8.6 that involves
417	  an array referred to as the ``reuse index array''.  A reuse index
418	  array is needed for each decay rate in use.  The reuse index array is
419	  used to estimate which reuse list to place a route when it is
420	  suppressed.  Proper placement avoids the need to periodically evaluate
421	  decay to determine if a route can be reused or when storage can be
422	  recovered.  Using the reuse index array avoids the need to compute a
423	  logarithm to determine placement.  One additional system wide
424	  parameter can be introduced.

426	reuse index array size (reuse-index-array-size)

428	     This is the size of reuse index arrays.  This size determines the
429	     accuracy with which suppressed routes can be placed within the set
430	     of reuse lists when suppressed for a long time.

432	4.3  Guidelines for Setting Parameters

434	  The decay half life should be set to a time considerably longer than
435	  the period of the route flap it is intended to address.  For example,
436	  if the decay is set to ten minutes and a route is withdrawn and
437	  readvertised exactly every ten minutes, the route would continue to
438	  flap if the cutoff was set to a value of 2 or above.

440	  The stability figure of merit itself is an accumulated time decayed
441	  total.  This must be kept in mind in setting the decay time, cutoff
442	  values and reuse values.  For example, if a route flaps at four times
443	  the decay rate, it will reach 3 in 4 cycles, 4 in 6 cycles, 5 in 10
444	  cycles, and will converge at about 6.3.  At twice the decay time, it
445	  will reach 3 in 7 cycles, and converge at a value of less than 3.5.

447	  Figure 1 shows the stability figure of merit for route flap at a
448	  constant rate.  The time axis is labeled in multiples of the decay
449	  half life.  The plots represent route flap with a period of 1/2, 1/3,
450	  1/4, and 1/8 times the decay half life.  A ceiling of 4.5 was set,
451	  which can be seen to affect three of the plots, effectively limiting
452	  the time it takes to readvertise the route regardless of the prior
453	  history.  With the cutoff and reuse thresholds suggested by the dotted
454	  lines, routes would be suppressed after being declared unreachable 2-3
455	  times and be used again after approximately 2 decay half life periods
456	  of stability.

458	  From the maximum hold time value (T-hold), a ratio of the reuse value
459	  to a ceiling can be determined.  An integer value for the ceiling can
460	  then be chosen such that overflow will not be a problem and all other
461	  values can be scaled accordingly.  If both cutoffs are specified or if
462	  multiple parameter sets are used the highest ceiling will be used.

464	     time      figure-of-merit as a function of time

466	     0.00    0.000 .         0.000 .         0.000 .         0.000 .
467	     0.08    0.000 .         0.000 .         0.000 .         0.000 .
468	     0.16    0.000 .         0.000 .         0.000 .         0.973  .
469	     0.24    0.000 .         0.000 .         0.000 .         0.920  .
470	     0.32    0.000 .         0.000 .         0.946  .        1.817    .
471	     0.40    0.000 .         0.953  .        0.895  .        2.698     .
472	     0.48    0.000 .         0.901  .        0.847  .        2.552     .
473	     0.56    0.953  .        0.853  .        1.754    .      3.367      .
474	     0.64    0.901  .        0.807  .        1.659   .       4.172        .
475	     0.72    0.853  .        1.722    .      1.570   .       3.947        .
476	     0.80    0.807  .        1.629   .       2.444     .     4.317        .
477	     0.88    0.763  .        1.542   .       2.312     .     4.469        .
478	     0.96    0.722  .        1.458   .       2.188    .      4.228        .
479	     1.04    1.649   .       2.346     .     3.036      .    4.347        .
480	     1.12    1.560   .       2.219    .      2.872      .    4.112        .
481	     1.20    1.476   .       2.099    .      2.717     .     4.257        .
482	     1.28    1.396   .       1.986    .      3.543       .   4.377        .
483	     1.36    1.321   .       2.858      .    3.352      .    4.141        .
484	     1.44    1.250   .       2.704     .     3.171      .    4.287        .
485	     1.52    2.162    .      2.558     .     3.979        .  4.407        .
486	     1.60    2.045    .      2.420     .     3.765       .   4.170        .
487	     1.68    1.935    .      3.276      .    3.562       .   4.317        .
488	     1.76    1.830    .      3.099      .    4.356        .  4.438        .
489	     1.84    1.732    .      2.932      .    4.121        .  4.199        .
490	     1.92    1.638   .       2.774     .     3.899       .   3.972        .
491	     2.00    1.550   .       2.624     .     3.688       .   3.758       .
492	     2.08    1.466   .       2.483     .     3.489       .   3.555       .
493	     2.16    1.387   .       2.349     .     3.301      .    3.363      .
494	     2.24    1.312   .       2.222    .      3.123      .    3.182      .
495	     2.32    1.242   .       2.102    .      2.955      .    3.010      .
496	     2.40    1.175   .       1.989    .      2.795     .     2.848      .
497	     2.48    1.111  .        1.882    .      2.644     .     2.694     .
498	     2.56    1.051  .        1.780    .      2.502     .     2.549     .
499	     2.64    0.995  .        1.684   .       2.367     .     2.411     .
500	     2.72    0.941  .        1.593   .       2.239    .      2.281     .
501	     2.80    0.890  .        1.507   .       2.118    .      2.158    .
502	     2.88    0.842  .        1.426   .       2.004    .      2.042    .
503	     2.96    0.797  .        1.349   .       1.896    .      1.932    .
504	     3.04    0.754  .        1.276   .       1.794    .      1.828    .
505	     3.12    0.713  .        1.207   .       1.697    .      1.729    .
506	     3.20    0.675  .        1.142   .       1.605   .       1.636   .
507	     3.28    0.638  .        1.081  .        1.519   .       1.547   .
508	     3.36    0.604  .        1.022  .        1.437   .       1.464   .
509	     3.44    0.571  .        0.967  .        1.359   .       1.385   .

511	    Figure 1:  Instability figure of merit for flap at a constant rate
512	     time      figure-of-merit as a function of time

514	     0.00    0.000 .         0.000 .         0.000 .
515	     0.20    0.000 .         0.000 .         0.000 .
516	     0.40    0.000 .         0.000 .         0.000 .
517	     0.60    0.000 .         0.000 .         0.000 .
518	     0.80    0.000 .         0.000 .         0.000 .
519	     1.00    0.999  .        0.999  .        0.999  .
520	     1.20    0.971  .        0.971  .        0.929  .
521	     1.40    0.945  .        0.945  .        0.809  .
522	     1.60    0.919  .        0.865  .        0.704  .
523	     1.80    0.894  .        0.753  .        0.613  .
524	     2.00    1.812    .      1.657   .       1.535   .
525	     2.20    1.762    .      1.612   .       1.428   .
526	     2.40    1.714    .      1.568   .       1.244   .
527	     2.60    1.667   .       1.443   .       1.083  .
528	     2.80    1.622   .       1.256   .       0.942  .
529	     3.00    1.468   .       1.094  .        0.820  .
530	     3.20    2.400     .     2.036    .      1.694    .
531	     3.40    2.335     .     1.981    .      1.475   .
532	     3.60    2.271     .     1.823    .      1.284   .
533	     3.80    2.209    .      1.587   .       1.118  .
534	     4.00    1.999    .      1.381   .       0.973  .
535	     4.20    2.625     .     2.084    .      1.727    .
536	     4.40    2.285     .     1.815    .      1.503   .
537	     4.60    1.990    .      1.580   .       1.309   .
538	     4.80    1.732    .      1.375   .       1.139   .
539	     5.00    1.508   .       1.197   .       0.992  .
540	     5.20    1.313   .       1.042  .        0.864  .
541	     5.40    1.143   .       0.907  .        0.752  .
542	     5.60    0.995  .        0.790  .        0.654  .
543	     5.80    0.866  .        0.688  .        0.570  .
544	     6.00    0.754  .        0.599  .        0.496 .
545	     6.20    0.656  .        0.521 .         0.432 .
546	     6.40    0.571  .        0.454 .         0.376 .
547	     6.60    0.497 .         0.395 .         0.327 .
548	     6.80    0.433 .         0.344 .         0.285 .
549	     7.00    0.377 .         0.299 .         0.248 .
550	     7.20    0.328 .         0.261 .         0.216 .
551	     7.40    0.286 .         0.227 .         0.188 .
552	     7.60    0.249 .         0.197 .         0.164 .
553	     7.80    0.216 .         0.172 .         0.142 .
554	     8.00    0.188 .         0.150 .         0.124 .

556	           Figure 2:  Separate decay constants when unreachable

558	  Figure 2 show the effect of configuring separate decay rates to be
559	  used when the route is reachable or unreachable.  The decay rate is
560	  5 times slower when the route is unreachable.  In the three case
561	  shown, the period of the route flap is equal to the decay half life
562	  but the route is reachable 1/8 of the time in one, reachable 1/2 the
563	  time in one, and reachable 7/8 of the time in the other.  In the last
564	  case the route is not suppressed until after the third unreachable
565	  (when it is above the top threshold after becoming reachable again).

567	  In both Figure 1 and Figure 2, routes would be suppressed.  Routes
568	  flapping at the decay half life or less would be withdrawn two or
569	  three times and then remain withdrawn until they had remained stably
570	  announced and stable for on the order of 1 1/2 to 2 1/2 times the
571	  decay half life (given the ceiling in the example).

573	  A larger time granularity will keep table storage down.  The time
574	  granularity should be less than a minimal reasonable time between
575	  expected worse case route flaps.  It might be reasonable to fix this
576	  parameter at compile time or set a default and strongly recommend that
577	  the user leave it alone.  With an exponential decay, array size can be
578	  greatly reduced by setting a period of complete stability after which
579	  the decayed total will be considered zero rather than retaining a tiny
580	  quantity.  Alternately, very long decays can be implemented by
581	  multiplying more than once if array bounds are exceeded.

583	  The reuse lists hold suppressed routes grouped according to how long
584	  it will be before the routes are eligible for reuse.  Periodically
585	  each list will be advanced by one position and one list removed as de-
586	  scribed in Section 4.8.7.  All of the suppressed routes in the removed
587	  list will be reevaluated and either used or placed in another list
588	  according to how much additional time must elapse before the route can
589	  be reused.  The last list will always contain all the routes which
590	  will not be advertised for more time than is appropriate for the re-
591	  maining list heads.  When the last list advances to the front, some of
592	  the routes will not be ready to be used and will have to be requeued.
593	  The time interval for reconsidering suppressed routes and number of list
594	  heads should be configurable.  Reasonable defaults might be 30 seconds and
595	  64 list heads.  A route suppressed for a long time would need to be reeval-
596	  uated every 32 minutes.

598	4.4  Run Time Data Structures

600	  A fixed small amount of per system storage will be required.  Where
601	  sets of multiple configuration parameters are used, storage will be
602	  required per set of parameters.  A small amount of per route storage
603	  is required.  A set of list heads is needed.  These list heads are
604	  used to arrange suppressed routes according to the time remaining
605	  until they can be reused.

607	  A separate reuse list can be used to hold unreachable routes for the
608	  purpose of later recovering storage if they remain unreachable too
609	  long.  This might be more accurately described as a recycling list.
610	  The advantage this would provide is making free data structures
611	  available as soon as possible.  Alternately, the data structures can
612	  simply be placed on a queue and the storage recovered when the route
613	  hits the front of the queue and if storage is needed.  The latter is
614	  less optimal but simple.

616	  If multiple sets of configuration parameters are allowed per route,
617	  there is a need for some means of associating more than one figure of
618	  merit and set of parameters with each route.  Building a linked list
619	  of these objects seems like one of a number of reasonable
620	  implementations.  Similarly, a means of associating a route to a reuse
621	  list is required.  A small overhead will be required for the pointers
622	  needed to implement whatever data structure is chosen for the reuse
623	  lists.  The suggested implementation uses a double linked lists and so
624	  requires two pointers per figure of merit.

626	  Each set of configuration parameters can reference decay arrays and
627	  reuse arrays.  These arrays should be shared among multiple sets of
628	  parameters since their storage requirement is not negligible.  There
629	  will be only one set of reuse list heads for the entire router.

631	4.4.1  Data Structures for Configuration Parameter Sets

633	  Based on the configuration parameters described in the previous
634	  section, the following values can be computed as scaled integers
635	  directly from the corresponding configuration parameters.

637	  o  decay array scale factor (decay-array-scale-factor)

639	  o  cutoff value (cut)

641	  o  reuse value (reuse)

643	  o  figure of merit ceiling (ceiling)

645	  Each configuration parameter set will reference one or two decay
646	  arrays and one or two reuse arrays.  Only one array will be needed if
647	  the decay rate is the same while a route is unreachable as while it is
648	  reachable, or if the stability figure of merit does not decay while a
649	  route is unreachable.

651	4.4.2  Data Structures per Decay Array and Reuse Index Array

653	  The following are also computed from the configuration parameters
654	  though not as directly.

656	  o  decay rate per tick (decay-delta-t)

658	  o  decay array size (decay-array-size)

660	  o  decay array (decay[])

662	  o  reuse index array size (reuse-index-array-size)

664	  o  reuse index array (reuse-index-array[])

666	  For each decay rate specified, an array will be used to store the
667	  value of a computed parameter raised to the power of the index of each
668	  array element.  This is to speed computations.  The decay rate per
669	  tick is an intermediate value expressed as a real number and used to
670	  compute the values stored in the decay arrays.  The array size is
671	  computed from the decay memory limit configuration parameter expressed
672	  as an array size or as a maximum hold time.

674	  The decay array size must be of sufficient size to accommodate the
675	  specified decay memory given the time granularity, or sufficient to
676	  hold the number of array elements until integer rounding produces a
677	  zero result if that value is smaller, or a implementation imposed
678	  reasonable size to prevent configurations which use excessive memory.
679	  Implementations may chose to make the array size shorter and multiply
680	  more than once when decaying a long time interval to reduce storage.

682	  The reuse index arrays serve a similar purpose to the decay arrays.
683	  The amount of time until a route can be reused can be determined using
684	  a array lookup.  The array can be built given the decay rate.  The
685	  array is indexed using a scaled integer proportional to the ratio
686	  between a current stability figure of merit value and the value needed
687	  for the route to be reused.

689	4.4.3  Per Route State

691	  Information must be maintained per some tuple representing a route.
692	  At the very minimum, the NLRI (BGP prefix and length) must be
693	  contained in the tuple.  Different BGP attributes may be included or
694	  excluded depending on the specific situation.  The AS path should also
695	  be contained in the tuple be default.  The tuple may also optionally
696	  contain other BGP attributes such as MULTI_EXIT_DISCRIMINATOR (MED).

698	  The tuple representing a route for the purpose of route flap damping
699	  is:

701	      tuple entry            default      options
702	      -------------------------------------------
703	      NLRI
704	        prefix               required
705	        length               required
706	      AS path                included     option to exclude
707	      last AS set in path    excluded     option to include
708	      next hop               excluded     option to include
709	      MED                    excluded     option to include
710	                                          in comparisons only

712	  The AS path is generally included in order to identify downstream
713	  instability which is not being damped or not being sufficiently damped
714	  and is alternating between a stable and an unstable path.  Under rare
715	  circumstances it may be desirable to exclude AS path for all or a
716	  subset of prefixes.  If an AS path ends in an AS set, in practice the
717	  path is always for an aggregate.  Changes to the trailing AS set
718	  should be ignored.  Ideally the AS path comparison should insure that
719	  at least one AS has remained constant in the old and new AS set, but
720	  completely ignoring the contents of a trailing AS set is also
721	  acceptable.

723	  Including next hop and MED changes can help suppress the use of an AS
724	  which is internally unstable or avoid a next hop which is closer to an
725	  unstable IGP path in the adjacent AS. If a large number of MED values
726	  are used, the increase in the amount of state may become a problem.
727	  For this reason MED is disabled by default and enabled only as part of
728	  the tuple comparison, using a single state entry regardless of MED
729	  value.  Including MED will suppress the use of the adjacent AS even
730	  though the change need not be propagated further.  Using MED is only a
731	  safe practice if a path is known to exist through another AS or where
732	  there are enough peering sites with the adjacent AS such that routes
733	  heard at only a subset of the peering sites will be suppressed.

735	4.4.4  Data Structures per Route

737	  The following information must be maintained per route.  A route here
738	  is considered to be a tuple usually containing NLRI, next hop, and AS
739	  path as defined in Section 4.4.3.

741	stability figure of merit (figure-of-merit)
742	     Each route must have a stability figure of merit per applicable
743	     parameter set.

745	last time updated (time-update)

747	     The exact last time updated must be maintained to allow exponential
748	     decay of the accumulated figure of merit to be deferred until the
749	     route might reasonable be considered eligible for a change in
750	     status (having gone from unreachable to reachable or advancing
751	     within the reuse lists).

753	config block pointer

755	     Any implementation that supports multiple parameter sets must
756	     provide a means of quickly identifying which set of parameters
757	     corresponds to the route currently being considered.  For
758	     implementations supporting only parameter sets where all routes
759	     must be treated the same, this pointer is not required.

761	reuse list traversal pointers

763	     If doubly linked lists are used to implement reuse lists, then two
764	     pointers will be needed, previous and next.  Generally there is a
765	     double linked list which is unused when a route is suppressed from
766	     use that can be used for reuse list traversal eliminating the need
767	     for additional pointer storage.

769	4.5  Processing Configuration Parameters

771	  From the configuration parameters, it is possible to precompute a
772	  number of values that will be used repeatedly and retain these to
773	  speed later computations that will be required frequently.

775	  Scaling is usually dependent on the highest value that figure-of-merit
776	  can attain, referred to here as the ceiling.  The real number value of
777	  the ceiling will typically be determined by the following equation.

779	        ceiling = reuse * (exp(T-hold/decay-half-life) * log(2))

781	  The methods of scaled integer arithmetic are not described in detail
782	  here.  The methods of determining the real values are given.
783	  Translation into scaled integer values and the details of scaled
784	  integer arithmetic are left up to the individual implementations.

786	figure of merit scale factor ( scale-figure-of-merit )

788	     The ceiling value can be set to be the largest integer that can fit
789	     in half the bits available for an unsigned integer.  This will
790	     allow the scaled integers to be multiplied by the scaled decay
791	     value and then shifted down.  Implementations may prefer to use
792	     real numbers or may use any integer scaling deemed appropriate for
793	     their architecture.

795	penalty value and thresholds (as proportional scaled integers)

797	     The figure of merit penalty for one route withdrawal and the cutoff
798	     values must be scaled according to the above scaling factor.

800	decay rate per tick (decay[1])

802	     The decay value per increment of time as defined by the time
803	     granularity must be determined (at least initially as a floating
804	     point number).  The per tick decay is a number slightly less than
805	     one.  It is the Nth root of the one half where N is the half life
806	     divided by the time granularity.

808	          decay[1] = exp ((1 / (decay-half-life/delta-t)) * log
809	       (1/2))

811	decay array size (decay-array-size)

813	     The decay array size is the decay memory divided by the time
814	     granularity.  If integer truncation brings the value of an array
815	     element to zero, the array can be made smaller.  An implementation
816	     should also impose a maximum reasonable array size or allow more
817	     than one multiplication.

819	          decay-array-size = (Tmax/delta-t)

821	decay array (decay[])

823	     Each i-th element of the decay array is the per tick delay raised
824	     to the i-th power.  This might be best done by successive floating
825	     point multiplies followed by scaling and integer rounding or
826	     truncation.  The array itself need only be computed at startup.

828	          decay[i] = decay[1] ** i

830	4.6  Building the Reuse Index Arrays

832	  The reuse lists may be accessed quite frequently if a lot of routes
833	  are flapping sufficiently to be suppressed.  A method of speeding the
834	  determination of which reuse list to use for a given route is
835	  suggested.  This method is introduced in Section 4.2, its
836	  configuration described in Section 4.4.2 and the algorithms described
837	  in Section 4.8.6 and Section 4.8.7.  This section describes building
838	  the reuse list index arrays.

840	  A ratio of the figure of merit of the route under consideration to the
841	  cutoff value is used as the basis for an array lookup.  The ratio is
842	  scaled and truncated to an integer and used to index the array.  The
843	  array entry is an integer used to determine which reuse list to use.

845	reuse array maximum ratio (max-ratio)

847	     This is the maximum ratio between the current value of the
848	     stability figure of merit and the target reuse value that can be
849	     indexed by the reuse array.  It may be limited by the ceiling
850	     imposed by the maximum hold time or by the amount of time that the
851	     reuse lists cover.

853	          max-ratio = min(ceiling/reuse, exp((1 /
854	       (half-life/reuse-array-time)) * log(2)))

856	reuse array scale factor ( scale-factor )

858	     Since the reuse array is an estimator, the reuse array scale factor
859	     has to be computed such that the full size of the reuse array is
860	     used.

862	          scale-factor = reuse-index-array-size / (max-ratio - 1)

864	reuse index array (reuse-index-array[])

866	     Each reuse index array entry should contain an index into the reuse
867	     list array pointing to one of the list heads.  This index should
868	     corresponding to the reuse list that will be evaluated just after a
869	     route would be eligible for reuse given the ratio of current value
870	     of the stability figure of merit to target reuse value
871	     corresponding the the reuse array entry.

873	          reuse-index-array[j] = integer((decay-half-life /

875	       reuse-time-granularity) * log(1/(reuse * (1 + (j /
876	       scale-factor)))) / log(1/2))

878	  To determine which reuse queue to place a route which is being sup-
879	  pressed, the following procedure is used.  Divide the current figure
880	  of merit by the cutoff.  Subtract one.  Multiply by the scale factor.
881	  This is the index into the reuse index array (reuse-index-array[]).
882	  The value fetched from the reuse index array (reuse-index-array[]) is
883	  an index into the array of reuse lists (reuse-array[]).  If this index
884	  is off the end of the array use the last queue otherwise look in the
885	  array and pick the number of the queue from the array at that index.
886	  This is quite fast and well worth the setup and storage required.

888	4.7  A Sample Configuration

890	  A simple example is presented here in which the space overhead is
891	  estimated for a set of configuration parameters.  The design here
892	  assumes:

894	 1.  there is a single parameter set used for all routes,

896	 2.  decay time for unreachable routes is slower than for reachable
897	     routes

899	 3.  the arrays must be full size, rather than allow more than one
900	     multiply per decay operation to reduce the array size.

902	  This example is used in later sections.  The use of multiple parameter
903	  sets complicates the examples somewhat.  Where multiple parameter sets
904	  are allowed for a single route, the decay portion of the algorithm is
905	  repeated for each parameter set.  If different routes are allowed to
906	  have different parameter sets, the routes must have pointers to the
907	  parameter sets to keep the time to locate to a minimum, but the
908	  algorithms are otherwise unchanged.

910	  A sample set of configuration parameters and a sample set of
911	  implementation parameters are provided in in the two following lists.

913	 1.  Configuration Parameters

915	     o cut = 1.25

917	     o reuse = 0.5
918	     o T-hold = 15 mins

920	     o decay-ok = 5 min

922	     o decay-ng = 15 min

924	     o Tmax-ok, Tmax-ng = 15, 30 mins

926	 2.  Implementation Parameters

928	     o delta-t = 1 sec

930	     o delta-reuse

932	     o reuse-list-size = 256

934	     o reuse-index-array-size = 1,024

936	  Using these configuration and implementation parameters and the
937	  equations in Section 4.5, the space overhead can be computed.  There
938	  is a fixed space overhead that is independent of the number of routes.
939	  There is a space requirement associated with a stable route.  There is
940	  a larger space requirement associated with an unstable route.  The
941	  space requirements for the parameters above are provide in the lists
942	  below.

944	 1.  fixed overhead (using parameters from previous example)

946	     o 900 * integer - decay array

948	     o 1,800 * integer - decay array

950	     o 120 * pointer - reuse list-heads

952	     o 2,048 * integer - reuse index arrays

954	 2.  overhead per stable route

956	     o pointer - containing null entry

958	 3.  overhead per unstable route

960	     o pointer - to a damping structure containing the following

962	     o integer - figure of merit  + bit for state
963	     o integer - last time updated

965	     o pointer (optional) to configuration parameter block

967	     o 2 * pointer - reuse list pointers (prev, next)

969	  Figure 3 shows the behavior of the algorithm with the parameters given
970	  above.  Four cases are given in this example.  In all four, there is a
971	  twelve minute period of route oscillations.  Two periods of oscilla-
972	  tion are used, 2 minutes and 4 minutes.  Two duty cycles are used, one
973	  in which the route is reachable during 20% of the cycle and the other
974	  where the route is reachable during 80% of the cycle.  In all four
975	  cases, the route becomes suppressed after it becomes unreachable the
976	  second time.  Once suppressed, it remains suppressed until some period
977	  after becoming stable.  The routes which oscillate over a 4 minute pe-
978	  riod are no longer suppressed within 9-11 minutes after becoming sta-
979	  ble.  The routes with a 2 minute period of oscillation are suppressed for
980	  nearly the maximum 15 minute period after becoming stable.

982	4.8  Processing Routing Protocol Activity

984	  The prior sections concentrate on configuration parameters and their
985	  relationship to the parameters and arrays used at run time and provide
986	  the algorithms for initializing run time storage.  This section
987	  provides the steps taken in processing routing events and timer events
988	  when running.

990	  The routing events are:

992	 1.  A BGP peer or new route comes up for the first time (or after an
993	     extended down time) (Section 4.8.1)

995	 2.  A route becomes unreachable (Section 4.8.2)

997	 3.  A route becomes reachable again (Section 4.8.3)

999	 4.  A route changes (Section 4.8.4)

1001	 5.  A peer goes down (Section 4.8.5)

1003	  The reuse list is used to provide a means of fast evaluation of route
1004	  that had been suppressed, but had been stable long enough to be reused
1005	  again or had been suppressed long enough that it can be treated as a
1006	  new route.  The following two operations are described.

1008	     time      figure-of-merit as a function of time

1010	     0.00    0.000 .         0.000 .         0.000 .         0.000 .
1011	     0.62    0.000 .         0.000 .         0.000 .         0.000 .
1012	     1.25    0.000 .         0.000 .         0.000 .         0.000 .
1013	     1.88    0.000 .         0.000 .         0.000 .         0.000 .
1014	     2.50    0.977  .        0.968  .        0.000 .         0.000 .
1015	     3.12    0.949  .        0.888  .        0.000 .         0.000 .
1016	     3.75    0.910  .        0.814  .        0.000 .         0.000 .
1017	     4.37    1.846    .      1.756    .      0.983  .        0.983  .
1018	     5.00    1.794    .      1.614    .      0.955  .        0.935  .
1019	     5.63    1.735    .      1.480   .       0.928  .        0.858  .
1020	     6.25    2.619      .    2.379     .     0.901  .        0.786  .
1021	     6.88    2.544      .    2.207     .     0.876  .        0.721  .
1022	     7.50    2.472     .     2.024     .     0.825  .        0.661  .
1023	     8.13    3.308       .   2.875      .    1.761    .      1.608    .
1024	     8.75    3.213       .   2.698      .    1.711    .      1.562    .
1025	     9.38    3.122       .   2.474     .     1.662    .      1.436   .
1026	    10.00    3.922        .  3.273       .   1.615    .      1.317   .
1027	    10.63    3.810        .  3.107       .   1.569    .      1.207   .
1028	    11.25    3.702        .  2.849      .    1.513    .      1.107   .
1029	    11.88    3.498       .   2.613      .    1.388   .       1.015   .
1030	    12.50    3.904        .  3.451       .   2.312     .     1.953    .
1031	    13.13    3.580        .  3.164       .   2.120     .     1.791    .
1032	    13.75    3.283       .   2.902      .    1.944    .      1.643    .
1033	    14.38    3.010       .   2.661      .    1.783    .      1.506    .
1034	    15.00    2.761      .    2.440     .     1.635    .      1.381   .
1035	    15.63    2.532      .    2.238     .     1.499   .       1.267   .
1036	    16.25    2.321     .     2.052     .     1.375   .       1.161   .
1037	    16.88    2.129     .     1.882    .      1.261   .       1.065   .
1038	    17.50    1.952    .      1.725    .      1.156   .       0.977  .
1039	    18.12    1.790    .      1.582    .      1.060   .       0.896  .
1040	    18.75    1.641    .      1.451   .       0.972  .        0.821  .
1041	    19.38    1.505    .      1.331   .       0.891  .        0.753  .
1042	    20.00    1.380   .       1.220   .       0.817  .        0.691  .
1043	    20.62    1.266   .       1.119   .       0.750  .        0.633  .
1044	    21.25    1.161   .       1.026   .       0.687  .        0.581  .
1045	    21.87    1.064   .       0.941  .        0.630  .        0.533  .
1046	    22.50    0.976  .        0.863  .        0.578  .        0.488 .
1047	    23.12    0.895  .        0.791  .        0.530  .        0.448 .
1048	    23.75    0.821  .        0.725  .        0.486 .         0.411 .
1049	    24.37    0.753  .        0.665  .        0.446 .         0.377 .
1050	    25.00    0.690  .        0.610  .        0.409 .         0.345 .

1052	  Figure 3:  Some fairly long route flap cycles, repeated for 12
1053	  minutes, followed by a period of stability.

1055	 1.  Inserting into a reuse list (Section 4.8.6)

1057	 2.  Reuse list processing every delta-t seconds (Section 4.8.7)

1059	4.8.1  Processing a New Peer or New Routes

1061	  When a peer comes up, no action is required if the routes had no
1062	  previous history of instability, for example if this is the first time
1063	  the peer is coming up and announcing these routes.  For each route,
1064	  the pointer to the damping structure would be zeroed and route used.
1065	  The same action is taken for a new route or a route that has been down
1066	  long enough that the figure of merit reached zero and the damping
1067	  structure was deleted.

1069	4.8.2  Processing Unreachable Messages

1071	  When a route is withdrawn or changed (Section 4.8.4 describes how a
1072	  change is handled), the following procedure is used.

1074	  If there is no previous stability history (the damping structure
1075	  pointer is zero), then:

1077	 1.  allocate a damping structure

1079	 2.  set figure-of-merit = 1

1081	 3.  withdraw the route

1083	  Otherwise, if there is an existing damping structure, then:

1085	 1.  set t-diff = t-now - t-updated

1087	 2.  if (t-diff puts you off the end of the array) {

1089	       set figure-of-merit = 1

1091	     } else {

1093	       set figure-of-merit = figure-of-merit * decay-array-ok [t-diff] + 1

1095	       if (figure-of-merit > ceiling) {

1097	         set figure-of-merit = ceiling

1099	       }

1101	     }

1103	 3.  remove the route from a reuse list if it is on one

1105	 4.  withdraw the route unless it is already suppressed

1107	  In either case then:

1109	 1.  set t-updated = t-now

1111	 2.  insert into a reuse list (see Section 4.8.6)

1113	  If there was a stability history, the previous value of the stability
1114	  figure of merit is decayed.  This is done using the decay array
1115	  (decay-array).  The index is determined by subtracting the current
1116	  time and the last time updated, then dividing by the time granularity.
1117	  If the index is zero, the figure of merit is unchanged (no decay).  If
1118	  it is greater than the array size, it is zeroed.  Otherwise use the
1119	  index to fetch a decay array element and multiply the figure of merit
1120	  by the array element.  If using the suggested scaled integer method,
1121	  shift down half an integer.  Add the scaled penalty for one more un-
1122	  reachable (shown above as 1).  If the result is above the ceiling re-
1123	  place it with the ceiling value.  Now update the last time updated field
1124	  (preferably taking into account how much time was truncated before doing
1125	  the decay calculation).

1127	  When a route becomes unreachable, alternate paths must be considered.
1128	  This process is complicated slightly if different configuration param-
1129	  eters are used in the presence or absence of viable alternate paths.
1130	  If all of these alternate paths have been suppressed because there had
1131	  previously been an alternate route and the new route withdrawal
1132	  changes that condition, the suppressed alternate paths must be reeval-
1133	  uated.  They should be reevaluated in order of normal route prefer-
1134	  ence.  When one of these alternate routes is encountered that had been
1135	  suppressed but is now usable since there is no alternate route, no
1136	  further routes need to be reevaluated.  This only applies if routes
1137	  are given two different reuse thresholds, one for use when there is an al-
1138	  ternate path and a higher threshold to use when suppressing the route would
1139	  result in making the destination completely unreachable.

1141	4.8.3  Processing Route Advertisements

1143	  When a route is readvertised if there is no damping structure, then
1144	  the procedure is the same as in Section 4.8.1.

1146	 1.  don't create a new damping structure

1148	 2.  use the route

1150	  If an damping structure exists, the figure of merit is decayed and the
1151	  figure of merit and last time updated fields are updated.  A decision
1152	  is now made as to whether the route can be used immediately or needs
1153	  to be suppressed for some period of time.

1155	 1.  set t-diff = t-now - t-updated

1157	 2.  if (t-diff puts you off the end of the array) {

1159	       set figure-of-merit = 0

1161	     } else {

1163	       set figure-of-merit = figure-of-merit * decay-array-ng [t-diff]

1165	     }

1167	 3.  if (not suppressed and figure-of-merit < cut) {

1169	       use the route

1171	     } else if (suppressed and figure-of-merit < reuse) {

1173	       set state to not suppressed

1175	       remove the route from a reuse list

1177	       use the route

1179	     } else {

1181	       set state to suppressed

1183	       don't use the route

1185	       insert into a reuse list (see Section 4.8.6)

1187	     }

1189	 4.  if (figure-of-merit > 0) {

1191	       set t-updated = t-now

1193	     } else {

1195	       recover memory for damping struct

1197	       zero pointer to damping struct

1199	     }

1201	  If the route is deemed usable, a search for the current best route
1202	  must be made.  The newly reachable route is then evaluated according
1203	  to the BGP protocol rules for route selection.

1205	  If the new route is usable, the previous best route is examined.
1206	  Prior to route comparisons, the current best route may have to be
1207	  reevaluated if separate parameter sets are used depending on the
1208	  presence or absence of an alternate route.  If there had been no
1209	  alternate the previous best route may be suppressed.

1211	  If the new route is to be suppressed it is placed on a reuse list only
1212	  if it would have been preferred to the current best route had the new
1213	  route been accepted as stable.  There is no reason to queue a route on
1214	  a reuse list if after the route becomes usable it would not be used
1215	  anyway due to the existence of a more preferred route.  Such a route
1216	  would not have to be reevaluated unless the preferred route became
1217	  unreachable.  As specified here, the less preferred route would be
1218	  reevaluated and potentially used or potentially added to a reuse list
1219	  when processing the withdrawal of a more preferred best route.

1221	4.8.4  Processing Route Changes

1223	  If a route is replaced by a peer router by supplying a new path, the
1224	  route that is being replaced should be treated as if an unreachable
1225	  were received (see Section 4.8.2).  This will occur when a peer
1226	  somewhere back in the AS path is continuously switching between two AS
1227	  paths and that peer is not damping route flap (or applying less
1228	  damping).  There is no way to determine if one AS path is stable and
1229	  the other is flapping, or if they are both flapping.  If the cycle is
1230	  sufficiently short compared to convergence times neither route through
1231	  that peer will deliver packets very reliably.  Since there is no way
1232	  to affect the peer such that it chooses the stable of the two AS
1233	  paths, the only viable option is to penalize both routes by considering
1234	  each change as an unreachable followed by a route advertisement.

1236	4.8.5  Processing A Peer Router Loss

1238	  When a peer routing session is broken, either all individual routes
1239	  advertised by that peer may be marked as unstable, or the peering
1240	  session itself may be marked as unstable.  Marking the peer will save
1241	  considerable memory.  Since the individual routes are advertised as
1242	  unreachable to routers beyond the immediate problem, per route state
1243	  will be incurred beyond the peer immediately adjacent to the BGP
1244	  session that went down.  If the instability continues, the immediately
1245	  adjacent router need only keep track of the peer stability history.
1246	  The routers beyond that point will receive no further advertisements
1247	  or withdrawal of routes and will dispose of the damping structure over
1248	  time.

1250	  BGP notification through an optional transitive attribute that damping
1251	  will already be applied may be considered in the future to reduce the
1252	  number of routers that incur damping structure storage overhead.

1254	4.8.6  Inserting into the Reuse Timer List

1256	  The reuse lists are used to provide a means of fast evaluation of
1257	  route that had been suppressed, but had been stable long enough to be
1258	  reused again.  The data structure consists of a series of list heads.
1259	  Each list contains a set of routes that are scheduled for reevaluation
1260	  at approximately the same time.  The set of reuse list heads are
1261	  treated as a circular array.

1263	  A simple implementation of the circular array of list heads would be
1264	  an array containing the list heads with an offset.  The offset would
1265	  identify the first list.  The Nth list would be at the index
1266	  corresponding to N plus the offset modulo the number of list heads.
1267	  This design will be assumed in the examples that follow.

1269	  A key requirement is to be able to insert an entry in the most
1270	  appropriate queue with a minimum of computation.  The computation is
1271	  given only the current value of figure-of-merit.  The array, scale,
1272	  and bounds are precomputed to map figure-of-merit to the nearest list
1273	  head without requiring a logarithm to be computed (see Section 4.5).

1275	 1.  scale figure-of-merit for the index array lookup producing index

1277	 2.  check index against the array bound

1279	 3.  if (within the array bound) {

1281	       set index = reuse-array [index]

1283	     } else {

1285	       set index = reuse-list-size - 1

1287	     }

1289	  4. insert into the list

1291	       reuse-list [modulo reuse-list-size (index + offset)]

1293	  Choosing the correct reuse list involves only a multiply and shift to
1294	  do the scaling, an integer truncation, then an array lookup.  The most
1295	  common method of implementing a circular array is to use an array and
1296	  apply an offset and modulo operation to pick the correct array entry.
1297	  The offset is incremented to rotate the the circular array.

1299	4.8.7  Handling Reuse Timer Events

1301	  The granularity of the reuse timer should be more course that that of
1302	  the decay timer.  As a result, when the reuse timer fires, suppressed
1303	  routes should be decayed by multiple increments of decay time.  Some
1304	  computation can be avoided by always inserting into the reuse list
1305	  corresponding to one time increment past reuse eligibility.  In cases
1306	  where the reuse lists have a longer ``memory'' than the ``decay
1307	  memory'' (described above), all of the routes in the first queue will
1308	  be available for immediate reuse if reachable or the history entry
1309	  could be disposed of if unreachable.

1311	  When it is time to advance the lists, the first queue on the reuse
1312	  list must be processed and the circular queue must be rotated.  Using
1313	  an array and an offset as a circular array (as described in
1314	  Section 4.8.6), the algorithm below is repeated every t-reuse seconds.

1316	 1.  save a pointer to the current zeroth queue head and zero the list
1317	     head entry

1319	 2.  set offset = modulo reuse-list-size ( offset + 1 ), thereby
1320	     rotating the circular queue of list-heads

1322	 3.  if (the saved list head pointer is non-empty)

1324	     foreach entry {

1326	       set t-diff = t-now - t-updated

1328	       set figure-of-merit = figure-of-merit * decay-array-ok [t-diff]
1329	       set t-updated = t-now

1331	       if (figure-of-merit < reuse)

1333	         reuse the route

1335	       else

1337	         re-insert into another list (see Section 4.8.6)

1339	     }

1341	  The value of the zeroth list head would be saved and the array entry
1342	  itself zeroed.  The list heads would then be advanced by incrementing
1343	  the offset.  Starting with the saved head of the old zeroth list, each
1344	  route would be reevaluated and used, disposed of entirely or requeued
1345	  if it were not ready for reuse.  If a route is used, it must be
1346	  treated as if it were a new route advertisement as described in
1347	  Section 4.8.3.

1349	5  Implementation Experience

1351	  The first implementations of ``route flap damping'' were the route
1352	  server daemon (rsd) coding by Ramesh Govindan (ISI) and the Cisco IOS
1353	  implementation by Ravi Chandra.  Both implementations first became
1354	  available in 1995 and have been used extensively.  The rsd
1355	  implementation has been in use in route servers at the NSF funded
1356	  Network Access Points (NAPs) and at other major Internet
1357	  interconnects.  The Cisco IOS version has been in use by Internet
1358	  Service Providers worldwide.  The rsd implementation has been
1359	  integrated in releases of gated (see http://www.gated.org) and is
1360	  available in commercial routers using gated.

1362	  There are now more than 2 years of BGP route damping deployment
1363	  experience.  Some problems have occurred in deployment.  So far these
1364	  are solvable by careful implementation of the algorithm and by careful
1365	  deployment.  In some topologies coordinated deployment can be helpful
1366	  and in all cases disclosure of the use of route damping and the param-
1367	  eters used is highly beneficial in debugging connectivity problems.

1369	  Some of the problems have occurred due to subtle implementation
1370	  errors.  Route damping should never be applied on IBGP learned routes.
1371	  To do so can open the possibility for persistent route loops.
1372	  Implementations should disallow this configuration.  Penalties for
1373	  flapping should only be applied when a route is removed or replaced
1374	  and not when a route is added.  If damping parameters are applied
1375	  consistently, this implementation constraint will result in a stable
1376	  secondary path being preferred over an unstable primary path due to
1377	  damping of the primary path near the source.

1379	  In topologies where multiple AS paths to a given destination exist
1380	  flapping of the primary path can result in suppression of the
1381	  secondary path.  This can occur if no damping is being done near the
1382	  cause of the route flap or if damping is being applied more
1383	  aggressively by a distant AS. This problem can be solved in one of two
1384	  ways.  Damping can be done near the source of the route flap and the
1385	  damping parameters can be made consistent.  Alternately, a distant AS
1386	  which insists on more aggressive damping parameters can disable
1387	  penalizing routes on AS path change, penalizing routes only if they
1388	  are withdrawn completely.  In order to do so, the implementation must
1389	  support this option (as described in Section 4.4.3).

1391	  Route flap should be damped near the source.  Single homed
1392	  destinations can be covered by static routes.  Aggregation provides
1393	  another means of damping.  Providers should damp their own internal
1394	  problems, however damping on IGP link state origination is not yet
1395	  implemented by router vendors.  Providers which use multiple AS within
1396	  their own topology should damp between their own AS. Providers should
1397	  damp adjacent providers AS.

1399	  Damping provides a means to limit propagation excessive route change
1400	  when connectivity is highly intermittent.  Once a problem is
1401	  corrected, select damping state can be manually cleared.  In order to
1402	  determine where damping may have occurred after connectivity problems,
1403	  providers should publish their damping parameters.  Providers should
1404	  be willing to manually clear damping on specific prefixes or AS paths
1405	  at the request of other providers when the request is accompanied by
1406	  assurance that the problem has truly been addressed.

1408	  By damping their own routing information, providers can reduce their
1409	  own need to make requests of other providers to clear damping state
1410	  after correcting a problem.  Providers should be pro-active and
1411	  monitor what prefixes and paths are suppressed in addition to
1412	  monitoring link states and BGP session state.

1414	Acknowledgements

1416	  This work and this document may not have been completed without the
1417	  advise, comments and encouragement of Yakov Rekhter (Cisco).  Dennis
1418	  Ferguson (MCI) provided a description of the algorithms in the gated
1419	  BGP implementation and many valuable comments and insights.  David
1420	  Bolen (ANS) and Jordan Becker (ANS) provided valuable comments,
1421	  particularly regarding early simulations.  Over four years elapsed
1422	  between the initial draft presented to the BGP WG (October 1993) and
1423	  this iteration.  At the time of this writing there is significant
1424	  experience with two implementations, each having been deployed since
1425	  1995.  One was led by Ramesh Govindan (ISI) for the NSF Routing Ar-
1426	  biter project.  The second was led by Ravi Chandra (Cisco).  Sean Doran
1427	  (Sprintlink) and Serpil Bayraktar (ANS) were among the early independent
1428	  testers of the Cisco pre-beta implementation.  Valuable comments and im-
1429	  plementation feedback were shared by many individuals on the IETF IDR WG
1430	  and the RIPE Routing Work Group and in NANOG and IEPG.

1432	  Thanks also to Rob Coltun (Fore Systems), Sanjay Wadhwa (Fore), John
1433	  Scudder (IENG), Eric Bennet (IENG) and Jayesh Bhatt (Bay Networks) for
1434	  pointing out errors in the math uncovered during coding of more recent
1435	  implementations.  These errors appeared in the details of the
1436	  implementation suggestion sections written after the first two
1437	  implementations were completed.

1439	References

1441	  [1]  P. Gross and Y. Rekhter. Application of the border gateway proto-
1442	       col in
1443	       the internet. Request for Comments (Draft Standard) RFC 1268, In-
1444	       ternet Engineering Task Force, October 1991. (Obsoletes RFC1164);
1445	       (Obsoleted by RFC1655). ftp://ds.internic.net/rfc/rfc1268.txt.

1447	  [2]  ISO/IEC.  Iso/iec 10747 - information technology - telecommunica-
1448	       tions and information exchange between systems - protocol for
1449	       exchange of inter-domain routeing information among intermediate
1450	       systems to support forwarding of iso
1451	       8473 pdus. Technical report, International Organization for Stan-
1452	       dardization, August 1994. ftp://merit.edu/pub/iso/idrp.ps.gz.

1454	  [3]  K. Lougheed and Y. Rekhter.  A border gateway protocol 3 (BGP-3).
1455	       Request for Comments (Draft Standard) RFC 1267, In-
1456	       ternet Engineering Task Force, October 1991. (Obsoletes RFC1163).
1457	       ftp://ds.internic.net/rfc/rfc1267.txt.
1458	  [4]  Y. Rekhter and P. Gross. Application of the border gateway proto-
1459	       col in the internet.        Request for Comments (Draft Standard)
1460	       RFC 1772, Internet Engineering Task Force, March 1995. (Obsoletes
1461	       RFC1655). ftp://ds.internic.net/rfc/rfc1772.txt.

1463	  [5]  Y. Rekhter and T. Li.                                    A border
1464	       gateway protocol 4 (BGP-4). Request for Comments (Draft Standard)
1465	       RFC 1771, Internet Engineering Task Force, March 1995. (Obsoletes
1466	       RFC1654). ftp://ds.internic.net/rfc/rfc1771.txt.

1468	  [6]  Y. Rekhter and C. Topolcic. Exchanging routing information across
1469	       provider boundaries in the CIDR environment. Request for Comments
1470	       (Informational) RFC 1520, Internet Engineering Task Force,
1471	       September 1993. ftp://ds.internic.net/rfc/rfc1520.txt.

1473	  [7]  P. Traina. BGP-4 protocol analysis.  Request for Comments (Infor-
1474	       mational) RFC 1774, Internet Engineering Task Force, March 1995.
1475	       ftp://ds.internic.net/rfc/rfc1774.txt.

1477	  [8]  P. Traina.  Experience with the BGP-4 protocol.  Request for Com-
1478	       ments (Informational) RFC 1773,
1479	       Internet Engineering Task Force, March 1995. (Obsoletes RFC1656).
1480	       ftp://ds.internic.net/rfc/rfc1773.txt.

1482	Security Considerations

1484	  The practices outlined in this document do not further weaken the
1485	  security of the routing protocols.  Denial of service is possible in
1486	  an already insecure routing environment but these practices only
1487	  contribute to the persistence of such attacks and do not impact the
1488	  methods of prevention and the methods of determining the source.

1490	Author's Addresses

1492	  Curtis Villamizar
1493	  ANS Communications
1494	  <curtis@ans.net>

1496	  Ravi Chandra
1497	  Cisco Systems
1498	  <rchandra@cisco.com>

1500	  Ramesh Govindan
1501	  ISI
1502	  <govindan@isi.edu>