| < draft-ietf-rtgwg-backoff-algo-08.txt | draft-ietf-rtgwg-backoff-algo-09.txt > | |||
|---|---|---|---|---|
| Network Working Group B. Decraene | Network Working Group B. Decraene | |||
| Internet-Draft Orange | Internet-Draft Orange | |||
| Intended status: Standards Track S. Litkowski | Intended status: Standards Track S. Litkowski | |||
| Expires: August 30, 2018 Orange Business Service | Expires: August 31, 2018 Orange Business Service | |||
| H. Gredler | H. Gredler | |||
| RtBrick Inc | RtBrick Inc | |||
| A. Lindem | A. Lindem | |||
| Cisco Systems | Cisco Systems | |||
| P. Francois | P. Francois | |||
| C. Bowers | C. Bowers | |||
| Juniper Networks, Inc. | Juniper Networks, Inc. | |||
| February 26, 2018 | February 27, 2018 | |||
| SPF Back-off algorithm for link state IGPs | SPF Back-off Delay algorithm for link state IGPs | |||
| draft-ietf-rtgwg-backoff-algo-08 | draft-ietf-rtgwg-backoff-algo-09 | |||
| Abstract | Abstract | |||
| This document defines a standard algorithm to temporarily postpone or | This document defines a standard algorithm to temporarily postpone or | |||
| 'back-off' link-state IGP Shortest Path First (SPF) computations. | 'back-off' link-state IGP Shortest Path First (SPF) computations. | |||
| This reduces the computational load and churn on IGP nodes when | This reduces the computational load and churn on IGP nodes when | |||
| multiple temporally close network events trigger multiple SPF | multiple temporally close network events trigger multiple SPF | |||
| computations. | computations. | |||
| Having one standard algorithm improves interoperability by reducing | Having one standard algorithm improves interoperability by reducing | |||
| skipping to change at page 2, line 10 ¶ | skipping to change at page 2, line 10 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on August 30, 2018. | This Internet-Draft will expire on August 31, 2018. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 3, line 28 ¶ | skipping to change at page 3, line 28 ¶ | |||
| the network is experiencing multiple failures over a short period of | the network is experiencing multiple failures over a short period of | |||
| time, there is a conflicting desire to limit the frequency of SPF | time, there is a conflicting desire to limit the frequency of SPF | |||
| computations, which would allow a reduction in control plane | computations, which would allow a reduction in control plane | |||
| resources used by IGPs and all protocols/subsystems reacting on the | resources used by IGPs and all protocols/subsystems reacting on the | |||
| attendant route change, such as LDP [RFC5036], RSVP-TE [RFC3209], BGP | attendant route change, such as LDP [RFC5036], RSVP-TE [RFC3209], BGP | |||
| [RFC4271], Fast ReRoute computations (e.g., Loop Free Alternates | [RFC4271], Fast ReRoute computations (e.g., Loop Free Alternates | |||
| (LFA) [RFC5286]), FIB updates, etc. This also reduces network churn | (LFA) [RFC5286]), FIB updates, etc. This also reduces network churn | |||
| and, in particular, reduces the side effects such as micro-loops | and, in particular, reduces the side effects such as micro-loops | |||
| [RFC5715] that ensue during IGP convergence. | [RFC5715] that ensue during IGP convergence. | |||
| To allow for this, IGPs usually implement an SPF back-off algorithm | To allow for this, IGPs usually implement an SPF Back-off Delay | |||
| that postpones or backs-off the SPF computation. However, different | algorithm that postpones or backs-off the SPF computation. However, | |||
| implementations have chosen different algorithms. Hence, in a multi- | different implementations have chosen different algorithms. Hence, | |||
| vendor network, it's not possible to ensure that all routers trigger | in a multi-vendor network, it's not possible to ensure that all | |||
| their SPF computation after the same delay. This situation increases | routers trigger their SPF computation after the same delay. This | |||
| the average and maximum differential delay between routers completing | situation increases the average and maximum differential delay | |||
| their SPF computation. It also increases the probability that | between routers completing their SPF computation. It also increases | |||
| different routers compute their FIBs based on different LSDB | the probability that different routers compute their FIBs based on | |||
| versions. Both factors increase the probability and/or duration of | different LSDB versions. Both factors increase the probability and/ | |||
| micro-loops as discussed in Section 8. | or duration of micro-loops as discussed in Section 8. | |||
| To allow multi-vendor networks to have all routers delay their SPF | To allow multi-vendor networks to have all routers delay their SPF | |||
| computations for the same duration, this document specifies a | computations for the same duration, this document specifies a | |||
| standard algorithm. Optionally, implementations may also offer | standard algorithm. | |||
| alternative algorithms. | ||||
| 2. High level goals | 2. High level goals | |||
| The high level goals of this algorithm are the following: | The high level goals of this algorithm are the following: | |||
| o Very fast convergence for a single event (e.g., link failure). | o Very fast convergence for a single event (e.g., link failure). | |||
| o Paced fast convergence for multiple temporally close IGP events | o Paced fast convergence for multiple temporally close IGP events | |||
| while IGP stability is considered acceptable. | while IGP stability is considered acceptable. | |||
| skipping to change at page 5, line 37 ¶ | skipping to change at page 5, line 37 ¶ | |||
| (TIME_TO_LEARN_INTERVAL), we then assume that a single component | (TIME_TO_LEARN_INTERVAL), we then assume that a single component | |||
| failed, but that this failure requires the knowledge of multiple IGP | failed, but that this failure requires the knowledge of multiple IGP | |||
| events in order for IGP routing to converge. Under this assumption, | events in order for IGP routing to converge. Under this assumption, | |||
| we want fast convergence since this is a normal network situation. | we want fast convergence since this is a normal network situation. | |||
| However, there is a benefit in waiting for all IGP events related to | However, there is a benefit in waiting for all IGP events related to | |||
| this single component failure so that the IGP can compute the post- | this single component failure so that the IGP can compute the post- | |||
| failure routing table in a single additional route computation. In | failure routing table in a single additional route computation. In | |||
| this situation, we delay the routing computation by SHORT_SPF_DELAY. | this situation, we delay the routing computation by SHORT_SPF_DELAY. | |||
| If IGP events are still received after TIME_TO_LEARN_INTERVAL from | If IGP events are still received after TIME_TO_LEARN_INTERVAL from | |||
| the initial IGP event received in QUIET state Figure 1, then the | the initial IGP event received in QUIET state Section 5.1, then the | |||
| network is presumably experiencing multiple independent failures. In | network is presumably experiencing multiple independent failures. In | |||
| this case, while waiting for network stability, the computations are | this case, while waiting for network stability, the computations are | |||
| delayed for a longer time represented by LONG_SPF_DELAY. This SPF | delayed for a longer time represented by LONG_SPF_DELAY. This SPF | |||
| delay is kept until no IGP events are received for HOLDDOWN_INTERVAL. | delay is kept until no IGP events are received for HOLDDOWN_INTERVAL. | |||
| Note that in order to increase the consistency network wide, the | Note that in order to increase the consistency network wide, the | |||
| algorithm uses a delay (TIME_TO_LEARN_INTERVAL) from the initial IGP | algorithm uses a delay (TIME_TO_LEARN_INTERVAL) from the initial IGP | |||
| event, rather than the number of SPF computation performed. Indeed, | event, rather than the number of SPF computation performed. Indeed, | |||
| as all routers may receive the IGP events at different times, we | as all routers may receive the IGP events at different times, we | |||
| cannot assume that all routers will perform the same number of SPF | cannot assume that all routers will perform the same number of SPF | |||
| computations. For example, assuming that the SPF delay is 50 ms, | computations. For example, assuming that the SPF delay is 50 ms, | |||
| router R1 may receive 3 IGP events (E1, E2, E3) in those 50 ms and | router R1 may receive 3 IGP events (E1, E2, E3) in those 50 ms and | |||
| hence will perform a single routing computation. While another | hence will perform a single routing computation. While another | |||
| router R2 may only receive 2 events (E1, E2) in those 50 ms and hence | router R2 may only receive 2 events (E1, E2) in those 50 ms and hence | |||
| will schedule another routing computation when receiving E3. | will schedule another routing computation when receiving E3. | |||
| 5. Specification of the SPF delay state machine | 5. Specification of the SPF delay state machine | |||
| This section describes the abstract finite state machine (FSM) | This section specifies the finite state machine (FSM) intended to | |||
| intended to control the timing of the execution of SPF calculations | control the timing of the execution of SPF calculations in response | |||
| in response to IGP events. | to IGP events. | |||
| 5.1. State Machine | 5.1. State Machine | |||
| The FSM is initialized to the QUIET state with all three timers | The FSM is initialized to the QUIET state with all three timers | |||
| timers (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER) deactivated. | timers (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER) deactivated. | |||
| The events which may change the FSM states are an IGP event or the | The events which may change the FSM states are an IGP event or the | |||
| expiration of one timer (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER). | expiration of one timer (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER). | |||
| The following diagram briefly describes the state transitions. | The following diagram briefly describes the state transitions. | |||
| skipping to change at page 8, line 12 ¶ | skipping to change at page 8, line 12 ¶ | |||
| QUIET state. This state is meant to handle single component failure | QUIET state. This state is meant to handle single component failure | |||
| requiring multiple IGP events (e.g., node, SRLG). | requiring multiple IGP events (e.g., node, SRLG). | |||
| LONG_WAIT: State reached after TIME_TO_LEARN_INTERVAL. In other | LONG_WAIT: State reached after TIME_TO_LEARN_INTERVAL. In other | |||
| words, state reached after TIME_TO_LEARN_INTERVAL in state | words, state reached after TIME_TO_LEARN_INTERVAL in state | |||
| SHORT_WAIT. This state is meant to handle multiple independent | SHORT_WAIT. This state is meant to handle multiple independent | |||
| component failures during periods of IGP instability. | component failures during periods of IGP instability. | |||
| 5.3. Timers | 5.3. Timers | |||
| SPF_TIMER: The FSM abstract timer that uses the computed SPF delay. | SPF_TIMER: The FSM timer that uses the computed SPF delay. Upon | |||
| Upon expiration, the Route Table Computation (as defined in | expiration, the Route Table Computation (as defined in Section 3) is | |||
| Section 3) is performed. | performed. | |||
| HOLDDOWN_TIMER: The FSM abstract timer that is (re)started whan an | HOLDDOWN_TIMER: The FSM timer that is (re)started whan an IGP event | |||
| IGP event is received and set to HOLDDOWN_INTERVAL. Upon expiration, | is received and set to HOLDDOWN_INTERVAL. Upon expiration, the FSM | |||
| the FSM is moved to the QUIET state. | is moved to the QUIET state. | |||
| LEARN_TIMER: The FSM abstract timer that is started when an IGP event | LEARN_TIMER: The FSM timer that is started when an IGP event is | |||
| is recevied while the FSM is in the QUIET state. Upon expiration, | recevied while the FSM is in the QUIET state. Upon expiration, the | |||
| the FSM is moved to the LONG_WAIT state. | FSM is moved to the LONG_WAIT state. | |||
| 5.4. FSM Events | 5.4. FSM Events | |||
| This section describes the events and the actions performed in | This section describes the events and the actions performed in | |||
| response. | response. | |||
| Transition 1: IGP event, while in QUIET state. | Transition 1: IGP event, while in QUIET state. | |||
| Actions on event 1: | Actions on event 1: | |||
| skipping to change at page 11, line 9 ¶ | skipping to change at page 11, line 9 ¶ | |||
| the network. In case of doubt, it's RECOMMENDED that timer intervals | the network. In case of doubt, it's RECOMMENDED that timer intervals | |||
| should be chosen conservatively (i.e., longer timer values). | should be chosen conservatively (i.e., longer timer values). | |||
| For the standard algorithm to be effective in mitigating micro-loops, | For the standard algorithm to be effective in mitigating micro-loops, | |||
| it is RECOMMENDED that all routers in the IGP domain, or at least all | it is RECOMMENDED that all routers in the IGP domain, or at least all | |||
| the routers in the same area/level, have exactly the same configured | the routers in the same area/level, have exactly the same configured | |||
| values. | values. | |||
| 7. Partial Deployment | 7. Partial Deployment | |||
| In general, the SPF delay algorithm is only effective in mitigating | In general, the SPF Back-off Delay algorithm is only effective in | |||
| micro-loops if it is deployed, with the same parameters, on all | mitigating micro-loops if it is deployed, with the same parameters, | |||
| routers in the IGP domain or, at least, all routers in an IGP area/ | on all routers in the IGP domain or, at least, all routers in an IGP | |||
| level. The impact of partial deployment is dependent on the | area/level. The impact of partial deployment is dependent on the | |||
| particular event, topology, and the SPF algorithm(s) used on other | particular event, topology, and the algorithm(s) used on other | |||
| routers in the IGP area/level. In cases where the previous SPF | routers in the IGP area/level. In cases where the previous SPF Back- | |||
| algorithm was implemented uniformly, partial deployment will increase | off Delay algorithm was implemented uniformly, partial deployment | |||
| the frequency and duration of micro-loops. Hence, it is RECOMMENDED | will increase the frequency and duration of micro-loops. Hence, it | |||
| that all routers in the IGP domain or at least within the same area/ | is RECOMMENDED that all routers in the IGP domain or at least within | |||
| level be migrated to the SPF algorithm described herein at roughly | the same area/level be migrated to the SPF algorithm described herein | |||
| the same time. | at roughly the same time. | |||
| Note that this is not a new consideration as over times, network | Note that this is not a new consideration as over times, network | |||
| operators have changed SPF delay parameters in order to accommodate | operators have changed SPF delay parameters in order to accommodate | |||
| new customer requirements for fast convergence, as permitted by new | new customer requirements for fast convergence, as permitted by new | |||
| software and hardware. They may also have progressively replaced an | software and hardware. They may also have progressively replaced an | |||
| implementation with a given SPF delay algorithm by another | implementation with a given SPF Back-off Delay algorithm by another | |||
| implementation with a different one. | implementation with a different one. | |||
| 8. Impact on micro-loops | 8. Impact on micro-loops | |||
| Micro-loops during IGP convergence are due to a non-synchronized or | Micro-loops during IGP convergence are due to a non-synchronized or | |||
| non-ordered update of the forwarding information tables (FIB) | non-ordered update of the forwarding information tables (FIB) | |||
| [RFC5715] [RFC6976] [I-D.ietf-rtgwg-spf-uloop-pb-statement]. FIBs | [RFC5715] [RFC6976] [I-D.ietf-rtgwg-spf-uloop-pb-statement]. FIBs | |||
| are installed after multiple steps such as flooding of the IGP event | are installed after multiple steps such as flooding of the IGP event | |||
| across the network, SPF wait time, SPF computation, FIB distribution | across the network, SPF wait time, SPF computation, FIB distribution | |||
| across line cards, and FIB update. This document only addresses the | across line cards, and FIB update. This document only addresses the | |||
| End of changes. 13 change blocks. | ||||
| 42 lines changed or deleted | 41 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||