| < draft-ietf-rtgwg-backoff-algo-07.txt | draft-ietf-rtgwg-backoff-algo-08.txt > | |||
|---|---|---|---|---|
| Network Working Group B. Decraene | Network Working Group B. Decraene | |||
| Internet-Draft Orange | Internet-Draft Orange | |||
| Intended status: Standards Track S. Litkowski | Intended status: Standards Track S. Litkowski | |||
| Expires: June 13, 2018 Orange Business Service | Expires: August 30, 2018 Orange Business Service | |||
| H. Gredler | H. Gredler | |||
| RtBrick Inc | RtBrick Inc | |||
| A. Lindem | A. Lindem | |||
| Cisco Systems | Cisco Systems | |||
| P. Francois | P. Francois | |||
| C. Bowers | C. Bowers | |||
| Juniper Networks, Inc. | Juniper Networks, Inc. | |||
| December 10, 2017 | February 26, 2018 | |||
| SPF Back-off algorithm for link state IGPs | SPF Back-off algorithm for link state IGPs | |||
| draft-ietf-rtgwg-backoff-algo-07 | draft-ietf-rtgwg-backoff-algo-08 | |||
| Abstract | Abstract | |||
| This document defines a standard algorithm to back-off link-state IGP | This document defines a standard algorithm to temporarily postpone or | |||
| Shortest Path First (SPF) computations. | 'back-off' link-state IGP Shortest Path First (SPF) computations. | |||
| This reduces the computational load and churn on IGP nodes when | ||||
| multiple temporally close network events trigger multiple SPF | ||||
| computations. | ||||
| Having one standard algorithm improves interoperability by reducing | Having one standard algorithm improves interoperability by reducing | |||
| the probability and/or duration of transient forwarding loops during | the probability and/or duration of transient forwarding loops during | |||
| the IGP convergence when the IGP reacts to multiple temporally close | the IGP convergence when the IGP reacts to multiple temporally close | |||
| IGP events. | IGP events. | |||
| Requirements Language | Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| document are to be interpreted as described in [RFC2119]. | "OPTIONAL" in this document are to be interpreted as described in | |||
| [BCP14] [RFC2119] [RFC8174] when, and only when, they appear in all | ||||
| capitals, as shown here. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on June 13, 2018. | ||||
| This Internet-Draft will expire on August 30, 2018. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2017 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. High level goals . . . . . . . . . . . . . . . . . . . . . . 3 | 2. High level goals . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3. Definitions and parameters . . . . . . . . . . . . . . . . . 4 | 3. Definitions and parameters . . . . . . . . . . . . . . . . . 4 | |||
| 4. Principles of SPF delay algorithm . . . . . . . . . . . . . . 5 | 4. Principles of SPF delay algorithm . . . . . . . . . . . . . . 5 | |||
| 5. Specification of the SPF delay state machine . . . . . . . . 5 | 5. Specification of the SPF delay state machine . . . . . . . . 6 | |||
| 5.1. States . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 5.1. State Machine . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 5.2. Timers . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 5.2. State . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 5.3. States Transitions . . . . . . . . . . . . . . . . . . . 6 | 5.3. Timers . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 5.4. FSM Events . . . . . . . . . . . . . . . . . . . . . . . 7 | 5.4. FSM Events . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 6. Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 6. Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 7. Partial Deployment . . . . . . . . . . . . . . . . . . . . . 10 | 7. Partial Deployment . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 8. Impact on micro-loops . . . . . . . . . . . . . . . . . . . . 10 | 8. Impact on micro-loops . . . . . . . . . . . . . . . . . . . . 11 | |||
| 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 10. Security considerations . . . . . . . . . . . . . . . . . . . 11 | 10. Security considerations . . . . . . . . . . . . . . . . . . . 11 | |||
| 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 | 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 12.1. Normative References . . . . . . . . . . . . . . . . . . 11 | 12.1. Normative References . . . . . . . . . . . . . . . . . . 12 | |||
| 12.2. Informative References . . . . . . . . . . . . . . . . . 11 | 12.2. Informative References . . . . . . . . . . . . . . . . . 12 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 1. Introduction | 1. Introduction | |||
| Link state IGPs, such as IS-IS [ISO10589-Second-Edition] and OSPF | Link state IGPs, such as IS-IS [ISO10589-Second-Edition], OSPF | |||
| [RFC2328], perform distributed route computation on all routers in | [RFC2328] and OSPFv3 [RFC5340], perform distributed route computation | |||
| the area/level. In order to have consistent routing tables across | on all routers in the area/level. In order to have consistent | |||
| the network, such distributed computation requires that all routers | routing tables across the network, such distributed computation | |||
| have the same version of the network topology (Link State DataBase | requires that all routers have the same version of the network | |||
| (LSDB)) and perform their computation at the same time. | topology (Link State DataBase (LSDB)) and perform their computation | |||
| essentially at the same time. | ||||
| In general, when the network is stable, there is a desire to compute | In general, when the network is stable, there is a desire to trigger | |||
| a new Shortest Path First (SPF) as soon as a failure is detected in | a new Shortest Path First (SPF) computation as soon as a failure is | |||
| order to quickly route around the failure. However, when the network | detected in order to quickly route around the failure. However, when | |||
| is experiencing multiple temporally close failures over a short | the network is experiencing multiple failures over a short period of | |||
| period of time, there is a conflicting desire to limit the frequency | time, there is a conflicting desire to limit the frequency of SPF | |||
| of SPF computations. Indeed, this allows a reduction in control | computations, which would allow a reduction in control plane | |||
| plane resources used by IGPs and all protocols/subsystems reacting on | resources used by IGPs and all protocols/subsystems reacting on the | |||
| the attendant route change, such as LDP [RFC5036], RSVP-TE [RFC3209], | attendant route change, such as LDP [RFC5036], RSVP-TE [RFC3209], BGP | |||
| BGP [RFC4271], Fast ReRoute computations (e.g. Loop Free Alternates | [RFC4271], Fast ReRoute computations (e.g., Loop Free Alternates | |||
| (LFA) [RFC5286], FIB updates... This also reduces the churn on | (LFA) [RFC5286]), FIB updates, etc. This also reduces network churn | |||
| routers and in the network and, in particular, reduces the side | and, in particular, reduces the side effects such as micro-loops | |||
| effects such as micro-loops [RFC5715] that ensue during IGP | [RFC5715] that ensue during IGP convergence. | |||
| convergence. | ||||
| To allow for this, IGPs implement an SPF back-off algorithm. | To allow for this, IGPs usually implement an SPF back-off algorithm | |||
| However, different implementations have choosen different algorithms. | that postpones or backs-off the SPF computation. However, different | |||
| Hence, in a multi-vendor network, it's not possible to ensure that | implementations have chosen different algorithms. Hence, in a multi- | |||
| all routers trigger their SPF computation after the same delay. This | vendor network, it's not possible to ensure that all routers trigger | |||
| situation increases the average and maximum differential delay | their SPF computation after the same delay. This situation increases | |||
| between routers completing their SPF computation. It also increases | the average and maximum differential delay between routers completing | |||
| the probability that different routers compute their FIBs based on | their SPF computation. It also increases the probability that | |||
| different LSDB versions. Both factors increase the probability and/ | different routers compute their FIBs based on different LSDB | |||
| or duration of micro-loops as discussed in Section 8. | versions. Both factors increase the probability and/or duration of | |||
| micro-loops as discussed in Section 8. | ||||
| To allow multi-vendor networks to have all routers delay their SPF | To allow multi-vendor networks to have all routers delay their SPF | |||
| computations for the same duration, this document specifies a | computations for the same duration, this document specifies a | |||
| standard algorithm. Optionally, implementations may also offer | standard algorithm. Optionally, implementations may also offer | |||
| alternative algorithms. | alternative algorithms. | |||
| 2. High level goals | 2. High level goals | |||
| The high level goals of this algorithm are the following: | The high level goals of this algorithm are the following: | |||
| o Very fast convergence for a single event (e.g., link failure). | o Very fast convergence for a single event (e.g., link failure). | |||
| o Paced fast convergence for multiple temporally close IGP events | o Paced fast convergence for multiple temporally close IGP events | |||
| while IGP stability is considered acceptable. | while IGP stability is considered acceptable. | |||
| o Delayed convergence when IGP stability is problematic. This will | o Delayed convergence when IGP stability is problematic. This will | |||
| allow the IGP and related processes to conserve resources during | allow the IGP and related processes to conserve resources during | |||
| the period of instability. | the period of instability. | |||
| o Always try to avoid different SPF_DELAY timers values across | o Always try to avoid different SPF_DELAY Section 3 timer values | |||
| different routers in the area/level. Even though not all routers | across different routers in the area/level. This requires | |||
| will receive IGP messages at the same time, due to differences | specific consideration as different routers may receive IGP | |||
| messages at different interval or even order, due to differences | ||||
| both in the distance from the originator of the IGP event and in | both in the distance from the originator of the IGP event and in | |||
| flooding implementations. | flooding implementations. | |||
| 3. Definitions and parameters | 3. Definitions and parameters | |||
| IGP events: The reception or origination of an IGP LSDB change | IGP events: The reception or origination of an IGP LSDB change | |||
| requiring a new routing table computation. Examples are a topology | requiring a new routing table computation. Examples are a topology | |||
| change, a prefix change, a metric change on a link or prefix... Note | change, a prefix change and a metric change on a link or prefix. | |||
| that locally triggering a routing table computation is not considered | Note that locally triggering a routing table computation is not | |||
| as an IGP event since other IGP routers are unaware of this | considered as an IGP event since other IGP routers are unaware of | |||
| occurrence. | this occurrence. | |||
| Routing table computation: Computation of the routing table, by the | Routing table computation: Computation of the routing table, by the | |||
| IGP, using the IGP LSDB. No distinction is made between the type of | IGP implementation, using the IGP LSDB. No distinction is made | |||
| computation performed. e.g., full SPF, incremental SPF, Partial Route | between the type of computation performed. e.g., full SPF, | |||
| Computation (PRC). The type of computation is a local consideration. | incremental SPF, Partial Route Computation (PRC). The type of | |||
| This document may interchangeably use the terms routing table | computation is a local consideration. This document may | |||
| computation and SPF computation. | interchangeably use the terms routing table computation and SPF | |||
| computation. | ||||
| SPF_DELAY: The delay between the first IGP event triggering a new | SPF_DELAY: The delay between the first IGP event triggering a new | |||
| routing table computation and the start of that routing table | routing table computation and the start of that routing table | |||
| computation. It can take the following values: | computation. It can take the following values: | |||
| INITIAL_SPF_DELAY: A very small delay to quickly handle link | INITIAL_SPF_DELAY: A very small delay to quickly handle a single | |||
| failure, e.g., 0 milliseconds. | isolated link failure, e.g., 0 milliseconds. | |||
| SHORT_SPF_DELAY: A small delay to have a fast convergence in case of | SHORT_SPF_DELAY: A small delay to provide fast convergence in the | |||
| a single failure (node, SRLG..), e.g., 50-100 milliseconds. | case of a single component failure (node, Shared Risk Link Group | |||
| (SRLG)..) that leads to multiple IGP events, e.g., 50-100 | ||||
| milliseconds. | ||||
| LONG_SPF_DELAY: A long delay when the IGP is unstable, e.g., 2 | LONG_SPF_DELAY: A long delay when the IGP is unstable, e.g., 2 | |||
| seconds. Note that this allows the IGP network to stabilize. | seconds. Note that this allows the IGP network to stabilize. | |||
| TIME_TO_LEARN_INTERVAL: This is the maximum duration typically needed | TIME_TO_LEARN_INTERVAL: This is the maximum duration typically needed | |||
| to learn all the IGP events related to a single component failure | to learn all the IGP events related to a single component failure | |||
| (e.g., router failure, SRLG failure), e.g., 1 second. It's mostly | (e.g., router failure, SRLG failure), e.g., 1 second. It's mostly | |||
| dependent on failure detection time variation between all routers | dependent on failure detection time variation between all routers | |||
| that are adjacent to the failure. Additionally, it may depend on the | that are adjacent to the failure. Additionally, it may depend on the | |||
| different IGP implementations/parameters across the network, related | different IGP implementations/parameters across the network, related | |||
| to origination and flooding of their link state advertisements. | to origination and flooding of their link state advertisements. | |||
| HOLDDOWN_INTERVAL: The time required with no received IGP events | HOLDDOWN_INTERVAL: The time required with no received IGP events | |||
| before considering the IGP to be stable again and allowing the | before considering the IGP to be stable again and allowing the | |||
| SPF_DELAY to be restored to INITIAL_SPF_DELAY. e.g., 3 seconds. The | SPF_DELAY to be restored to INITIAL_SPF_DELAY. e.g. a | |||
| HOLDDOWN_INTERVAL MUST be defaulted or configured to be longer than | HOLDDOWN_INTERVAL of 3 seconds. The HOLDDOWN_INTERVAL MUST be | |||
| the TIME_TO_LEARN_INTERVAL. | defaulted and configured to be longer than the | |||
| TIME_TO_LEARN_INTERVAL. | ||||
| 4. Principles of SPF delay algorithm | 4. Principles of SPF delay algorithm | |||
| For this first IGP event, we assume that there has been a single | For this first IGP event, we assume that there has been a single | |||
| simple change in the network which can be taken into account using a | simple change in the network which can be taken into account using a | |||
| single routing computation (e.g., link failure, prefix (metric) | single routing computation (e.g., link failure, prefix (metric) | |||
| change) and we optimize for very fast convergence, delaying the | change) and we optimize for very fast convergence, delaying the | |||
| routing computation by INITIAL_SPF_DELAY. Under this assumption, | routing computation by INITIAL_SPF_DELAY. Under this assumption, | |||
| there is no benefit in delaying the routing computation. In a | there is no benefit in delaying the routing computation. In a | |||
| typical network, this is the most common type of IGP event. Hence, | typical network, this is the most common type of IGP event. Hence, | |||
| it makes sense to optimize this case. | it makes sense to optimize this case. | |||
| If subsequent IGP events are received in a short period of time | If subsequent IGP events are received in a short period of time | |||
| (TIME_TO_LEARN_INTERVAL), we then assume that a single component | (TIME_TO_LEARN_INTERVAL), we then assume that a single component | |||
| failed, but that this failure requires the knowledge of multiple IGP | failed, but that this failure requires the knowledge of multiple IGP | |||
| events in order for IGP routing to converge. Under this assumption, | events in order for IGP routing to converge. Under this assumption, | |||
| we want fast convergence since this is a normal network situation. | we want fast convergence since this is a normal network situation. | |||
| However, there is a benefit in waiting for all IGP events related to | However, there is a benefit in waiting for all IGP events related to | |||
| this single component failure so that the IGP can compute the post- | this single component failure so that the IGP can compute the post- | |||
| failure routing table in a single route computation. In this | failure routing table in a single additional route computation. In | |||
| situation, we delay the routing computation by SHORT_SPF_DELAY. | this situation, we delay the routing computation by SHORT_SPF_DELAY. | |||
| If IGP events are still received after TIME_TO_LEARN_INTERVAL from | If IGP events are still received after TIME_TO_LEARN_INTERVAL from | |||
| the initial IGP event received in QUIET state, then the network is | the initial IGP event received in QUIET state Figure 1, then the | |||
| presumably experiencing multiple independent failures. In this case, | network is presumably experiencing multiple independent failures. In | |||
| while waiting for network stability, the computations are delayed for | this case, while waiting for network stability, the computations are | |||
| a longer time represented by LONG_SPF_DELAY. This SPF delay is kept | delayed for a longer time represented by LONG_SPF_DELAY. This SPF | |||
| until no IGP events are received for HOLDDOWN_INTERVAL. | delay is kept until no IGP events are received for HOLDDOWN_INTERVAL. | |||
| Note that previously implemented SPF delay algorithms counted the | Note that in order to increase the consistency network wide, the | |||
| number of SPF computations. However, as all routers may receive the | algorithm uses a delay (TIME_TO_LEARN_INTERVAL) from the initial IGP | |||
| IGP events at different times, we cannot assume that all routers will | event, rather than the number of SPF computation performed. Indeed, | |||
| perform the same number of SPF computations or that they will | as all routers may receive the IGP events at different times, we | |||
| schedule them at the same time. For example, assuming that the SPF | cannot assume that all routers will perform the same number of SPF | |||
| delay is 50 ms, router R1 may receive 3 IGP events (E1, E2, E3) in | computations. For example, assuming that the SPF delay is 50 ms, | |||
| those 50 ms and hence will perform a single routing computation. | router R1 may receive 3 IGP events (E1, E2, E3) in those 50 ms and | |||
| While another router R2 may only receive 2 events (E1, E2) in those | hence will perform a single routing computation. While another | |||
| 50 ms and hence will schedule another routing computation when | router R2 may only receive 2 events (E1, E2) in those 50 ms and hence | |||
| receiving E3. That's why this document uses a time | will schedule another routing computation when receiving E3. | |||
| (TIME_TO_LEARN_INTERVAL) from the initial event detection/reception | ||||
| as opposed to counting the number of SPF computations to determine | ||||
| when the IGP is unstable. | ||||
| 5. Specification of the SPF delay state machine | 5. Specification of the SPF delay state machine | |||
| 5.1. States | ||||
| This section describes the state machine. The naming and semantics | This section describes the abstract finite state machine (FSM) | |||
| of each state corresponds directly to the SPF delay used for IGP | intended to control the timing of the execution of SPF calculations | |||
| events received in that state. Three states are defined: | in response to IGP events. | |||
| QUIET: This is the initial state, when no IGP events have occured for | ||||
| at least HOLDDOWN_INTERVAL since the previous routing table | ||||
| computation. The state is meant to handle link failures very | ||||
| quickly. | ||||
| SHORT_WAIT: State entered when an IGP event has been received in | ||||
| QUIET state. This state is meant to handle single component failure | ||||
| requiring multiple IGP events (e.g., node, SRLG). | ||||
| LONG_WAIT: State reached after TIME_TO_LEARN_INTERVAL. In other | ||||
| words, state reached after TIME_TO_LEARN_INTERVAL in state | ||||
| SHORT_WAIT. This state is meant to handle multiple independent | ||||
| component failures during periods of IGP instability. | ||||
| 5.2. Timers | ||||
| SPF_TIMER: The Finite State Machine (FSM) abstract timer that uses | ||||
| the computed SPF delay. Upon expiration, the Route Table Computation | ||||
| (as defined in Section 3) is performed. | ||||
| HOLDDOWN_TIMER: The Finite State Machine (FSM) abstract timer that is | ||||
| (re)started whan an IGP event is received and set to | ||||
| HOLDDOWN_INTERVAL. Upon expiration, the FSM is moved to the QUIET | ||||
| state. | ||||
| LEARN_TIMER: The Finite State Machine (FSM) abstract timer that is | ||||
| started when an IGP event is recevied while the FSM is in the QUIET | ||||
| state. Upon expiration, the FSM is moved to the LONG_WAIT state. | ||||
| 5.3. States Transitions | 5.1. State Machine | |||
| The FSM is initialized to the QUIET state with all three timers | The FSM is initialized to the QUIET state with all three timers | |||
| timers (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER) deactivated. | timers (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER) deactivated. | |||
| The events which may change the FSM states are an IGP event or the | The events which may change the FSM states are an IGP event or the | |||
| expiration of one timer (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER). | expiration of one timer (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER). | |||
| The following diagram briefly describes the state transitions. | The following diagram briefly describes the state transitions. | |||
| +-------------------+ | +-------------------+ | |||
| skipping to change at page 7, line 37 ¶ | skipping to change at page 7, line 37 ¶ | |||
| +-------------------+ | | +-------------------+ | | |||
| +---->| | | | +---->| | | | |||
| | | LONG_WAIT |------------>-------+ | | | LONG_WAIT |------------>-------+ | |||
| +-----| | | +-----| | | |||
| 4: +-------------------+ 5: HOLDDOWN_TIMER | 4: +-------------------+ 5: HOLDDOWN_TIMER | |||
| IGP event expiration | IGP event expiration | |||
| 9: SPF_TIMER expiration | 9: SPF_TIMER expiration | |||
| Figure 1: State Machine | Figure 1: State Machine | |||
| 5.2. State | ||||
| The naming and semantics of each state corresponds directly to the | ||||
| SPF delay used for IGP events received in that state. Three states | ||||
| are defined: | ||||
| QUIET: This is the initial state, when no IGP events have occurred | ||||
| for at least HOLDDOWN_INTERVAL since the previous routing table | ||||
| computation. The state is meant to handle link failures very | ||||
| quickly. | ||||
| SHORT_WAIT: State entered when an IGP event has been received in | ||||
| QUIET state. This state is meant to handle single component failure | ||||
| requiring multiple IGP events (e.g., node, SRLG). | ||||
| LONG_WAIT: State reached after TIME_TO_LEARN_INTERVAL. In other | ||||
| words, state reached after TIME_TO_LEARN_INTERVAL in state | ||||
| SHORT_WAIT. This state is meant to handle multiple independent | ||||
| component failures during periods of IGP instability. | ||||
| 5.3. Timers | ||||
| SPF_TIMER: The FSM abstract timer that uses the computed SPF delay. | ||||
| Upon expiration, the Route Table Computation (as defined in | ||||
| Section 3) is performed. | ||||
| HOLDDOWN_TIMER: The FSM abstract timer that is (re)started whan an | ||||
| IGP event is received and set to HOLDDOWN_INTERVAL. Upon expiration, | ||||
| the FSM is moved to the QUIET state. | ||||
| LEARN_TIMER: The FSM abstract timer that is started when an IGP event | ||||
| is recevied while the FSM is in the QUIET state. Upon expiration, | ||||
| the FSM is moved to the LONG_WAIT state. | ||||
| 5.4. FSM Events | 5.4. FSM Events | |||
| This section describes the events and the actions performed in | This section describes the events and the actions performed in | |||
| response. | response. | |||
| Transition 1: IGP event, while in QUIET_STATE. | Transition 1: IGP event, while in QUIET state. | |||
| Actions on event 1: | Actions on event 1: | |||
| o If SPF_TIMER is not already running, start it with value | o If SPF_TIMER is not already running, start it with value | |||
| INITIAL_SPF_DELAY. | INITIAL_SPF_DELAY. | |||
| o Start LEARN_TIMER with TIME_TO_LEARN_INTERVAL. | o Start LEARN_TIMER with TIME_TO_LEARN_INTERVAL. | |||
| o Start HOLDDOWN_TIMER with HOLDDOWN_INTERVAL. | o Start HOLDDOWN_TIMER with HOLDDOWN_INTERVAL. | |||
| skipping to change at page 9, line 31 ¶ | skipping to change at page 10, line 19 ¶ | |||
| Transition 9: SPF_TIMER expiration, while in LONG_WAIT. | Transition 9: SPF_TIMER expiration, while in LONG_WAIT. | |||
| Actions on event 9: | Actions on event 9: | |||
| o Compute SPF. | o Compute SPF. | |||
| o Remain in current state. | o Remain in current state. | |||
| 6. Parameters | 6. Parameters | |||
| All the parameters MUST be configurable [I-D.ietf-isis-yang-isis-cfg] | All the parameters MUST be configurable at the protocol instance | |||
| [I-D.ietf-ospf-yang] at the protocol instance granularity. They MAY | granularity. They MAY be configurable at the area/level granularity. | |||
| be configurable at the area/level granularity. All the delays | All the delays (INITIAL_SPF_DELAY, SHORT_SPF_DELAY, LONG_SPF_DELAY, | |||
| (INITIAL_SPF_DELAY, SHORT_SPF_DELAY, LONG_SPF_DELAY, | ||||
| TIME_TO_LEARN_INTERVAL, HOLDDOWN_INTERVAL) SHOULD be configurable at | TIME_TO_LEARN_INTERVAL, HOLDDOWN_INTERVAL) SHOULD be configurable at | |||
| the millisecond granularity. They MUST be configurable at least at | the millisecond granularity. They MUST be configurable at least at | |||
| the tenth of second granularity. The configurable range for all the | the tenth of second granularity. The configurable range for all the | |||
| parameters SHOULD at least be from 0 milliseconds to 60 seconds. | parameters SHOULD at least be from 0 milliseconds to 60 seconds. | |||
| This document does not propose default values for the parameters | This document does not propose default values for the parameters | |||
| because these values are expected to be context dependent. | because these values are expected to be context dependent. | |||
| Implementations are free to propose their own default values. | Implementations are free to propose their own default values. | |||
| However the HOLDDOWN_INTERVAL MUST be defaulted or configured to be | ||||
| longer than the TIME_TO_LEARN_INTERVAL. | ||||
| In order to satisfy the goals stated in Section 2, operators are | In order to satisfy the goals stated in Section 2, operators are | |||
| RECOMMENDED to configure delay intervals such that SPF_INITIAL_DELAY | RECOMMENDED to configure delay intervals such that INITIAL_SPF_DELAY | |||
| <= SPF_SHORT_DELAY and SPF_SHORT_DELAY <= SPF_LONG_DELAY. | <= SHORT_SPF_DELAY and SHORT_SPF_DELAY <= LONG_SPF_DELAY. | |||
| When setting (default) values, one SHOULD consider the customers and | When setting (default) values, one should consider the customers and | |||
| their application requirements, the computational power of the | their application requirements, the computational power of the | |||
| routers, the size of the network, and, in particular, the number of | routers, the size of the network, and, in particular, the number of | |||
| IP prefixes advertised in the IGP, the frequency and number of IGP | IP prefixes advertised in the IGP, the frequency and number of IGP | |||
| events, the number of protocols reactions/computations triggered by | events, the number of protocols reactions/computations triggered by | |||
| IGP SPF (e.g., BGP, PCEP, Traffic Engineering CSPF, Fast ReRoute | IGP SPF computation (e.g., BGP, PCEP, Traffic Engineering CSPF, Fast | |||
| computations). | ReRoute computations). | |||
| Note that some or all of these factors may change over the life of | Note that some or all of these factors may change over the life of | |||
| the network. In case of doubt, it's RECOMMENDED to play it safe and | the network. In case of doubt, it's RECOMMENDED that timer intervals | |||
| start with safe, i.e., longer timers. | should be chosen conservatively (i.e., longer timer values). | |||
| For the standard algorithm to be effective in mitigating micro-loops, | For the standard algorithm to be effective in mitigating micro-loops, | |||
| it is RECOMMENDED that all routers in the IGP domain, or at least all | it is RECOMMENDED that all routers in the IGP domain, or at least all | |||
| the routers in the same area/level, have exactly the same configured | the routers in the same area/level, have exactly the same configured | |||
| values. | values. | |||
| 7. Partial Deployment | 7. Partial Deployment | |||
| In general, the SPF delay algorithm is only effective in mitigating | In general, the SPF delay algorithm is only effective in mitigating | |||
| micro-loops if it is deployed, with the same parameters, on all | micro-loops if it is deployed, with the same parameters, on all | |||
| routers, in the IGP domain or, at least, all routers in an IGP area/ | routers in the IGP domain or, at least, all routers in an IGP area/ | |||
| level. The impact of partial deployment is based on the particular | level. The impact of partial deployment is dependent on the | |||
| event, topology, and the SPF algorithm(s) used on other routers in | particular event, topology, and the SPF algorithm(s) used on other | |||
| the IGP area/level. In cases where the previous SPF algorithm was | routers in the IGP area/level. In cases where the previous SPF | |||
| implemented uniformly, partial deployment will increase the frequency | algorithm was implemented uniformly, partial deployment will increase | |||
| and duration of micro-loops. Hence, it is RECOMMENDED that all | the frequency and duration of micro-loops. Hence, it is RECOMMENDED | |||
| routers in the IGP domain or at least within the same area/level be | that all routers in the IGP domain or at least within the same area/ | |||
| migrated to the SPF algorithm described herein at roughly the same | level be migrated to the SPF algorithm described herein at roughly | |||
| time. | the same time. | |||
| Note that this is not a new consideration as over times, network | Note that this is not a new consideration as over times, network | |||
| operators have changed SPF delay parameters in order to accommodate | operators have changed SPF delay parameters in order to accommodate | |||
| new customer requirements for fast convergence, as permitted by new | new customer requirements for fast convergence, as permitted by new | |||
| software and hardware. They may also have progressively replaced an | software and hardware. They may also have progressively replaced an | |||
| implementation with a given SPF delay algorithm by another | implementation with a given SPF delay algorithm by another | |||
| implementation with a different one. | implementation with a different one. | |||
| 8. Impact on micro-loops | 8. Impact on micro-loops | |||
| Micro-loops during IGP convergence are due to a non-synchronized or | Micro-loops during IGP convergence are due to a non-synchronized or | |||
| non-ordered update of the forwarding information tables (FIB) | non-ordered update of the forwarding information tables (FIB) | |||
| [RFC5715] [RFC6976] [I-D.ietf-rtgwg-spf-uloop-pb-statement]. FIBs | [RFC5715] [RFC6976] [I-D.ietf-rtgwg-spf-uloop-pb-statement]. FIBs | |||
| are installed after multiple steps such as flooding of the IGP event | are installed after multiple steps such as flooding of the IGP event | |||
| across the network, SPF wait time, SPF computation, FIB distribution | across the network, SPF wait time, SPF computation, FIB distribution | |||
| across line cards, and FIB update. This document only addresses the | across line cards, and FIB update. This document only addresses the | |||
| first contribution. This standardized procedure reduces the | contribution from the SPF wait time. This standardized procedure | |||
| probability and/or duration of micro-loops when IGPs experience | reduces the probability and/or duration of micro-loops when IGPs | |||
| multiple temporally close events. It does not prevent all micro- | experience multiple temporally close events. It does not prevent all | |||
| loops. However, it is beneficial and is less complex and costly to | micro-loops. However, it is beneficial and is less complex and | |||
| implement when compared to full solutions such as [RFC5715] or | costly to implement when compared to full solutions such as [RFC5715] | |||
| [RFC6976]. | or [RFC6976]. | |||
| 9. IANA Considerations | 9. IANA Considerations | |||
| No IANA actions required. | No IANA actions required. | |||
| 10. Security considerations | 10. Security considerations | |||
| The algorithm presented in this document does not compromise IGP | The algorithm presented in this document does not compromise IGP | |||
| security. An attacker having the ability to generate IGP events | security. An attacker having the ability to generate IGP events | |||
| would be able to delay the IGP convergence time. The LONG_SPF_DELAY | would be able to delay the IGP convergence time. The LONG_SPF_DELAY | |||
| skipping to change at page 11, line 34 ¶ | skipping to change at page 12, line 22 ¶ | |||
| 12. References | 12. References | |||
| 12.1. Normative References | 12.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| 12.2. Informative References | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | ||||
| [I-D.ietf-isis-yang-isis-cfg] | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| Litkowski, S., Yeung, D., Lindem, A., Zhang, Z., and L. | ||||
| Lhotka, "YANG Data Model for IS-IS protocol", draft-ietf- | ||||
| isis-yang-isis-cfg-19 (work in progress), November 2017. | ||||
| [I-D.ietf-ospf-yang] | 12.2. Informative References | |||
| Yeung, D., Qu, Y., Zhang, Z., Chen, I., and A. Lindem, | ||||
| "Yang Data Model for OSPF Protocol", draft-ietf-ospf- | ||||
| yang-09 (work in progress), October 2017. | ||||
| [I-D.ietf-rtgwg-spf-uloop-pb-statement] | [I-D.ietf-rtgwg-spf-uloop-pb-statement] | |||
| Litkowski, S., Decraene, B., and M. Horneffer, "Link State | Litkowski, S., Decraene, B., and M. Horneffer, "Link State | |||
| protocols SPF trigger and delay algorithm impact on IGP | protocols SPF trigger and delay algorithm impact on IGP | |||
| micro-loops", draft-ietf-rtgwg-spf-uloop-pb-statement-05 | micro-loops", draft-ietf-rtgwg-spf-uloop-pb-statement-06 | |||
| (work in progress), December 2017. | (work in progress), January 2018. | |||
| [ISO10589-Second-Edition] | [ISO10589-Second-Edition] | |||
| International Organization for Standardization, | International Organization for Standardization, | |||
| "Intermediate system to Intermediate system intra-domain | "Intermediate system to Intermediate system intra-domain | |||
| routeing information exchange protocol for use in | routeing information exchange protocol for use in | |||
| conjunction with the protocol for providing the | conjunction with the protocol for providing the | |||
| connectionless-mode Network Service (ISO 8473)", ISO/ | connectionless-mode Network Service (ISO 8473)", ISO/ | |||
| IEC 10589:2002, Second Edition, Nov 2002. | IEC 10589:2002, Second Edition, Nov 2002. | |||
| [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, | [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, | |||
| skipping to change at page 12, line 36 ¶ | skipping to change at page 13, line 19 ¶ | |||
| [RFC5036] Andersson, L., Ed., Minei, I., Ed., and B. Thomas, Ed., | [RFC5036] Andersson, L., Ed., Minei, I., Ed., and B. Thomas, Ed., | |||
| "LDP Specification", RFC 5036, DOI 10.17487/RFC5036, | "LDP Specification", RFC 5036, DOI 10.17487/RFC5036, | |||
| October 2007, <https://www.rfc-editor.org/info/rfc5036>. | October 2007, <https://www.rfc-editor.org/info/rfc5036>. | |||
| [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for | [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for | |||
| IP Fast Reroute: Loop-Free Alternates", RFC 5286, | IP Fast Reroute: Loop-Free Alternates", RFC 5286, | |||
| DOI 10.17487/RFC5286, September 2008, | DOI 10.17487/RFC5286, September 2008, | |||
| <https://www.rfc-editor.org/info/rfc5286>. | <https://www.rfc-editor.org/info/rfc5286>. | |||
| [RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF | ||||
| for IPv6", RFC 5340, DOI 10.17487/RFC5340, July 2008, | ||||
| <https://www.rfc-editor.org/info/rfc5340>. | ||||
| [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free | [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free | |||
| Convergence", RFC 5715, DOI 10.17487/RFC5715, January | Convergence", RFC 5715, DOI 10.17487/RFC5715, January | |||
| 2010, <https://www.rfc-editor.org/info/rfc5715>. | 2010, <https://www.rfc-editor.org/info/rfc5715>. | |||
| [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., | [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., | |||
| Francois, P., and O. Bonaventure, "Framework for Loop-Free | Francois, P., and O. Bonaventure, "Framework for Loop-Free | |||
| Convergence Using the Ordered Forwarding Information Base | Convergence Using the Ordered Forwarding Information Base | |||
| (oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July | (oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July | |||
| 2013, <https://www.rfc-editor.org/info/rfc6976>. | 2013, <https://www.rfc-editor.org/info/rfc6976>. | |||
| End of changes. 39 change blocks. | ||||
| 163 lines changed or deleted | 175 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||