idnits 2.17.1 draft-ietf-rtgwg-backoff-algo-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (May 4, 2015) is 3252 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Decraene 3 Internet-Draft Orange 4 Intended status: Standards Track May 4, 2015 5 Expires: November 5, 2015 7 Back-off SPF algorithm for link state IGP 8 draft-ietf-rtgwg-backoff-algo-00 10 Abstract 12 This document defines a standard algorithm to back-off link-state IGP 13 SPF computations. 15 Having one standardized algorithm improves interoperability by 16 reducing the probability and/or duration of transient forwarding 17 loops during the IGP convergence in the area/level when the network 18 reacts to multiple consecutive events. 20 Requirements Language 22 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 23 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 24 document are to be interpreted as described in [RFC2119]. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on November 5, 2015. 43 Copyright Notice 45 Copyright (c) 2015 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. High level goals . . . . . . . . . . . . . . . . . . . . . . 3 62 3. Definitions and parameters . . . . . . . . . . . . . . . . . 3 63 4. Principle of SPF delay algorithm . . . . . . . . . . . . . . 4 64 5. Specification of SPF delay algorithm . . . . . . . . . . . . 4 65 6. Impact on micro-loops . . . . . . . . . . . . . . . . . . . . 5 66 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 67 8. Security considerations . . . . . . . . . . . . . . . . . . . 5 68 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 69 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 70 10.1. Normative References . . . . . . . . . . . . . . . . . . 6 71 10.2. Informative References . . . . . . . . . . . . . . . . . 6 72 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 6 74 1. Introduction 76 Link state IGP, such as IS-IS [ISO10589-Second-Edition] and OSPF 77 [RFC2328], performs distributed computation on all nodes of the area/ 78 level. In order to have consistent routing tables across the 79 network, such distributed computation requires that all routers have 80 the same vision of the network (Link State DataBase (LSDB)) and 81 perform their computation at the same time. 83 In general, when the network is stable, there is a desire to compute 84 the new SPF as soon as the failure is known, in order to quickly 85 route around the failure. However, when the network is experiencing 86 multiple consecutive failures over a short period of time, there is a 87 desire to limit the frequency of SPF computations. Indeed, this 88 allow reducing the control plane resources used by IGP and all 89 protocols/sub system reacting on it such as LDP, RSVP-TE, BGP, Fast 90 ReRoute computations, FIB updates..., reduce the churn on nodes and 91 in the network, in particular reduce side effects such as micro-loops 92 which may happen during each IGP convergence. 94 To allow for this, some back-off algorithm have been implemented. 95 Different implementations choose different algorithms, hence in a 96 multi-vendor network, it's not possible to enforce that all routers 97 triggers their SPF computation after the same waiting delay. This 98 situation increases the average differential delay between routers 99 end of RIB computation. It also increases the probability that 100 different routers compute their RIB based on a different LSDB. Both 101 increases the probability and/or duration of micro-loops. 103 To allow for multi-vendors networks having all the routers delaying 104 their SPF for the same duration, this document specifies a 105 standardized algorithm. Implementations may offer alternative 106 optional algorithms. 108 2. High level goals 110 The high level goals of this algorithm are the following: 112 o Very fast convergence for single simple events (link failure). 114 o Fast convergence in general while the IGP stability is considered 115 under control. 117 o A long delay when the IGP stability is considered out of control, 118 in order to let all related process calm down. 120 o At any time, try to avoid using different SPF_TIMERS values for 121 nodes in the area/level. Even though not all nodes will receive 122 IGP message at the same time (due to difference in distance from 123 the source and due to different flooding implementations on the 124 path from the source). 126 3. Definitions and parameters 128 IGP events: An LSDB change requiring a new RIB computation (topology 129 change, prefix change, metric change). No distinction is done 130 between the type of computation performed (e.g. full SPF, incremental 131 SPF, PRC). The type of computation is a local consideration. 133 The SPF_DELAY timer can take the following values: 135 INITIAL_WAIT: a very small delay to quickly handle link failure. 136 e.g. 0 millisecond. 138 FAST_WAIT: a small delay to have a fast convergence. e.g. 50-100 139 millisecond. Note: we want to be fast, but as this failure requires 140 multiple IGP events, being too fast increase the probability to 141 receive additional IGP events just after the RIB computation. 143 LONG_WAIT: a long delay as IGP is unstable. e.g. 2 seconds. Note: 144 let's bring calm in the IGP. 146 The TIME_TO_CONVERGE timer is the time to learn all the IGP events 147 related to a single failure (e.g. node failure, SRLG failure). e.g. 1 148 second. It's mostly dependent on variation of failure detection 149 times between all nodes which are neighbour to the failure, and then 150 may depend on different flooding algorithms of nodes in the network. 152 The HOLD_DOWN timer is the time needed with no IGP events received, 153 before considering that the IGP is quiet again and we can set the 154 SPF_DELAY back to INITAL_WAIT. e.g. 5 seconds. 156 4. Principle of SPF delay algorithm 158 The first IGP event is handled very quickly (INITIAL_WAIT) in order 159 to be very reactive for the first event if it only needs one IGP 160 event (e.g. link failure, prefix change). 162 If more IGP events are received quickly after, we consider that they 163 are related to the same single failure, and handle the IGP events 164 relatively quickly (FAST_WAIT) during the time needed to receive all 165 the IGP events related to the failure (TIME_TO_CONVERGE). 167 If IGP events are still received after this time, then the network is 168 presumably experiencing multiple independent failures and the while 169 waiting for its stability, the computations are delayed for a longer 170 time (LONG_WAIT). 172 Note: previous SPF delay algorithms used to count the number of RIB 173 computations. However, as all nodes may receive the LSP events in a 174 different way we cannot assume that all nodes will perform the same 175 number of SPF computations or that they will schedule them at the 176 same time. For example, assuming that the SPF delay is 50 ms, node 177 R1 may receive 3 IGP events (E1, E2, E3) in those 50 ms and hence 178 will perform a single routing computation. While another node R2 may 179 only receive 2 events (E1, E2) in those 50ms and hence will schedule 180 another routing computation when further receiving E3. That's why 181 this document prefers to define a time limit (TIME_TO_CONVERGE) since 182 the first event, rather than a number of routing computations. 184 5. Specification of SPF delay algorithm 186 When the previous IGP events is more than HOLD_DOWN ago: 188 o The IGP is set to the QUIET state. 190 When the IGP is in the QUIET state and an IGP event is received: 192 o The time of this first IGP event is stored in FIRST_EVENT_TIME. 194 o The next RIB computation time is set to LSP receive time + 195 INITIAL_WAIT. 197 o The IGP is set to the FAST_WAIT state. 199 When the IGP is in the FAST_WAIT state and an IGP event is received: 201 o If more than TIME_TO_CONVERGE has passed since FIRST_EVENT_TIME, 202 then the IGP is set to the HOLD_DOWN state. 204 o If the next RIB_computation time is in the past, set the next RIB 205 computation time to LSP receive time + FAST_WAIT. 207 When the IGP is in the HOLD_DOWN state and an IGP event is received: 209 o If the next RIB_computation time is in the past, set the next RIB 210 computation time to LSP receive time + LONG_WAIT. 212 6. Impact on micro-loops 214 Micro-loops during IGP convergence are due to a non synchronized or 215 non ordered update of the forwarding information tables (FIB) 216 [RFC5715] [RFC6976] [I-D.litkowski-rtgwg-spf-uloop-pb-statement]. 217 FIB are installed after multiple steps such as SPF wait time, SPF 218 computation, FIB distribution and FIB update. This document only 219 address the first contribution. This standardized procedure reduces 220 the probability and/or duration of micro-loops when the IGP 221 experience multiple consecutive events. It does not remove all 222 micro-loops. However, it is beneficial and its cost seems limited 223 compared to full solutions such as [RFC5715] or [RFC6976]. 225 7. IANA Considerations 227 No IANA actions required. 229 8. Security considerations 231 This document has no impact on the security of the IGP. 233 9. Acknowledgements 235 We would like to acknowledge Hannes Gredler, Les Ginsberg and Pierre 236 Francois for the discussions related to this document. 238 10. References 240 10.1. Normative References 242 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 243 Requirement Levels", BCP 14, RFC 2119, March 1997. 245 10.2. Informative References 247 [I-D.litkowski-rtgwg-spf-uloop-pb-statement] 248 Litkowski, S., "Link State protocols SPF trigger and delay 249 algorithm impact on IGP microloops", draft-litkowski- 250 rtgwg-spf-uloop-pb-statement-02 (work in progress), March 251 2015. 253 [ISO10589-Second-Edition] 254 International Organization for Standardization, 255 "Intermediate system to Intermediate system intra-domain 256 routeing information exchange protocol for use in 257 conjunction with the protocol for providing the 258 connectionless-mode Network Service (ISO 8473)", ISO/IEC 259 10589:2002, Second Edition, Nov 2002. 261 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 263 [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free 264 Convergence", RFC 5715, January 2010. 266 [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., 267 Francois, P., and O. Bonaventure, "Framework for Loop-Free 268 Convergence Using the Ordered Forwarding Information Base 269 (oFIB) Approach", RFC 6976, July 2013. 271 Author's Address 273 Bruno Decraene 274 Orange 275 38 rue du General Leclerc 276 Issy Moulineaux cedex 9 92794 277 France 279 Email: bruno.decraene@orange.com