idnits 2.17.1 draft-ietf-rtgwg-backoff-algo-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 17, 2015) is 3050 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-10) exists of draft-ietf-rtgwg-spf-uloop-pb-statement-02 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Decraene 3 Internet-Draft Orange 4 Intended status: Standards Track S. Litkowski 5 Expires: June 19, 2016 Orange Business Service 6 H. Gredler 7 Private Contributor 8 A. Lindem 9 P. Francois 10 Cisco Systems 11 December 17, 2015 13 SPF Back-off algorithm for link state IGPs 14 draft-ietf-rtgwg-backoff-algo-02 16 Abstract 18 This document defines a standard algorithm to back-off link-state IGP 19 SPF computations. 21 Having one standard algorithm improves interoperability by reducing 22 the probability and/or duration of transient forwarding loops during 23 the IGP convergence when the IGP reacts to multiple proximate IGP 24 events. 26 Requirements Language 28 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 29 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 30 document are to be interpreted as described in [RFC2119]. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at http://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on June 19, 2016. 49 Copyright Notice 51 Copyright (c) 2015 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 67 2. High level goals . . . . . . . . . . . . . . . . . . . . . . 3 68 3. Definitions and parameters . . . . . . . . . . . . . . . . . 3 69 4. Principles of SPF delay algorithm . . . . . . . . . . . . . . 4 70 5. Specification of the SPF delay algorithm . . . . . . . . . . 5 71 6. Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 6 72 7. Impact on micro-loops . . . . . . . . . . . . . . . . . . . . 6 73 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 74 9. Security considerations . . . . . . . . . . . . . . . . . . . 7 75 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 76 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 77 11.1. Normative References . . . . . . . . . . . . . . . . . . 7 78 11.2. Informative References . . . . . . . . . . . . . . . . . 7 79 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 81 1. Introduction 83 Link state IGPs, such as IS-IS [ISO10589-Second-Edition] and OSPF 84 [RFC2328], perform distributed route computation on all routers of 85 the area/level. In order to have consistent routing tables across 86 the network, such distributed computation requires that all routers 87 have the same version of the network topology (Link State DataBase 88 (LSDB)) and perform their computation at the same time. 90 In general, when the network is stable, there is a desire to compute 91 the new SPF as soon as the failure is detected in order to quickly 92 route around the failure. However, when the network is experiencing 93 multiple proximate failures over a short period of time, there is a 94 conflicting desire to limit the frequency of SPF computations. 95 Indeed, this allows a reduction in control plane resources used by 96 IGPs and all protocols/subsystem reacting on the attendant route 97 change, such as LDP, RSVP-TE, BGP, Fast ReRoute computations, FIB 98 updates... This will reduce the churn on routers and in the network 99 and, in particular, reduce the side effects such as micro-loops that 100 ensue during IGP convergence. 102 To allow for this, IGPs implement a SPF back-off algorithm. 103 Different implementations choose different algorithms. Hence, in a 104 multi-vendor network, it's not possible to ensure that all routers 105 trigger their SPF computation after the same delay. This situation 106 increases the average differential delay between routers completing 107 their SPF computation. It also increases the probability that 108 different routers compute their FIBs based on a different LSDB 109 versions. Both factors increase the probability and/or duration of 110 micro-loops. 112 To allow multi-vendor networks to have all routers delay their SPF 113 computations for the same duration, this document specifies a 114 standard algorithm. Optionally, implementations may offer 115 alternative algorithms. 117 2. High level goals 119 The high level goals of this algorithm are the following: 121 o Very fast convergence for a single event (e.g., link failure). 123 o Slightly paced fast convergence for multiple proximate IGP events 124 while IGP stability is considered acceptable. 126 o Delayed convergence when the IGP stability is problematic. This 127 will allow the IGP and related processes to conserve resources 128 during the period of instability. 130 o Always try to avoid different SPF_DELAY timers values across 131 different routers in the area/level. Even though not all routers 132 will receive IGP messages at the same time (due to differences 133 both in the distance from the originator of the IGP event and in 134 flooding implementations). 136 3. Definitions and parameters 138 IGP events: An IGP LSDB change requiring a new routing table 139 computation. Examples are a topology change, a prefix change, a 140 metric change on link or prefix... 142 Routing table computation: computation of the routing table, by the 143 IGP, using the IGP LSDB. No distinction is made between the type of 144 computation performed. e.g., full SPF, incremental SPF, Partial Route 145 Computation (PRC). The type of computation is a local consideration. 146 This document may indifferently use the terms routing table 147 computation or SPF computation. 149 The SPF_DELAY is the delay introduced between the IGP event and the 150 start of the routing table computation. It can take the following 151 values: 153 INITIAL_WAIT: a very small delay to quickly handle link failure, 154 e.g., 0 milliseconds. 156 FAST_WAIT: a small delay to have a fast convergence in case of 157 single component failure (node, SRLG..), e.g., 50-100 milliseconds. 158 Note: we want to be fast, but as this failure results in multiple 159 IGP events, being too fast increases the probability to receive 160 additional network events immediately after the SPF computation. 162 LONG_WAIT: a long delay when the IGP is unstable, e.g., 2 seconds. 163 Note: Allow the IGP network to stabilize. 165 The TIME_TO_LEARN timer is the maximum duration typically needed to 166 learn all the IGP events related to a single component failure (e.g., 167 router failure, SRLG failure), e.g., 1 second. It's mostly dependent 168 on variation of failure detection times between all routers that are 169 adjacent to the failure. Additionally, it may depend on the 170 different flooding implementations for routers in the network. 172 The HOLD_DOWN timer is the time needed with no received IGP events 173 before considering the IGP to be stable again, allowing the SPF_DELAY 174 to be restored to INITIAL_WAIT. e.g., 3 seconds. 176 4. Principles of SPF delay algorithm 178 For this first IGP event, we assume that there has been a single 179 simple change in the network which can be taken into account using a 180 single routing computation (e.g., link failure, prefix (metric) 181 change) and we optimize for very fast convergence, delaying the 182 routing computation by INITIAL_WAIT. Under this assumption, there is 183 no benefit in delaying the routing computation. In a typical 184 network, this is the most common type of IGP event. Hence, it makes 185 sense to optimize this case. 187 If subsequent IGP events are received in a short period of time 188 (TIME_TO_LEARN), we then assume that a single component failed, but 189 that this failure requires the knowledge of multiple IGP events in 190 order for the IGP routing to converge. Under this assumption, we 191 want fast convergence since this is a normal network situation. 192 However, there is a benefit in waiting for all IGP events related to 193 this single component failure so that the IGP can compute the post- 194 failure routing table in a single route computation. In this 195 situation, we delay the routing computation by FAST_WAIT. 197 If IGP events are still received after TIME_TO_LEARN seconds from the 198 initial IGP event, then the network is presumably experiencing 199 multiple independent failures and while waiting for network 200 stability, the computations are delayed for a longer time represented 201 by LONG_WAIT. This SPF_delay is kept until no IGP events are 202 received for HOLD_DOWN seconds. 204 Note: previous SPF delay algorithms used to count the number of SPF 205 computations. However, as all routers may receive the IGP events at 206 different times, we cannot assume that all routers will perform the 207 same number of SPF computations or that they will schedule them at 208 the same time. For example, assuming that the SPF delay is 50 ms, 209 router R1 may receive 3 IGP events (E1, E2, E3) in those 50 ms and 210 hence will perform a single routing computation. While another 211 router R2 may only receive 2 events (E1, E2) in those 50 ms and hence 212 will schedule another routing computation when receiving E3. That's 213 why this document defines a time (TIME_TO_LEARN) from the initial 214 event detection/reception as opposed to defining the number of SPF 215 computations to determine when the IGP is unstable. 217 5. Specification of the SPF delay algorithm 219 When no IGP events have occurred during the HOLD_DOWN interval: 221 o The IGP is set to the QUIET state. 223 When the IGP is in the QUIET state and an IGP event is received: 225 o The time of this first IGP event is stored in FIRST_EVENT_TIME. 227 o The next routing table computation is scheduled at: this IGP event 228 received time + INITIAL_WAIT. 230 o The IGP is set to the FAST_WAIT state. 232 When the IGP is in the FAST_WAIT state and an IGP event is received: 234 o If more than the TIME_TO_LEARN interval has passed since 235 FIRST_EVENT_TIME, then the IGP is set to the HOLD_DOWN state. 237 o If a routing table computation is not already scheduled, one is 238 scheduled at: this IGP event received time + FAST_WAIT. 240 When the IGP is in the HOLD_DOWN state and an IGP event is received: 242 o If a routing table computation is not already scheduled, one is 243 scheduled at: this IGP event received time + LONG_WAIT. 245 6. Parameters 247 All the parameters MUST be configurable. All the delays 248 (INITIAL_WAIT, FAST_WAIT, LONG_WAIT, TIME_TO_LEARN, HOLD_DOWN) SHOULD 249 be configurable at the millisecond granularity. They MUST be 250 configurable at least at the tenth of second granularity. The 251 configurable range for all the parameters SHOULD be at least from 0 252 milliseconds to 60 seconds. 254 This document does not propose default values for the parameters 255 because these values are expected to be context dependent. 256 Implementations are free to propose their own default values. 258 When setting the (default) values, one SHOULD consider the customer's 259 or their applications' requirements, the computational power of the 260 routers, the size of the network, and, in particular, the number of 261 IP prefixes advertised in the IGP, the frequency and number of IGP 262 events, the number of protocols reactions/computations triggered by 263 IGP SPF (e.g., BGP, PCEP, Traffic Engineering CSPF, Fast ReRoute 264 computations). 266 Note that some or all of these factors may change over the life of 267 the network. In case of doubt, it's RECOMMENDED to play it safe and 268 start with safe, i.e., longer timers. 270 For the standard algorithm to be effective in mitigating micro-loops, 271 it is RECOMMENDED that all routers in the IGP domain, or at least all 272 the routers in the same area/level, have exactly the same configured 273 values. 275 7. Impact on micro-loops 277 Micro-loops during IGP convergence are due to a non-synchronized or 278 non-ordered update of the forwarding information tables (FIB) 279 [RFC5715] [RFC6976] [I-D.ietf-rtgwg-spf-uloop-pb-statement]. FIBs 280 are installed after multiple steps such as SPF wait time, SPF 281 computation, FIB distribution, and FIB update. This document only 282 addresses the first contribution. This standardized procedure 283 reduces the probability and/or duration of micro-loops when the IGP 284 experience multiple proximate events. It does not prevent all micro- 285 loops. However, it is beneficial and its cost seems limited compared 286 to full solutions such as [RFC5715] or [RFC6976]. 288 8. IANA Considerations 290 No IANA actions required. 292 9. Security considerations 294 This algorithm presented in this document does not in any way 295 compromise the security of the IGP. In fact, the HOLD_DOWN state may 296 mitigate the effects of Denial-of-Service (DOS) attacks generating 297 many IGP events. 299 10. Acknowledgements 301 We would like to acknowledge Les Ginsberg, Uma Chunduri, and Mike 302 Shand for the discussions and comments related to this document. 304 11. References 306 11.1. Normative References 308 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 309 Requirement Levels", BCP 14, RFC 2119, 310 DOI 10.17487/RFC2119, March 1997, 311 . 313 11.2. Informative References 315 [I-D.ietf-rtgwg-spf-uloop-pb-statement] 316 Litkowski, S., Decraene, B., and M. Horneffer, "Link State 317 protocols SPF trigger and delay algorithm impact on IGP 318 micro-loops", draft-ietf-rtgwg-spf-uloop-pb-statement-02 319 (work in progress), December 2015. 321 [ISO10589-Second-Edition] 322 International Organization for Standardization, 323 "Intermediate system to Intermediate system intra-domain 324 routeing information exchange protocol for use in 325 conjunction with the protocol for providing the 326 connectionless-mode Network Service (ISO 8473)", ISO/ 327 IEC 10589:2002, Second Edition, Nov 2002. 329 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 330 DOI 10.17487/RFC2328, April 1998, 331 . 333 [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free 334 Convergence", RFC 5715, DOI 10.17487/RFC5715, January 335 2010, . 337 [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., 338 Francois, P., and O. Bonaventure, "Framework for Loop-Free 339 Convergence Using the Ordered Forwarding Information Base 340 (oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July 341 2013, . 343 Authors' Addresses 345 Bruno Decraene 346 Orange 347 38 rue du General Leclerc 348 Issy Moulineaux cedex 9 92794 349 France 351 Email: bruno.decraene@orange.com 353 Stephane Litkowski 354 Orange Business Service 356 Email: stephane.litkowski@orange.com 358 Hannes Gredler 359 Private Contributor 361 Email: hannes@gredler.at 363 Acee Lindem 364 Cisco Systems 365 301 Midenhall Way 366 Cary, NC 27513 367 USA 369 Email: acee@cisco.com 371 Pierre Francois 372 Cisco Systems 374 Email: pifranco@cisco.com