idnits 2.17.1 draft-ietf-rtgwg-backoff-algo-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 19, 2015) is 3231 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-10) exists of draft-ietf-rtgwg-spf-uloop-pb-statement-00 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Decraene 3 Internet-Draft Orange 4 Intended status: Standards Track S. Litkowski 5 Expires: December 21, 2015 Orange Business Service 6 H. Gredler 7 Juniper Networks, Inc. 8 A. Lindem 9 Cisco Systems 10 P. Francois 11 IMDEA Networks Institute 12 June 19, 2015 14 SPF Back-off algorithm for link state IGPs 15 draft-ietf-rtgwg-backoff-algo-01 17 Abstract 19 This document defines a standard algorithm to back-off link-state IGP 20 SPF computations. 22 Having one standard algorithm improves interoperability by reducing 23 the probability and/or duration of transient forwarding loops during 24 the IGP convergence when the IGP reacts to multiple proximate IGP 25 events. 27 Requirements Language 29 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 30 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 31 document are to be interpreted as described in [RFC2119]. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on December 21, 2015. 50 Copyright Notice 52 Copyright (c) 2015 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 68 2. High level goals . . . . . . . . . . . . . . . . . . . . . . 3 69 3. Definitions and parameters . . . . . . . . . . . . . . . . . 3 70 4. Principles of SPF delay algorithm . . . . . . . . . . . . . . 4 71 5. Specification of the SPF delay algorithm . . . . . . . . . . 5 72 6. Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 6 73 7. Impact on micro-loops . . . . . . . . . . . . . . . . . . . . 6 74 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 75 9. Security considerations . . . . . . . . . . . . . . . . . . . 7 76 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 77 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 78 11.1. Normative References . . . . . . . . . . . . . . . . . . 7 79 11.2. Informative References . . . . . . . . . . . . . . . . . 7 80 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 82 1. Introduction 84 Link state IGPs, such as IS-IS [ISO10589-Second-Edition] and OSPF 85 [RFC2328], perform distributed route computation on all routers of 86 the area/level. In order to have consistent routing tables across 87 the network, such distributed computation requires that all routers 88 have the same version of the network topology (Link State DataBase 89 (LSDB)) and perform their computation at the same time. 91 In general, when the network is stable, there is a desire to compute 92 the new SPF as soon as the failure is detected in order to quickly 93 route around the failure. However, when the network is experiencing 94 multiple proximate failures over a short period of time, there is a 95 conflicting desire to limit the frequency of SPF computations. 96 Indeed, this allows a reduction in control plane resources used by 97 IGPs and all protocols/subsystem reacting on the attendant route 98 change, such as LDP, RSVP-TE, BGP, Fast ReRoute computations, FIB 99 updates... This will reduce the churn on routers and in the network 100 and, in particular, reduce the side effects such as micro-loops that 101 ensue during IGP convergence. 103 To allow for this, IGPs implement a SPF back-off algorithm. 104 Different implementations choose different algorithms. Hence, in a 105 multi-vendor network, it's not possible to ensure that all routers 106 trigger their SPF computation after the same delay. This situation 107 increases the average differential delay between routers completing 108 their SPF computation. It also increases the probability that 109 different routers compute their FIBs based on a different LSDB 110 versions. Both factors increase the probability and/or duration of 111 micro-loops. 113 To allow multi-vendor networks to have all routers delay their SPF 114 computations for the same duration, this document specifies a 115 standard algorithm. Optionally, implementations may offer 116 alternative algorithms. 118 2. High level goals 120 The high level goals of this algorithm are the following: 122 o Very fast convergence for a single event (e.g., link failure). 124 o Slightly paced fast convergence for multiple proximate IGP events 125 while IGP stability is considered acceptable. 127 o Delayed convergence when the IGP stability is problematic. This 128 will allow the IGP and related processes to conserve resources 129 during the period of instability. 131 o Always try to avoid different SPF_DELAY timers values across 132 different routers in the area/level. Even though not all routers 133 will receive IGP messages at the same time (due to differences 134 both in the distance from the originator of the IGP event and in 135 flooding implementations). 137 3. Definitions and parameters 139 IGP events: An IGP LSDB change requiring a new routing table 140 computation. Examples are a topology change, a prefix change, a 141 metric change on link or prefix... 143 Routing table computation: computation of the routing table, by the 144 IGP, using the IGP LSDB. No distinction is made between the type of 145 computation performed. e.g., full SPF, incremental SPF, Partial Route 146 Computation (PRC). The type of computation is a local consideration. 147 This document may indifferently use the terms routing table 148 computation or SPF computation. 150 The SPF_DELAY is the delay introduced between the IGP event and the 151 start of the routing table computation. It can take the following 152 values: 154 INITIAL_WAIT: a very small delay to quickly handle link failure, 155 e.g., 0 milliseconds. 157 FAST_WAIT: a small delay to have a fast convergence in case of 158 single component failure (node, SRLG..), e.g., 50-100 milliseconds. 159 Note: we want to be fast, but as this failure results in multiple 160 IGP events, being too fast increases the probability to receive 161 additional network events immediately after the SPF computation. 163 LONG_WAIT: a long delay when the IGP is unstable, e.g., 2 seconds. 164 Note: Allow the IGP network to stabilize. 166 The TIME_TO_LEARN timer is the maximum duration typically needed to 167 learn all the IGP events related to a single component failure (e.g., 168 router failure, SRLG failure), e.g., 1 second. It's mostly dependent 169 on variation of failure detection times between all routers that are 170 adjacent to the failure. Additionally, it may depend on the 171 different flooding implementations for routers in the network. 173 The HOLD_DOWN timer is the time needed with no received IGP events 174 before considering the IGP to be stable again, allowing the SPF_DELAY 175 to be restored to INITIAL_WAIT. e.g., 3 seconds. 177 4. Principles of SPF delay algorithm 179 For this first IGP event, we assume that there has been a single 180 simple change in the network which can be taken into account using a 181 single routing computation (e.g., link failure, prefix (metric) 182 change) and we optimize for very fast convergence, delaying the 183 routing computation by INITIAL_WAIT. Under this assumption, there is 184 no benefit in delaying the routing computation. In a typical 185 network, this is the most common type of IGP event. Hence, it makes 186 sense to optimize this case. 188 If subsequent IGP events are received in a short period of time 189 (TIME_TO_LEARN), we then assume that a single component failed, but 190 that this failure requires the knowledge of multiple IGP events in 191 order for the IGP routing to converge. Under this assumption, we 192 want fast convergence since this is a normal network situation. 193 However, there is a benefit in waiting for all IGP events related to 194 this single component failure so that the IGP can compute the post- 195 failure routing table in a single route computation. In this 196 situation, we delay the routing computation by FAST_WAIT. 198 If IGP events are still received after TIME_TO_LEARN seconds from the 199 initial IGP event, then the network is presumably experiencing 200 multiple independent failures and while waiting for network 201 stability, the computations are delayed for a longer time represented 202 by LONG_WAIT. This SPF_delay is kept until no IGP events are 203 received for HOLD_DOWN seconds. 205 Note: previous SPF delay algorithms used to count the number of SPF 206 computations. However, as all routers may receive the IGP events at 207 different times, we cannot assume that all routers will perform the 208 same number of SPF computations or that they will schedule them at 209 the same time. For example, assuming that the SPF delay is 50 ms, 210 router R1 may receive 3 IGP events (E1, E2, E3) in those 50 ms and 211 hence will perform a single routing computation. While another 212 router R2 may only receive 2 events (E1, E2) in those 50 ms and hence 213 will schedule another routing computation when receiving E3. That's 214 why this document defines a time (TIME_TO_LEARN) from the initial 215 event detection/reception as opposed to defining the number of SPF 216 computations to determine when the IGP is unstable. 218 5. Specification of the SPF delay algorithm 220 When no IGP events have occurred during the HOLD_DOWN interval: 222 o The IGP is set to the QUIET state. 224 When the IGP is in the QUIET state and an IGP event is received: 226 o The time of this first IGP event is stored in FIRST_EVENT_TIME. 228 o The next routing table computation is scheduled at: this IGP event 229 received time + INITIAL_WAIT. 231 o The IGP is set to the FAST_WAIT state. 233 When the IGP is in the FAST_WAIT state and an IGP event is received: 235 o If more than the TIME_TO_LEARN interval has passed since 236 FIRST_EVENT_TIME, then the IGP is set to the HOLD_DOWN state. 238 o If a routing table computation is not already scheduled, one is 239 scheduled at: this IGP event received time + FAST_WAIT. 241 When the IGP is in the HOLD_DOWN state and an IGP event is received: 243 o If a routing table computation is not already scheduled, one is 244 scheduled at: this IGP event received time + LONG_WAIT. 246 6. Parameters 248 All the parameters MUST be configurable. All the delays 249 (INITIAL_WAIT, FAST_WAIT, LONG_WAIT, TIME_TO_LEARN, HOLD_DOWN) SHOULD 250 be configurable at the millisecond granularity. They MUST be 251 configurable at least at the tenth of second granularity. The 252 configurable range for all the parameters SHOULD be at least from 0 253 milliseconds to 60 seconds. 255 This document does not propose default values for the parameters 256 because these values are expected to be context dependent. 257 Implementations are free to propose their own default values. 259 When setting the (default) values, one SHOULD consider the customer's 260 or their applications' requirements, the computational power of the 261 routers, the size of the network, and, in particular, the number of 262 IP prefixes advertised in the IGP, the frequency and number of IGP 263 events, the number of protocols reactions/computations triggered by 264 IGP SPF (e.g., BGP, PCEP, Traffic Engineering CSPF, Fast ReRoute 265 computations). 267 Note that some or all of these factors may change over the life of 268 the network. In case of doubt, it's RECOMMENDED to play it safe and 269 start with safe, i.e., longer timers. 271 For the standard algorithm to be effective in mitigating micro-loops, 272 it is RECOMMENDED that all routers in the IGP domain, or at least all 273 the routers in the same area/level, have exactly the same configured 274 values. 276 7. Impact on micro-loops 278 Micro-loops during IGP convergence are due to a non-synchronized or 279 non-ordered update of the forwarding information tables (FIB) 280 [RFC5715] [RFC6976] [I-D.ietf-rtgwg-spf-uloop-pb-statement]. FIBs 281 are installed after multiple steps such as SPF wait time, SPF 282 computation, FIB distribution, and FIB update. This document only 283 addresses the first contribution. This standardized procedure 284 reduces the probability and/or duration of micro-loops when the IGP 285 experience multiple proximate events. It does not prevent all micro- 286 loops. However, it is beneficial and its cost seems limited compared 287 to full solutions such as [RFC5715] or [RFC6976]. 289 8. IANA Considerations 291 No IANA actions required. 293 9. Security considerations 295 This algorithm presented in this document does not in any way 296 compromise the security of the IGP. In fact, the HOLD_DOWN state may 297 mitigate the effects of Denial-of-Service (DOS) attacks generating 298 many IGP events. 300 10. Acknowledgements 302 We would like to acknowledge Les Ginsberg, Uma Chunduri, and Mike 303 Shand for the discussions and comments related to this document. 305 11. References 307 11.1. Normative References 309 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 310 Requirement Levels", BCP 14, RFC 2119, March 1997. 312 11.2. Informative References 314 [I-D.ietf-rtgwg-spf-uloop-pb-statement] 315 Litkowski, S., "Link State protocols SPF trigger and delay 316 algorithm impact on IGP microloops", draft-ietf-rtgwg-spf- 317 uloop-pb-statement-00 (work in progress), May 2015. 319 [ISO10589-Second-Edition] 320 International Organization for Standardization, 321 "Intermediate system to Intermediate system intra-domain 322 routeing information exchange protocol for use in 323 conjunction with the protocol for providing the 324 connectionless-mode Network Service (ISO 8473)", ISO/IEC 325 10589:2002, Second Edition, Nov 2002. 327 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 329 [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free 330 Convergence", RFC 5715, January 2010. 332 [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., 333 Francois, P., and O. Bonaventure, "Framework for Loop-Free 334 Convergence Using the Ordered Forwarding Information Base 335 (oFIB) Approach", RFC 6976, July 2013. 337 Authors' Addresses 339 Bruno Decraene 340 Orange 341 38 rue du General Leclerc 342 Issy Moulineaux cedex 9 92794 343 France 345 Email: bruno.decraene@orange.com 347 Stephane Litkowski 348 Orange Business Service 350 Email: stephane.litkowski@orange.com 352 Hannes Gredler 353 Juniper Networks, Inc. 354 1194 N. Mathilda Ave. 355 Sunnyvale, CA 94089 356 US 358 Email: hannes@juniper.net 360 Acee Lindem 361 Cisco Systems 362 301 Midenhall Way 363 Cary, NC 27513 364 USA 366 Email: acee@cisco.com 368 Pierre Francois 369 IMDEA Networks Institute 370 1194 N. Mathilda Ave. 371 Leganes 372 ES 374 Email: pierre.francois@imdea.org