idnits 2.17.1 draft-ietf-rtgwg-spf-uloop-pb-statement-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (May 23, 2017) is 2529 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-09) exists of draft-ietf-rtgwg-uloop-delay-04 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing Area Working Group S. Litkowski 3 Internet-Draft Orange Business Service 4 Intended status: Informational B. Decraene 5 Expires: November 24, 2017 Orange 6 M. Horneffer 7 Deutsche Telekom 8 May 23, 2017 10 Link State protocols SPF trigger and delay algorithm impact on IGP 11 micro-loops 12 draft-ietf-rtgwg-spf-uloop-pb-statement-04 14 Abstract 16 A micro-loop is a packet forwarding loop that may occur transiently 17 among two or more routers in a hop-by-hop packet forwarding paradigm. 19 In this document, we are trying to analyze the impact of using 20 different Link State IGP implementations in a single network in 21 regards of micro-loops. The analysis is focused on the SPF triggers 22 and SPF delay algorithm. 24 Requirements Language 26 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 27 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 28 document are to be interpreted as described in [RFC2119]. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on November 24, 2017. 47 Copyright Notice 49 Copyright (c) 2017 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 2. Problem statement . . . . . . . . . . . . . . . . . . . . . . 3 66 3. SPF trigger strategies . . . . . . . . . . . . . . . . . . . 5 67 4. SPF delay strategies . . . . . . . . . . . . . . . . . . . . 5 68 4.1. Two steps SPF delay . . . . . . . . . . . . . . . . . . . 5 69 4.2. Exponential backoff . . . . . . . . . . . . . . . . . . . 6 70 5. Mixing strategies . . . . . . . . . . . . . . . . . . . . . . 7 71 6. Proposed work items . . . . . . . . . . . . . . . . . . . . . 11 72 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 73 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 74 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 75 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 76 10.1. Normative References . . . . . . . . . . . . . . . . . . 13 77 10.2. Informative References . . . . . . . . . . . . . . . . . 13 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 80 1. Introduction 82 Link State IGP protocols are based on a topology database on which an 83 SPF (Shortest Path First) algorithm like Dijkstra is implemented to 84 find the optimal routing paths. 86 Specifications like IS-IS ([RFC1195]) propose some optimizations of 87 the route computation (See Appendix C.1) but not all the 88 implementations are following those not mandatory optimizations. 90 We will call "SPF trigger", the events that would lead to a new SPF 91 computation based on the topology. 93 Link State IGP protocols, like OSPF ([RFC2328]) and IS-IS 94 ([RFC1195]), are using multiple timers to control the router behavior 95 in case of churn: SPF delay, PRC delay, LSP generation delay, LSP 96 flooding delay, LSP retransmission interval... 98 Some of those timers are standardized in protocol specification, some 99 are not especially the SPF computation related timers. 101 For non standardized timers, implementations are free to implement it 102 in any way. For some standardized timer, we can also see that rather 103 than using static configurable values for such timer, implementations 104 may offer dynamically adjusted timers to help controlling the churn. 106 We will call "SPF delay", the timer that exists in most 107 implementations that specifies the required delay before running SPF 108 computation after a SPF trigger is received. 110 A micro-loop is a packet forwarding loop that may occur transiently 111 among two or more routers in a hop-by-hop packet forwarding paradigm. 112 We can observe that these micro-loops are formed when two routers do 113 not update their Forwarding Information Base (FIB) for a certain 114 prefix at the same time. The micro-loop phenomenon is described in 115 [I-D.ietf-rtgwg-microloop-analysis]. 117 Some micro-loop mitigation techniques have been defined by IETF (e.g. 118 [RFC6976], [I-D.ietf-rtgwg-uloop-delay]) but are not implemented due 119 to complexity or are not providing a complete mitigation. 121 In multi-vendor networks, using different implementations of a link 122 state protocol may favor micro-loops creation during the convergence 123 process due to discrepancies of timers. Service Providers are 124 already aware to use similar timers for all the network as a best 125 practice, but sometimes it is not possible due to limitations of 126 implementations. 128 This document will present why it sounds important for service 129 providers to have consistent implementations of Link State protocols 130 across vendors. We are particularly analyzing the impact of using 131 different Link State IGP implementations in a single network in 132 regards of micro-loops. The analysis is focused on the SPF triggers 133 and the SPF delay algorithm. 135 This document is only stating the problem, and defining some work 136 items but its not intended to provide a solution. 138 2. Problem statement 139 A ---- B 140 | | 141 10 | | 10 142 | | 143 C ---- D 144 | 2 | 145 Px Px 147 Figure 1 - Network topology suffering from micro-loops 149 In Figure 1, A uses primarily the AC link to reach C. When the AC 150 link fails, the IGP convergence occurs. If A converges before B, A 151 will forward the traffic to C through B, but as B as not converged 152 yet, B will loop back traffic to A, leading to a micro-loop. 154 The micro-loop appears due to the asynchronous convergence of nodes 155 in a network when an event occurs. 157 Multiple factors (and combination of these factors) may increase the 158 probability for a micro-loop to appear: 160 o the delay of failure notification: the more B is advised of the 161 failure later than A, the more a micro-loop may have a chance to 162 appear. 164 o the SPF delay: most of the implementations supports a delay for 165 the SPF computation to try to catch as many events as possible. 166 If A uses an SPF delay timer of x msec and B uses an SPF delay 167 timer of y msec and x < y, B would start converging after A 168 leading to a potential micro-loop. 170 o the SPF computation time: mostly a matter of CPU power and 171 optimizations like incremental SPF. If A computes its SPF faster 172 than B, there is a chance for a micro-loop to appear. CPUs are 173 today faster enough to consider SPF computation time as 174 negligeable (order of msec in a large network). 176 o the RIB and FIB prefix insertion speed or ordering: highly 177 implementation dependant. 179 This document will focus on analysis SPF delay (and associated 180 triggers). 182 3. SPF trigger strategies 184 Depending of the change advertised in LSP/LSA, the topology may be 185 affected or not. An implementation may avoid running the SPF 186 computation (and may only run IP reachability computation instead) if 187 the advertised change is not affecting topology. 189 Different strategies exists to trigger the SPF computation: 191 1. An implementation may always run a full SPF whatever the change 192 to process. 194 2. An implementation may run a full SPF only when required: e.g. if 195 a link fails, a local node will run an SPF for its local LSP 196 update. If the LSP from the neighbor (describing the same 197 failure) is received after SPF has started, the local node can 198 decide that a new full SPF is not required as the topology has 199 not change. 201 3. If the topology does not change, an implementation may only 202 recompute the IP reachability. 204 As pointed in Section 1, SPF optimizations are not mandatory in 205 specifications, leading to multiple strategies to be implemented. 207 4. SPF delay strategies 209 Implementations of link state routing protocols use different 210 strategies to delay the SPF computation. We usually see the 211 following: 213 1. Two steps delay. 215 2. Exponential backoff delay. 217 Those behavior will be explained in the next sections. 219 4.1. Two steps SPF delay 221 The SPF delay is managed by four parameters: 223 o Rapid delay: amount of time to wait before running SPF. 225 o Rapid runs: amount of consecutive SPF runs that can use the rapid 226 delay. When the amount is exceeded the delay moves to the slow 227 delay value . 229 o Slow delay: amount of time to wait before running SPF. 231 o Wait time: amount of time to wait without events before going back 232 to the rapid delay. 234 Example: Rapid delay = 50msec, Rapid runs = 3, Slow delay = 1sec, 235 Wait time = 2sec 237 SPF delay time 238 ^ 239 | 240 | 241 SD- | x xx x 242 | 243 | 244 | 245 RD- | x x x x 246 | 247 +---------------------------------> Events 248 | | | | || | | 249 < wait time > 251 Figure 2 - Two steps delay algorithm 253 4.2. Exponential backoff 255 The algorithm has two modes: the fast mode and the backoff mode. In 256 the fast mode, the SPF delay is usually delayed by a very small 257 amount of time (fast reaction). When an SPF computation has run in 258 the fast mode, the algorithm automatically moves to the backoff mode 259 (a single SPF run is authorized in the fast mode). In the backoff 260 mode, the SPF delay is increasing exponentially at each run. When 261 the network becomes stable, the algorithm moves back to the fast 262 mode. The SPF delay is managed by four parameters: 264 o First delay: amount of time to wait before running SPF. This 265 delay is used only when SPF is in fast mode. 267 o Incremental delay: amount of time to wait before running SPF. 268 This delay is used only when SPF is in backoff mode and increments 269 exponentially at each SPF run. 271 o Maximum delay: maximum amount of time to wait before running SPF. 273 o Wait time: amount of time to wait without events before going back 274 to the fast mode. 276 Example: First delay = 50msec, Incremental delay = 50msec, Maximum 277 delay = 1sec, Wait time = 2sec 278 SPF delay time 279 ^ 280 MD- | xx x 281 | 282 | 283 | 284 | 285 | 286 | x 287 | 288 | 289 | 290 | x 291 | 292 FD- | x x x 293 ID | 294 +---------------------------------> Events 295 | | | | || | | 296 < wait time > 297 FM->BM -------------------->FM 299 Figure 3 - Exponential delay algorithm 301 5. Mixing strategies 303 S ---- E 304 | | 305 10 | | 10 306 | | 307 D ---- A 308 | 2 309 Px 311 Figure 4 313 In Figure 4, we consider a flow of packet from S to D. We consider 314 that S is using optimized SPF triggering (Full SPF is triggered only 315 when necessary), and two steps SPF delay (rapid=150ms,rapid-runs=3, 316 slow=1s). As implementation of S is optimized, Partial Reachability 317 Computation (PRC) is available. We consider the same timers as SPF 318 for delaying PRC. We consider that E is using a SPF trigger strategy 319 that always compute Full SPF and exponential backoff strategy for SPF 320 delay (start=150ms, inc=150ms, max=1s) 321 We also consider the following sequence of events (note : the time 322 scale does not intend to represent a real router time scale where 323 jitters are introduced to all timers) : 325 o t0=0 ms: a prefix is declared down in the network. We consider 326 this event to happen at time=0. 328 o 200ms: the prefix is declared as up. 330 o 400ms: a prefix is declared down in the network. 332 o 1000ms: S-D link fails. 334 +--------+--------------------+------------------+------------------+ 335 | Time | Network Event | Router S events | Router E events | 336 +--------+--------------------+------------------+------------------+ 337 | t0=0 | Prefix DOWN | | | 338 | 10ms | | Schedule PRC (in | Schedule SPF (in | 339 | | | 150ms) | 150ms) | 340 | | | | | 341 | | | | | 342 | 160ms | | PRC starts | SPF starts | 343 | 161ms | | PRC ends | | 344 | 162ms | | RIB/FIB starts | | 345 | 163ms | | | SPF ends | 346 | 164ms | | | RIB/FIB starts | 347 | 175ms | | RIB/FIB ends | | 348 | 178ms | | | RIB/FIB ends | 349 | | | | | 350 | 200ms | Prefix UP | | | 351 | 212ms | | Schedule PRC (in | | 352 | | | 150ms) | | 353 | 214ms | | | Schedule SPF (in | 354 | | | | 150ms) | 355 | | | | | 356 | | | | | 357 | 370ms | | PRC starts | | 358 | 372ms | | PRC ends | | 359 | 373ms | | | SPF starts | 360 | 373ms | | RIB/FIB starts | | 361 | 375ms | | | SPF ends | 362 | 376ms | | | RIB/FIB starts | 363 | 383ms | | RIB/FIB ends | | 364 | 385ms | | | RIB/FIB ends | 365 | | | | | 366 | 400ms | Prefix DOWN | | | 367 | 410ms | | Schedule PRC (in | Schedule SPF (in | 368 | | | 300ms) | 300ms) | 369 | | | | | 370 | | | | | 371 | | | | | 372 | | | | | 373 | 710ms | | PRC starts | SPF starts | 374 | 711ms | | PRC ends | | 375 | 712ms | | RIB/FIB starts | | 376 | 713ms | | | SPF ends | 377 | 714ms | | | RIB/FIB starts | 378 | 716ms | | RIB/FIB ends | RIB/FIB ends | 379 | | | | | 380 | 1000ms | S-D link DOWN | | | 381 | 1010ms | | Schedule SPF (in | Schedule SPF (in | 382 | | | 150ms) | 600ms) | 383 | | | | | 384 | | | | | 385 | 1160ms | | SPF starts | | 386 | 1161ms | | SPF ends | | 387 | 1162ms | Micro-loop may | RIB/FIB starts | | 388 | | start from here | | | 389 | 1175ms | | RIB/FIB ends | | 390 | | | | | 391 | | | | | 392 | | | | | 393 | | | | | 394 | 1612ms | | | SPF starts | 395 | 1615ms | | | SPF ends | 396 | 1616ms | | | RIB/FIB starts | 397 | 1626ms | Micro-loop ends | | RIB/FIB ends | 398 +--------+--------------------+------------------+------------------+ 400 Route computation event time scale 402 In the table above, we can see that due to discrepancies in the SPF 403 management, after multiple events (of a different type), the values 404 of the SPF delay are completely misaligned between nodes leading to 405 long micro-loops creation. 407 The same issue can also appear with only single type of events as 408 displayed below: 410 +--------+--------------------+------------------+------------------+ 411 | Time | Network Event | Router S events | Router E events | 412 +--------+--------------------+------------------+------------------+ 413 | t0=0 | Link DOWN | | | 414 | 10ms | | Schedule SPF (in | Schedule SPF (in | 415 | | | 150ms) | 150ms) | 416 | | | | | 417 | | | | | 418 | 160ms | | SPF starts | SPF starts | 419 | 161ms | | SPF ends | | 420 | 162ms | | RIB/FIB starts | | 421 | 163ms | | | SPF ends | 422 | 164ms | | | RIB/FIB starts | 423 | 175ms | | RIB/FIB ends | | 424 | 178ms | | | RIB/FIB ends | 425 | | | | | 426 | 200ms | Link DOWN | | | 427 | 212ms | | Schedule SPF (in | | 428 | | | 150ms) | | 429 | 214ms | | | Schedule SPF (in | 430 | | | | 150ms) | 431 | | | | | 432 | | | | | 433 | 370ms | | SPF starts | | 434 | 372ms | | SPF ends | | 435 | 373ms | | | SPF starts | 436 | 373ms | | RIB/FIB starts | | 437 | 375ms | | | SPF ends | 438 | 376ms | | | RIB/FIB starts | 439 | 383ms | | RIB/FIB ends | | 440 | 385ms | | | RIB/FIB ends | 441 | | | | | 442 | 400ms | Link DOWN | | | 443 | 410ms | | Schedule SPF (in | Schedule SPF (in | 444 | | | 150ms) | 300ms) | 445 | | | | | 446 | | | | | 447 | 560ms | | SPF starts | | 448 | 561ms | | SPF ends | | 449 | 562ms | Micro-loop may | RIB/FIB starts | | 450 | | start from here | | | 451 | 568ms | | RIB/FIB ends | | 452 | | | | | 453 | | | | | 454 | 710ms | | | SPF starts | 455 | 713ms | | | SPF ends | 456 | 714ms | | | RIB/FIB starts | 457 | 716ms | Micro-loop ends | | RIB/FIB ends | 458 | | | | | 459 | 1000ms | Link DOWN | | | 460 | 1010ms | | Schedule SPF (in | Schedule SPF (in | 461 | | | 1s) | 600ms) | 462 | | | | | 463 | | | | | 464 | | | | | 465 | | | | | 466 | 1612ms | | | SPF starts | 467 | 1615ms | | | SPF ends | 468 | 1616ms | Micro-loop may | | RIB/FIB starts | 469 | | start from here | | | 470 | 1626ms | | | RIB/FIB ends | 471 | | | | | 472 | | | | | 473 | | | | | 474 | | | | | 475 | 2012ms | | SPF starts | | 476 | 2014ms | | SPF ends | | 477 | 2015ms | | RIB/FIB starts | | 478 | 2025ms | Micro-loop ends | RIB/FIB ends | | 479 | | | | | 480 | | | | | 481 +--------+--------------------+------------------+------------------+ 483 Route computation event time scale 485 6. Proposed work items 487 In order to enhance the current Link State IGP behavior, authors 488 would encourage working on standardization of some behaviours. 490 Authors are proposing the following work items : 492 o Standardize SPF trigger strategy. 494 o Standardize computation timer scope: single timer for all 495 computation operations, separated timers ... 497 o Standardize "slowdown" timer algorithm including its association 498 to a particular timer: authors of this document does not presume 499 that the same algorithm must be used for all timers. 501 Using the same event sequence as in figure 2, we may expect fewer 502 and/or shorter micro-loops using standardized implementations. 504 +--------+--------------------+------------------+------------------+ 505 | Time | Network Event | Router S events | Router E events | 506 +--------+--------------------+------------------+------------------+ 507 | t0=0 | Prefix DOWN | | | 508 | 10ms | | Schedule PRC (in | Schedule SPF (in | 509 | | | 150ms) | 150ms) | 510 | | | | | 511 | | | | | 512 | 160ms | | PRC starts | PRC starts | 513 | 161ms | | PRC ends | | 514 | 162ms | | RIB/FIB starts | PRC ends | 515 | 163ms | | | RIB/FIB starts | 516 | 175ms | | RIB/FIB ends | | 517 | 176ms | | | RIB/FIB ends | 518 | | | | | 519 | 200ms | Prefix UP | | | 520 | 212ms | | Schedule PRC (in | | 521 | | | 150ms) | | 522 | 213ms | | | Schedule PRC (in | 523 | | | | 150ms) | 524 | | | | | 525 | | | | | 526 | 370ms | | PRC starts | PRC starts | 527 | 372ms | | PRC ends | | 528 | 373ms | | RIB/FIB starts | PRC ends | 529 | 374ms | | | RIB/FIB starts | 530 | 383ms | | RIB/FIB ends | | 531 | 384ms | | | RIB/FIB ends | 532 | | | | | 533 | 400ms | Prefix DOWN | | | 534 | 410ms | | Schedule PRC (in | Schedule PRC (in | 535 | | | 300ms) | 300ms) | 536 | | | | | 537 | | | | | 538 | | | | | 539 | | | | | 540 | 710ms | | PRC starts | PRC starts | 541 | 711ms | | PRC ends | PRC ends | 542 | 712ms | | RIB/FIB starts | | 543 | 713ms | | | RIB/FIB starts | 544 | 716ms | | RIB/FIB ends | RIB/FIB ends | 545 | | | | | 546 | 1000ms | S-D link DOWN | | | 547 | 1010ms | | Schedule SPF (in | Schedule SPF (in | 548 | | | 150ms) | 150ms) | 549 | | | | | 550 | | | | | 551 | 1160ms | | SPF starts | | 552 | 1161ms | | SPF ends | SPF starts | 553 | 1162ms | Micro-loop may | RIB/FIB starts | SPF ends | 554 | | start from here | | | 555 | 1163ms | | | RIB/FIB starts | 556 | 1175ms | | RIB/FIB ends | | 557 | 1177ms | Micro-loop ends | | RIB/FIB ends | 558 +--------+--------------------+------------------+------------------+ 560 Route computation event time scale 562 As displayed above, there could be some other parameters like router 563 computation power, flooding timers that may also influence micro- 564 loops. In Figure 4, we consider E to be a bit slower than S, leading 565 to micro-loop creation. Despite of this, we expect that by aligning 566 implementations at least on SPF trigger and SPF delay, service 567 provider may reduce the number and the duration of micro-loops. 569 7. Security Considerations 571 This document does not introduce any security consideration. 573 8. Acknowledgements 575 Authors would like to thank Mike Shand for his useful comments. 577 9. IANA Considerations 579 This document has no action for IANA. 581 10. References 583 10.1. Normative References 585 [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 586 dual environments", RFC 1195, DOI 10.17487/RFC1195, 587 December 1990, . 589 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 590 Requirement Levels", BCP 14, RFC 2119, 591 DOI 10.17487/RFC2119, March 1997, 592 . 594 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 595 DOI 10.17487/RFC2328, April 1998, 596 . 598 10.2. Informative References 600 [I-D.ietf-rtgwg-microloop-analysis] 601 Zinin, A., "Analysis and Minimization of Microloops in 602 Link-state Routing Protocols", draft-ietf-rtgwg-microloop- 603 analysis-01 (work in progress), October 2005. 605 [I-D.ietf-rtgwg-uloop-delay] 606 Litkowski, S., Decraene, B., Filsfils, C., and P. 607 Francois, "Micro-loop prevention by introducing a local 608 convergence delay", draft-ietf-rtgwg-uloop-delay-04 (work 609 in progress), April 2017. 611 [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., 612 Francois, P., and O. Bonaventure, "Framework for Loop-Free 613 Convergence Using the Ordered Forwarding Information Base 614 (oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July 615 2013, . 617 Authors' Addresses 619 Stephane Litkowski 620 Orange Business Service 622 Email: stephane.litkowski@orange.com 624 Bruno Decraene 625 Orange 627 Email: bruno.decraene@orange.com 629 Martin Horneffer 630 Deutsche Telekom 632 Email: martin.horneffer@telekom.de