idnits 2.17.1 draft-ietf-rtgwg-spf-uloop-pb-statement-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 27, 2017) is 2585 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-09) exists of draft-ietf-rtgwg-uloop-delay-03 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing Area Working Group S. Litkowski 3 Internet-Draft Orange Business Service 4 Intended status: Standards Track B. Decraene 5 Expires: September 28, 2017 Orange 6 M. Horneffer 7 Deutsche Telekom 8 March 27, 2017 10 Link State protocols SPF trigger and delay algorithm impact on IGP 11 micro-loops 12 draft-ietf-rtgwg-spf-uloop-pb-statement-03 14 Abstract 16 A micro-loop is a packet forwarding loop that may occur transiently 17 among two or more routers in a hop-by-hop packet forwarding paradigm. 19 In this document, we are trying to analyze the impact of using 20 different Link State IGP implementations in a single network in 21 regards of micro-loops. The analysis is focused on the SPF triggers 22 and SPF delay algorithm. 24 Requirements Language 26 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 27 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 28 document are to be interpreted as described in [RFC2119]. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on September 28, 2017. 47 Copyright Notice 49 Copyright (c) 2017 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 2. Problem statement . . . . . . . . . . . . . . . . . . . . . . 3 66 3. SPF trigger strategies . . . . . . . . . . . . . . . . . . . 4 67 4. SPF delay strategies . . . . . . . . . . . . . . . . . . . . 5 68 4.1. Two step SPF delay . . . . . . . . . . . . . . . . . . . 5 69 4.2. Exponential backoff . . . . . . . . . . . . . . . . . . . 6 70 5. Mixing strategies . . . . . . . . . . . . . . . . . . . . . . 7 71 6. Proposed work items . . . . . . . . . . . . . . . . . . . . . 11 72 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 73 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 74 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 75 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 76 10.1. Normative References . . . . . . . . . . . . . . . . . . 13 77 10.2. Informative References . . . . . . . . . . . . . . . . . 13 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 80 1. Introduction 82 Link State IGP protocols are based on a topology database on which a 83 SPF (Shortest Path First) algorithm like Dijkstra is implemented to 84 find the optimal routing paths. 86 Specifications like IS-IS ([RFC1195]) propose some optimizations of 87 the route computation (See Appendix C.1) but not all the 88 implementations are following those not mandatory optimizations. 90 We will call SPF trigger, the events that would lead to a new SPF 91 computation based on the topology. 93 Link State IGP protocols, like OSPF ([RFC2328]) and IS-IS 94 ([RFC1195]), are using plenty of timers to control the router 95 behavior in case of churn: SPF delay, PRC delay, LSP generation 96 delay, LSP flooding delay, LSP retransmission interval ... 98 Some of those timers are standardized in protocol specification, some 99 are not especially the SPF computation related timers. 101 For non standardized timers, implementations are free to implement it 102 in any way. For some standardized timer, we can also see that rather 103 than using static configurable values for such timer, implementations 104 may offer dynamically adjusted timers to help controlling the churn. 106 We will call SPF delay, the timer that exists in most implementations 107 that specifies the required delay before running SPF computation 108 after a SPF trigger is received. 110 A micro-loop is a packet forwarding loop that may occur transiently 111 among two or more routers in a hop-by-hop packet forwarding paradigm. 112 We can observe that these micro-loops are formed when two routers do 113 not update their Forwarding Information Base (FIB) for a certain 114 prefix at the same time. The micro-loop phenomenon is described in 115 [I-D.ietf-rtgwg-microloop-analysis]. 117 Some micro-loop mitigation techniques have been defined by IETF (e.g. 118 [RFC6976], [I-D.ietf-rtgwg-uloop-delay]) but are not implemented due 119 to complexity or are not providing a complete mitigation. 121 In multi vendor networks, using different implementations of a link 122 state protocol may favor micro-loops creation during the convergence 123 process due to discrepancies of timers. Service Providers are 124 already aware to use similar timers for all the network as a best 125 practice, but sometimes it is not possible due to limitations of 126 implementations. 128 This document will present why it sounds important for service 129 providers to have consistent implementations of Link State protocols 130 across vendors. We are particularly analyzing the impact of using 131 different Link State IGP implementations in a single network in 132 regards of micro-loops. The analysis is focused on the SPF triggers 133 and the SPF delay algorithm. 135 This document is only stating the problem, and defining some work 136 items but its not intended to provide a solution. 138 2. Problem statement 139 A ---- B 140 | | 141 10 | | 10 142 | | 143 C ---- D 144 | 2 | 145 Px Px 147 Figure 1 149 In the figure above, A uses primarily the AC link to reach C. When 150 the AC link fails, IGP convergence occurs. If A converges before B, 151 A will forward the traffic to C through B, but as B as not converged 152 yet, B will loop back traffic to A, leading to a micro-loop. 154 The micro-loop appears due to the asynchronous convergence of nodes 155 in a network when an event occurs. 157 Multiple factors (and combination of these factors) may increase the 158 probability for a micro-loop to appear: 160 o the delay of failure notification: the more B is advised of the 161 failure later than A, the more a micro-loop may have a chance to 162 appear. 164 o the SPF delay: most of the implementations supports a delay for 165 the SPF computation to try to catch as many events as possible. 166 If A uses an SPF delay timer of x msec and B uses an SPF delay 167 timer of y msec and x < y, B would start converging after A 168 leading to a potential micro-loop. 170 o the SPF computation time: mostly a matter of CPU power and 171 optimizations like incremental SPF. If A computes its SPF faster 172 than B, there is a chance for a micro-loop to appear. CPUs are 173 today faster enough to consider SPF computation time as 174 negligeable (order of msec in a large network). 176 o the RIB and FIB prefix insertion speed or ordering: highly 177 implementation dependant. 179 This document will focus on analysis SPF delay (and associated 180 triggers). 182 3. SPF trigger strategies 184 Depending of the change advertised in LSP/LSA, the topology may be 185 affected or not. An implementation may avoid running the SPF 186 computation (and may only run IP reachability computation instead) if 187 the advertised change is not affecting topology. 189 Different strategies exists to trigger the SPF computation: 191 1. An implementation may always run a full SPF whatever the change 192 to process. 194 2. An implementation may run a full SPF only when required: e.g. if 195 a link fails, a local node will run an SPF for its local LSP 196 update. If the LSP from the neighbor (describing the same 197 failure) is received after SPF has started, the local node can 198 decide that a new full SPF is not required as the topology has 199 not change. 201 3. If the topology does not change, an implementation may only 202 recompute the IP reachability. 204 As pointed in Section 1, SPF optimizations are not mandatory in 205 specifications, leading to multiple strategies to be implemented. 207 4. SPF delay strategies 209 Implementations of link state routing protocols use different 210 strategies to delay the SPF computation. We usually see the 211 following: 213 1. Two step delay. 215 2. Exponential backoff delay. 217 Those behavior will be explained in the next sections. 219 4.1. Two step SPF delay 221 The SPF delay is managed by four parameters: 223 o Rapid delay: amount of time to wait before running SPF. 225 o Rapid runs: amount of consecutive SPF runs that can use the rapid 226 delay. When the amount is exceeded the delay moves to the slow 227 delay value . 229 o Slow delay: amount of time to wait before running SPF. 231 o Wait time: amount of time to wait without events before going back 232 to the rapid delay. 234 Example: Rapid delay = 50msec, Rapid runs = 3, Slow delay = 1sec, 235 Wait time = 2sec 237 SPF delay time 238 ^ 239 | 240 | 241 SD- | x xx x 242 | 243 | 244 | 245 RD- | x x x x 246 | 247 +---------------------------------> Events 248 | | | | || | | 249 < wait time > 251 4.2. Exponential backoff 253 The algorithm has two modes: the fast mode and the backoff mode. In 254 the backoff mode, the SPF delay is increasing exponentially at each 255 run. The SPF delay is managed by four parameters: 257 o First delay: amount of time to wait before running SPF. This 258 delay is used only when SPF is in fast mode. 260 o Incremental delay: amount of time to wait before running SPF. 261 This delay is used only when SPF is in backoff mode and increments 262 exponentially at each SPF run. 264 o Maximum delay: maximum amount of time to wait before running SPF. 266 o Wait time: amount of time to wait without events before going back 267 to the fast mode. 269 Example: First delay = 50msec, Incremental delay = 50msec, Maximum 270 delay = 1sec, Wait time = 2sec 271 SPF delay time 272 ^ 273 MD- | xx x 274 | 275 | 276 | 277 | 278 | 279 | x 280 | 281 | 282 | 283 | x 284 | 285 FD- | x x x 286 ID | 287 +---------------------------------> Events 288 | | | | || | | 289 < wait time > 290 FM->BM -------------------->FM 292 5. Mixing strategies 294 S ---- E 295 | | 296 10 | | 10 297 | | 298 D ---- A 299 | 2 300 Px 302 Figure 2 304 In the diagram above, we consider a flow of packet from S to D. We 305 consider that S is using optimized SPF triggering (Full SPF is 306 triggered only when necessary), and two steps SPF delay 307 (rapid=150ms,rapid-runs=3, slow=1s). As implementation of S is 308 optimized, Partial Reachability Computation (PRC) is available. We 309 consider the same timers as SPF for delaying PRC. We consider that E 310 is using a SPF trigger strategy that always compute Full SPF and 311 exponential backoff strategy for SPF delay (start=150ms, inc=150ms, 312 max=1s) 314 We also consider the following sequence of events (note : the time 315 scale does not intend to represent a real router time scale where 316 jitters are introduced to all timers) : 318 o t0=0 ms: a prefix is declared down in the network. We consider 319 this event to happen at time=0. 321 o 200ms: the prefix is declared as up. 323 o 400ms: a prefix is declared down in the network. 325 o 1000ms: S-D link fails. 327 +--------+--------------------+------------------+------------------+ 328 | Time | Network Event | Router S events | Router E events | 329 +--------+--------------------+------------------+------------------+ 330 | t0=0 | Prefix DOWN | | | 331 | 10ms | | Schedule PRC (in | Schedule SPF (in | 332 | | | 150ms) | 150ms) | 333 | | | | | 334 | | | | | 335 | 160ms | | PRC starts | SPF starts | 336 | 161ms | | PRC ends | | 337 | 162ms | | RIB/FIB starts | | 338 | 163ms | | | SPF ends | 339 | 164ms | | | RIB/FIB starts | 340 | 175ms | | RIB/FIB ends | | 341 | 178ms | | | RIB/FIB ends | 342 | | | | | 343 | 200ms | Prefix UP | | | 344 | 212ms | | Schedule PRC (in | | 345 | | | 150ms) | | 346 | 214ms | | | Schedule SPF (in | 347 | | | | 150ms) | 348 | | | | | 349 | | | | | 350 | 370ms | | PRC starts | | 351 | 372ms | | PRC ends | | 352 | 373ms | | | SPF starts | 353 | 373ms | | RIB/FIB starts | | 354 | 375ms | | | SPF ends | 355 | 376ms | | | RIB/FIB starts | 356 | 383ms | | RIB/FIB ends | | 357 | 385ms | | | RIB/FIB ends | 358 | | | | | 359 | 400ms | Prefix DOWN | | | 360 | 410ms | | Schedule PRC (in | Schedule SPF (in | 361 | | | 300ms) | 300ms) | 362 | | | | | 363 | | | | | 364 | | | | | 365 | | | | | 366 | 710ms | | PRC starts | SPF starts | 367 | 711ms | | PRC ends | | 368 | 712ms | | RIB/FIB starts | | 369 | 713ms | | | SPF ends | 370 | 714ms | | | RIB/FIB starts | 371 | 716ms | | RIB/FIB ends | RIB/FIB ends | 372 | | | | | 373 | 1000ms | S-D link DOWN | | | 374 | 1010ms | | Schedule SPF (in | Schedule SPF (in | 375 | | | 150ms) | 600ms) | 376 | | | | | 377 | | | | | 378 | 1160ms | | SPF starts | | 379 | 1161ms | | SPF ends | | 380 | 1162ms | Micro-loop may | RIB/FIB starts | | 381 | | start from here | | | 382 | 1175ms | | RIB/FIB ends | | 383 | | | | | 384 | | | | | 385 | | | | | 386 | | | | | 387 | 1612ms | | | SPF starts | 388 | 1615ms | | | SPF ends | 389 | 1616ms | | | RIB/FIB starts | 390 | 1626ms | Micro-loop ends | | RIB/FIB ends | 391 +--------+--------------------+------------------+------------------+ 393 Route computation event time scale 395 In the table above, we can see that due to discrepancies in the SPF 396 management, after multiple events (of a different type), the values 397 of the SPF delay are completely misaligned between nodes leading to 398 long micro-loops creation. 400 The same issue can also appear with only single type of events as 401 displayed below: 403 +--------+--------------------+------------------+------------------+ 404 | Time | Network Event | Router S events | Router E events | 405 +--------+--------------------+------------------+------------------+ 406 | t0=0 | Link DOWN | | | 407 | 10ms | | Schedule SPF (in | Schedule SPF (in | 408 | | | 150ms) | 150ms) | 409 | | | | | 410 | | | | | 411 | 160ms | | SPF starts | SPF starts | 412 | 161ms | | SPF ends | | 413 | 162ms | | RIB/FIB starts | | 414 | 163ms | | | SPF ends | 415 | 164ms | | | RIB/FIB starts | 416 | 175ms | | RIB/FIB ends | | 417 | 178ms | | | RIB/FIB ends | 418 | | | | | 419 | 200ms | Link DOWN | | | 420 | 212ms | | Schedule SPF (in | | 421 | | | 150ms) | | 422 | 214ms | | | Schedule SPF (in | 423 | | | | 150ms) | 424 | | | | | 425 | | | | | 426 | 370ms | | SPF starts | | 427 | 372ms | | SPF ends | | 428 | 373ms | | | SPF starts | 429 | 373ms | | RIB/FIB starts | | 430 | 375ms | | | SPF ends | 431 | 376ms | | | RIB/FIB starts | 432 | 383ms | | RIB/FIB ends | | 433 | 385ms | | | RIB/FIB ends | 434 | | | | | 435 | 400ms | Link DOWN | | | 436 | 410ms | | Schedule SPF (in | Schedule SPF (in | 437 | | | 150ms) | 300ms) | 438 | | | | | 439 | | | | | 440 | 560ms | | SPF starts | | 441 | 561ms | | SPF ends | | 442 | 562ms | Micro-loop may | RIB/FIB starts | | 443 | | start from here | | | 444 | 568ms | | RIB/FIB ends | | 445 | | | | | 446 | | | | | 447 | 710ms | | | SPF starts | 448 | 713ms | | | SPF ends | 449 | 714ms | | | RIB/FIB starts | 450 | 716ms | Micro-loop ends | | RIB/FIB ends | 451 | | | | | 452 | 1000ms | Link DOWN | | | 453 | 1010ms | | Schedule SPF (in | Schedule SPF (in | 454 | | | 1s) | 600ms) | 455 | | | | | 456 | | | | | 457 | | | | | 458 | | | | | 459 | 1612ms | | | SPF starts | 460 | 1615ms | | | SPF ends | 461 | 1616ms | Micro-loop may | | RIB/FIB starts | 462 | | start from here | | | 463 | 1626ms | | | RIB/FIB ends | 464 | | | | | 465 | | | | | 466 | | | | | 467 | | | | | 468 | 2012ms | | SPF starts | | 469 | 2014ms | | SPF ends | | 470 | 2015ms | | RIB/FIB starts | | 471 | 2025ms | Micro-loop ends | RIB/FIB ends | | 472 | | | | | 473 | | | | | 474 +--------+--------------------+------------------+------------------+ 476 Route computation event time scale 478 6. Proposed work items 480 In order to enhance the current Link State IGP behavior, authors 481 would encourage working on standardization of some behaviours. 483 Authors are proposing the following work items : 485 o Standardize SPF trigger strategy. 487 o Standardize computation timer scope: single timer for all 488 computation operations, separated timers ... 490 o Standardize "slowdown" timer algorithm including its association 491 to a particular timer: authors of this document does not presume 492 that the same algorithm must be used for all timers. 494 Using the same event sequence as in figure 2, we may expect fewer 495 and/or shorter micro-loops using standardized implementations. 497 +--------+--------------------+------------------+------------------+ 498 | Time | Network Event | Router S events | Router E events | 499 +--------+--------------------+------------------+------------------+ 500 | t0=0 | Prefix DOWN | | | 501 | 10ms | | Schedule PRC (in | Schedule SPF (in | 502 | | | 150ms) | 150ms) | 503 | | | | | 504 | | | | | 505 | 160ms | | PRC starts | PRC starts | 506 | 161ms | | PRC ends | | 507 | 162ms | | RIB/FIB starts | PRC ends | 508 | 163ms | | | RIB/FIB starts | 509 | 175ms | | RIB/FIB ends | | 510 | 176ms | | | RIB/FIB ends | 511 | | | | | 512 | 200ms | Prefix UP | | | 513 | 212ms | | Schedule PRC (in | | 514 | | | 150ms) | | 515 | 213ms | | | Schedule PRC (in | 516 | | | | 150ms) | 517 | | | | | 518 | | | | | 519 | 370ms | | PRC starts | PRC starts | 520 | 372ms | | PRC ends | | 521 | 373ms | | RIB/FIB starts | PRC ends | 522 | 374ms | | | RIB/FIB starts | 523 | 383ms | | RIB/FIB ends | | 524 | 384ms | | | RIB/FIB ends | 525 | | | | | 526 | 400ms | Prefix DOWN | | | 527 | 410ms | | Schedule PRC (in | Schedule PRC (in | 528 | | | 300ms) | 300ms) | 529 | | | | | 530 | | | | | 531 | | | | | 532 | | | | | 533 | 710ms | | PRC starts | PRC starts | 534 | 711ms | | PRC ends | PRC ends | 535 | 712ms | | RIB/FIB starts | | 536 | 713ms | | | RIB/FIB starts | 537 | 716ms | | RIB/FIB ends | RIB/FIB ends | 538 | | | | | 539 | 1000ms | S-D link DOWN | | | 540 | 1010ms | | Schedule SPF (in | Schedule SPF (in | 541 | | | 150ms) | 150ms) | 542 | | | | | 543 | | | | | 544 | 1160ms | | SPF starts | | 545 | 1161ms | | SPF ends | SPF starts | 546 | 1162ms | Micro-loop may | RIB/FIB starts | SPF ends | 547 | | start from here | | | 548 | 1163ms | | | RIB/FIB starts | 549 | 1175ms | | RIB/FIB ends | | 550 | 1177ms | Micro-loop ends | | RIB/FIB ends | 551 +--------+--------------------+------------------+------------------+ 553 Route computation event time scale 555 As displayed above, there could be some other parameters like router 556 computation power, flooding timers that may also influence micro- 557 loops. In the figure 5, we consider E to be a bit slower than S, 558 leading to micro-loop creation. Despite of this, we expect that by 559 aligning implementations at least on SPF trigger and SPF delay, 560 service provider may reduce the number and the duration of micro- 561 loops. 563 7. Security Considerations 565 This document does not introduce any security consideration. 567 8. Acknowledgements 569 Authors would like to thank Mike Shand for his useful comments. 571 9. IANA Considerations 573 This document has no action for IANA. 575 10. References 577 10.1. Normative References 579 [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 580 dual environments", RFC 1195, DOI 10.17487/RFC1195, 581 December 1990, . 583 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 584 Requirement Levels", BCP 14, RFC 2119, 585 DOI 10.17487/RFC2119, March 1997, 586 . 588 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 589 DOI 10.17487/RFC2328, April 1998, 590 . 592 10.2. Informative References 594 [I-D.ietf-rtgwg-microloop-analysis] 595 Zinin, A., "Analysis and Minimization of Microloops in 596 Link-state Routing Protocols", draft-ietf-rtgwg-microloop- 597 analysis-01 (work in progress), October 2005. 599 [I-D.ietf-rtgwg-uloop-delay] 600 Litkowski, S., Decraene, B., Filsfils, C., and P. 601 Francois, "Micro-loop prevention by introducing a local 602 convergence delay", draft-ietf-rtgwg-uloop-delay-03 (work 603 in progress), November 2016. 605 [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., 606 Francois, P., and O. Bonaventure, "Framework for Loop-Free 607 Convergence Using the Ordered Forwarding Information Base 608 (oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July 609 2013, . 611 Authors' Addresses 613 Stephane Litkowski 614 Orange Business Service 616 Email: stephane.litkowski@orange.com 618 Bruno Decraene 619 Orange 621 Email: bruno.decraene@orange.com 623 Martin Horneffer 624 Deutsche Telekom 626 Email: martin.horneffer@telekom.de