idnits 2.17.1 draft-ietf-rtgwg-spf-uloop-pb-statement-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (June 23, 2015) is 3223 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-04) exists of draft-litkowski-rtgwg-uloop-delay-03 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing Area Working Group S. Litkowski 3 Internet-Draft Orange Business Service 4 Intended status: Standards Track B. Decraene 5 Expires: December 25, 2015 Orange 6 M. Horneffer 7 Deutsche Telekom 8 June 23, 2015 10 Link State protocols SPF trigger and delay algorithm impact on IGP 11 micro-loops 12 draft-ietf-rtgwg-spf-uloop-pb-statement-01 14 Abstract 16 A micro-loop is a packet forwarding loop that may occur transiently 17 among two or more routers in a hop-by-hop packet forwarding paradigm. 19 In this document, we are trying to analyze the impact of using 20 different Link State IGP implementations in a single network in 21 regards of micro-loops. The analysis is focused on the SPF triggers 22 and SPF delay algorithm. 24 Requirements Language 26 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 27 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 28 document are to be interpreted as described in [RFC2119]. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on December 25, 2015. 47 Copyright Notice 49 Copyright (c) 2015 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 2. Problem statement . . . . . . . . . . . . . . . . . . . . . . 3 66 3. SPF trigger strategies . . . . . . . . . . . . . . . . . . . 4 67 4. SPF delay strategies . . . . . . . . . . . . . . . . . . . . 5 68 4.1. Two step SPF delay . . . . . . . . . . . . . . . . . . . 5 69 4.2. Exponential backoff . . . . . . . . . . . . . . . . . . . 6 70 5. Mixing strategies . . . . . . . . . . . . . . . . . . . . . . 7 71 6. Proposed work items . . . . . . . . . . . . . . . . . . . . . 11 72 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 73 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 74 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 75 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 76 10.1. Normative References . . . . . . . . . . . . . . . . . . 13 77 10.2. Informative References . . . . . . . . . . . . . . . . . 13 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 80 1. Introduction 82 Link State IGP protocols are based on a topology database on which a 83 SPF (Shortest Path First) algorithm like Dijkstra is implemented to 84 find the optimal routing paths. 86 Specifications like IS-IS ([RFC1195]) propose some optimization of 87 the route computation (See Appendix C.1) but not all the 88 implementations are following those not mandatory optimizations. 90 We will call SPF trigger, the events that would lead to a new SPF 91 computation based on the topology. 93 Link State IGP protocols, like OSPF ([RFC2328]) and IS-IS 94 ([RFC1195]), are using plenty of timers to control the router 95 behavior in case of churn : SPF delay, PRC delay, LSP generation 96 delay, LSP flooding delay, LSP retransmission interval ... 98 Some of those timers are standardized in protocol specification, some 99 are not especially the SPF computation related timers. 101 For non standardized timers, implementations are free to implement it 102 in any way. For some standardized timer, we can also see that rather 103 than using static configurable values for such timer , 104 implementations may offer dynamically adjusted timers to help 105 controlling the churn. 107 We will call SPF delay, the timer that exists in most implementations 108 that specifies the required delay before running SPF computation 109 after a SPF trigger is received. 111 A micro-loop is a packet forwarding loop that may occur transiently 112 among two or more routers in a hop-by-hop packet forwarding paradigm. 113 We can observe that these micro-loops are formed when two routers do 114 not update their Forwarding Information Base (FIB) for a certain 115 prefix at the same time. The micro-loop phenomenon is described in 116 [I-D.ietf-rtgwg-microloop-analysis]. 118 Some micro-loop mitigation techniques have been defined by IETF (e.g. 119 [RFC6976], [I-D.litkowski-rtgwg-uloop-delay]) but are not implemented 120 due to complexity or are not providing a complete mitigation. 122 In multi vendor networks, using different implementations of a link 123 state protocol may favor micro-loops creation during convergence time 124 due to discrepancies of timers. Service Providers are already aware 125 to use similar timers for all the network as best practice, but 126 sometimes it is not possible due to limitation of implementations. 128 This document will present why it sounds important for service 129 provider to have consistent implementations of Link State protocols 130 across vendors. We are particularly analyzing the impact of using 131 different Link State IGP implementations in a single network in 132 regards of micro-loops. The analysis is focused on the SPF triggers 133 and SPF delay algorithm in a first step. 135 This document is only stating the problem, and defining some work 136 items but its not intended to provide a solution. 138 2. Problem statement 139 A ---- B 140 | | 141 10 | | 10 142 | | 143 C ---- D 144 | 2 | 145 Px Px 147 Figure 1 149 In the figure above, A uses primarily the AC link to reach C. When 150 the AC link fails, IGP convergence occurs. If A converges before B, 151 A will forward traffic to C through B, but as B as not converged yet, 152 B will loop back traffic to A, leading to a micro-loop. 154 The micro-loop appears due to the asynchronous convergence of nodes 155 in a network when a event occurs. 157 Multiple factors (and combination of these factors) may increase the 158 probability for a micro-loop to appear : 160 o delay of failure notification : the more B is advised of the 161 failure later than A, the more a micro-loop may appear. 163 o SPF delay : most of the implementations supports a delay for the 164 SPF computation to try to catch as many events as possible. If A 165 uses a SPF delay timer of x msec and B uses a SPF delay timer of y 166 msec and x < y, B would start converging after A leading to a 167 potential micro-loop. 169 o SPF computation time : mostly a matter of CPU power and 170 optimizations like incremental SPF. If A computes SPF faster than 171 B, there is a chance for a micro-loop to appear. CPUs are today 172 faster enough to consider SPF computation time as negligeable 173 (order of msec in a large network). 175 o RIB and FIB prefix insertion speed or ordering : highly 176 implementation dependant. 178 This document will focus on analysis SPF delay (and associated 179 triggers). 181 3. SPF trigger strategies 183 Depending of the change advertised in LSP/LSA, the topology may be 184 affected or not. An implementation can decide to not run SPF (and 185 only run IP reachability) if the advertised change is not affecting 186 topology. 188 Different strategies exists to trigger SPF : 190 1. Always run full SPF whatever the change to process. 192 2. Run only Full SPF when required : e.g. if a link fails, a local 193 node will run an SPF for its local LSP update. If the LSP from 194 the neighbor (describing the same failure) is received after SPF 195 has started, the local node can decide that a new full SPF is not 196 required as the topology has not change. 198 3. If topology does not change, only recompute reachability. 200 As pointed in Section 1, SPF optimization are not mandatory in 201 specifications, leading to multiple strategies to be implemented. 203 4. SPF delay strategies 205 Implementations of link state routing protocols use different 206 strategies to delay SPF : 208 1. Two steps. 210 2. Exponential backoff. 212 4.1. Two step SPF delay 214 The SPF delay is managed by four parameters : 216 o Rapid delay : amount of time to wait before running SPF. 218 o Rapid runs : amount of consecutive SPF runs that can run using 219 rapid delay. When amount is exceeded router moves to slow delay. 221 o Slow delay : amount of time to wait before running SPF. 223 o Wait time : amount of time to wait without events before going 224 back to rapid delay. 226 Example : Rapid delay = 50msec, Rapid runs = 3, Slow delay = 1sec, 227 Wait time = 2sec 228 SPF delay time 229 ^ 230 | 231 | 232 SD- | x xx x 233 | 234 | 235 | 236 RD- | x x x x 237 | 238 +---------------------------------> Events 239 | | | | || | | 240 < wait time > 242 4.2. Exponential backoff 244 The algorithm has two mode : fast mode and backoff mode. In backoff 245 mode, the SPF delay is increasing exponentially at each run. The SPF 246 delay is managed by four parameters : 248 o First delay : amount of time to wait before running SPF. This 249 delay is used only when SPF is in fast mode. 251 o Incremental delay : amount of time to wait before running SPF. 252 This delay is used only when SPF is in backoff mode and increments 253 exponentially at each SPF run. 255 o Maximum delay : maximum amount of time to wait before running SPF. 257 o Wait time : amount of time to wait without events before going 258 back to fast mode. 260 Example : First delay = 50msec, Incremental delay = 50msec, Maximum 261 delay = 1sec, Wait time = 2sec 262 SPF delay time 263 ^ 264 MD- | xx x 265 | 266 | 267 | 268 | 269 | 270 | x 271 | 272 | 273 | 274 | x 275 | 276 FD- | x x x 277 ID | 278 +---------------------------------> Events 279 | | | | || | | 280 < wait time > 281 FM->BM -------------------->FM 283 5. Mixing strategies 285 S ---- E 286 | | 287 10 | | 10 288 | | 289 D ---- A 290 | 2 291 Px 293 Figure 2 295 In the diagram above, we consider a flow of packet from S to D. We 296 consider that S is using optimized SPF triggering (Full SPF is 297 triggered only when necessary), and two steps SPF delay 298 (rapid=150ms,rapid-runs=3, slow=1s). As implementation of S is 299 optimized, Partial Reachability Computation (PRC) is available. We 300 consider the same timers as SPF for delaying PRC. We consider that E 301 is using a SPF trigger strategy that always compute Full SPF and 302 exponential backoff strategy for SPF delay (start=150ms, inc=150ms, 303 max=1s) 305 We also consider the following sequence of events (note : the 306 timescale does not intend to represent a real router timescale where 307 jitters are introduced to all timers) : 309 o t0=0 ms : a prefix is declared down in the network. We consider 310 this event to happen at time=0. 312 o 200ms : the prefix is declared as up. 314 o 400ms : a prefix is declared down in the network. 316 o 1000ms : S-D link fails. 318 +--------+--------------------+------------------+------------------+ 319 | Time | Network Event | Router S events | Router E events | 320 +--------+--------------------+------------------+------------------+ 321 | t0=0 | Prefix DOWN | | | 322 | 10ms | | Schedule PRC (in | Schedule SPF (in | 323 | | | 150ms) | 150ms) | 324 | | | | | 325 | | | | | 326 | 160ms | | PRC starts | SPF starts | 327 | 161ms | | PRC ends | | 328 | 162ms | | RIB/FIB starts | | 329 | 163ms | | | SPF ends | 330 | 164ms | | | RIB/FIB starts | 331 | 175ms | | RIB/FIB ends | | 332 | 178ms | | | RIB/FIB ends | 333 | | | | | 334 | 200ms | Prefix UP | | | 335 | 212ms | | Schedule PRC (in | | 336 | | | 150ms) | | 337 | 214ms | | | Schedule SPF (in | 338 | | | | 150ms) | 339 | | | | | 340 | | | | | 341 | 370ms | | PRC starts | | 342 | 372ms | | PRC ends | | 343 | 373ms | | | SPF starts | 344 | 373ms | | RIB/FIB starts | | 345 | 375ms | | | SPF ends | 346 | 376ms | | | RIB/FIB starts | 347 | 383ms | | RIB/FIB ends | | 348 | 385ms | | | RIB/FIB ends | 349 | | | | | 350 | 400ms | Prefix DOWN | | | 351 | 410ms | | Schedule PRC (in | Schedule SPF (in | 352 | | | 300ms) | 300ms) | 353 | | | | | 354 | | | | | 355 | | | | | 356 | | | | | 357 | 710ms | | PRC starts | SPF starts | 358 | 711ms | | PRC ends | | 359 | 712ms | | RIB/FIB starts | | 360 | 713ms | | | SPF ends | 361 | 714ms | | | RIB/FIB starts | 362 | 716ms | | RIB/FIB ends | RIB/FIB ends | 363 | | | | | 364 | 1000ms | S-D link DOWN | | | 365 | 1010ms | | Schedule SPF (in | Schedule SPF (in | 366 | | | 150ms) | 600ms) | 367 | | | | | 368 | | | | | 369 | 1160ms | | SPF starts | | 370 | 1161ms | | SPF ends | | 371 | 1162ms | Micro-loop may | RIB/FIB starts | | 372 | | start from here | | | 373 | 1175ms | | RIB/FIB ends | | 374 | | | | | 375 | | | | | 376 | | | | | 377 | | | | | 378 | 1612ms | | | SPF starts | 379 | 1615ms | | | SPF ends | 380 | 1616ms | | | RIB/FIB starts | 381 | 1626ms | Micro-loop ends | | RIB/FIB ends | 382 +--------+--------------------+------------------+------------------+ 384 Route computation event time scale 386 In the table above, we can see that due to discrepancies in SPF 387 management, after multiple events (different types of event), SPF 388 delays are completely misaligned between nodes leading to long micro- 389 loop creation. 391 The same issue can also appear with only single type of events as 392 displayed below : 394 +--------+--------------------+------------------+------------------+ 395 | Time | Network Event | Router S events | Router E events | 396 +--------+--------------------+------------------+------------------+ 397 | t0=0 | Link DOWN | | | 398 | 10ms | | Schedule SPF (in | Schedule SPF (in | 399 | | | 150ms) | 150ms) | 400 | | | | | 401 | | | | | 402 | 160ms | | SPF starts | SPF starts | 403 | 161ms | | SPF ends | | 404 | 162ms | | RIB/FIB starts | | 405 | 163ms | | | SPF ends | 406 | 164ms | | | RIB/FIB starts | 407 | 175ms | | RIB/FIB ends | | 408 | 178ms | | | RIB/FIB ends | 409 | | | | | 410 | 200ms | Link DOWN | | | 411 | 212ms | | Schedule SPF (in | | 412 | | | 150ms) | | 413 | 214ms | | | Schedule SPF (in | 414 | | | | 150ms) | 415 | | | | | 416 | | | | | 417 | 370ms | | SPF starts | | 418 | 372ms | | SPF ends | | 419 | 373ms | | | SPF starts | 420 | 373ms | | RIB/FIB starts | | 421 | 375ms | | | SPF ends | 422 | 376ms | | | RIB/FIB starts | 423 | 383ms | | RIB/FIB ends | | 424 | 385ms | | | RIB/FIB ends | 425 | | | | | 426 | 400ms | Link DOWN | | | 427 | 410ms | | Schedule SPF (in | Schedule SPF (in | 428 | | | 150ms) | 300ms) | 429 | | | | | 430 | | | | | 431 | 560ms | | SPF starts | | 432 | 561ms | | SPF ends | | 433 | 562ms | Micro-loop may | RIB/FIB starts | | 434 | | start from here | | | 435 | 568ms | | RIB/FIB ends | | 436 | | | | | 437 | | | | | 438 | 710ms | | | SPF starts | 439 | 713ms | | | SPF ends | 440 | 714ms | | | RIB/FIB starts | 441 | 716ms | Micro-loop ends | | RIB/FIB ends | 442 | | | | | 443 | 1000ms | Link DOWN | | | 444 | 1010ms | | Schedule SPF (in | Schedule SPF (in | 445 | | | 1s) | 600ms) | 446 | | | | | 447 | | | | | 448 | | | | | 449 | | | | | 450 | 1612ms | | | SPF starts | 451 | 1615ms | | | SPF ends | 452 | 1616ms | Micro-loop may | | RIB/FIB starts | 453 | | start from here | | | 454 | 1626ms | | | RIB/FIB ends | 455 | | | | | 456 | | | | | 457 | | | | | 458 | | | | | 459 | 2012ms | | SPF starts | | 460 | 2014ms | | SPF ends | | 461 | 2015ms | | RIB/FIB starts | | 462 | 2025ms | Micro-loop ends | RIB/FIB ends | | 463 | | | | | 464 | | | | | 465 +--------+--------------------+------------------+------------------+ 467 Route computation event time scale 469 6. Proposed work items 471 In order to enhance the current LinkState IGP behavior, authors would 472 encourage working on standardization of some behaviors. 474 Authors are proposing the following work items : 476 o Standardize SPF trigger strategy. 478 o Standardize computation timer scope : single timer for all 479 computation operations, separated timers ... 481 o Standardize "slowdown" timer algorithm including its association 482 to a particular timer : authors of this document does not presume 483 that the same algorithm must be used for all timers. 485 Using the same event sequence as in figure 2, we may expect fewer 486 and/or shorter micro-loops using standardized implementations. 488 +--------+--------------------+------------------+------------------+ 489 | Time | Network Event | Router S events | Router E events | 490 +--------+--------------------+------------------+------------------+ 491 | t0=0 | Prefix DOWN | | | 492 | 10ms | | Schedule PRC (in | Schedule SPF (in | 493 | | | 150ms) | 150ms) | 494 | | | | | 495 | | | | | 496 | 160ms | | PRC starts | PRC starts | 497 | 161ms | | PRC ends | | 498 | 162ms | | RIB/FIB starts | PRC ends | 499 | 163ms | | | RIB/FIB starts | 500 | 175ms | | RIB/FIB ends | | 501 | 176ms | | | RIB/FIB ends | 502 | | | | | 503 | 200ms | Prefix UP | | | 504 | 212ms | | Schedule PRC (in | | 505 | | | 150ms) | | 506 | 213ms | | | Schedule PRC (in | 507 | | | | 150ms) | 508 | | | | | 509 | | | | | 510 | 370ms | | PRC starts | PRC starts | 511 | 372ms | | PRC ends | | 512 | 373ms | | RIB/FIB starts | PRC ends | 513 | 374ms | | | RIB/FIB starts | 514 | 383ms | | RIB/FIB ends | | 515 | 384ms | | | RIB/FIB ends | 516 | | | | | 517 | 400ms | Prefix DOWN | | | 518 | 410ms | | Schedule PRC (in | Schedule PRC (in | 519 | | | 300ms) | 300ms) | 520 | | | | | 521 | | | | | 522 | | | | | 523 | | | | | 524 | 710ms | | PRC starts | PRC starts | 525 | 711ms | | PRC ends | PRC ends | 526 | 712ms | | RIB/FIB starts | | 527 | 713ms | | | RIB/FIB starts | 528 | 716ms | | RIB/FIB ends | RIB/FIB ends | 529 | | | | | 530 | 1000ms | S-D link DOWN | | | 531 | 1010ms | | Schedule SPF (in | Schedule SPF (in | 532 | | | 150ms) | 150ms) | 533 | | | | | 534 | | | | | 535 | 1160ms | | SPF starts | | 536 | 1161ms | | SPF ends | SPF starts | 537 | 1162ms | Micro-loop may | RIB/FIB starts | SPF ends | 538 | | start from here | | | 539 | 1163ms | | | RIB/FIB starts | 540 | 1175ms | | RIB/FIB ends | | 541 | 1177ms | Micro-loop ends | | RIB/FIB ends | 542 +--------+--------------------+------------------+------------------+ 544 Route computation event time scale 546 As displayed above, there could be some other parameters like router 547 computation power, flooding timers that may also influence micro- 548 loops. In the figure 5, we consider E to be a bit slower than S, 549 leading to micro-loop creation. Despite of this, we expect that by 550 aligning implementations at least on SPF trigger and SPF delay, 551 service provider may reduce number or duration of micro-loops. 553 7. Security Considerations 555 This document does not introduce any security consideration. 557 8. Acknowledgements 559 Authors would like to thank Mike Shand for his useful comments. 561 9. IANA Considerations 563 This document has no action for IANA. 565 10. References 567 10.1. Normative References 569 [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 570 dual environments", RFC 1195, December 1990. 572 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 573 Requirement Levels", BCP 14, RFC 2119, March 1997. 575 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 577 10.2. Informative References 579 [I-D.ietf-rtgwg-microloop-analysis] 580 Zinin, A., "Analysis and Minimization of Microloops in 581 Link-state Routing Protocols", draft-ietf-rtgwg-microloop- 582 analysis-01 (work in progress), October 2005. 584 [I-D.litkowski-rtgwg-uloop-delay] 585 Litkowski, S., Decraene, B., Filsfils, C., and P. 586 Francois, "Microloop prevention by introducing a local 587 convergence delay", draft-litkowski-rtgwg-uloop-delay-03 588 (work in progress), February 2014. 590 [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., 591 Francois, P., and O. Bonaventure, "Framework for Loop-Free 592 Convergence Using the Ordered Forwarding Information Base 593 (oFIB) Approach", RFC 6976, July 2013. 595 Authors' Addresses 597 Stephane Litkowski 598 Orange Business Service 600 Email: stephane.litkowski@orange.com 602 Bruno Decraene 603 Orange 605 Email: bruno.decraene@orange.com 607 Martin Horneffer 608 Deutsche Telekom 610 Email: martin.horneffer@telekom.de