idnits 2.17.1 draft-ietf-rtgwg-spf-uloop-pb-statement-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 15 instances of too long lines in the document, the longest one being 23 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (May 4, 2015) is 3279 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Normative reference to a draft: ref. 'I-D.ietf-rtgwg-microloop-analysis' Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing Area Working Group S. Litkowski 3 Internet-Draft Orange Business Service 4 Intended status: Standards Track May 4, 2015 5 Expires: November 5, 2015 7 Link State protocols SPF trigger and delay algorithm impact on IGP 8 microloops 9 draft-ietf-rtgwg-spf-uloop-pb-statement-00 11 Abstract 13 A micro-loop is a packet forwarding loop that may occur transiently 14 among two or more routers in a hop-by-hop packet forwarding paradigm. 16 In this document, we are trying to analyze the impact of using 17 different Link State IGP implementations in a single network in 18 regards of microloops. The analysis is focused on the SPF triggers 19 and SPF delay algorithm. 21 Requirements Language 23 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 24 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 25 document are to be interpreted as described in [RFC2119]. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on November 5, 2015. 44 Copyright Notice 46 Copyright (c) 2015 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Problem statement . . . . . . . . . . . . . . . . . . . . . . 3 63 3. SPF trigger strategies . . . . . . . . . . . . . . . . . . . 4 64 4. SPF delay strategies . . . . . . . . . . . . . . . . . . . . 5 65 4.1. Two step SPF delay . . . . . . . . . . . . . . . . . . . 5 66 4.2. Exponential backoff . . . . . . . . . . . . . . . . . . . 6 67 5. Mixing strategies . . . . . . . . . . . . . . . . . . . . . . 7 68 6. Proposed work items . . . . . . . . . . . . . . . . . . . . . 11 69 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 70 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 71 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 72 10. Normative References . . . . . . . . . . . . . . . . . . . . 13 73 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13 75 1. Introduction 77 Link State IGP protocols are based on a topology database on which a 78 SPF (Shortest Path First) algorithm like Dijkstra is implemented to 79 find the optimal routing paths. 81 Specifications like IS-IS ([RFC1195]) propose some optimization of 82 the route computation (See Appendix C.1) but not all the 83 implementations are following those not mandatory optimizations. 85 We will call SPF trigger, the events that would lead to a new SPF 86 computation based on the topology. 88 Link State IGP protocols, like OSPF ([RFC2328]) and IS-IS 89 ([RFC1195]), are using plenty of timers to control the router 90 behavior in case of churn : SPF delay, PRC delay, LSP generation 91 delay, LSP flooding delay, LSP retransmission interval ... 93 Some of those timers are standardized in protocol specification, some 94 are not especially the SPF computation related timers. 96 For non standardized timers, implementations are free to implement it 97 in any way. For some standardized timer, we can also see that rather 98 than using static configurable values for such timer , 99 implementations may offer dynamically adjusted timers to help 100 controlling the churn. 102 We will call SPF delay, the delay timer that exists in most 103 implementations that makes codes to wait before running SPF 104 computation after a SPF trigger is received. 106 A micro-loop is a packet forwarding loop that may occur transiently 107 among two or more routers in a hop-by-hop packet forwarding paradigm. 108 We can observe that these micro-loops are formed when two routers do 109 not update their Forwarding Information Base (FIB) for a certain 110 prefix at the same time. The micro-loop phenomenon is described in 111 [I-D.ietf-rtgwg-microloop-analysis]. 113 Routers have more and more powerful controlplane and dataplane that 114 reduce the Control plane to Forwarding plane overhead during the 115 convergence process. Even if FIB update is still reasonably the 116 highest contributor in the convergence time for large network, its 117 duration is reducing more and more and may become comparable to 118 protocol timers. This is particular true in small and medium 119 networks. 121 In multi vendor networks, using different implementations of a link 122 state protocol may favor micro-loops creation during convergence time 123 due to deprecancies of timers. Service Providers are already aware 124 to use similar timers for all the network as best practice, but 125 sometimes it is not possible due to limitation of implementations. 127 This document will present why it sounds important for service 128 provider to have consistent implementations of Link State protocols 129 across vendors. We are particularly analyzing the impact of using 130 different Link State IGP implementations in a single network in 131 regards of microloops. The analysis is focused on the SPF triggers 132 and SPF delay algorithm in a first step. 134 This document is only stating the problem, and defining some work 135 items but its not intented to provide a solution. 137 2. Problem statement 138 A ---- B 139 | | 140 10 | | 10 141 | | 142 C ---- D 143 | 2 | 144 Px Px 146 Figure 1 148 In the figure above, A uses primarily the AC link to reach C. When 149 the AC link fails, IGP convergence occurs. If A converges before B, 150 A will forward traffic to C through B, but as B as not converged yet, 151 B will loop back traffic to A, leading to a microloop. 153 The micro-loop appears due to the asynchronous convergence of nodes 154 in a network when a event occurs. 156 Multiple factors (and combination of these factors) may increase the 157 probability for a micro-loop to appear : 159 o delay of failure notification : the more B is advised of the 160 failure later than A, the more a micro-loop may appear. 162 o SPF delay : most of the implementations supports a delay for the 163 SPF computation to try to catch as many events as possible. If A 164 uses a SPF delay timer of x msec and B uses a SPF delay timer of y 165 msec and x < y, B would start converging after A leading to a 166 potential microloop. 168 o SPF computation time : mostly a matter of CPU power and 169 optimizations like incremental SPF. If A computes SPF faster than 170 B, there is a chance for a microloop to appear. CPUs are today 171 faster enough to consider SPF computation time as negligeable 172 (order of msec in a large network). 174 o RIB and FIB prefix insertion speed or ordering : highly 175 implementation dependant. 177 This document will focus on analysis SPF delay (and associated 178 triggers). 180 3. SPF trigger strategies 182 Depending of the change advertised in LSP/LSA, the topology may be 183 affected or not. An implementation can decide to not run SPF (and 184 only run IP reachability) if the advertised change is not affecting 185 topology. 187 Different strategies exists to trigger SPF : 189 1. Always run full SPF whatever the change to process. 191 2. Run only Full SPF when required : e.g. if a link fails, a local 192 node will run an SPF for its local LSP update. If the LSP from 193 the neighbor (describing the same failure) is received after SPF 194 has started, the local node can decide that a new full SPF is not 195 required as the topology has not change. 197 3. If topology does not change, only recompute reachability. 199 As pointed in Section 1, SPF optimization are not mandatory in 200 specifications, leading to multiple strategies to be implemented. 202 4. SPF delay strategies 204 Implementations of link state routing protocols use different 205 strategies to delay SPF : 207 1. Two steps. 209 2. Exponential backoff. 211 4.1. Two step SPF delay 213 The SPF delay is managed by four parameters : 215 o Rapid delay : amount of time to wait before running SPF. 217 o Rapid runs : amount of consecutive SPF runs that can run using 218 rapid delay. When amount is exceeded router moves to slow delay. 220 o Slow delay : amount of time to wait before running SPF. 222 o Wait time : amount of time to wait without events before going 223 back to rapid delay. 225 Example : Rapid delay = 50msec, Rapid runs = 3, Slow delay = 1sec, 226 Wait time = 2sec 227 SPF delay time 228 ^ 229 | 230 | 231 SD- | x xx x 232 | 233 | 234 | 235 RD- | x x x x 236 | 237 +---------------------------------> Events 238 | | | | || | | 239 < wait time > 241 4.2. Exponential backoff 243 The algorithm has two mode : fast mode and backoff mode. In backoff 244 mode, the SPF delay is increasing exponentially at each run. The SPF 245 delay is managed by four parameters : 247 o First delay : amount of time to wait before running SPF. This 248 delay is used on when SPF is in fast mode. 250 o Incremental delay : amount of time to wait before running SPF. 251 This delay is used on when SPF is in backoff mode and increments 252 exponentially at each SPF run. 254 o Maximum delay : maximum amount of time to wait before running SPF. 256 o Wait time : amount of time to wait without events before going 257 back to fast mode. 259 Example : First delay = 50msec, Incremental delay = 50msec, Maximum 260 delay = 1sec, Wait time = 2sec 261 SPF delay time 262 ^ 263 MD- | xx x 264 | 265 | 266 | 267 | 268 | 269 | x 270 | 271 | 272 | 273 | x 274 | 275 FD- | x x x 276 ID | 277 +---------------------------------> Events 278 | | | | || | | 279 < wait time > 280 FM->BM -------------------->FM 282 5. Mixing strategies 284 S ---- E 285 | | 286 10 | | 10 287 | | 288 D ---- A 289 | 2 290 Px 292 Figure 2 294 In the diagram above, we consider a flow of packet from S to D. We 295 consider that S is using optimized SPF triggering (Full SPF is 296 triggered only when necessary), and two steps SPF delay 297 (rapid=150ms,rapid-runs=3, slow=1s). As implementation of S is 298 optimized, Partial Reachability Computation (PRC) is available. We 299 consider the same timers as SPF for delaying PRC. We consider that E 300 is using a SPF trigger strategy that always compute Full SPF and 301 exponential backoff strategy for SPF delay (start=150ms, inc=150ms, 302 max=1s) 304 We also consider the following sequence of events (note : the 305 timescale does not intend to represent a real router timescale where 306 jitters are introduced to all timers) : 308 o t0 : a prefix is declared down in the network. 310 o t0+200ms : the prefix is declared as up. 312 o t0+400ms : a prefix is declared down in the network. 314 o t0+1000ms : S-D link fails. 316 S timescale E timescale Event timescale 317 | | | 318 | | | <- t0 Event 319 | Schedule PRC (150ms) | Schedule SPF (150ms) | 320 | | | 321 | | | 322 | | | 323 | PRC starts | SPF starts | 324 | PRC ends | | 325 | RIB/FIB starts | SPF ends | 326 | | RIB/FIB starts | 327 | RIB/FIB ends | | 328 | | RIB/FIB ends | t0+180ms 329 | | | 330 | | | < - t0+200ms Event 331 | Schedule PRC (150ms) | Schedule SPF (150ms) | 332 | | | 333 | | | 334 | | | 335 | PRC starts | SPF starts | 336 | PRC ends | | 337 | RIB/FIB starts | SPF ends | 338 | | RIB/FIB starts | 339 | RIB/FIB ends | | 340 | | RIB/FIB ends | t0+380ms 341 | | | < - t0+400ms Event 342 | Schedule PRC (300ms) | Schedule SPF (300ms) | 343 | | | 344 | | | 345 | | | 346 | | | 347 | | | 348 | | | 349 | PRC starts | SPF starts | 350 | PRC ends | | 351 | RIB/FIB starts | SPF ends | 352 | | RIB/FIB starts | 353 | RIB/FIB ends | | 354 | | RIB/FIB ends | t0+730ms 355 | | | 356 | | | 357 | | | 358 | | | 359 | | | < - t0+1000ms Event 360 | Schedule SPF (150ms) | Schedule SPF (600ms) | 361 | | | 362 | | | 363 | SPF starts | | 364 | | | 365 | SPF ends | | 366 | RIB/FIB starts | | 367 | | | } 368 | RIB/FIB ends | | } 369 | | | } 370 | | | } 371 | | | } 372 | | | } 373 | | | } Micro-loop creation 374 | | | } 375 | | SPF starts | } 376 | | | } 377 | | SPF ends | } 378 | | RIB/FIB starts | } 379 | | | } 380 | | RIB/FIB ends | } 382 Figure 3 384 In the figure above, we can see that due to deprecancies in SPF 385 management, after multiple events (different types of event), SPF 386 delays are completely misaligned between nodes leading to long 387 microloop creation. 389 The same issue can also appear with only single type of events as 390 displayed below : 392 S timescale E timescale Event timescale 393 | | | 394 | | | < - t0 Event remote link down 395 | Schedule SPF (150ms) | Schedule SPF (150ms) | 396 | | | 397 | | | 398 | | | 399 | PRC starts | SPF starts | 400 | PRC ends | | 401 | RIB/FIB starts | SPF ends | 402 | | RIB/FIB starts | 403 | RIB/FIB ends | | 404 | | RIB/FIB ends | t0+180ms 405 | | | 406 | | | < - t0+200ms Event remote link down 407 | Schedule SPF (150ms) | Schedule SPF (150ms) | 408 | | | 409 | | | 410 | | | 411 | SPF starts | SPF starts | 412 | SPF ends | | 413 | RIB/FIB starts | SPF ends | 414 | | RIB/FIB starts | 415 | RIB/FIB ends | | 416 | | RIB/FIB ends | t0+380ms 417 | | | < - t0+400ms Event remote link change 418 | Schedule SPF (150ms) | Schedule SPF (300ms) | 419 | | | 420 | | | 421 | SPF starts | | 422 | | | 423 | SPF ends | | 424 | RIB/FIB starts | | 425 | | SPF starts | } 426 | RIB/FIB ends | | } 427 | | SPF ends | } micro-loop creation 428 | | RIB/FIB starts | } 429 | | | } 430 | | RIB/FIB ends | t0+730ms 431 | | | 432 | | | 433 | | | 434 | | | 435 | | | < - t0+1000ms Event 436 | Schedule SPF (1s) | Schedule SPF (600ms) | 437 | | | 438 | | | 439 | | | 440 | | | 441 | | | 442 | | | 443 | | | 444 | | | 445 | | | 446 | | | 447 | | | 448 | | | 449 | | | 450 | | | 451 | | SPF starts | 452 | | | 453 | | SPF ends | 454 | | RIB/FIB starts | 455 | | | } 456 | | RIB/FIB ends | } 457 | | | } 458 | | | } 459 | | | } microloop creation 460 | | | } 461 | | | } 462 | | | } 463 | SPF starts | | } 464 | | | } 465 | SPF ends | | } 466 | RIB/FIB starts | | } 467 | | | } 468 | RIB/FIB ends | | t0 + 2030ms 470 Figure 4 472 6. Proposed work items 474 In order to enhance the current LinkState IGP behavior, authors would 475 encourage working on standardization of some behaviors. 477 Authors are proposing the following work items : 479 o Standardize SPF trigger strategy. 481 o Standardize computation timer scope : single timer for all 482 computation operations, separated timers ... 484 o Standardize "slowdown" timer algorithm including its association 485 to a particular timer : authors of this document does not presume 486 that the same algorithm must be used for all timers. 488 Using the same event sequence as in figure 2, we may expect fewer 489 and/or shorter microloops using standardized implementations. 491 S timescale E timescale Event timescale 492 | | | 493 | | | < - t0 Event 494 | Schedule PRC (150ms) | Schedule PRC (150ms) | 495 | | | 496 | | | 497 | | | 498 | PRC starts | PRC starts | 499 | PRC ends | | 500 | RIB/FIB starts | PRC ends | 501 | | RIB/FIB starts | 502 | RIB/FIB ends | | 503 | | RIB/FIB ends | t0+180ms 504 | | | 505 | | | < - t0+200ms Event 506 | Schedule PRC (150ms) | Schedule PRC (150ms) | 507 | | | 508 | | | 509 | | | 510 | PRC starts | PRC starts | 511 | PRC ends | | 512 | RIB/FIB starts | PRC ends | 513 | | RIB/FIB starts | 514 | RIB/FIB ends | | 515 | | RIB/FIB ends | t0+380ms 516 | | | < - t0+400ms Event 517 | Schedule PRC (300ms) | Schedule PRC (300ms) | 518 | | | 519 | | | 520 | | | 521 | | | 522 | | | 523 | | | 524 | PRC starts | PRC starts | 525 | PRC ends | | 526 | RIB/FIB starts | PRC ends | 527 | | RIB/FIB starts | 528 | RIB/FIB ends | | 529 | | RIB/FIB ends | t0+730ms 530 | | | 531 | | | 532 | | | 533 | | | 534 | | | < - t0+1000ms Event 535 | Schedule SPF (150ms) | Schedule SPF (150ms) | 536 | | | 537 | | | 538 | SPF starts | SPF starts | 539 | | | 540 | SPF ends | | 541 | RIB/FIB starts | SPF ends | 542 | | RIB/FIB starts | } microloop creation 543 | RIB/FIB ends | | } 544 | | RIB/FIB ends | 546 | | | 547 | | | 549 Figure 5 551 As displayed above, there could be some other parameters like router 552 computation power, flooding timers that may also influence 553 microloops. In the figure 5, we consider E to be a bit slower than 554 S, leading to microloop creation. Despite of this, we expect that by 555 aligning implementations at least on SPF trigger and SPF delay, 556 service provider may reduce number or duration of microloops. 558 7. Security Considerations 560 This document does not introduce any security consideration. 562 8. Acknowledgements 564 9. IANA Considerations 566 This document has no action for IANA. 568 10. Normative References 570 [I-D.ietf-rtgwg-microloop-analysis] 571 Zinin, A., "Analysis and Minimization of Microloops in 572 Link-state Routing Protocols", draft-ietf-rtgwg-microloop- 573 analysis-01 (work in progress), October 2005. 575 [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 576 dual environments", RFC 1195, December 1990. 578 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 579 Requirement Levels", BCP 14, RFC 2119, March 1997. 581 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 583 Author's Address 585 Stephane Litkowski 586 Orange Business Service 588 Email: stephane.litkowski@orange.com