idnits 2.17.1 draft-ietf-rtgwg-ordered-fib-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 20, 2011) is 4753 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'AAH' on line 849 -- Looks like a reference, but probably isn't: 'Hold' on line 842 -- Looks like a reference, but probably isn't: 'Q' on line 856 -- Looks like a reference, but probably isn't: 'CC' on line 853 -- Looks like a reference, but probably isn't: 'AAH-hold' on line 854 -- Looks like a reference, but probably isn't: 'IDLE' on line 931 -- Looks like a reference, but probably isn't: 'TX-AAH' on line 938 Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Pierre Francois 3 Internet-Draft Olivier Bonaventure 4 Intended status: Experimental Universite catholique de Louvain 5 Expires: October 22, 2011 Mike Shand 6 Stewart Bryant 7 Stefano Previdi 8 Clarence Filsfils 9 Cisco Systems 10 April 20, 2011 12 Loop-free convergence using oFIB 13 draft-ietf-rtgwg-ordered-fib-05 15 Abstract 17 This document describes a mechanism for use in conjunction with link 18 state routing protocols which prevents the transient loops which 19 would otherwise occur during topology changes. It does this by 20 correctly sequencing the FIB updates on the routers. 22 This mechanism can be used in the case of non-urgent link or node 23 shutdowns and restarts or link metric changes. It can also be used 24 in conjunction with a fast re-route mechanism which converts a sudden 25 link or node failure into a non-urgent topology change. This is 26 possible where a complete repair path is provided for all affected 27 destinations. 29 After a non-urgent topology change, each router computes a rank that 30 defines the time at which it can safely update its FIB. A method for 31 accelerating this loop-free convergence process by the use of 32 completion messages is also described. 34 The technology described in this document has been subject to 35 extensive simulation using real network topologies and costs, and 36 pathological convergence behaviour. A variant of the technology 37 described here has been experimentally deployed in a production 38 network. 40 Status of this Memo 42 This Internet-Draft is submitted in full conformance with the 43 provisions of BCP 78 and BCP 79. 45 Internet-Drafts are working documents of the Internet Engineering 46 Task Force (IETF). Note that other groups may also distribute 47 working documents as Internet-Drafts. The list of current Internet- 48 Drafts is at http://datatracker.ietf.org/drafts/current/. 50 Internet-Drafts are draft documents valid for a maximum of six months 51 and may be updated, replaced, or obsoleted by other documents at any 52 time. It is inappropriate to use Internet-Drafts as reference 53 material or to cite them other than as "work in progress." 55 This Internet-Draft will expire on October 22, 2011. 57 Copyright Notice 59 Copyright (c) 2011 IETF Trust and the persons identified as the 60 document authors. All rights reserved. 62 This document is subject to BCP 78 and the IETF Trust's Legal 63 Provisions Relating to IETF Documents 64 (http://trustee.ietf.org/license-info) in effect on the date of 65 publication of this document. Please review these documents 66 carefully, as they describe your rights and restrictions with respect 67 to this document. Code Components extracted from this document must 68 include Simplified BSD License text as described in Section 4.e of 69 the Trust Legal Provisions and are provided without warranty as 70 described in the Simplified BSD License. 72 Table of Contents 74 1. Conventions used in the document . . . . . . . . . . . . . . . 4 75 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 76 3. The required FIB update order . . . . . . . . . . . . . . . . 5 77 3.1. Single Link Events . . . . . . . . . . . . . . . . . . . . 5 78 3.1.1. Link Down / Metric Increase . . . . . . . . . . . . . 5 79 3.1.2. Link Up / Metric Decrease . . . . . . . . . . . . . . 6 80 3.2. Multi-link events . . . . . . . . . . . . . . . . . . . . 7 81 3.2.1. Router Down events . . . . . . . . . . . . . . . . . . 7 82 3.2.2. Router Up events . . . . . . . . . . . . . . . . . . . 7 83 3.2.3. Linecard Failure/Restoration Events . . . . . . . . . 7 84 4. Applying ordered FIB updates . . . . . . . . . . . . . . . . . 8 85 4.1. Deducing the topology change . . . . . . . . . . . . . . . 8 86 4.2. Deciding if ordered FIB updates applies . . . . . . . . . 8 87 5. Computation of the ordering . . . . . . . . . . . . . . . . . 9 88 5.1. Link or Router Down or Metric Increase . . . . . . . . . . 9 89 5.2. Link or Router Up or Metric Decrease . . . . . . . . . . . 10 90 6. Acceleration of Ordered Convergence . . . . . . . . . . . . . 10 91 6.1. Construction of the waiting list and notification list . . 11 92 6.1.1. Down events . . . . . . . . . . . . . . . . . . . . . 11 93 6.1.2. Up Events . . . . . . . . . . . . . . . . . . . . . . 11 94 6.2. Format of Completion Messages . . . . . . . . . . . . . . 11 95 7. Fall back to Conventional Convergence . . . . . . . . . . . . 12 96 8. oFIB state machine . . . . . . . . . . . . . . . . . . . . . . 12 97 8.1. OFIB_STABLE . . . . . . . . . . . . . . . . . . . . . . . 12 98 8.2. OFIB_HOLDING_DOWN . . . . . . . . . . . . . . . . . . . . 13 99 8.3. OFIB_HOLDING_UP . . . . . . . . . . . . . . . . . . . . . 14 100 8.4. OFIB_ONGOING . . . . . . . . . . . . . . . . . . . . . . . 15 101 8.5. OFIB_ABANDONED . . . . . . . . . . . . . . . . . . . . . . 15 102 9. IANA considerations . . . . . . . . . . . . . . . . . . . . . 16 103 10. Security considerations . . . . . . . . . . . . . . . . . . . 16 104 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 105 12. Informative References . . . . . . . . . . . . . . . . . . . . 16 106 Appendix A. Mechanisms for Safely Abandoning Loop-Free 107 Convergence (AAH) . . . . . . . . . . . . . . . . . . 17 108 A.1. Possible Solutions . . . . . . . . . . . . . . . . . . . . 17 109 A.2. Hold-down timer only . . . . . . . . . . . . . . . . . . . 17 110 A.3. AAH messages . . . . . . . . . . . . . . . . . . . . . . . 18 111 A.3.1. Per Router State Machine . . . . . . . . . . . . . . . 19 112 A.3.2. Per Neighbor State Machine . . . . . . . . . . . . . . 21 113 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 115 1. Conventions used in the document 117 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 118 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 119 document are to be interpreted as described in RFC2119 [4]. 121 2. Introduction 123 With link-state protocols, such as IS-IS [1] and OSPF [5], each time 124 the network topology changes, some routers need to modify their 125 Forwarding Information Base (FIB) to take into account the new 126 topology. Each topology change causes a convergence phase. During 127 this phase, routers may transiently have inconsistent FIBs, which may 128 lead to packet loops and losses, even if the reachability of the 129 destinations is not compromised after the topology change. Packet 130 losses and transient loops can also occur in the case of a link down 131 event implied by a maintenance operation, even if this operation is 132 predictable and not urgent. When the link state change is a metric 133 update and when a new link is brought up in the network, there is no 134 direct loss of connectivity, but transient packet loops and loss can 135 still occur. 137 For example, in Figure 1, if the link between X and Y is shut down by 138 an operator, packets destined to X can loop between R and Y when Y 139 has updated its FIB while R has not yet updated its FIB, and packets 140 destined to Y can loop between X and S if X updates its FIB before S. 141 According to the current behaviour of ISIS and OSPF, this scenario 142 will happen most of the time because X and Y are the first routers to 143 be aware of the failure, so that they will update their FIBs first. 145 1 146 X-------------/-------------Y 147 | | 148 | | 149 | | 150 | | 151 1 | | 1 152 | | 153 | | 154 | | 155 | | 156 S---------------------------R 157 2 159 Figure 1: A simple topology 161 It should be noted that the loops can occur remotely from the 162 failure, not just adjacent to it. 164 The goal of this document is to define a mechanism which sequences 165 the router FIB updates to maintain consistency throughout the 166 network. By correctly setting the FIB change order no looping or 167 packet loss can occur. This mechanism may be applied to the case of 168 managed link-state changes, i.e. link metric change, manual link 169 down/up, manual router down/up, and managed state changes of a set of 170 links attached to one router. It may also be applied to the case 171 where one or more network elements are protected by a fast re-route 172 mechanism [7] [6]. The mechanisms that are used in the failure case 173 are exactly the same as those used for managed changes. For 174 simplicity this document makes no further distinction between managed 175 and unplanned changes. 177 The technology described in this document has been subject to 178 extensive simulation using real network topologies and costs and 179 pathological convergence behaviour. A variant of the technology 180 described here has been experimentally deployed in a production 181 network. 183 3. The required FIB update order 185 This section provides an overview of the required ordering of the FIB 186 updates. A more detailed analysis of the rerouting dynamics and 187 correctness proofs of the mechanism can be found in [3]. 189 3.1. Single Link Events 191 For simplicity the correct ordering for single link changes are 192 described first. The document then builds on this to demonstrate 193 that the same principles can be applied to more complex scenarios 194 such as line card or node changes. 196 3.1.1. Link Down / Metric Increase 198 First consider the non-urgent failure of a link or the increase of a 199 link metric. In this case, a router R MUST NOT update its FIB until 200 all other routers that send traffic via R and the affected link have 201 first updated their FIBs. 203 The following argument shows that this rule ensures the correct order 204 of FIB change when the link X->Y is shut down or its metric is 205 increased. 207 An "outdated" FIB entry for a destination is defined as being a FIB 208 entry that still reflects the shortest path(s) in use before the 209 topology change. Once a packet reaches a router R that has an 210 outdated FIB entry for the packet destination, then, provided the 211 oFIB ordering is respected, the packet will continue to X only 212 traversing routers that also have an outdated FIB entry for the 213 destination. The packet thus reaches X without looping and will be 214 forwarded to Y via X->Y (or in the case of FRR, the X->Y repair path) 215 and hence reach its destination. 217 Since it can be assumed that the original topology was loop-free, Y 218 will never use the link Y->X to reach the destination and hence the 219 path(s) between Y and the destination are guaranteed to be unaffected 220 by the topology change. It therefore follows that the packet 221 arriving at Y will reach its destination without looping. 223 Since it can also be assumed that the new topology is loop-free, by 224 definition a packet cannot loop while being forwarded exclusively by 225 routers with an updated FIB entry. 227 In other words, when the oFIB ordering is respected, if a packet 228 reaches an outdated router, it can never subsequently reach an 229 updated router, and cannot loop because from this point on it will 230 only be forwarded on the consistent path that was used before the 231 event. If it does not reach an outdated router, it will only be 232 forwarded on the loop free path that will be used after the 233 convergence. 235 According to the proposed ordering, X will be the last router to 236 update its FIB. Once it has updated its FIB, the link X->Y can 237 actually be shut down (or the repair removed). 239 If the link X-Y is bidirectional a similar process must be run to 240 order the FIB update for destinations using the link in the direction 241 Y->X. As has already been shown, no packet ever traverses the X-Y 242 link in both directions, and hence the operation of the two ordering 243 processes is orthogonal. 245 3.1.2. Link Up / Metric Decrease 247 In the case of link up events or metric decreases, a router R MUST 248 update its FIB BEFORE all other routers that WILL use R to reach the 249 affected link. 251 The following argument shows that this rule ensures the correct order 252 of FIB change when the link X->Y is brought into service or its 253 metric is decreased. 255 Firstly, when a packet reaches a router R that has already updated 256 its FIB, all the routers on the path from R to X will also have 257 updated their FIB, so that the packet will reach X and be forwarded 258 along X->Y, ultimately reaching its destination. 260 Secondly, a packet cannot loop between routers that have not yet 261 updated their FIB. This proves that no packet can loop. 263 3.2. Multi-link events 265 The following sections describe the required ordering for single 266 events which may be manifest as multiple link events. For example, 267 the failure of a router may be notified to the rest of the network as 268 the individual failure of all its attached links. The means of 269 identifying the event type from the collection of received link 270 events is described in Section 4.1. 272 3.2.1. Router Down events 274 In the case of the non-urgent shut-down of a router, a router R MUST 275 NOT update its FIB until all other routers that send traffic via R 276 and the affected router have first updated their FIBs. 278 Using a proof similar to that for link failure, it can be shown that 279 no loops will occur if this ordering is respected [3]. 281 3.2.2. Router Up events 283 In the case of a router being brought into service, a router R MUST 284 update its FIB BEFORE all other routers that WILL use R to reach the 285 affected router. 287 A proof similar to that for link up, shows that no loops will occur 288 if this ordering is respected [3]. 290 3.2.3. Linecard Failure/Restoration Events 292 The failure of a line card involves the failure of a set of links all 293 of which have a single node in common, i.e. the parent router. The 294 ordering to be applied is the same as if it were the failure of the 295 parent router. 297 In a similar way, the restoration of an entire linecard to service as 298 a single event can be treated as if the parent router were returning 299 to service. 301 4. Applying ordered FIB updates 303 4.1. Deducing the topology change 305 As has been described, a single event such as the failure or 306 restoration of a single link, single router or a linecard may be 307 notified to the rest of the network as a set of individual link 308 change events. It is necessary to deduce from this collection of 309 link state notifications the type of event that has occurred in the 310 network and hence the required ordering. 312 When a link change event is received which impacts the receiving 313 router's FIB, the routers at the near and far end of the link are 314 noted. 316 If all events received within some hold-down period have a single 317 router (R) in common, then it is assumed that the change reflects an 318 event (line-card or router change) concerning the common router (R). 320 In the case of a link change event, the router at the far end of the 321 link is deemed to be the common router (R). 323 All ordering computations are based on treating the common router R 324 as the root for both link and node events. 326 4.2. Deciding if ordered FIB updates applies 328 There are some events (for example a subsequent failure with 329 conflicting repair requirements occurring before the ordered FIB 330 process has completed) that cannot be correctly processed by this 331 mechanism. In these cases it is necessary to ensure that convergence 332 falls back to the conventional mode of operation (see Section 7). 334 In all cases it is necessary to wait some hold-down period after 335 receiving the first notification to ensure that all routers have 336 received the complete set of link state notifications associated with 337 the single event. 339 At any time, if a link change notification is received which would 340 have no effect on the receiving router's FIB, then it may be ignored. 342 If no other event is received during the hold-down time, the event is 343 treated as a link event. Note that the reverse connectivity check 344 means that only the first failure event, or second up event have an 345 effect on the FIB. 347 If an event is received within the hold down period which does NOT 348 reference the common router (R) then in this version of the 349 specification normal convergence is invoked immediately (see 350 Section 7). 352 The sudden failure of a link or a set of links that are not protected 353 using a FRR mechanism must be processed using the conventional mode 354 of operation. 356 In summary an ordered FIB process is applicable if the set of link 357 state notifications received between the first event and the hold 358 down period reference a common router R, and one of the following 359 assertions is verified : 360 . The set of notifications refer to link down events concerning 361 protected links and metric increase events 362 . The set of notifications refer to link up events and metric 363 decrease events. 365 5. Computation of the ordering 367 This section describes how the required ordering is computed. 369 5.1. Link or Router Down or Metric Increase 371 To respect the proposed ordering, routers compute a rank that will be 372 used to determine the time at which they are permitted to perform 373 their FIB update. In the case of a failure event rooted at router Y 374 or an increase of the metric of link X->Y, router R computes the 375 reverse Shortest Path Tree (rSPT) in the topology before the failure 376 (rSPT_OLD) rooted at Y. This rSPT gives the shortest paths to reach Y 377 before the failure. The branch of the reverse SPT that is below R 378 corresponds to the set of shortest paths to R that are used by the 379 routers that reach Y via R. 381 The rank of router R is defined as the depth (in number of hops) of 382 this branch. In the case of ECMP, the maximum depth of the ECMP path 383 set is used. 385 Router R is required to update its FIB at time 387 T0 + H + rank * MAX_FIB 389 where T0 is the arrival time of the link-state packet containing the 390 topology change, H is the hold-down time and MAX_FIB is a network- 391 wide constant that reflects the maximum time required to update a FIB 392 irrespective of the change required. The value of MAX_FIB is network 393 specific and its determination is out of the scope of this document. 394 This value must be agreed by all the routers in the network. This 395 agreement can be performed by using a capability TLV as defined in 397 [9]. 399 All the routers that use R to reach Y will compute a lower rank than 400 R, and hence the correct order will be respected. It should be noted 401 that only the routers that used Y before the event need to compute 402 their rank. 404 5.2. Link or Router Up or Metric Decrease 406 In the case of a link or router up event rooted at Y or a link metric 407 decrease affecting link Y->W, a router R must have a rank that is 408 higher than the rank of the routers that it will use to reach Y, 409 according to the rule described in Section 3. The rank of R is thus 410 the number of hops between R and Y in its renewed Shortest Path Tree. 411 When R has multiple equal cost paths to Y, the rank is the length in 412 hops of the longest ECMP path to Y. 414 Router R is required to update its FIB at time 416 T0 + H + rank * MAX_FIB 418 It should be noted that only the routers that use Y after the event 419 have to compute a rank, i.e. only the routers that have Y in their 420 SPT after the link-state change. 422 6. Acceleration of Ordered Convergence 424 The mechanism described above is conservative, and hence may be 425 relatively slow. The purpose of this section is to describe a method 426 of accelerating the controlled convergence in such a way that ordered 427 loop-free convergence is still guaranteed. 429 In many cases a router will complete its required FIB changes in a 430 time much shorter than MAX_FIB and in many other cases, a router will 431 not have to perform any FIB change at all. 433 This section describes the use of completion messages to speed up the 434 convergence by providing a means for a router to inform those routers 435 waiting for it, that it has completed any required FIB changes. When 436 a router has been advised of completion by all the routers for which 437 it is waiting, it can safely update its own FIB without further 438 delay. In most cases this can result in a sub-second re-convergence 439 time comparable with that of normal convergence. 441 Routers maintain a waiting list of the neighbours from which a 442 completion message must be received. Upon reception of a completion 443 message from a neighbour, a router removes this neighbour from its 444 waiting list. Once its waiting list becomes empty, the router is 445 allowed to update its FIB immediately even if its ranking timer has 446 not yet expired. Once this is done, the router sends a completion 447 message to the neighbours that are waiting for it to complete. Those 448 routers are listed in a list called the Notification List. 449 Completion messages contain an identification of the event to which 450 they refer. 452 Note that, since this is only an optimization, any loss of completion 453 messages will result in the routers waiting their defined ranking 454 time and hence the loop-free properties will be preserved. 456 6.1. Construction of the waiting list and notification list 458 6.1.1. Down events 460 Consider a link or node down event rooted at router Y or the cost 461 increase of the link X->Y. A router R will compute rSPT_OLD(Y) to 462 determine its rank. When doing this, R also computes the set of 463 neighbors that R uses to reach the failing node or link, and the set 464 of neighbors that are using R to reach the failing node or link. The 465 Notification list of R is equal to the former set and the Waiting 466 list of R is equal to the latter. 468 Note that R could include all its neighbors except those in the 469 Waiting list in the Notification list, this has no impact on the 470 correctness of the protocol, but would be unnecessarily inefficient. 472 6.1.2. Up Events 474 Consider a link or node up event rooted at router Y or the cost 475 decrease of the link Y->X. A router R will compute its new SPT 476 (SPT_new(R)). The Waiting list is the set of next hop routers that R 477 uses to reach Y in SPT_new(R). 479 In a simple implementation the notification list of R is all the 480 neighbours of R excluding those in the Waiting list. This may be 481 further optimized by computing rSPT_new(Y) to determine those routers 482 that are waiting for R to complete. 484 6.2. Format of Completion Messages 486 The format of completion messages and means of their delivery is 487 routing protocol dependent and is outside the scope of this document. 488 An encoding of completion message for IS-IS is proposed in [2]. 490 The following information is required: 492 . Identity of the sender. 493 . A list of routing notifications being considered in the 494 associated FIB change. Each notification is defined as : 495 . Node ID of the near end of the link 496 . Node ID of the far end of the link 497 . Old Metric 498 . New Metric 500 7. Fall back to Conventional Convergence 502 In circumstances where a router detects that it is dealing with 503 incomplete or inconsistent link state information, or when a further 504 topology event is received before completion of the current ordered 505 FIB update process, it may be expedient to abandon the controlled 506 convergence process. A number of possible fall back mechanisms are 507 described in Appendix A. The state machine defined in the body of 508 this document does not make any assumption about which fall back 509 mechanism will be used. 511 8. oFIB state machine 513 An oFIB capable router maintains an oFIB state value which can be one 514 of : OFIB_STABLE, OFIB_HOLDING_DOWN, OFIB_HOLDING_UP, OFIB_ABANDONED, 515 OFIB_ONGOING. 517 An oFIB capable router maintains a timer, Hold_down_timer. An oFIB 518 capable router is configured with a value referred to as 519 HOLD_DOWN_DURATION. This configuration can be performed manually or 520 using [9]. 522 An oFIB capable router maintains a timer, rank_timer. 524 8.1. OFIB_STABLE 526 OFIB_STABLE is the state of a router which is not currently involved 527 in any convergence process. This router is ready to process an event 528 by applying oFIB. 530 EVENT : Reception of a link-state packet describing an event of the 531 type link X--Y down or metric increase to be processed using oFIB. 533 ACTION : Set state to OFIB_HOLDING_DOWN. Start Hold_down_timer. 534 ofib_current_common_set = {X,Y}. Compute rank with respect to the 535 event, as defined in Section 5. Store Waiting List and Notification 536 List for X--Y obtained from the rank computation. 538 EVENT : Reception of a link-state packet describing an event of the 539 type link X--Y up or metric decrease which to be processed using 540 oFIB. 542 ACTION : 544 Set state to OFIB_HOLDING_UP. 545 Start Hold_down_timer. 546 ofib_current_common_set = {X,Y} 547 Compute rank with respect to the event, as defined in section 548 Section 5 . 549 Store Waiting List and Notification List for X--Y obtained from 550 the rank computation. 552 8.2. OFIB_HOLDING_DOWN 554 OFIB_HOLDING_DOWN is the state of a router that is collecting a set 555 of link down or metric increase link-state packets to be processed 556 together using controlled convergence. 558 EVENT : Reception of a link-state packet describing an event of the 559 type link up or metric decrease which in itself can be processed 560 using oFIB. 562 ACTION : 564 Set state to OFIB_ABANDONED. 565 Reset Hold_down_timer. 566 Trigger AAH mechanism 568 EVENT : Reception of a link-state packet describing an event of the 569 type link A--B down or metric increase which in itself can be 570 processed using oFIB. 572 ACTION : 574 ofib_current_common_set = 575 intersection(ofib_current_common_set,{A,B}). 576 If ofib_current_common_set is empty, then there is no longer a 577 node in common in all the pending link-state changes. 578 Set state to OFIB_ABANDONED 579 Reset Hold_down_timer 580 Trigger AAH mechanism. 581 If ofib_current_common set is not empty, update waiting list and 582 notification list as defined in Section 5. Note that in the 583 case of a single link event, the link-state packet received when 584 the router is in this state describes the state change of the 585 other direction of the link, hence no changes will be made to 586 the waiting and notification lists. 588 EVENT : Hold_down_timer expires. 590 ACTION : 592 Set state to OFIB_ONGOING. 593 Start rank_timer with computed rank. 595 EVENT : Reception of a completion message 597 ACTION : Remove the sender from waiting list associated with the 598 event identified in the completion message. 600 8.3. OFIB_HOLDING_UP 602 OFIB_HOLDING_UP is the state of a router that is collecting a set of 603 link up or metric decrease link-state packets to be processed 604 together using controlled convergence. 606 EVENT : Reception of a link-state packet describing an event of the 607 type link down or metric increase to be processed using oFIB. 609 ACTION : 611 Set state to OFIB_ABANDONED. 612 Reset Hold_down_timer. 613 Trigger AAH mechanism. 615 EVENT : Reception of a link-state packet describing an event of the 616 type link A--B up or metric decrease to be processed using oFIB. 618 ACTION : 620 ofib_current_common_set = 621 intersection(ofib_current_common_set,{A,B}). 622 If ofib_current_common_set is empty, then there is no longer a 623 common node in the set of pending link-state changes. 624 Set state to OFIB_ABANDONED. 625 Reset Hold_down_timer. 626 Trigger AAH mechanism. 627 If ofib_current_common set is not empty, update waiting list and 628 notification list as defined in Section 5. Note that in the 629 case of a single link event, the link-state packet received when 630 the router is in this state describes the state change of the 631 other direction of the link, hence no changes will be made to 632 the waiting and notification lists. 634 EVENT : Reception of a completion message 636 ACTION : Remove the sender from the waiting list associated with the 637 event identified in the completion message. 639 EVENT : Hold_down_timer expires. 641 ACTION : 643 Set state to OFIB_ONGOING. 644 Start rank_timer with computed rank. 646 8.4. OFIB_ONGOING 648 OFIB_ONGOING is the state of a router that is applying the ordering 649 mechanism w.r.t. the set of LSP collected when in OFIB_HOLDING_DOWN 650 or OFIB_HOLDING_UP state. 652 EVENT : rank_timer expires or waiting list becomes empty. 654 ACTION : 656 Perform FIB updates according to the change. 657 Send completion message to each member of the notification list. 658 Set State to OFIB_STABLE. 660 EVENT : Reception of a completion message 662 ACTION : Remove the sender from the waiting list. 664 EVENT : Reception of a link-state packet describing a link state 665 change event. 667 ACTION : 669 Set state to OFIB_ABANDONED. 670 Trigger AAH. 671 Start Hold_down_timer. 673 8.5. OFIB_ABANDONED 675 OFIB_ABANDONED is the state of a router that has fallen back to fast 676 convergence due to the reception of link-state packets that cannot be 677 dealt together using oFIB. 679 EVENT : Reception of a link-state packet describing a link-state 680 change event. 682 ACTION : Trigger AAH, reset Hold_down_timer. 684 EVENT : Hold_down_timer expires. 686 ACTION : Set state to OFIB_STABLE 688 9. IANA considerations 690 There are no IANA considerations which arise from this document. Any 691 such considerations will be called out in protocol specific documents 692 such as [9]and [2] 694 10. Security considerations 696 This document requires only minor modifications to existing routing 697 protocols and therefore does not add significant additional security 698 risks. However a full security analysis would need to be provided 699 within the protocol specific specifications proposed for deployment. 701 11. Acknowledgments 703 We would like to thank Jean-Philippe Vasseur and Les Ginsberg for 704 their useful suggestions and comments. 706 12. Informative References 708 [1] International Organization for Standardization, "Intermediate 709 system to Intermediate system intra-domain routeing information 710 exchange protocol for use in conjunction with the protocol for 711 providing the connectionless-mode Network Service (ISO 8473)", 712 ISO/IEC 10589:2002, Second Edition, Nov 2002. 714 [2] Bonaventure, O., "ISIS extensions for ordered FIB updates", 715 draft-bonaventure-isis-ordered-00 (work in progress), 716 February 2006. 718 [3] P. Francois and O. Bonaventure, "Avoiding transient loops during 719 IGP convergence in IP Networks", in IEEE/ACM Transactions on 720 Networking, http://inl.info.ucl.ac.be/publications, 721 December 2007. 723 [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement 724 Levels", BCP 14, RFC 2119, March 1997. 726 [5] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 728 [6] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute Extensions to 729 RSVP-TE for LSP Tunnels", RFC 4090, May 2005. 731 [7] Shand, M. and S. Bryant, "IP Fast Reroute Framework", RFC 5714, 732 January 2010. 734 [8] Shand, M. and S. Bryant, "A Framework for Loop-Free 735 Convergence", RFC 5715, January 2010. 737 [9] K, A. and S. Bryant, "Synchronisation of Loop Free Timer 738 Values", draft-atlas-bryant-shand-lf-timers-04 (work in 739 progress), February 2008. 741 Appendix A. Mechanisms for Safely Abandoning Loop-Free Convergence 742 (AAH) 744 IPFRR[7] and loop-free convergence techniques [8] can deal with 745 single topology change events, multiple correlated change events, and 746 in some cases even certain uncorrelated events. However, in all 747 cases there are events which cannot be dealt with and the mechanism 748 needs to quickly revert to normal convergence. This is known as 749 "Abandoning All Hope" (AAH). 751 This appendix describes the outcome of a design study into the AAH 752 problem, and is included here to trigger discussion on the trade-offs 753 between complexity and robustness in the AAH solution-space. 755 A.1. Possible Solutions 757 Two approaches to this problem have been proposed: 759 1. Hold-down timer only. 761 2. Synchronization of AAH state using AAH messages. 763 These are described below. 765 A.2. Hold-down timer only 767 This method uses a hold-down to acquire a set of LSPs which should be 768 processed together. On expiry of the local hold-down timer, the 769 router begins processing the batch of LSPs according to the loop free 770 prevention algorithm. 772 There are a number of problems with this simple approach. In some 773 cases the timer value will be too short to ensure that all the 774 related events have arrived at all routers (perhaps because there was 775 some unexpected propagation delay, or one or more of the events are 776 slow in being detected). In other cases, a completely unrelated 777 event may occur after the timer has expired, but before the 778 processing is complete. In addition, since the timer is started at 779 each router on reception of the first LSP announcing a topology 780 change, the actual starting time is dependant upon the propagation 781 time of the first LSP. So, for a subsequent event occurring around 782 the time of the timer expiry, because of variations in propagation 783 delay it may reach some routers before the timer expires and others 784 after it has expired. In the former case this LSP will be included 785 in the set of changes to be considered, while in the latter it will 786 be excluded leasing to serious routing inconsistency. In such cases 787 continuing to operate the loop-free convergence protocol may 788 exacerbate the situation. 790 The simple approach to this would be to revert to normal convergence 791 (AAH) whenever an LSP is received after the timer has expired. 792 However this also has problems for the reasons above and therefore 793 AAH must be a synchronous operation, i.e. it is necessary to arrange 794 that an AAH invoked anywhere in the network causes ALL routers to 795 AAH. 797 It is also necessary to consider the means of exiting the AAH state. 798 Again the simplest method is to use a timer. However while in AAH 799 state any topology changes previously received, or which are 800 subsequently received, should be processed immediately using the 801 traditional convergence algorithms i.e. without invoking controlled 802 convergence. If the exit from the AAH state is not correctly 803 synchronized, a new event may be processed by some routers 804 immediately (as AAH), while those which have already left AAH state 805 will treat it as the first of a new batch of changes and attempt 806 controlled convergence. Thus both entry and exit from the AAH state 807 needs to be synchronised. A method of achieving this is described in 808 Appendix A.3. 810 A.3. AAH messages 812 Like the simple timer AAH method, this method uses a hold-down to 813 acquire a set of LSPs which should be processed together. On expiry 814 of the local hold-down timer, the router begins processing the batch 815 of LSPs according to the loop free prevention algorithm. This is the 816 same behaviour as the hold-down timer only method. However, if any 817 router, having started the loop-free convergence process receives an 818 LSP which would trigger a topology change, it locally abandons the 819 controlled convergence process, and sends an AAH message to all its 820 neighbors. This eventually triggers all routers to abandon the 821 controlled convergence. The routers remain in AAH state (i.e. 822 processing topology changes using normal "fast" convergence), until a 823 period of quiescence has elapsed. The exit from AAH state is 824 synchronized by using a two step process. To achieve the required 825 synchronization, two additional messages are required, AAH and AAH 826 ACK. The AAH message is reliably exchanged between neighbours using 827 the AAH ACK message. These could be implemented as a new message 828 within the routing protocol or carried in existing routing hello 829 messages. Two types of state machines are needed. A per-router AAH 830 state machine and a per neighbour AAH state machine(PNSM). These are 831 described below. 833 A.3.1. Per Router State Machine 835 Per Router State Table 836 +-------------+-----------+---------+--------+------------+----------+ 837 | EVENT | Q | Hold | CC | AAH | AAH-hold | 838 +=============+===========+=========+========+============+==========+ 839 | RX LSP | Start | - | TX-AAH | Re-start | TX-AAH | 840 | triggering | hold-down | | Start | AAH timer. | Start | 841 | change | timer | | AAH | [AAH] | AAH | 842 | | [Hold] | | timer. | | timer. | 843 | | | | [AAH] | | [AAH] | 844 +-------------+-----------+---------+--------+------------+----------+ 845 | RX AAH | TX-AAH | TX-AAH | TX-AAH | [AAH] | TX-AAH | 846 | (Neighbor's | Start AAH | Start | Start | | Start | 847 | PNSM | timer. | AAH | AAH | | AAH | 848 | processes | [AAH] | timer | timer. | | timer. | 849 | RX AAH.) | | [AAH] | [AAH] | | [AAH] | 850 +-------------+-----------+---------+--------+------------+----------+ 851 | Timer | - | Trigger | - | Start | [Q] | 852 | expiry | | CC. | | AAH-hold | | 853 | | | [CC] | | timer. | | 854 | | | | | [AAH-hold] | | 855 +-------------+-----------+---------+--------+------------+----------+ 856 | Controlled | - | - | [Q] | - | - | 857 | convergence | | | | | | 858 | completed | | | | | | 859 +-------------+-----------+---------+--------+------------+----------+ 860 TX-AAH = Send "goto TX-AAH" to all other PNSMs. 862 Operation of the per-router state machine is as follows: 864 Operation of this state machine under normal topology change involves 865 only states: Quiescent (Q), Hold-down (Hold) and Controlled 866 Convergence (CC). The remaining states are associated with an AAH 867 event. 869 The resting state is Quiescent. When the router in the Quiescent 870 state receives an LSP indicating a topology change, which would 871 normally trigger an SPF, it starts the Hold-down timer and changes 872 state to Hold-down. It normally remains in this state, collecting 873 additional LSPs until the Hold-down timer expires. Note that all 874 routers MUST use a common value for the Hold-down timer. When the 875 Hold-down timer expires the router then enters Controlled Convergence 876 (CC) state and executes the CC mechanism to re-converge the topology. 877 When the CC process has completed on the router, the router re-enters 878 the Quiescent state. 880 If this router receives a topology changing LSP whilst it is in the 881 CC state, it enters AAH state, and sends a "goto TX-AAH" command to 882 all per neighbour state machines which causes each per-neighbour 883 state machine to signal this state change to its neighbour. 884 Alternatively, if this router receives an AAH message from any of its 885 neighbors whilst in any state except AAH, it starts the AAH timer and 886 enters the AAH state. The per neighbor state machine corresponding 887 to the neighbor from which the AAH was received executes the RX AAH 888 action (which causes it to send an AAH ACK), while the remainder are 889 sent the "goto TX-AAH" command. The result is that the AAH is 890 acknowledged to the neighbor from which it was received and 891 propagated to all other neighbors. On entering AAH state, all CC 892 timers are expired and normal convergence takes place. 894 Whilst in the AAH state, LSPs are processed in the traditional 895 manner. Each time an LSP is received, the AAH timer is restarted. 896 In an unstable network ALL routers will remain in this state for some 897 time and the network will behave in the traditional uncontrolled 898 convergence manner. 900 When the AAH timer expires, the router enters AAH-hold state and 901 starts the AAH hold timer. The purpose of the AAH-hold state is to 902 synchronize the transition of the network from AAH to Quiescent. The 903 additional state ensures that the network cannot contain a mixture of 904 routers in both AAH and Quiescent states. If, whilst in AAH-Hold 905 state the router receives a topology changing LSP, it re-enters AAH 906 state and commands all per neighbour state machines to "goto TX-AAH". 907 If, whilst in AAH-Hold state the router receives an AAH message from 908 one of its neighbours, it re-enters the AAH state and commands all 909 other per neighbour state machines to "goto TX-AAH". Note that the 910 per-neighbor state machine receiving the AAH message will 911 autonomously acknowledge receipt of the AAH message. Commanding the 912 per-neighbour state machine to "goto TX-AAH" is necessary, because 913 routers may be in a mixture of Quiescent, Hold-down and AAH-hold 914 state, and it is necessary to rendezvous the entire network back to 915 AAH state. 917 When the AAH Hold timer expires the router changes to state Quiescent 918 and is ready for loop free convergence. 920 A.3.2. Per Neighbor State Machine 922 Per Neighbor State Table 923 +----------------------------+--------------+------------------------+ 924 | EVENT | Idle | TX-AAH | 925 +============================+==============+========================+ 926 | RX AAH | Send ACK. | Send ACK. | 927 | | | Cancel timer. | 928 | | [IDLE] | [IDLE] | 929 +----------------------------+--------------+------------------------+ 930 | RX ACK | ignore | Cancel timer. | 931 | | | [IDLE] | 932 +----------------------------+--------------+------------------------+ 933 | RX "goto TX-AAH" from | Send AAH | ignore | 934 | Router State Machine | [TX-AAH] | | 935 +----------------------------+--------------+------------------------+ 936 | Timer expires | impossible | Send AAH | 937 | | | Restart timer. | 938 | | | [TX-AAH] | 939 +----------------------------+--------------+------------------------+ 941 There is one instance of the per-neighbour (PN) state machine for 942 each neighbour within the convergence control domain. 944 The normal state is IDLE. 946 On command ("goto TX-AAH") from the router state machine, the state 947 machine enters TX-AAH state, transmits an AAH message to its 948 neighbour and starts a timer. 950 On receipt of an AAH ACK in state TX-AAH the state machine cancels 951 the timer and enters IDLE state. 953 In states IDLE, any AAH ACK message received is ignored. 955 On expiry of the timer in state TX-AAH the state machine transmits an 956 AAH message to the neighbour and restarts the timer. (The timer 957 cannot expire in any other state.) 959 In any state, receipt of an AAH causes the state machine to transmit 960 an AAH ACK and enter the IDLE state. 962 Note that for correct operation the state machine MUST remain in 963 state TX-AAH, until an AAH ACK or an AAH is received, or the state 964 machine is deleted. Deletion of the per neighbor state machine 965 occurs when routing determines that the neighbour has gone away, or 966 when the interface goes away. 968 When routing detects a new neighbour it creates a new instance of the 969 per-neighbour state machine in state Idle. The consequent generation 970 of the router's own LSP will then cause the router state machine to 971 execute the LSP receipt actions, which will if necessary result in 972 the new per-neighbour state machine receiving a "goto TX-AAH" command 973 and transitioning to TX-AAH state. 975 Authors' Addresses 977 Pierre Francois 978 Universite catholique de Louvain 979 Place Ste Barbe, 2 980 Louvain-la-Neuve 1348 981 BE 983 URI: http://inl.info.ucl.ac.be/ 985 Olivier Bonaventure 986 Universite catholique de Louvain 987 Place Ste Barbe, 2 988 Louvain-la-Neuve 1348 989 BE 991 URI: http://inl.info.ucl.ac.be/ 993 Mike Shand 994 Cisco Systems 995 Green Park, 250, Longwater Avenue, 996 Reading RG2 6GB 997 UK 999 Email: mshand@cisco.com 1001 Stewart Bryant 1002 Cisco Systems 1003 Green Park, 250, Longwater Avenue, 1004 Reading RG2 6GB 1005 UK 1007 Email: stbryant@cisco.com 1008 Stefano Previdi 1009 Cisco Systems 1010 Via Del Serafico 200 1011 00142 Roma 1012 Italy 1014 Email: sprevidi@cisco.com 1016 Clarence Filsfils 1017 Cisco Systems 1018 Brussels, 1019 Belgium 1021 Email: cfilsfil@cisco.com