idnits 2.17.1 draft-ietf-rtgwg-ordered-fib-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 7, 2012) is 4248 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'AAH' is mentioned on line 855, but not defined == Missing Reference: 'Hold' is mentioned on line 848, but not defined == Missing Reference: 'Q' is mentioned on line 862, but not defined == Missing Reference: 'CC' is mentioned on line 859, but not defined == Missing Reference: 'AAH-hold' is mentioned on line 860, but not defined == Missing Reference: 'IDLE' is mentioned on line 937, but not defined == Missing Reference: 'TX-AAH' is mentioned on line 944, but not defined Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Shand 3 Internet-Draft Individual Contributor 4 Intended status: Informational S. Bryant 5 Expires: March 11, 2013 S. Previdi 6 C. Filsfils 7 Cisco Systems 8 P. Francois 9 Institute IMDEA Networks 10 O. Bonaventure 11 Universite catholique de Louvain 12 September 7, 2012 14 Loop-free convergence using oFIB 15 draft-ietf-rtgwg-ordered-fib-07 17 Abstract 19 This document describes a mechanism for use in conjunction with link 20 state routing protocols which prevents the transient loops which 21 would otherwise occur during topology changes. It does this by 22 correctly sequencing the forwarding information base (FIB) updates on 23 the routers. 25 This mechanism can be used in the case of non-urgent link or node 26 shutdowns and restarts or link metric changes. It can also be used 27 in conjunction with a fast re-route mechanism which converts a sudden 28 link or node failure into a non-urgent topology change. This is 29 possible where a complete repair path is provided for all affected 30 destinations. 32 After a non-urgent topology change, each router computes a rank that 33 defines the time at which it can safely update its FIB. A method for 34 accelerating this loop-free convergence process by the use of 35 completion messages is also described. 37 The technology described in this document has been subject to 38 extensive simulation using real network topologies and costs, and 39 pathological convergence behaviour. 41 Status of this Memo 43 This Internet-Draft is submitted in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF). Note that other groups may also distribute 48 working documents as Internet-Drafts. The list of current Internet- 49 Drafts is at http://datatracker.ietf.org/drafts/current/. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 This Internet-Draft will expire on March 11, 2013. 58 Copyright Notice 60 Copyright (c) 2012 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (http://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1. Conventions used in the document . . . . . . . . . . . . . . . 4 76 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 77 3. The required FIB update order . . . . . . . . . . . . . . . . 5 78 3.1. Single Link Events . . . . . . . . . . . . . . . . . . . . 5 79 3.1.1. Link Down / Metric Increase . . . . . . . . . . . . . 5 80 3.1.2. Link Up / Metric Decrease . . . . . . . . . . . . . . 6 81 3.2. Multi-link events . . . . . . . . . . . . . . . . . . . . 7 82 3.2.1. Router Down events . . . . . . . . . . . . . . . . . . 7 83 3.2.2. Router Up events . . . . . . . . . . . . . . . . . . . 7 84 3.2.3. Linecard Failure/Restoration Events . . . . . . . . . 7 85 4. Applying ordered FIB updates . . . . . . . . . . . . . . . . . 8 86 4.1. Deducing the topology change . . . . . . . . . . . . . . . 8 87 4.2. Deciding if ordered FIB updates applies . . . . . . . . . 8 88 5. Computation of the ordering . . . . . . . . . . . . . . . . . 9 89 5.1. Link or Router Down or Metric Increase . . . . . . . . . . 9 90 5.2. Link or Router Up or Metric Decrease . . . . . . . . . . . 10 91 6. Acceleration of Ordered Convergence . . . . . . . . . . . . . 10 92 6.1. Construction of the waiting list and notification list . . 11 93 6.1.1. Down events . . . . . . . . . . . . . . . . . . . . . 11 94 6.1.2. Up Events . . . . . . . . . . . . . . . . . . . . . . 11 95 6.2. Format of Completion Messages . . . . . . . . . . . . . . 11 96 7. Fall back to Conventional Convergence . . . . . . . . . . . . 12 97 8. oFIB state machine . . . . . . . . . . . . . . . . . . . . . . 12 98 8.1. OFIB_STABLE . . . . . . . . . . . . . . . . . . . . . . . 12 99 8.2. OFIB_HOLDING_DOWN . . . . . . . . . . . . . . . . . . . . 13 100 8.3. OFIB_HOLDING_UP . . . . . . . . . . . . . . . . . . . . . 14 101 8.4. OFIB_ONGOING . . . . . . . . . . . . . . . . . . . . . . . 15 102 8.5. OFIB_ABANDONED . . . . . . . . . . . . . . . . . . . . . . 15 103 9. IANA considerations . . . . . . . . . . . . . . . . . . . . . 16 104 10. Security considerations . . . . . . . . . . . . . . . . . . . 16 105 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 106 12. Informative References . . . . . . . . . . . . . . . . . . . . 16 107 Appendix A. Mechanisms for Safely Abandoning Loop-Free 108 Convergence (AAH) . . . . . . . . . . . . . . . . . . 17 109 A.1. Possible Solutions . . . . . . . . . . . . . . . . . . . . 17 110 A.2. Hold-down timer only . . . . . . . . . . . . . . . . . . . 17 111 A.3. AAH messages . . . . . . . . . . . . . . . . . . . . . . . 18 112 A.3.1. Per Router State Machine . . . . . . . . . . . . . . . 19 113 A.3.2. Per Neighbor State Machine . . . . . . . . . . . . . . 21 114 Appendix B. Synchronisation of Loop Free Timer Values . . . . . . 22 115 B.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 22 116 B.2. Required Properties . . . . . . . . . . . . . . . . . . . 22 117 B.3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 23 118 B.4. Security Considerations . . . . . . . . . . . . . . . . . 24 119 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 121 1. Conventions used in the document 123 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 124 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 125 document are to be interpreted as described in RFC2119 [RFC2119]. 127 2. Introduction 129 With link-state protocols, such as IS-IS [ISO10589] and OSPF 130 [RFC2328], each time the network topology changes, some routers need 131 to modify their forwarding information base (FIB) to take into 132 account the new topology. Each topology change causes a convergence 133 phase. During this phase, routers may transiently have inconsistent 134 FIBs, which may lead to packet loops and losses, even if the 135 reachability of the destinations is not compromised after the 136 topology change. Packet losses and transient loops can also occur in 137 the case of a link down event implied by a maintenance operation, 138 even if this operation is predictable and not urgent. When the link 139 state change is a metric update and when a new link is brought up in 140 the network, there is no direct loss of connectivity, but transient 141 packet loops and loss can still occur. 143 For example, in Figure 1, if the link between X and Y is shut down by 144 an operator, packets destined to X can loop between R and Y when Y 145 has updated its FIB while R has not yet updated its FIB, and packets 146 destined to Y can loop between X and S if X updates its FIB before S. 147 According to the current behaviour of ISIS and OSPF, this scenario 148 will happen most of the time because X and Y are the first routers to 149 be aware of the failure, so that they will update their FIBs first. 151 1 152 X-------------/-------------Y 153 | | 154 | | 155 | | 156 | | 157 1 | | 1 158 | | 159 | | 160 | | 161 | | 162 S---------------------------R 163 2 165 Figure 1: A simple topology 167 It should be noted that the loops can occur remotely from the 168 failure, not just adjacent to it. 170 The goal of this document is to define a mechanism which sequences 171 the router FIB updates to maintain consistency throughout the 172 network. By correctly setting the FIB change order no looping or 173 packet loss can occur. This mechanism may be applied to the case of 174 managed link-state changes, i.e. link metric change, manual link 175 down/up, manual router down/up, and managed state changes of a set of 176 links attached to one router. It may also be applied to the case 177 where one or more network elements are protected by a fast re-route 178 mechanism [RFC5714] [RFC4090]. The mechanisms that are used in the 179 failure case are exactly the same as those used for managed changes. 180 For simplicity this document makes no further distinction between 181 managed and unplanned changes. 183 The technology described in this document has been subject to 184 extensive simulation using real network topologies and costs and 185 pathological convergence behaviour. A variant of the technology 186 described here has been experimentally deployed in a production 187 network. 189 3. The required FIB update order 191 This section provides an overview of the required ordering of the FIB 192 updates. A more detailed analysis of the rerouting dynamics and 193 correctness proofs of the mechanism can be found in [refs.PFOB07]. 195 3.1. Single Link Events 197 For simplicity the correct ordering for single link changes are 198 described first. The document then builds on this to demonstrate 199 that the same principles can be applied to more complex scenarios 200 such as line card or node changes. 202 3.1.1. Link Down / Metric Increase 204 First consider the non-urgent failure of a link or the increase of a 205 link metric. In this case, a router R MUST NOT update its FIB until 206 all other routers that send traffic via R and the affected link have 207 first updated their FIBs. 209 The following argument shows that this rule ensures the correct order 210 of FIB change when the link X->Y is shut down or its metric is 211 increased. 213 An "outdated" FIB entry for a destination is defined as being a FIB 214 entry that still reflects the shortest path(s) in use before the 215 topology change. Once a packet reaches a router R that has an 216 outdated FIB entry for the packet destination, then, provided the 217 oFIB ordering is respected, the packet will continue to X only 218 traversing routers that also have an outdated FIB entry for the 219 destination. The packet thus reaches X without looping and will be 220 forwarded to Y via X->Y (or in the case of FRR, the X->Y repair path) 221 and hence reach its destination. 223 Since it can be assumed that the original topology was loop-free, Y 224 will never use the link Y->X to reach the destination and hence the 225 path(s) between Y and the destination are guaranteed to be unaffected 226 by the topology change. It therefore follows that the packet 227 arriving at Y will reach its destination without looping. 229 Since it can also be assumed that the new topology is loop-free, by 230 definition a packet cannot loop while being forwarded exclusively by 231 routers with an updated FIB entry. 233 In other words, when the oFIB ordering is respected, if a packet 234 reaches an outdated router, it can never subsequently reach an 235 updated router, and cannot loop because from this point on it will 236 only be forwarded on the consistent path that was used before the 237 event. If it does not reach an outdated router, it will only be 238 forwarded on the loop free path that will be used after the 239 convergence. 241 According to the proposed ordering, X will be the last router to 242 update its FIB. Once it has updated its FIB, the link X->Y can 243 actually be shut down (or the repair removed). 245 If the link X-Y is bidirectional a similar process must be run to 246 order the FIB update for destinations using the link in the direction 247 Y->X. As has already been shown, no packet ever traverses the X-Y 248 link in both directions, and hence the operation of the two ordering 249 processes is orthogonal. 251 3.1.2. Link Up / Metric Decrease 253 In the case of link up events or metric decreases, a router R MUST 254 update its FIB BEFORE all other routers that WILL use R to reach the 255 affected link. 257 The following argument shows that this rule ensures the correct order 258 of FIB change when the link X->Y is brought into service or its 259 metric is decreased. 261 Firstly, when a packet reaches a router R that has already updated 262 its FIB, all the routers on the path from R to X will also have 263 updated their FIB, so that the packet will reach X and be forwarded 264 along X->Y, ultimately reaching its destination. 266 Secondly, a packet cannot loop between routers that have not yet 267 updated their FIB. This proves that no packet can loop. 269 3.2. Multi-link events 271 The following sections describe the required ordering for single 272 events which may be manifest as multiple link events. For example, 273 the failure of a router may be notified to the rest of the network as 274 the individual failure of all its attached links. The means of 275 identifying the event type from the collection of received link 276 events is described in Section 4.1. 278 3.2.1. Router Down events 280 In the case of the non-urgent shut-down of a router, a router R MUST 281 NOT update its FIB until all other routers that send traffic via R 282 and the affected router have first updated their FIBs. 284 Using a proof similar to that for link failure, it can be shown that 285 no loops will occur if this ordering is respected [refs.PFOB07]. 287 3.2.2. Router Up events 289 In the case of a router being brought into service, a router R MUST 290 update its FIB BEFORE all other routers that WILL use R to reach the 291 affected router. 293 A proof similar to that for link up, shows that no loops will occur 294 if this ordering is respected [refs.PFOB07]. 296 3.2.3. Linecard Failure/Restoration Events 298 The failure of a line card involves the failure of a set of links all 299 of which have a single node in common, i.e. the parent router. The 300 ordering to be applied is the same as if it were the failure of the 301 parent router. 303 In a similar way, the restoration of an entire linecard to service as 304 a single event can be treated as if the parent router were returning 305 to service. 307 4. Applying ordered FIB updates 309 4.1. Deducing the topology change 311 As has been described, a single event such as the failure or 312 restoration of a single link, single router or a linecard may be 313 notified to the rest of the network as a set of individual link 314 change events. It is necessary to deduce from this collection of 315 link state notifications the type of event that has occurred in the 316 network and hence the required ordering. 318 When a link change event is received which impacts the receiving 319 router's FIB, the routers at the near and far end of the link are 320 noted. 322 If all events received within some hold-down period have a single 323 router (R) in common, then it is assumed that the change reflects an 324 event (line-card or router change) concerning the common router (R). 326 In the case of a link change event, the router at the far end of the 327 link is deemed to be the common router (R). 329 All ordering computations are based on treating the common router R 330 as the root for both link and node events. 332 4.2. Deciding if ordered FIB updates applies 334 There are some events (for example a subsequent failure with 335 conflicting repair requirements occurring before the ordered FIB 336 process has completed) that cannot be correctly processed by this 337 mechanism. In these cases it is necessary to ensure that convergence 338 falls back to the conventional mode of operation (see Section 7). 340 In all cases it is necessary to wait some hold-down period after 341 receiving the first notification to ensure that all routers have 342 received the complete set of link state notifications associated with 343 the single event. 345 At any time, if a link change notification is received which would 346 have no effect on the receiving router's FIB, then it may be ignored. 348 If no other event is received during the hold-down time, the event is 349 treated as a link event. Note that the reverse connectivity check 350 means that only the first failure event, or second up event have an 351 effect on the FIB. 353 If an event is received within the hold down period which does NOT 354 reference the common router (R) then in this version of the 355 specification normal convergence is invoked immediately (see 356 Section 7). 358 The sudden failure of a link or a set of links that are not protected 359 using a FRR mechanism must be processed using the conventional mode 360 of operation. 362 In summary an ordered FIB process is applicable if the set of link 363 state notifications received between the first event and the hold 364 down period reference a common router R, and one of the following 365 assertions is verified : 366 o The set of notifications refer to link down events concerning 367 protected links and metric increase events 368 o The set of notifications refer to link up events and metric 369 decrease events. 371 5. Computation of the ordering 373 This section describes how the required ordering is computed. 375 5.1. Link or Router Down or Metric Increase 377 To respect the proposed ordering, routers compute a rank that will be 378 used to determine the time at which they are permitted to perform 379 their FIB update. In the case of a failure event rooted at router Y 380 or an increase of the metric of link X->Y, router R computes the 381 reverse Shortest Path Tree (rSPT) in the topology before the failure 382 (rSPT_OLD) rooted at Y. This rSPT gives the shortest paths to reach Y 383 before the failure. The branch of the reverse SPT that is below R 384 corresponds to the set of shortest paths to R that are used by the 385 routers that reach Y via R. 387 The rank of router R is defined as the depth (in number of hops) of 388 this branch. In the case of ECMP, the maximum depth of the ECMP path 389 set is used. 391 Router R is required to update its FIB at time 393 T0 + H + rank * MAX_FIB 395 where T0 is the arrival time of the link-state packet containing the 396 topology change, H is the hold-down time and MAX_FIB is a network- 397 wide constant that reflects the maximum time required to update a FIB 398 irrespective of the change required. The value of MAX_FIB is network 399 specific and its determination is out of the scope of this document. 400 This value must be agreed by all the routers in the network. This 401 agreement can be performed by using a capability TLV as defined in 402 Appendix B. 404 All the routers that use R to reach Y will compute a lower rank than 405 R, and hence the correct order will be respected. It should be noted 406 that only the routers that used Y before the event need to compute 407 their rank. 409 5.2. Link or Router Up or Metric Decrease 411 In the case of a link or router up event rooted at Y or a link metric 412 decrease affecting link Y->W, a router R must have a rank that is 413 higher than the rank of the routers that it will use to reach Y, 414 according to the rule described in Section 3. The rank of R is thus 415 the number of hops between R and Y in its renewed Shortest Path Tree. 416 When R has multiple equal cost paths to Y, the rank is the length in 417 hops of the longest ECMP path to Y. 419 Router R is required to update its FIB at time 421 T0 + H + rank * MAX_FIB 423 It should be noted that only the routers that use Y after the event 424 have to compute a rank, i.e. only the routers that have Y in their 425 SPT after the link-state change. 427 6. Acceleration of Ordered Convergence 429 The mechanism described above is conservative, and hence may be 430 relatively slow. The purpose of this section is to describe a method 431 of accelerating the controlled convergence in such a way that ordered 432 loop-free convergence is still guaranteed. 434 In many cases a router will complete its required FIB changes in a 435 time much shorter than MAX_FIB and in many other cases, a router will 436 not have to perform any FIB change at all. 438 This section describes the use of completion messages to speed up the 439 convergence by providing a means for a router to inform those routers 440 waiting for it, that it has completed any required FIB changes. When 441 a router has been advised of completion by all the routers for which 442 it is waiting, it can safely update its own FIB without further 443 delay. In most cases this can result in a sub-second re-convergence 444 time comparable with that of normal convergence. 446 Routers maintain a waiting list of the neighbours from which a 447 completion message must be received. Upon reception of a completion 448 message from a neighbour, a router removes this neighbour from its 449 waiting list. Once its waiting list becomes empty, the router is 450 allowed to update its FIB immediately even if its ranking timer has 451 not yet expired. Once this is done, the router sends a completion 452 message to the neighbours that are waiting for it to complete. Those 453 routers are listed in a list called the Notification List. 454 Completion messages contain an identification of the event to which 455 they refer. 457 Note that, since this is only an optimization, any loss of completion 458 messages will result in the routers waiting their defined ranking 459 time and hence the loop-free properties will be preserved. 461 6.1. Construction of the waiting list and notification list 463 6.1.1. Down events 465 Consider a link or node down event rooted at router Y or the cost 466 increase of the link X->Y. A router R will compute rSPT_OLD(Y) to 467 determine its rank. When doing this, R also computes the set of 468 neighbors that R uses to reach the failing node or link, and the set 469 of neighbors that are using R to reach the failing node or link. The 470 Notification list of R is equal to the former set and the Waiting 471 list of R is equal to the latter. 473 Note that R could include all its neighbors except those in the 474 Waiting list in the Notification list, this has no impact on the 475 correctness of the protocol, but would be unnecessarily inefficient. 477 6.1.2. Up Events 479 Consider a link or node up event rooted at router Y or the cost 480 decrease of the link Y->X. A router R will compute its new SPT 481 (SPT_new(R)). The Waiting list is the set of next hop routers that R 482 uses to reach Y in SPT_new(R). 484 In a simple implementation the notification list of R is all the 485 neighbours of R excluding those in the Waiting list. This may be 486 further optimized by computing rSPT_new(Y) to determine those routers 487 that are waiting for R to complete. 489 6.2. Format of Completion Messages 491 The format of completion messages and means of their delivery is 492 routing protocol dependent and is outside the scope of this document. 494 The following information is required: 496 o Identity of the sender. 497 o List of routing notifications being considered in the associated 498 FIB change. Each notification is defined as : 499 Node ID of the near end of the link 500 Node ID of the far end of the link 501 Old Metric 502 New Metric 504 7. Fall back to Conventional Convergence 506 In circumstances where a router detects that it is dealing with 507 incomplete or inconsistent link state information, or when a further 508 topology event is received before completion of the current ordered 509 FIB update process, it may be expedient to abandon the controlled 510 convergence process. A number of possible fall back mechanisms are 511 described in Appendix A. The state machine defined in the body of 512 this document does not make any assumption about which fall back 513 mechanism will be used. 515 8. oFIB state machine 517 An oFIB capable router maintains an oFIB state value which can be one 518 of : OFIB_STABLE, OFIB_HOLDING_DOWN, OFIB_HOLDING_UP, OFIB_ABANDONED, 519 OFIB_ONGOING. 521 An oFIB capable router maintains a timer, Hold_down_timer. An oFIB 522 capable router is configured with a value referred to as 523 HOLD_DOWN_DURATION. This configuration can be performed manually or 524 using Appendix B. 526 An oFIB capable router maintains a timer, rank_timer. 528 8.1. OFIB_STABLE 530 OFIB_STABLE is the state of a router which is not currently involved 531 in any convergence process. This router is ready to process an event 532 by applying oFIB. 534 EVENT : Reception of a link-state packet describing an event of the 535 type link X--Y down or metric increase to be processed using oFIB. 537 ACTION : Set state to OFIB_HOLDING_DOWN. Start Hold_down_timer. 538 ofib_current_common_set = {X,Y}. Compute rank with respect to the 539 event, as defined in Section 5. Store Waiting List and Notification 540 List for X--Y obtained from the rank computation. 542 EVENT : Reception of a link-state packet describing an event of the 543 type link X--Y up or metric decrease which to be processed using 544 oFIB. 546 ACTION : 548 Set state to OFIB_HOLDING_UP. 549 Start Hold_down_timer. 550 ofib_current_common_set = {X,Y} 551 Compute rank with respect to the event, as defined in section 552 Section 5 . 553 Store Waiting List and Notification List for X--Y obtained from 554 the rank computation. 556 8.2. OFIB_HOLDING_DOWN 558 OFIB_HOLDING_DOWN is the state of a router that is collecting a set 559 of link down or metric increase link-state packets to be processed 560 together using controlled convergence. 562 EVENT : Reception of a link-state packet describing an event of the 563 type link up or metric decrease which in itself can be processed 564 using oFIB. 566 ACTION : 568 Set state to OFIB_ABANDONED. 569 Reset Hold_down_timer. 570 Trigger AAH mechanism 572 EVENT : Reception of a link-state packet describing an event of the 573 type link A--B down or metric increase which in itself can be 574 processed using oFIB. 576 ACTION : 578 ofib_current_common_set = 579 intersection(ofib_current_common_set,{A,B}). 580 If ofib_current_common_set is empty, then there is no longer a 581 node in common in all the pending link-state changes. 582 Set state to OFIB_ABANDONED 583 Reset Hold_down_timer 584 Trigger AAH mechanism. 585 If ofib_current_common set is not empty, update waiting list and 586 notification list as defined in Section 5. Note that in the 587 case of a single link event, the link-state packet received when 588 the router is in this state describes the state change of the 589 other direction of the link, hence no changes will be made to 590 the waiting and notification lists. 592 EVENT : Hold_down_timer expires. 594 ACTION : 596 Set state to OFIB_ONGOING. 597 Start rank_timer with computed rank. 599 EVENT : Reception of a completion message 601 ACTION : Remove the sender from waiting list associated with the 602 event identified in the completion message. 604 8.3. OFIB_HOLDING_UP 606 OFIB_HOLDING_UP is the state of a router that is collecting a set of 607 link up or metric decrease link-state packets to be processed 608 together using controlled convergence. 610 EVENT : Reception of a link-state packet describing an event of the 611 type link down or metric increase to be processed using oFIB. 613 ACTION : 615 Set state to OFIB_ABANDONED. 616 Reset Hold_down_timer. 617 Trigger AAH mechanism. 619 EVENT : Reception of a link-state packet describing an event of the 620 type link A--B up or metric decrease to be processed using oFIB. 622 ACTION : 624 ofib_current_common_set = 625 intersection(ofib_current_common_set,{A,B}). 626 If ofib_current_common_set is empty, then there is no longer a 627 common node in the set of pending link-state changes. 628 Set state to OFIB_ABANDONED. 629 Reset Hold_down_timer. 630 Trigger AAH mechanism. 631 If ofib_current_common set is not empty, update waiting list and 632 notification list as defined in Section 5. Note that in the 633 case of a single link event, the link-state packet received when 634 the router is in this state describes the state change of the 635 other direction of the link, hence no changes will be made to 636 the waiting and notification lists. 638 EVENT : Reception of a completion message 640 ACTION : Remove the sender from the waiting list associated with the 641 event identified in the completion message. 643 EVENT : Hold_down_timer expires. 645 ACTION : 647 Set state to OFIB_ONGOING. 648 Start rank_timer with computed rank. 650 8.4. OFIB_ONGOING 652 OFIB_ONGOING is the state of a router that is applying the ordering 653 mechanism w.r.t. the set of LSP collected when in OFIB_HOLDING_DOWN 654 or OFIB_HOLDING_UP state. 656 EVENT : rank_timer expires or waiting list becomes empty. 658 ACTION : 660 Perform FIB updates according to the change. 661 Send completion message to each member of the notification list. 662 Set State to OFIB_STABLE. 664 EVENT : Reception of a completion message 666 ACTION : Remove the sender from the waiting list. 668 EVENT : Reception of a link-state packet describing a link state 669 change event. 671 ACTION : 673 Set state to OFIB_ABANDONED. 674 Trigger AAH. 675 Start Hold_down_timer. 677 8.5. OFIB_ABANDONED 679 OFIB_ABANDONED is the state of a router that has fallen back to fast 680 convergence due to the reception of link-state packets that cannot be 681 dealt together using oFIB. 683 EVENT : Reception of a link-state packet describing a link-state 684 change event. 686 ACTION : Trigger AAH, reset Hold_down_timer. 688 EVENT : Hold_down_timer expires. 690 ACTION : Set state to OFIB_STABLE 692 9. IANA considerations 694 There are no IANA considerations which arise from this document. Any 695 such considerations will be called out in protocol specific documents 696 defining the modification to any routing protocol that is to be 697 enhanced to support loop-free convergence using ordered FIB. 699 10. Security considerations 701 This document requires only minor modifications to existing routing 702 protocols and therefore does not add significant additional security 703 risks. However a full security analysis would need to be provided 704 within the protocol specific specifications proposed for deployment. 706 11. Acknowledgments 708 We would like to thank Jean-Philippe Vasseur and Les Ginsberg for 709 their useful suggestions and comments. 711 12. Informative References 713 [ISO10589] 714 International Organization for Standardization, 715 "Intermediate system to Intermediate system intra-domain 716 routeing information exchange protocol for use in 717 conjunction with the protocol for providing the 718 connectionless-mode Network Service (ISO 8473)", ISO/ 719 IEC 10589:2002, Second Edition, Nov 2002. 721 [refs.PFOB07] 722 P. Francois and O. Bonaventure, "Avoiding transient loops 723 during IGP convergence in IP Networks", in IEEE/ACM 724 Transactions on Networking, 725 http://inl.info.ucl.ac.be/publications, December 2007. 727 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 728 Requirement Levels", BCP 14, RFC 2119, March 1997. 730 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 732 [RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute 733 Extensions to RSVP-TE for LSP Tunnels", RFC 4090, 734 May 2005. 736 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", 737 RFC 5714, January 2010. 739 [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free 740 Convergence", RFC 5715, January 2010. 742 [I-D.atlas-bryant-shand-lf-timers] 743 K, A. and S. Bryant, "Synchronisation of Loop Free Timer 744 Values", draft-atlas-bryant-shand-lf-timers-04 (work in 745 progress), February 2008. 747 Appendix A. Mechanisms for Safely Abandoning Loop-Free Convergence 748 (AAH) 750 IPFRR[RFC5714] and loop-free convergence techniques [RFC5715] can 751 deal with single topology change events, multiple correlated change 752 events, and in some cases even certain uncorrelated events. However, 753 in all cases there are events which cannot be dealt with and the 754 mechanism needs to quickly revert to normal convergence. This is 755 known as "Abandoning All Hope" (AAH). 757 This appendix describes the outcome of a design study into the AAH 758 problem, and is included here to trigger discussion on the trade-offs 759 between complexity and robustness in the AAH solution-space. 761 A.1. Possible Solutions 763 Two approaches to this problem have been proposed: 765 1. Hold-down timer only. 767 2. Synchronization of AAH state using AAH messages. 769 These are described below. 771 A.2. Hold-down timer only 773 This method uses a hold-down to acquire a set of LSPs which should be 774 processed together. On expiry of the local hold-down timer, the 775 router begins processing the batch of LSPs according to the loop free 776 prevention algorithm. 778 There are a number of problems with this simple approach. In some 779 cases the timer value will be too short to ensure that all the 780 related events have arrived at all routers (perhaps because there was 781 some unexpected propagation delay, or one or more of the events are 782 slow in being detected). In other cases, a completely unrelated 783 event may occur after the timer has expired, but before the 784 processing is complete. In addition, since the timer is started at 785 each router on reception of the first LSP announcing a topology 786 change, the actual starting time is dependant upon the propagation 787 time of the first LSP. So, for a subsequent event occurring around 788 the time of the timer expiry, because of variations in propagation 789 delay it may reach some routers before the timer expires and others 790 after it has expired. In the former case this LSP will be included 791 in the set of changes to be considered, while in the latter it will 792 be excluded leasing to serious routing inconsistency. In such cases 793 continuing to operate the loop-free convergence protocol may 794 exacerbate the situation. 796 The simple approach to this would be to revert to normal convergence 797 (AAH) whenever an LSP is received after the timer has expired. 798 However this also has problems for the reasons above and therefore 799 AAH must be a synchronous operation, i.e. it is necessary to arrange 800 that an AAH invoked anywhere in the network causes ALL routers to 801 AAH. 803 It is also necessary to consider the means of exiting the AAH state. 804 Again the simplest method is to use a timer. However while in AAH 805 state any topology changes previously received, or which are 806 subsequently received, should be processed immediately using the 807 traditional convergence algorithms i.e. without invoking controlled 808 convergence. If the exit from the AAH state is not correctly 809 synchronized, a new event may be processed by some routers 810 immediately (as AAH), while those which have already left AAH state 811 will treat it as the first of a new batch of changes and attempt 812 controlled convergence. Thus both entry and exit from the AAH state 813 needs to be synchronised. A method of achieving this is described in 814 Appendix A.3. 816 A.3. AAH messages 818 Like the simple timer AAH method, this method uses a hold-down to 819 acquire a set of LSPs which should be processed together. On expiry 820 of the local hold-down timer, the router begins processing the batch 821 of LSPs according to the loop free prevention algorithm. This is the 822 same behaviour as the hold-down timer only method. However, if any 823 router, having started the loop-free convergence process receives an 824 LSP which would trigger a topology change, it locally abandons the 825 controlled convergence process, and sends an AAH message to all its 826 neighbors. This eventually triggers all routers to abandon the 827 controlled convergence. The routers remain in AAH state (i.e. 828 processing topology changes using normal "fast" convergence), until a 829 period of quiescence has elapsed. The exit from AAH state is 830 synchronized by using a two step process. To achieve the required 831 synchronization, two additional messages are required, AAH and AAH 832 ACK. The AAH message is reliably exchanged between neighbours using 833 the AAH ACK message. These could be implemented as a new message 834 within the routing protocol or carried in existing routing hello 835 messages. Two types of state machines are needed. A per-router AAH 836 state machine and a per neighbour AAH state machine(PNSM). These are 837 described below. 839 A.3.1. Per Router State Machine 841 Per Router State Table 842 +-------------+-----------+---------+--------+------------+----------+ 843 | EVENT | Q | Hold | CC | AAH | AAH-hold | 844 +=============+===========+=========+========+============+==========+ 845 | RX LSP | Start | - | TX-AAH | Re-start | TX-AAH | 846 | triggering | hold-down | | Start | AAH timer. | Start | 847 | change | timer | | AAH | [AAH] | AAH | 848 | | [Hold] | | timer. | | timer. | 849 | | | | [AAH] | | [AAH] | 850 +-------------+-----------+---------+--------+------------+----------+ 851 | RX AAH | TX-AAH | TX-AAH | TX-AAH | [AAH] | TX-AAH | 852 | (Neighbor's | Start AAH | Start | Start | | Start | 853 | PNSM | timer. | AAH | AAH | | AAH | 854 | processes | [AAH] | timer | timer. | | timer. | 855 | RX AAH.) | | [AAH] | [AAH] | | [AAH] | 856 +-------------+-----------+---------+--------+------------+----------+ 857 | Timer | - | Trigger | - | Start | [Q] | 858 | expiry | | CC. | | AAH-hold | | 859 | | | [CC] | | timer. | | 860 | | | | | [AAH-hold] | | 861 +-------------+-----------+---------+--------+------------+----------+ 862 | Controlled | - | - | [Q] | - | - | 863 | convergence | | | | | | 864 | completed | | | | | | 865 +-------------+-----------+---------+--------+------------+----------+ 866 TX-AAH = Send "goto TX-AAH" to all other PNSMs. 868 Operation of the per-router state machine is as follows: 870 Operation of this state machine under normal topology change involves 871 only states: Quiescent (Q), Hold-down (Hold) and Controlled 872 Convergence (CC). The remaining states are associated with an AAH 873 event. 875 The resting state is Quiescent. When the router in the Quiescent 876 state receives an LSP indicating a topology change, which would 877 normally trigger an SPF, it starts the Hold-down timer and changes 878 state to Hold-down. It normally remains in this state, collecting 879 additional LSPs until the Hold-down timer expires. Note that all 880 routers MUST use a common value for the Hold-down timer. When the 881 Hold-down timer expires the router then enters Controlled Convergence 882 (CC) state and executes the CC mechanism to re-converge the topology. 883 When the CC process has completed on the router, the router re-enters 884 the Quiescent state. 886 If this router receives a topology changing LSP whilst it is in the 887 CC state, it enters AAH state, and sends a "goto TX-AAH" command to 888 all per neighbour state machines which causes each per-neighbour 889 state machine to signal this state change to its neighbour. 890 Alternatively, if this router receives an AAH message from any of its 891 neighbors whilst in any state except AAH, it starts the AAH timer and 892 enters the AAH state. The per neighbor state machine corresponding 893 to the neighbor from which the AAH was received executes the RX AAH 894 action (which causes it to send an AAH ACK), while the remainder are 895 sent the "goto TX-AAH" command. The result is that the AAH is 896 acknowledged to the neighbor from which it was received and 897 propagated to all other neighbors. On entering AAH state, all CC 898 timers are expired and normal convergence takes place. 900 Whilst in the AAH state, LSPs are processed in the traditional 901 manner. Each time an LSP is received, the AAH timer is restarted. 902 In an unstable network ALL routers will remain in this state for some 903 time and the network will behave in the traditional uncontrolled 904 convergence manner. 906 When the AAH timer expires, the router enters AAH-hold state and 907 starts the AAH hold timer. The purpose of the AAH-hold state is to 908 synchronize the transition of the network from AAH to Quiescent. The 909 additional state ensures that the network cannot contain a mixture of 910 routers in both AAH and Quiescent states. If, whilst in AAH-Hold 911 state the router receives a topology changing LSP, it re-enters AAH 912 state and commands all per neighbour state machines to "goto TX-AAH". 913 If, whilst in AAH-Hold state the router receives an AAH message from 914 one of its neighbours, it re-enters the AAH state and commands all 915 other per neighbour state machines to "goto TX-AAH". Note that the 916 per-neighbor state machine receiving the AAH message will 917 autonomously acknowledge receipt of the AAH message. Commanding the 918 per-neighbour state machine to "goto TX-AAH" is necessary, because 919 routers may be in a mixture of Quiescent, Hold-down and AAH-hold 920 state, and it is necessary to rendezvous the entire network back to 921 AAH state. 923 When the AAH Hold timer expires the router changes to state Quiescent 924 and is ready for loop free convergence. 926 A.3.2. Per Neighbor State Machine 928 Per Neighbor State Table 929 +----------------------------+--------------+------------------------+ 930 | EVENT | Idle | TX-AAH | 931 +============================+==============+========================+ 932 | RX AAH | Send ACK. | Send ACK. | 933 | | | Cancel timer. | 934 | | [IDLE] | [IDLE] | 935 +----------------------------+--------------+------------------------+ 936 | RX ACK | ignore | Cancel timer. | 937 | | | [IDLE] | 938 +----------------------------+--------------+------------------------+ 939 | RX "goto TX-AAH" from | Send AAH | ignore | 940 | Router State Machine | [TX-AAH] | | 941 +----------------------------+--------------+------------------------+ 942 | Timer expires | impossible | Send AAH | 943 | | | Restart timer. | 944 | | | [TX-AAH] | 945 +----------------------------+--------------+------------------------+ 947 There is one instance of the per-neighbour (PN) state machine for 948 each neighbour within the convergence control domain. 950 The normal state is IDLE. 952 On command ("goto TX-AAH") from the router state machine, the state 953 machine enters TX-AAH state, transmits an AAH message to its 954 neighbour and starts a timer. 956 On receipt of an AAH ACK in state TX-AAH the state machine cancels 957 the timer and enters IDLE state. 959 In states IDLE, any AAH ACK message received is ignored. 961 On expiry of the timer in state TX-AAH the state machine transmits an 962 AAH message to the neighbour and restarts the timer. (The timer 963 cannot expire in any other state.) 965 In any state, receipt of an AAH causes the state machine to transmit 966 an AAH ACK and enter the IDLE state. 968 Note that for correct operation the state machine MUST remain in 969 state TX-AAH, until an AAH ACK or an AAH is received, or the state 970 machine is deleted. Deletion of the per neighbor state machine 971 occurs when routing determines that the neighbour has gone away, or 972 when the interface goes away. 974 When routing detects a new neighbour it creates a new instance of the 975 per-neighbour state machine in state Idle. The consequent generation 976 of the router's own LSP will then cause the router state machine to 977 execute the LSP receipt actions, which will if necessary result in 978 the new per-neighbour state machine receiving a "goto TX-AAH" command 979 and transitioning to TX-AAH state. 981 Appendix B. Synchronisation of Loop Free Timer Values 983 The Appendix provided the reader with access to the design 984 considerations originally described in 985 [I-D.atlas-bryant-shand-lf-timers]. 987 B.1. Introduction 989 Most of the loop-free convergence mechanisms [RFC5715] require one or 990 more convergence delay timers that MUST have a duration that is 991 consistent throughout the routing domain. This time is the worst 992 case time that any router will take to calculate the new topology, 993 and to make the necessary changes to the FIB. The timer is used by 994 the routers to know when it is safe to transition between the loop- 995 free convergence states. The time taken by a router to complete each 996 phase of the loop-free transition will be dependent on the size of 997 the network and the design and implementation of the router. It can 998 therefore be expected that the optimum delay will need to be tuned 999 from time to time as the network evolves. Manual configuration of 1000 the timer is fraught for two reasons, firstly it is always difficult 1001 to ensure that the correct value is installed in all of the routers, 1002 and secondly, if any change is introduced into the network that 1003 results in a need to change the timer, for example due to a change in 1004 hardware or software version, then all of the routers need to be 1005 reconfigured to use the new timer value. It is therefore desirable 1006 that a means be provided by which the convergence delay timer can be 1007 automatically synchronized throughout the network. 1009 B.2. Required Properties 1011 The timer synchronization mechanism MUST have the following 1012 properties: 1014 o The convergence delay time must be consistent amongst all routers 1015 that are converging on the new topology. 1016 o The convergence delay time must be the highest delay required by 1017 any router in the new topology. 1018 o The mechanism must increase the delay when a new router in 1019 introduced to the network that requires a higher delay than is 1020 currently in use. 1021 o When the router that had the longest delay requirements is removed 1022 from the topology, the convergence delay timer value must, within 1023 some reasonable time, be reduced to the longest delay required by 1024 the remaining routers. 1025 o It must be possible for a router to change the convergence delay 1026 timer value that it requires. 1027 o A router which is in multiple routing areas, or is running 1028 multiple routing protocols may signal a different loop-free 1029 convergence delay for each area, and for each protocol. 1030 How a router determines the time that it needs to execute each 1031 convergence phase is an implementation issue, and outside the scope 1032 of this specification. However a router that dynamically determines 1033 its proposed timer value must do so in such a way that it does not 1034 cause the synchronized value to continually fluctuate. 1036 B.3. Mechanism 1038 The following mechanism is proposed. 1040 A new information element is introduced into the routing protocol 1041 that specifies the maximum time (in milliseconds) that the router 1042 will take to calculate the new topology and to update its FIB as a 1043 result of any topology change. 1045 When a topology change occurs, the largest convergence delay time 1046 required by any router in the new topology is used by the loop-free 1047 convergence mechanism. 1049 If a routing protocol message is issued that changes the convergence 1050 delay timer value, but does not change the topology, the new timer 1051 value MUST be taken into consideration during the next loop-free 1052 transition, but MUST NOT instigate a loop-free transition. 1054 If a routing protocol message is issued that changes the convergence 1055 timer value and changes the topology, a loop-free transition is 1056 instigated and the new timer value is taken into consideration. 1058 The loop-free convergence mechanism should specify the action to be 1059 taken if a timer change (only) message and a topology change message 1060 are independently generated during the hold-off time. A suitable 1061 action would be to take the same action that would be taken if two 1062 uncorrelated topology changes occurred in the network. 1064 All routers that support loop-free convergence MUST advertise a loop- 1065 free convergence delay time. The loop-free convergence mechanism 1066 MUST specify the action to be taken if a router does not advertise a 1067 convergence delay time. 1069 B.4. Security Considerations 1071 If an abnormally large timer value is proposed by a router, the there 1072 is a danger that the loop-free convergence process will take an 1073 excessive time. If during that time the routing protocol signals the 1074 need for another transition, the loop-free transition will be 1075 abandoned and the default best case (traditional) convergence 1076 mechanism used. 1078 It is still undesirable that the routers select a convergence delay 1079 time that has an excessive value. The maximum value that can be 1080 specified in the LSP/LSA is limited through the use of a 16 bit field 1081 to about 65 seconds. When sufficient implementation experience is 1082 gained, an architectural constant will be specified which sets the 1083 upper limit of the convergence delay timer. 1085 Authors' Addresses 1087 Mike Shand 1088 Individual Contributor 1090 Email: imc.shand@googlemail.com 1092 Stewart Bryant 1093 Cisco Systems 1094 Green Park, 250, Longwater Avenue, 1095 Reading RG2 6GB 1096 UK 1098 Email: stbryant@cisco.com 1099 Stefano Previdi 1100 Cisco Systems 1101 Via Del Serafico 200 1102 00142 Roma 1103 Italy 1105 Email: sprevidi@cisco.com 1107 Clarence Filsfils 1108 Cisco Systems 1109 Brussels, 1110 Belgium 1112 Email: cfilsfil@cisco.com 1114 Pierre Francois 1115 Institute IMDEA Networks 1116 Avda. del Mar Mediterraneo, 22 1117 Leganese 28918 1118 ES 1120 Email: pierre.francois@imdea.org 1122 Olivier Bonaventure 1123 Universite catholique de Louvain 1124 Place Ste Barbe, 2 1125 Louvain-la-Neuve 1348 1126 BE 1128 Email: Olivier.Bonaventure@uclouvain.be 1129 URI: http://inl.info.ucl.ac.be/