idnits 2.17.1 draft-ietf-rtgwg-ordered-fib-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 24, 2013) is 3989 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'AAH' is mentioned on line 960, but not defined == Missing Reference: 'Hold' is mentioned on line 953, but not defined == Missing Reference: 'Q' is mentioned on line 967, but not defined == Missing Reference: 'CC' is mentioned on line 964, but not defined == Missing Reference: 'AAH-hold' is mentioned on line 965, but not defined == Missing Reference: 'IDLE' is mentioned on line 1043, but not defined == Missing Reference: 'TX-AAH' is mentioned on line 1050, but not defined Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Shand 3 Internet-Draft Individual Contributor 4 Intended status: Informational S. Bryant 5 Expires: November 25, 2013 S. Previdi 6 C. Filsfils 7 Cisco Systems 8 P. Francois 9 Institute IMDEA Networks 10 O. Bonaventure 11 Universite catholique de Louvain 12 May 24, 2013 14 Framework for Loop-free convergence using oFIB 15 draft-ietf-rtgwg-ordered-fib-12 17 Abstract 19 This document describes an illustrative framework of a mechanism for 20 use in conjunction with link state routing protocols which prevents 21 the transient loops which would otherwise occur during topology 22 changes. It does this by correctly sequencing the forwarding 23 information base (FIB) updates on the routers. 25 This mechanism can be used in the case of non-urgent (management 26 action) link or node shutdowns and restarts or link metric changes. 27 It can also be used in conjunction with a fast re-route mechanism 28 which converts a sudden link or node failure into a non-urgent 29 topology change. This is possible where a complete repair path is 30 provided for all affected destinations. 32 After a non-urgent topology change, each router computes a rank that 33 defines the time at which it can safely update its FIB. A method for 34 accelerating this loop-free convergence process by the use of 35 completion messages is also described. 37 The technology described in this document has been subject to 38 extensive simulation using real network topologies and costs, and 39 pathological convergence behaviour. However the mechanism described 40 in this document are purely illustrative of the general approach and 41 do not constitute a protocol specification. The document represents 42 a snapshot of the work of the Routing Area Working Group at the time 43 of publication and is published as a document of record. Further 44 work is needed before implementation or deployment. 46 Status of This Memo 47 This Internet-Draft is submitted in full conformance with the 48 provisions of BCP 78 and BCP 79. 50 Internet-Drafts are working documents of the Internet Engineering 51 Task Force (IETF). Note that other groups may also distribute 52 working documents as Internet-Drafts. The list of current Internet- 53 Drafts is at http://datatracker.ietf.org/drafts/current/. 55 Internet-Drafts are draft documents valid for a maximum of six months 56 and may be updated, replaced, or obsoleted by other documents at any 57 time. It is inappropriate to use Internet-Drafts as reference 58 material or to cite them other than as "work in progress." 60 This Internet-Draft will expire on November 25, 2013. 62 Copyright Notice 64 Copyright (c) 2013 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents 69 (http://trustee.ietf.org/license-info) in effect on the date of 70 publication of this document. Please review these documents 71 carefully, as they describe your rights and restrictions with respect 72 to this document. Code Components extracted from this document must 73 include Simplified BSD License text as described in Section 4.e of 74 the Trust Legal Provisions and are provided without warranty as 75 described in the Simplified BSD License. 77 Table of Contents 79 1. The Purpose of this Document . . . . . . . . . . . . . . . . 3 80 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 81 3. The required FIB update order . . . . . . . . . . . . . . . . 5 82 3.1. Single Link Events . . . . . . . . . . . . . . . . . . . 5 83 3.1.1. Link Down / Metric Increase . . . . . . . . . . . . . 5 84 3.1.2. Link Up / Metric Decrease . . . . . . . . . . . . . . 7 85 3.2. Multi-link events . . . . . . . . . . . . . . . . . . . . 7 86 3.2.1. Router Down events . . . . . . . . . . . . . . . . . 7 87 3.2.2. Router Up events . . . . . . . . . . . . . . . . . . 7 88 3.2.3. Linecard Failure/Restoration Events . . . . . . . . . 7 89 4. Applying ordered FIB updates . . . . . . . . . . . . . . . . 8 90 4.1. Deducing the topology change . . . . . . . . . . . . . . 8 91 4.2. Deciding if ordered FIB updates applies . . . . . . . . . 8 92 5. Computation of the ordering . . . . . . . . . . . . . . . . . 9 93 5.1. Link or Router Down or Metric Increase . . . . . . . . . 9 94 5.2. Link or Router Up or Metric Decrease . . . . . . . . . . 10 96 6. Acceleration of Ordered Convergence . . . . . . . . . . . . . 10 97 6.1. Construction of the waiting list and notification list . 11 98 6.1.1. Down events . . . . . . . . . . . . . . . . . . . . . 11 99 6.1.2. Up Events . . . . . . . . . . . . . . . . . . . . . . 11 100 6.2. Format of Completion Messages . . . . . . . . . . . . . . 12 101 7. Fall back to Conventional Convergence . . . . . . . . . . . . 12 102 8. oFIB state machine . . . . . . . . . . . . . . . . . . . . . 12 103 8.1. OFIB_STABLE . . . . . . . . . . . . . . . . . . . . . . . 13 104 8.2. OFIB_HOLDING_DOWN . . . . . . . . . . . . . . . . . . . . 14 105 8.3. OFIB_HOLDING_UP . . . . . . . . . . . . . . . . . . . . . 15 106 8.4. OFIB_ONGOING . . . . . . . . . . . . . . . . . . . . . . 16 107 8.5. OFIB_ABANDONED . . . . . . . . . . . . . . . . . . . . . 17 108 9. Management Considerations . . . . . . . . . . . . . . . . . . 17 109 10. IANA considerations . . . . . . . . . . . . . . . . . . . . . 17 110 11. Security considerations . . . . . . . . . . . . . . . . . . . 17 111 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 112 13. Informative References . . . . . . . . . . . . . . . . . . . 18 113 Appendix A. Candidate Methods of Safely Abandoning Loop-Free 114 Convergence (AAH) . . . . . . . . . . . . . . . . . 18 115 A.1. Possible Solutions . . . . . . . . . . . . . . . . . . . 19 116 A.2. Hold-down timer only . . . . . . . . . . . . . . . . . . 19 117 A.3. AAH messages . . . . . . . . . . . . . . . . . . . . . . 20 118 A.3.1. Per Router State Machine . . . . . . . . . . . . . . 20 119 A.3.2. Per Neighbor State Machine . . . . . . . . . . . . . 22 120 Appendix B. Synchronisation of Loop Free Timer Values . . . . . 24 121 B.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 24 122 B.2. Required Properties . . . . . . . . . . . . . . . . . . . 24 123 B.3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 25 124 B.4. Security Considerations . . . . . . . . . . . . . . . . . 25 125 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 127 1. The Purpose of this Document 129 This document describes an illustrative framework of a mechanism for 130 use in conjunction with link state routing protocols which prevents 131 the transient loops which would otherwise occur during topology 132 changes. It does this by correctly sequencing the forwarding 133 information base (FIB) updates on the routers. 135 At the time of publication there is no demand to deploy this 136 technology, however in view of the subtleties involved in the design 137 of loop-free convergence routing protocol extensions the Routing Area 138 Working Group considered it desirable to publish this document to 139 place on record the design consideration of the ordered FIB 140 (oFIB)approach. 142 The mechanisms presented in this document are purely illustrative of 143 the general approach and do not constitute a protocol specification. 145 The document represents a snapshot of the work of the working group 146 at the time of publication and is published as a document of record. 147 Additional work is needed to specify the necessary routing protocol 148 extensions necessary to support this IP fast re-route (IPFRR) method 149 before implementation or deployment. 151 2. Introduction 153 With link-state protocols, such as IS-IS [ISO10589] and OSPF 154 [RFC2328], each time the network topology changes, some routers need 155 to modify their forwarding information bases (FIBs) to take into 156 account the new topology. Each topology change causes a convergence 157 phase. During this phase, routers may transiently have inconsistent 158 FIBs, which may lead to packet loops and losses, even if the 159 reachability of the destinations is not compromised after the 160 topology change. Packet losses and transient loops can also occur in 161 the case of a link down event implied by a maintenance operation, 162 even if this operation is predictable and not urgent. When the link 163 state change is a metric update and when a new link is brought up in 164 the network, there is no direct loss of connectivity, but transient 165 packet loops and loss can still occur. 167 For example, in Figure 1, if the link between X and Y is shut down by 168 an operator, packets destined to X can loop between R and Y when Y 169 has updated its FIB while R has not yet updated its FIB, and packets 170 destined to Y can loop between X and S if X updates its FIB before S. 171 According to the current behaviour of ISIS and OSPF, this scenario 172 will happen most of the time because X and Y are the first routers to 173 be aware of the failure, so that they will update their FIBs first. 175 1 176 X-------------/-------------Y 177 | | 178 | | 179 | | 180 | | 181 1 | | 1 182 | | 183 | | 184 | | 185 | | 186 S---------------------------R 187 2 189 Figure 1: A simple topology 191 It should be noted that the loops can occur remotely from the 192 failure, not just adjacent to it. 194 [RFC5715] provides an introduction to a number of loop-free 195 convergence methods and readers unfamiliar with this technology are 196 recommended to read before studying this document in detail. Note 197 that in common with other loop-free convergence methods, oFIB is only 198 capable of providing loop free convergence in the presence of a 199 single failure. 201 The goal of this document is to describe a mechanism which sequences 202 the router FIB updates to maintain consistency throughout the 203 network. By correctly setting the FIB change order, no looping or 204 packet loss can occur. This mechanism may be applied to the case of 205 managed link-state changes, i.e. link metric change, manual link 206 down/up, manual router down/up, and managed state changes of a set of 207 links attached to one router. It may also be applied to the case 208 where one or more network elements are protected by a fast re-route 209 mechanism (FRR) [RFC5714] [RFC4090]. The mechanisms that are used in 210 the failure case are exactly the same as those used for managed 211 changes. For simplicity this document makes no further distinction 212 between managed and unplanned changes. 214 It is assumed in the description that follows that all routers in the 215 routing domain are oFIB capable. This can be verified in an 216 operation network by the routers reporting oFIB capability using the 217 IGP in use. Where non-oFIB capable routers exist in the network, 218 normal convergence would be used by all routers. The operation of 219 mixed-mode networks is for further study. 221 The technology described in this document has been subject to 222 extensive simulation using real network topologies and costs and 223 pathological convergence behaviour. A variant of the technology 224 described here has been experimentally deployed in a production 225 network. 227 3. The required FIB update order 229 This section provides an overview of the required ordering of the FIB 230 updates. A more detailed analysis of the rerouting dynamics and 231 correctness proofs of the mechanism can be found in [refs.PFOB07]. 233 3.1. Single Link Events 235 For simplicity the correct ordering for single link changes are 236 described first. The document then builds on this to demonstrate 237 that the same principles can be applied to more complex scenarios 238 such as line card or node changes. 240 3.1.1. Link Down / Metric Increase 241 First consider the non-urgent failure of a link (i.e. where an 242 operator or a network management system (NMS) shuts down a link 243 thereby removing it from the currently active topology) or the 244 increase of a link metric by the operator or NMS . In this case, a 245 router R must not update its FIB until all other routers that send 246 traffic via R and the affected link have first updated their FIBs. 248 The following argument shows that this rule ensures the correct order 249 of FIB change when the link X->Y is shut down or its metric is 250 increased. 252 An "outdated" FIB entry for a destination is defined as being a FIB 253 entry that still reflects the shortest path(s) in use before the 254 topology change. Once a packet reaches a router R that has an 255 outdated FIB entry for the packet destination, then, provided the 256 oFIB ordering is respected, the packet will continue to X only 257 traversing routers that also have an outdated FIB entry for the 258 destination. The packet thus reaches X without looping and will be 259 forwarded to Y via X->Y (or in the case of FRR, the X->Y repair path) 260 and hence reach its destination. 262 Since it can be assumed that the original topology was loop-free, Y 263 will never use the link Y->X to reach the destination and hence the 264 path(s) between Y and the destination are guaranteed to be unaffected 265 by the topology change. It therefore follows that the packet 266 arriving at Y will reach its destination without looping. 268 Since it can also be assumed that the new topology is loop-free, by 269 definition a packet cannot loop while being forwarded exclusively by 270 routers with an updated FIB entry. 272 In other words, when the oFIB ordering is respected, if a packet 273 reaches an outdated router, it can never subsequently reach an 274 updated router, and cannot loop because from this point on it will 275 only be forwarded on the consistent path that was used before the 276 event. If it does not reach an outdated router, it will only be 277 forwarded on the loop free path that will be used after the 278 convergence. 280 According to the proposed ordering, X will be the last router to 281 update its FIB. Once it has updated its FIB, the link X->Y can 282 actually be shut down (or the repair removed). 284 If the link X-Y is bidirectional a similar process must be run to 285 order the FIB update for destinations using the link in the direction 286 Y->X. As has already been shown, no packet ever traverses the X-Y 287 link in both directions, and hence the operation of the two ordering 288 processes is orthogonal. 290 3.1.2. Link Up / Metric Decrease 292 In the case of link up events or metric decreases, a router R must 293 update its FIB before all other routers that will use R to reach the 294 affected link. 296 The following argument shows that this rule ensures the correct order 297 of FIB change when the link X->Y is brought into service or its 298 metric is decreased. 300 Firstly, when a packet reaches a router R that has already updated 301 its FIB, all the routers on the path from R to X will also have 302 updated their FIB, so that the packet will reach X and be forwarded 303 along X->Y, ultimately reaching its destination. 305 Secondly, a packet cannot loop between routers that have not yet 306 updated their FIB. This proves that no packet can loop. 308 3.2. Multi-link events 310 The following sections describe the required ordering for single 311 events which may manifest as multiple link events. For example, the 312 failure of a router may be notified to the rest of the network as the 313 individual failure of all its attached links. The means of 314 identifying the event type from the collection of received link 315 events is described in Section 4.1. 317 3.2.1. Router Down events 319 In the case of the non-urgent shut-down of a router, a router R must 320 not update its FIB until all other routers that send traffic via R 321 and the affected router have first updated their FIBs. 323 Using a proof similar to that for link failure, it can be shown that 324 no loops will occur if this ordering is respected [refs.PFOB07]. 326 3.2.2. Router Up events 328 In the case of a router being brought into service, a router R must 329 update its FIB BEFORE all other routers that WILL use R to reach the 330 affected router. 332 A proof similar to that for link up, shows that no loops will occur 333 if this ordering is respected [refs.PFOB07]. 335 3.2.3. Linecard Failure/Restoration Events 336 The failure of a line card involves the failure of a set of links all 337 of which have a single node in common, i.e. the parent router. The 338 ordering to be applied is the same as if it were the failure of the 339 parent router. 341 In a similar way, the restoration of an entire linecard to service as 342 a single event can be treated as if the parent router were returning 343 to service. 345 4. Applying ordered FIB updates 347 4.1. Deducing the topology change 349 As has been described, a single event such as the failure or 350 restoration of a single link, single router or a linecard may be 351 notified to the rest of the network as a set of individual link 352 change events. It is necessary to deduce from this collection of 353 link state notifications the type of event that has occurred in the 354 network and hence the required ordering. 356 When a link change event is received which impacts the receiving 357 router's FIB, the routers at the near and far end of the link are 358 noted. 360 If all events received within some hold-down period (the time that a 361 router waits to acquire a set of LSPs which should be processed 362 together) have a single router in common, then it is assumed that the 363 change reflects an event (line-card or router change) concerning that 364 router. 366 In the case of a link change event, the router at the far end of the 367 link is deemed to be the common router. 369 All ordering computations are based on treating the common router as 370 the root for both link and node events. 372 4.2. Deciding if ordered FIB updates applies 374 There are some events (for example, a subsequent failure with 375 conflicting repair requirements occurring before the ordered FIB 376 process has completed) that cannot be correctly processed by this 377 mechanism. In these cases it is necessary to ensure that convergence 378 falls back to the conventional mode of operation (see Section 7). 380 In all cases it is necessary to wait some hold-down period after 381 receiving the first notification to ensure that all routers have 382 received the complete set of link state notifications associated with 383 the single event. 385 At any time, if a link change notification is received which would 386 have no effect on the receiving router's FIB, then it may be ignored. 388 If no other event is received during the hold-down time, the event is 389 treated as a link event. Note that the IGP reverse connectivity 390 check means that only the first failure event, or second up event 391 have an effect on the FIB. 393 If an event is received within the hold down period which does NOT 394 reference the common router (R) then in this version of the 395 specification normal convergence is invoked immediately (see 396 Section 7). 398 Network reconvergence under ordered FIB takes longer than the normal 399 reconvergence process. Where the failure is protected by an FRR 400 mechanism, this additional delay in convergence causes no packet 401 loss. When the sudden failure of a link or a set of links that are 402 not protected using a FRR mechanism occurs this must be processed 403 using the conventional (faster) mode of operation to minimise packet 404 loss during re-convergence. 406 In summary an ordered FIB process is applicable if the set of link 407 state notifications received between the first event and the hold 408 down period reference a common router R, and one of the following 409 assertions is verified : 411 o The set of notifications refer to link down events concerning 412 protected links and metric increase events 414 o The set of notifications refer to link up events and metric 415 decrease events. 417 5. Computation of the ordering 419 This section describes how the required ordering is computed. 421 This computation required the introduction of the concept of a 422 reverse Shortest Path Tree (rSPT). The rSPT uses the cost towards 423 the root rather than from it and yields the best paths towards the 424 root from other nodes in the network[I-D.bryant-ipfrr-tunnels]. 426 5.1. Link or Router Down or Metric Increase 428 To respect the proposed ordering, routers compute a rank that will be 429 used to determine the time at which they are permitted to perform 430 their FIB update. In the case of a failure event rooted at router Y 431 or an increase of the metric of link X->Y, router R computes the rSPT 432 in the topology before the failure (rSPT_OLD) rooted at Y. This rSPT 433 gives the shortest paths to reach Y before the failure. The branch 434 of the reverse SPT that is below R corresponds to the set of shortest 435 paths to R that are used by the routers that reach Y via R. 437 The rank of router R is defined as the depth (in number of hops) of 438 this branch. In the case of Equal Cost Multi-path (ECMP), the 439 maximum depth of the ECMP path set is used. 441 Router R is required to update its FIB at time 443 T0 + H + (rank * MAX_FIB) 445 where T0 is the arrival time of the link-state packet containing the 446 topology change, H is the hold-down time and MAX_FIB is a network- 447 wide constant that reflects the maximum time required to update a FIB 448 irrespective of the change required. The value of MAX_FIB is network 449 specific and its determination is out of the scope of this document. 450 This value must be agreed by all the routers in the network. This 451 agreement can be performed by using a capability TLV as defined in 452 Appendix B. 454 All the routers that use R to reach Y will compute a lower rank than 455 R, and hence the correct order will be respected. It should be noted 456 that only the routers that used Y before the event need to compute 457 their rank. 459 5.2. Link or Router Up or Metric Decrease 461 In the case of a link or router up event rooted at Y or a link metric 462 decrease affecting link Y->W, a router R must have a rank that is 463 higher than the rank of the routers that it will use to reach Y, 464 according to the rule described in Section 3. The rank of R is thus 465 the number of hops between R and Y in its renewed Shortest Path Tree. 466 When R has multiple equal cost paths to Y, the rank is the length in 467 hops of the longest ECMP path to Y. 469 Router R is required to update its FIB at time 471 T0 + H + (rank * MAX_FIB) 473 It should be noted that only the routers that use Y after the event 474 have to compute a rank, i.e. only the routers that have Y in their 475 SPT after the link-state change. 477 6. Acceleration of Ordered Convergence 479 The mechanism described above is conservative, and hence may be 480 relatively slow. The purpose of this section is to describe a method 481 of accelerating the controlled convergence in such a way that ordered 482 loop-free convergence is still guaranteed. 484 In many cases a router will complete its required FIB changes in a 485 time much shorter than MAX_FIB and in many other cases, a router will 486 not have to perform any FIB change at all. 488 This section describes the use of completion messages to speed up the 489 convergence by providing a means for a router to inform those routers 490 waiting for it, that it has completed any required FIB changes. When 491 a router has been advised of completion by all the routers for which 492 it is waiting, it can safely update its own FIB without further 493 delay. In most cases this can result in a sub-second re-convergence 494 time comparable with that of normal convergence. 496 Routers maintain a waiting list of the neighbours from which a 497 completion message must be received. Upon reception of a completion 498 message from a neighbour, a router removes this neighbour from its 499 waiting list. Once its waiting list becomes empty, the router is 500 allowed to update its FIB immediately even if its ranking timer has 501 not yet expired. Once this is done, the router sends a completion 502 message to the neighbours that are waiting for it to complete. Those 503 routers are listed in a list called the Notification List. 504 Completion messages contain an identification of the event to which 505 they refer. 507 Note that, since this is only an optimization, any loss of completion 508 messages will result in the routers waiting their defined ranking 509 time and hence the loop-free properties will be preserved. 511 6.1. Construction of the waiting list and notification list 513 6.1.1. Down events 515 Consider a link or node down event rooted at router Y or the cost 516 increase of the link X->Y. A router R will compute rSPT_OLD(Y) to 517 determine its rank. When doing this, R also computes the set of 518 neighbours that R uses to reach the failing node or link, and the set 519 of neighbours that are using R to reach the failing node or link. 520 The Notification list of R is equal to the former set and the Waiting 521 list of R is equal to the latter. 523 Note that R could include all its neighbours except those in the 524 Waiting list in the Notification list, this has no impact on the 525 correctness of the protocol, but would be unnecessarily inefficient. 527 6.1.2. Up Events 528 Consider a link or node up event rooted at router Y or the cost 529 decrease of the link Y->X. A router R will compute its new SPT 530 (SPT_new(R)). The Waiting list is the set of next hop routers that R 531 uses to reach Y in SPT_new(R). 533 In a simple implementation the notification list of R is all the 534 neighbours of R excluding those in the Waiting list. This may be 535 further optimized by computing rSPT_new(Y) to determine those routers 536 that are waiting for R to complete. 538 6.2. Format of Completion Messages 540 The format of completion messages and means of their delivery is 541 routing protocol dependent and is outside the scope of this document. 543 The following information is required: 545 o Identity of the sender. 547 o List of routing notifications being considered in the associated 548 FIB change. Each notification is defined as : 550 Node ID of the near end of the link 552 Node ID of the far end of the link 554 Inclusion or removal of link. 556 Old Metric 558 New Metric 560 7. Fall back to Conventional Convergence 562 In circumstances where a router detects that it is dealing with 563 incomplete or inconsistent link state information, or when a further 564 topology event is received before completion of the current ordered 565 FIB update process, it may be expedient to abandon the controlled 566 convergence process. A number of possible fall back mechanisms are 567 described in Appendix A. This mechanism is referred to as 568 "Abandoning All Hope" (AAH). The state machine defined in the body 569 of this document does not make any assumption about which fall back 570 mechanism will be used. 572 8. oFIB state machine 574 This section describes a model of an oFIB state machine which an 575 implementation must be capable of interworking with. 577 An oFIB capable router maintains an oFIB state value which can be one 578 of : OFIB_STABLE, OFIB_HOLDING_DOWN, OFIB_HOLDING_UP, OFIB_ABANDONED, 579 OFIB_ONGOING. 581 An oFIB capable router maintains a timer, Hold_down_timer. An oFIB 582 capable router is configured with a value referred to as 583 HOLD_DOWN_DURATION. This configuration can be performed manually or 584 using Appendix B. 586 An oFIB capable router maintains a timer, rank_timer. 588 8.1. OFIB_STABLE 590 OFIB_STABLE is the state of a router which is not currently involved 591 in any convergence process. This router is ready to process an event 592 by applying oFIB. 594 EVENT : Reception of a link-state packet describing an event of the 595 type link X--Y down or metric increase to be processed using oFIB. 597 ACTION : 599 Set state to OFIB_HOLDING_DOWN. 601 Start Hold_down_timer. 603 ofib_current_common_set = {X,Y}. 605 Compute rank with respect to the event, as defined in Section 5. 607 Store Waiting List and Notification List for X--Y obtained from 608 the rank computation. 610 EVENT : Reception of a link-state packet describing an event of the 611 type link X--Y up or metric decrease which to be processed using 612 oFIB. 614 ACTION : 616 Set state to OFIB_HOLDING_UP. 618 Start Hold_down_timer. 620 ofib_current_common_set = {X,Y}. 622 Compute rank with respect to the event, as defined in Section 5. 624 Store Waiting List and Notification List for X--Y obtained from 625 the rank computation. 627 8.2. OFIB_HOLDING_DOWN 629 OFIB_HOLDING_DOWN is the state of a router that is collecting a set 630 of link down or metric increase link-state packets to be processed 631 together using controlled convergence. 633 EVENT : Reception of a link-state packet describing an event of the 634 type link up or metric decrease which in itself can be processed 635 using oFIB. 637 ACTION : 639 Set state to OFIB_ABANDONED. 641 Reset Hold_down_timer. 643 Trigger AAH mechanism 645 EVENT : Reception of a link-state packet describing an event of the 646 type link A--B down or metric increase which in itself can be 647 processed using oFIB. 649 ACTION : 651 ofib_current_common_set = 652 intersection(ofib_current_common_set,{A,B}). 654 If ofib_current_common_set is empty, then there is no longer a 655 node in common in all the pending link-state changes. 657 Set state to OFIB_ABANDONED. 659 Reset Hold_down_timer. 661 Trigger AAH mechanism. 663 If ofib_current_common set is not empty, update waiting list and 664 notification list as defined in Section 5. Note that in the case 665 of a single link event, the link-state packet received when the 666 router is in this state describes the state change of the other 667 direction of the link, hence no changes will be made to the 668 waiting and notification lists. 670 EVENT : Hold_down_timer expires. 672 ACTION : 674 Set state to OFIB_ONGOING. 676 Start rank_timer with computed rank. 678 EVENT : Reception of a completion message 680 ACTION : Remove the sender from waiting list associated with the 681 event identified in the completion message. 683 8.3. OFIB_HOLDING_UP 685 OFIB_HOLDING_UP is the state of a router that is collecting a set of 686 link up or metric decrease link-state packets to be processed 687 together using controlled convergence. 689 EVENT : Reception of a link-state packet describing an event of the 690 type link down or metric increase to be processed using oFIB. 692 ACTION : 694 Set state to OFIB_ABANDONED. 696 Reset Hold_down_timer. 698 Trigger AAH mechanism. 700 EVENT : Reception of a link-state packet describing an event of the 701 type link A--B up or metric decrease to be processed using oFIB. 703 ACTION : 705 ofib_current_common_set = 706 intersection(ofib_current_common_set,{A,B}). 708 If ofib_current_common_set is empty, then there is no longer a 709 common node in the set of pending link-state changes. 711 Set state to OFIB_ABANDONED. 713 Reset Hold_down_timer. 715 Trigger AAH mechanism. 717 If ofib_current_common set is not empty, update waiting list and 718 notification list as defined in Section 5. Note that in the case 719 of a single link event, the link-state packet received when the 720 router is in this state describes the state change of the other 721 direction of the link, hence no changes will be made to the 722 waiting and notification lists. 724 EVENT : Reception of a completion message 726 ACTION : Remove the sender from the waiting list associated with the 727 event identified in the completion message. 729 EVENT : Hold_down_timer expires. 731 ACTION : 733 Set state to OFIB_ONGOING. 735 Start rank_timer with computed rank. 737 8.4. OFIB_ONGOING 739 OFIB_ONGOING is the state of a router that is applying the ordering 740 mechanism w.r.t. the set of Link State Packets (LSP) collected when 741 in OFIB_HOLDING_DOWN or OFIB_HOLDING_UP state. 743 EVENT : rank_timer expires or waiting list becomes empty. 745 ACTION : 747 Perform FIB updates according to the change. 749 Send completion message to each member of the notification list. 751 Set State to OFIB_STABLE. 753 EVENT : Reception of a completion message 755 ACTION : Remove the sender from the waiting list. 757 EVENT : Reception of a link-state packet describing a link state 758 change event. 760 ACTION : 762 Set state to OFIB_ABANDONED. 764 Trigger AAH. 766 Start Hold_down_timer. 768 8.5. OFIB_ABANDONED 770 OFIB_ABANDONED is the state of a router that has fallen back to fast 771 convergence due to the reception of link-state packets that cannot be 772 dealt together using oFIB. 774 EVENT : Reception of a link-state packet describing a link-state 775 change event. 777 ACTION : Trigger AAH, reset AAH_Hold_down_timer. 779 EVENT : AAH_Hold_down_timer expires. 781 ACTION : Set state to OFIB_STABLE 783 9. Management Considerations 785 A system for recording the dynamics of the convergence process needs 786 to be deployed in order to post hoc diagnose the re-convergence. The 787 sensitivity of applications to the any packet re-order introduced by 788 the delayed convergence process will need to be studied, however both 789 of these considerations apply to any loop-free convergence method and 790 are not specific to the ordered FIB method described in this 791 document. 793 10. IANA considerations 795 There are no IANA considerations which arise from this document. Any 796 such considerations will be called out in protocol specific documents 797 defining the modification to any routing protocol that is to be 798 enhanced to support loop-free convergence using ordered FIB. 800 11. Security considerations 802 This document requires only minor modifications to existing routing 803 protocols and therefore does not add significant additional security 804 risks. However a full security analysis would need to be provided 805 within the protocol specific specifications proposed for deployment. 806 Additional security considerations are noted in Appendix B.4. 808 12. Acknowledgments 810 We would like to thank Jean-Philippe Vasseur and Les Ginsberg for 811 their useful suggestions and comments. 813 13. Informative References 815 [ISO10589] 816 International Organization for Standardization, 817 "Intermediate system to Intermediate system intra-domain 818 routing information exchange protocol for use in 819 conjunction with the protocol for providing the 820 connectionless-mode Network Service (ISO 8473)", ISO/IEC 821 10589:2002, Second Edition, Nov 2002. 823 [refs.PFOB07] 824 P. Francois, and O. Bonaventure, "Avoiding transient loops 825 during IGP convergence in IP Networks", in IEEE/ACM 826 Transactions on Networking, http://inl.info.ucl.ac.be/ 827 system/files/pfr-obo-ofib-ton.pdf, December 2007. 829 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 831 [RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute 832 Extensions to RSVP-TE for LSP Tunnels", RFC 4090, May 833 2005. 835 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", RFC 836 5714, January 2010. 838 [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free 839 Convergence", RFC 5715, January 2010. 841 [I-D.atlas-bryant-shand-lf-timers] 842 K, A. and S. Bryant, "Synchronisation of Loop Free Timer 843 Values", draft-atlas-bryant-shand-lf-timers-04 (work in 844 progress), February 2008. 846 [I-D.bryant-ipfrr-tunnels] 847 Bryant, S., Filsfils, C., Previdi, S., and M. Shand, "IP 848 Fast Reroute using tunnels", draft-bryant-ipfrr-tunnels-03 849 (work in progress), November 2007. 851 Appendix A. Candidate Methods of Safely Abandoning Loop-Free 852 Convergence (AAH) 854 IPFRR[RFC5714] and loop-free convergence techniques [RFC5715] can 855 deal with single topology change events, multiple correlated change 856 events, and in some cases even certain uncorrelated events. However, 857 in all cases there are events which cannot be dealt with and the 858 mechanism needs to quickly revert to normal convergence. This is 859 known as "Abandoning All Hope" (AAH). 861 This appendix describes the outcome of a design study into the AAH 862 problem, and is included here to trigger discussion on the trade-offs 863 between complexity and robustness in the AAH solution-space. 865 A.1. Possible Solutions 867 Two approaches to this problem have been proposed: 869 1. Hold-down timer only. 871 2. Synchronization of AAH state using AAH messages. 873 These are described below. 875 A.2. Hold-down timer only 877 The "hold-down timer only" AAH method uses a hold-down to acquire a 878 set of LSPs which should be processed together. On expiry of the 879 local hold-down timer, the router begins processing the batch of LSPs 880 according to the loop free prevention algorithm. 882 There are a number of problems with this simple approach. In some 883 cases the timer value will be too short to ensure that all the 884 related events have arrived at all routers (perhaps because there was 885 some unexpected propagation delay, or one or more of the events are 886 slow in being detected). In other cases, a completely unrelated 887 event may occur after the timer has expired, but before the 888 processing is complete. In addition, since the timer is started at 889 each router on reception of the first LSP announcing a topology 890 change, the actual starting time is dependant upon the propagation 891 time of the first LSP. So, for a subsequent event occurring around 892 the time of the timer expiry, because of variations in propagation 893 delay it may reach some routers before the timer expires and others 894 after it has expired. In the former case this LSP will be included 895 in the set of changes to be considered, while in the latter it will 896 be excluded leasing to serious routing inconsistency. In such cases 897 continuing to operate the loop-free convergence protocol may 898 exacerbate the situation. 900 The simple approach to this would be to revert to normal convergence 901 (AAH) whenever an LSP is received after the timer has expired. 902 However this also has problems for the reasons above and therefore 903 AAH must be a synchronous operation, i.e. it is necessary to arrange 904 that an AAH invoked anywhere in the network causes ALL routers to 905 AAH. 907 It is also necessary to consider the means of exiting the AAH state. 908 Again the simplest method is to use a timer. However while in AAH 909 state any topology changes previously received, or which are 910 subsequently received, should be processed immediately using the 911 traditional convergence algorithms, i.e. without invoking controlled 912 convergence. If the exit from the AAH state is not correctly 913 synchronized, a new event may be processed by some routers 914 immediately (as AAH), while those which have already left AAH state 915 will treat it as the first of a new batch of changes and attempt 916 controlled convergence. Thus both entry and exit from the AAH state 917 needs to be synchronised. A method of achieving this is described in 918 Appendix A.3. 920 A.3. AAH messages 922 Like the simple timer AAH method, the "AAH messages" AAH method uses 923 a hold-down to acquire a set of LSPs which should be processed 924 together. On expiry of the local hold-down timer, the router begins 925 processing the batch of LSPs according to the loop free prevention 926 algorithm. This is the same behaviour as the hold-down timer only 927 method. However, if any router, having started the loop-free 928 convergence process receives an LSP which would trigger a topology 929 change, it locally abandons the controlled convergence process, and 930 sends an AAH message to all its neighbours. This eventually triggers 931 all routers to abandon the controlled convergence. The routers 932 remain in AAH state (i.e. processing topology changes using normal 933 "fast" convergence), until a period of quiescence has elapsed. The 934 exit from AAH state is synchronized by using a two step process. To 935 achieve the required synchronization, two additional messages are 936 required, AAH and AAH ACK. The AAH message is reliably exchanged 937 between neighbours using the AAH ACK message. These could be 938 implemented as a new message within the routing protocol or carried 939 in existing routing hello messages. Two types of state machines are 940 needed. A per-router AAH state machine and a per neighbour AAH state 941 machine(PNSM). These are described below. 943 A.3.1. Per Router State Machine 945 Per Router State Table 947 +-------------+----------+---------+--------+------------+----------+ 948 | EVENT | Q | Hold | CC | AAH | AAH-hold | 949 +=============+==========+=========+========+============+==========+ 950 | RX LSP | Start | - | TX-AAH | Re-start | TX-AAH | 951 | triggering |hold-down | | Start | AAH timer. | Start | 952 | change | timer | | AAH | [AAH] | AAH | 953 | | [Hold] | | timer. | | timer. | 954 | | | | [AAH] | | [AAH] | 955 +-------------+----------+---------+--------+------------+----------+ 956 | RX AAH | TX-AAH | TX-AAH | TX-AAH | [AAH] | TX-AAH | 957 |(Neighbour's |Start AAH | Start | Start | | Start | 958 | PNSM | timer. | AAH | AAH | | AAH | 959 | processes | [AAH] | timer | timer. | | timer. | 960 | RX AAH.) | | [AAH] | [AAH] | | [AAH] | 961 +-------------+----------+---------+--------+------------+----------+ 962 | Timer | - | Trigger | - | Start | [Q] | 963 | expiry | | CC. | | AAH-hold | | 964 | | | [CC] | | timer. | | 965 | | | | | [AAH-hold] | | 966 +-------------+----------+---------+--------+------------+----------+ 967 | Controlled | - | - | [Q] | - | - | 968 | convergence | | | | | | 969 | completed | | | | | | 970 +-------------+----------+---------+--------+------------+----------+ 971 TX-AAH = Send "goto TX-AAH" to all other PNSMs. 973 Operation of the per-router state machine is as follows: 975 Operation of this state machine under normal topology change involves 976 only states: Quiescent (Q), Hold-down (Hold) and Controlled 977 Convergence (CC). The remaining states are associated with an AAH 978 event. 980 The resting state is Quiescent. When the router in the Quiescent 981 state receives an LSP indicating a topology change, which would 982 normally trigger an SPF, it starts the Hold-down timer and changes 983 state to Hold-down. It normally remains in this state, collecting 984 additional LSPs until the Hold-down timer expires. Note that all 985 routers must use a common value for the Hold-down timer. When the 986 Hold-down timer expires the router then enters Controlled Convergence 987 (CC) state and executes the CC mechanism to re-converge the topology. 988 When the CC process has completed on the router, the router re-enters 989 the Quiescent state. 991 If this router receives a topology changing LSP whilst it is in the 992 CC state, it enters AAH state, and sends a "goto TX-AAH" command to 993 all per neighbour state machines which causes each per-neighbour 994 state machine to signal this state change to its neighbour. 995 Alternatively, if this router receives an AAH message from any of its 996 neighbours whilst in any state except AAH, it starts the AAH timer 997 and enters the AAH state. The per neighbour state machine 998 corresponding to the neighbour from which the AAH was received 999 executes the RX AAH action (which causes it to send an AAH ACK), 1000 while the remainder are sent the "goto TX-AAH" command. The result 1001 is that the AAH is acknowledged to the neighbour from which it was 1002 received and propagated to all other neighbours. On entering AAH 1003 state, all CC timers are expired and normal convergence takes place. 1005 Whilst in the AAH state, LSPs are processed in the traditional 1006 manner. Each time an LSP is received, the AAH timer is restarted. 1007 In an unstable network ALL routers will remain in this state for some 1008 time and the network will behave in the traditional uncontrolled 1009 convergence manner. 1011 When the AAH timer expires, the router enters AAH-hold state and 1012 starts the AAH hold timer. The purpose of the AAH-hold state is to 1013 synchronize the transition of the network from AAH to Quiescent. The 1014 additional state ensures that the network cannot contain a mixture of 1015 routers in both AAH and Quiescent states. If, whilst in AAH-Hold 1016 state the router receives a topology changing LSP, it re-enters AAH 1017 state and commands all per neighbour state machines to "goto TX-AAH". 1018 If, whilst in AAH-Hold state the router receives an AAH message from 1019 one of its neighbours, it re-enters the AAH state and commands all 1020 other per neighbour state machines to "goto TX-AAH". Note that the 1021 per-neighbour state machine receiving the AAH message will 1022 autonomously acknowledge receipt of the AAH message. Commanding the 1023 per-neighbour state machine to "goto TX-AAH" is necessary, because 1024 routers may be in a mixture of Quiescent, Hold-down and AAH-hold 1025 state, and it is necessary to rendezvous the entire network back to 1026 AAH state. 1028 When the AAH Hold timer expires the router changes to state Quiescent 1029 and is ready for loop free convergence. 1031 A.3.2. Per Neighbor State Machine 1033 Per Neighbour State Table 1035 +----------------------------+--------------+-----------------------+ 1036 | EVENT | Idle | TX-AAH | 1037 +============================+==============+=======================+ 1038 | RX AAH | Send ACK. | Send ACK. | 1039 | | | Cancel timer. | 1040 | | [IDLE] | [IDLE] | 1041 +----------------------------+--------------+-----------------------+ 1042 | RX ACK | ignore | Cancel timer. | 1043 | | | [IDLE] | 1044 +----------------------------+--------------+-----------------------+ 1045 | RX "goto TX-AAH" from | Send AAH | ignore | 1046 | Router State Machine | [TX-AAH] | | 1047 +----------------------------+--------------+-----------------------+ 1048 | Timer expires | impossible | Send AAH | 1049 | | | Restart timer. | 1050 | | | [TX-AAH] | 1051 +----------------------------+--------------+-----------------------+ 1053 There is one instance of the per-neighbour state machine(PNSM) for 1054 each neighbour within the convergence control domain. 1056 The normal state is IDLE. 1058 On command ("goto TX-AAH") from the router state machine, the state 1059 machine enters TX-AAH state, transmits an AAH message to its 1060 neighbour and starts a timer. 1062 On receipt of an AAH ACK in state TX-AAH the state machine cancels 1063 the timer and enters IDLE state. 1065 In states IDLE, any AAH ACK message received is ignored. 1067 On expiry of the timer in state TX-AAH the state machine transmits an 1068 AAH message to the neighbour and restarts the timer. (The timer 1069 cannot expire in any other state.) 1071 In any state, receipt of an AAH causes the state machine to transmit 1072 an AAH ACK and enter the IDLE state. 1074 Note that for correct operation the state machine must remain in 1075 state TX-AAH, until an AAH ACK or an AAH is received, or the state 1076 machine is deleted. Deletion of the per neighbour state machine 1077 occurs when routing determines that the neighbour has gone away, or 1078 when the interface goes away. 1080 When routing detects a new neighbour it creates a new instance of the 1081 per-neighbour state machine in state Idle. The consequent generation 1082 of the router's own LSP will then cause the router state machine to 1083 execute the LSP receipt actions, which will if necessary result in 1084 the new per-neighbour state machine receiving a "goto TX-AAH" command 1085 and transitioning to TX-AAH state. 1087 Appendix B. Synchronisation of Loop Free Timer Values 1089 The Appendix provided the reader with access to the design 1090 considerations originally described in 1091 [I-D.atlas-bryant-shand-lf-timers] . 1093 B.1. Introduction 1095 Most of the loop-free convergence mechanisms [RFC5715] require one or 1096 more convergence delay timers that must have a duration that is 1097 consistent throughout the routing domain. This time is the worst 1098 case time that any router will take to calculate the new topology, 1099 and to make the necessary changes to the FIB. The timer is used by 1100 the routers to know when it is safe to transition between the loop- 1101 free convergence states. The time taken by a router to complete each 1102 phase of the loop-free transition will be dependent on the size of 1103 the network and the design and implementation of the router. It can 1104 therefore be expected that the optimum delay will need to be tuned 1105 from time to time as the network evolves. Manual configuration of 1106 the timer is fraught for two reasons. Firstly it is always difficult 1107 to ensure that the correct value is installed in all of the routers. 1108 Secondly, if any change is introduced into the network that results 1109 in a need to change the timer, for example, due to a change in 1110 hardware or software version, then all of the routers need to be 1111 reconfigured to use the new timer value. It is therefore desirable 1112 that a means be provided by which the convergence delay timer can be 1113 automatically synchronized throughout the network. 1115 B.2. Required Properties 1117 The timer synchronization mechanism must have the following 1118 properties: 1120 o The convergence delay time must be consistent amongst all routers 1121 that are converging on the new topology. 1123 o The convergence delay time must be the highest delay required by 1124 any router in the new topology. 1126 o The mechanism must increase the delay when a new router in 1127 introduced to the network that requires a higher delay than is 1128 currently in use. 1130 o When the router that had the longest delay requirements is removed 1131 from the topology, the convergence delay timer value must, within 1132 some reasonable time, be reduced to the longest delay required by 1133 the remaining routers. 1135 o It must be possible for a router to change the convergence delay 1136 timer value that it requires. 1138 o A router which is in multiple routing areas, or is running 1139 multiple routing protocols may signal a different loop-free 1140 convergence delay for each area, and for each protocol. 1142 How a router determines the time that it needs to execute each 1143 convergence phase is an implementation issue, and outside the scope 1144 of this specification. However a router that dynamically determines 1145 its proposed timer value must do so in such a way that it does not 1146 cause the synchronized value to continually fluctuate. 1148 B.3. Mechanism 1150 The following mechanism is proposed. 1152 A new information element is introduced into the routing protocol 1153 that specifies the maximum time (in milliseconds) that the router 1154 will take to calculate the new topology and to update its FIB as a 1155 result of any topology change. 1157 When a topology change occurs, the largest convergence delay time 1158 required by any router in the new topology is used by the loop-free 1159 convergence mechanism. 1161 If a routing protocol message is issued that changes the convergence 1162 delay timer value, but does not change the topology, the new timer 1163 value must be taken into consideration during the next loop-free 1164 transition, but must not instigate a loop-free transition. 1166 If a routing protocol message is issued that changes the convergence 1167 timer value and changes the topology, a loop-free transition is 1168 instigated and the new timer value is taken into consideration. 1170 The loop-free convergence mechanism should specify the action to be 1171 taken if a timer change (only) message and a topology change message 1172 are independently generated during the hold-off time. A suitable 1173 action would be to take the same action that would be taken if two 1174 uncorrelated topology changes occurred in the network. 1176 All routers that support loop-free convergence must advertise a loop- 1177 free convergence delay time. The loop-free convergence mechanism 1178 must specify the action to be taken if a router does not advertise a 1179 convergence delay time. 1181 B.4. Security Considerations 1182 If an abnormally large timer value is proposed by a router, the there 1183 is a danger that the loop-free convergence process will take an 1184 excessive time. If during that time the routing protocol signals the 1185 need for another transition, the loop-free transition will be 1186 abandoned and the default best case (traditional) convergence 1187 mechanism used. 1189 It is still undesirable that the routers select a convergence delay 1190 time that has an excessive value. The maximum value that can be 1191 specified in the LSP/LSA is limited through the use of a 16 bit field 1192 to about 65 seconds. When sufficient implementation experience is 1193 gained, an architectural constant will be specified which sets the 1194 upper limit of the convergence delay timer. 1196 Authors' Addresses 1198 Mike Shand 1199 Individual Contributor 1201 Email: imc.shand@googlemail.com 1203 Stewart Bryant 1204 Cisco Systems 1205 Green Park, 250, Longwater Avenue, 1206 Reading RG2 6GB 1207 UK 1209 Email: stbryant@cisco.com 1211 Stefano Previdi 1212 Cisco Systems 1213 Via Del Serafico 200 1214 00142 Roma 1215 Italy 1217 Email: sprevidi@cisco.com 1219 Clarence Filsfils 1220 Cisco Systems 1221 Brussels 1222 Belgium 1224 Email: cfilsfil@cisco.com 1225 Pierre Francois 1226 Institute IMDEA Networks 1227 Avda. del Mar Mediterraneo, 22 1228 Leganese 28918 1229 ES 1231 Email: pierre.francois@imdea.org 1233 Olivier Bonaventure 1234 Universite catholique de Louvain 1235 Place Ste Barbe, 2 1236 Louvain-la-Neuve 1348 1237 BE 1239 Email: Olivier.Bonaventure@uclouvain.be 1240 URI: http://inl.info.ucl.ac.be/