idnits 2.17.1 draft-corson-fastrestore-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 848 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 448 has weird spacing: '...blishes links...' == Line 452 has weird spacing: '...lt, the route...' == Line 457 has weird spacing: '...started route...' == Line 508 has weird spacing: '...nts (or equiv...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 2000) is 8562 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'MobileIP' is defined on line 731, but no explicit reference was found in the text == Outdated reference: A later version (-04) exists of draft-ramachandra-bgp-restart-00 ** Obsolete normative reference: RFC 2002 (ref. 'MobileIP') (Obsoleted by RFC 3220) == Outdated reference: A later version (-11) exists of draft-ietf-manet-olsr-01 == Outdated reference: A later version (-04) exists of draft-ietf-manet-tora-spec-02 Summary: 7 errors (**), 0 flaws (~~), 9 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Personal M. Scott Corson 2 Ansible Systems 3 Internet Draft A. O'Neill 4 BT 5 G. Tsirtsis 6 Flarion 7 Document: draft-corson-fastrestore-00.txt 8 Category: Informational November 2000 10 IP Fast Restoration 11 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. Internet-Drafts are working 17 documents of the Internet Engineering Task Force (IETF), its areas, 18 and its working groups. Note that other groups may also distribute 19 working documents as Internet-Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other documents 23 at any time. It is inappropriate to use Internet-Drafts as 24 reference material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 Abstract 34 This memo describes a generalized approach for achieving IP fast 35 restoration in response to link and router failures by using IP 36 routing to support high network availability. The approach relies 37 on the utilization of flat, yet highly-localized routing technology 38 to reduce far-reaching control message propagation, thereby limiting 39 the effects of a failure and enabling sub-second restoration. It 40 then outlines how the Temporally-Ordered Routing Algorithm (TORA) (a 41 Mobile Ad hoc Network (MANET) routing protocol), when suitably 42 modified for use in fixed networks, can be used within this 43 framework. The general approach to routing topology management 44 within the fast restoration framework is immediate reaction on link 45 failure (to preserve existing flows) and gradually integration of 46 new network elements on activation into the domain (to further 47 assure stability). 49 INDEX 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . 2 52 1.1 Terminology. . . . . . . . . . . . . . . . . . . 2 53 2. Motivation . . . . . . . . . . . . . . . . . . . . . . 3 54 3. The Fast Restoration Framework . . . . . . . . . . . . 4 55 3.1 Highly Localized Routing Protocol. . . . . . . . 5 56 3.2 Link Status Assessment Protocol. . . . . . . . . 5 57 3.3 Reaction to Failure. . . . . . . . . . . . . . . 6 58 4. The Temporally-Ordered Routing Algorithm . . . . . . . 7 59 4.1 Overview . . . . . . . . . . . . . . . . . . . . 7 60 4.2 Router Activation (Route Creation) . . . . . . . 8 61 4.3 Link Failure . . . . . . . . . . . . . . . . . . 9 62 4.4 Link Activation. . . . . . . . . . . . . . . . . 9 63 4.5 Router Failure (Route Erasure) . . . . . . . . . 10 64 5. Detecting Router Failures with Single-Homed Prefixes . 10 65 5.1 Neighbor Advertisement Protocol. . . . . . . . . 11 66 5.2 Neighbor Reachability Protocol . . . . . . . . . 12 67 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . 14 68 7. References . . . . . . . . . . . . . . . . . . . . . . 15 70 1. Introduction 72 This memo describes a generalized approach for achieving IP fast 73 restoration in response to link and router failures by using IP 74 routing to support high network availability. The approach relies 75 on the utilization of flat, yet highly-localized routing technology 76 to reduce far-reaching control message propagation, thereby limiting 77 the effects of a failure and enabling sub-second restoration. It 78 then outlines how the Temporally-Ordered Routing Algorithm (TORA) (a 79 Mobile Ad hoc Network (MANET) routing protocol), when suitably 80 modified for use in fixed networks, can be used within this 81 framework. The general approach to routing topology management 82 within the fast restoration framework is an immediate reaction on 83 link failure (to preserve existing flows) and gradually integration 84 of new network elements on activation into the domain (to further 85 assure stability). 87 The draft first discusses the motivation for this draft and its 88 relevance to a recent IAB report [IAB99] on routing convergence. It 89 then outlines a framework of protocol mechanisms, and then gives a 90 fuller description of the main mechanisms. It then describes a 91 candidate routing algorithm that can be used within this framework. 92 It then additionally describes the mechanisms within the framework 93 to address efficiency and localization issues associated with the 94 failure of routers advertising single-homed prefixes. The draft 95 concludes with a discussion of the benefits of this fast restoration 96 framework. 98 1.1 Terminology 99 Here are some new terms defined in this draft. 101 RID Router ID. A unique ID used by a routing protocol to identify 102 a router. 104 RPLF Routing Protocol Link Failure. A RPLF packet is generated as 105 a routing protocol's reaction to a link failure for each 106 destination prefix affected by the failure 108 RPDF Routing Protocol Destination Failure. RPDF is a packet 109 generated to eliminate routing to a destination that has been 110 partitioned from the rest of the network 112 LFDT Link Failure Detection Timer. Failure to receive an FRLS 113 packet with the neighbor heard flag set within the LFDT 114 setting indicates link failure. 116 LFUB Link Failure Upper Bound. This is the upper bounds of the 117 time 118 required to detect failure. 120 FRNA Fast Restore Neighbor Advertisement. A router's FRNA 121 packet also contains the RIDs and corresponding prefixes of 122 the router's one-hop neighbors. 124 FRLS Fast Restore Link Status. A FRLS packet contains a router's 125 RID and an indication of whether or not the router has 126 recently heard its neighbor. 128 2. Motivation 130 The report from the recent IAB Network Layer Workshop [IAB99] raised 131 several routing issues. Regarding routing convergence, it noted 132 that "at least one backbone operator has concerns about the 133 convergence time of internetwork-wide routing during a failover. 134 This operator believes that current convergence times are on the 135 order of half a minute, and possibly getting worse. Others in the 136 routing community did not believe that the convergence times are a 137 current issue. Some, who believe that real-time applications (e.g. 138 telephony) require sub-second convergence, are concerned about the 139 implications of convergence times of a half minute on such 140 applications. [IAB99]" 142 It is agreed that existing Internet intra-domain routing protocols 143 (e.g. [RIP,OSPF]) have limited responsiveness in reacting to 144 topological changes (i.e. slow convergence) and consequently have 145 limitations on the size of domains they may support. These 146 limitations stem directly from the use of shortest-path routing 147 based on global topological information exchange. 149 For any premium service with critically tight availability 150 requirements, operators are left with the use of hot standby routers 151 and link level restoration to improve restoration times. Both 152 mechanisms are however very expensive, and applicable mainly to 153 large-scale core elements. However, these cannot be cost-effectively 154 applied in edge networks such as IP enabled Radio Access Networks 155 and base stations and flat enterprise networks. These will likely be 156 composed of large numbers of small and necessarily cheap IP routers 157 which may be interconnected by ad-hoc, non-resilient or unprotected 158 link layers such as ethernet and various serial lines. The need for 159 a fast restoring IGP is therefore apparent. In addition, if a cost- 160 effective IP layer solution could be found which could be made 161 sympathetic to traffic engineering requirements then the requirement 162 for hot standby and link layer restoration could be reduced 163 resulting in cost savings. The traffic engineering implications are 164 not however considered in this draft. 166 By relaxing the requirement for shortest path routing only slightly 167 (with acceptable impact on forwarding efficiency), one can 168 approximate shortest-path routing while enabling IP fast restoration 169 and higher levels of domain scalability. This can be achieved by 170 utilizing adaptive routing algorithms that localize their response 171 to topological changes while ensuring loop-free paths. It is 172 localization that yields the desired benefits of reduced restoration 173 time and increased domain size. 175 The proposed fast restoration approach therefore offers a potential 176 intra-domain solution to this convergence problem in reaction to 177 failure. It is presently unclear whether the fast restoration 178 techniques discussed in this memo have potential applicability to 179 the associated inter-domain routing problem. 181 3. Fast Restoration Framework 183 Four general protocol mechanisms are required in the proposed fast 184 restoration framework. The first two are mandatory in that they 185 provide the basic fault detection and reaction mechanisms. The 186 second two are optional and illustrate one way to enable the 187 reaction to be tailored to whether the failure point is a link or a 188 neighbor router. 190 1) A highly localized routing protocol 191 This is required to enable the protocol to be able to react 192 extremely quickly to failures, with minimal impact on other 193 routers within the domain. 195 2) A link status assessment protocol 196 This is required to enable a router to rapidly detect link or 197 router failures. 199 3) A neighbor advertisement protocol 200 This is used to enable a router to learn about its local two 201 hop topology including its neighbors' neighbors and the degree 202 of connectivity (single/multi-homed) of subnets. 204 4) A neighbor reachability protocol 205 This is used to enable a router to identify whether it is the 206 link or the neighbor router that has failed through 207 collaboration with its two hop neighbors. 209 3.1 Highly Localized Routing 211 The general approach to routing topology management within the fast 212 restoration framework is immediate reaction on link failure (to 213 preserve existing flows) and gradually integration of new network 214 elements on activation into the domain (to further assure 215 stability). 217 The method by which the aforementioned route adjustment occurs is a 218 function of the routing algorithm. It is essential that a routing 219 algorithm have the following properties to operate within this fast 220 restoration framework. 222 1. The protocol reaction to a link failure that does not partition 223 the network be localized for all destination prefixes. 225 2. The protocol reaction to a router failure that does not partition 226 the network should be localized for all destination prefixes. 227 (i.e. it should be treated as a link failure with respect to 228 these other prefixes). 230 3. The protocol reaction to a link failure that does partition the 231 network be capable of detecting the partition (and erasing routes 232 if desired). 234 4. The protocol reaction to a router failure that does partition the 235 network be capable of erasing routes for these prefixes (if 236 desired). Single-homed prefixes on the failed router will by 237 definition create a partition. 239 The preceding description assumes that a routing protocol reacts to 240 a link or router failure by sending generic Routing Protocol Link 241 Failure (RPLF). Routing Protocol Destination Failure (RPDF) packets 242 as generated as necessary in response to partitions. 244 3.2 Link Status Assessment 246 The link status assessment protocol is principally concerned with 247 fast detection of link failures. Work within the IETF on a 248 generalized IP level link status assessment algorithm was recently 249 proposed [FLIP]. A similar capability is required here where a link 250 consists of a pair of Router IDs (RIDs). After discovery of a 251 neighbor (identified by its RID) via the neighbor discovery 252 protocol, the objective of a link status assessment algorithm would 253 be to monitor with high resolution (in time) the bi-directional 254 connectivity of a link with an adjacent neighbor router. 256 It is desired that a link failure be detectable on millisecond time 257 scales enabling the higher level mechanisms (to be described) to 258 quickly react thereby preserving ongoing sessions. The protocol 259 operates by periodically sending Fast Restore Link Status (FRLS) 260 packets between adjacent interfaces to reconfirm link bi- 261 directionality. The FRLS retransmission period is set to 262 LINK_PROBE_INTERVAL time units. A FRLS packet contains a router's 263 RID and an indication of whether or not the router has recently 264 heard its neighbor. 266 FRLS packets are initially sent indicating that the sending router 267 has not heard its neighbor. Reception of a FRLS packet at a router 268 from a neighbor causes the router to set the "neighbor heard" flag 269 to TRUE for ROUTER_HEARD TIME (perhaps 3.5 times 270 LINK_PROBE_INTERVAL) during which time it sends FRLS packets back 271 towards its neighbor indicating that it has heard its neighbor. 272 Reception of any packet from its neighbor causes the router to reset 273 the ROUTER_HEARD_TIMER associated with this flag setting. Reception 274 of a FRLS packet with the neighbor heard flag set to TRUE causes the 275 router to consider the link with its neighbor as bi-directional. At 276 that time the Link Failure Detection Timer (LFDT) is also set to 277 ROUTER_HEARD_TIME. Reception of a subsequent FRLS packet with the 278 neighbor heard flag set resets the LFDT. Failure to receive an FRLS 279 packet with the neighbor heard flag set within the LFDT setting 280 indicates link failure. Also, if the router considers the link to 281 be bi-directional, subsequent reception of a FRLS packet without the 282 neighbor heard flag set immediately indicates link failure (i.e. 283 this indicates that its neighbor no longer hears the router). Thus 284 the sum of ROUTER_HEARD_TIME plus LINK_PROBE_INTERVAL equals Link 285 Failure Upper Bound (LFUB) and effectively upper bounds the time 286 required to detect failure. 288 Alternatively, the protocol may also rely on a link layer 289 notification of link failure (e.g. loss of carrier) if available. 290 Such a signal may provide a more accurate and timely indication of 291 link status. 293 3.3 Reaction to Failure 295 After the link status assessment protocol detects a link failure 296 (possible router failure) at a router X with a neighbor Y, the 297 following processing steps occur at router X: 299 1) Router X initially drops (or buffers) data packets for 300 destination routes whose next hop was over the failed link with 301 neighbor Y. This clearly includes any single-homed or multi- 302 homed destination prefixes on the (potentially failed) 303 neighboring router Y. 305 2) Router X adjusts routes for all affected destination prefixes 306 aiming to locally route around the failed link and/or router. Let 307 us assume that the routing protocol's reaction to a link failure 308 generates a Routing Protocol Link Failure (RPLF) packet for each 309 destination prefix. Note that the RPLF activity for a particular 310 destination will definitely fail in either of two cases. 312 3) The first would be if the link failure had generated a partition 313 in the topology towards that destination which places an obvious 314 requirement that restoration requires a sufficient level of 315 connectivity between nodes to guarantee the availability of an 316 alternate path. Partition detection would lead to the generation 317 of a Routing Protocol Destination Failure (RPDF) packet that 318 would be used to eliminate the routing for that destination 319 throughout the domain. 321 4) The second more specific example occurs if the destination prefix 322 is single-homed on a router, and that router has itself failed, 323 rendering restoration around the failure futile. Therefore, for 324 singly homed prefixes it may be useful to react differently to 325 the failure of the router. This firstly requires a method to 326 discover which routers have single-homed prefixes, a method to 327 discover when such routers fail, and an alternative strategy to 328 react to this type of failure. These are discussed in section 5. 329 One such strategy though would be to hold back on pointless RPLF 330 processing to give the router time to re-boot. This avoids the 331 RPLF processing causing partition detection and subsequent RPDF 332 generation, resulting in the elimination of the routing for that 333 destination throughout the domain. In this case, the domain 334 relies on interim ICMP unreachable messages to inform 335 applications of the loss of connectivity to the destination. If 336 re-boot does not occur within a fixed period then the routes to 337 that destination can be then be reasonably removed from the 338 domain through RPDF propagation. This is somewhat similar to the 339 `graceful' strategy for avoiding routing flaps in BGP 340 [BGPrestart] by suppressing route erasure on router failure in 341 anticipation of fast reboot and graceful BGP restart. 343 4 The Temporally-Ordered Routing Algorithm 345 4.1 Overview 347 We now map these generic packets to a specific routing protocol --- 348 the Temporally-Ordered Routing Algorithm (TORA) [TORA, TORAdraft]. 349 In TORA, RPLF packets map to Update (UPD) packets while RPDF packets 350 map to Erase (ERS) packets. This latter packet type does not yet 351 appear in [TORA, TORAdraft]. 353 TORA is not a shortest-path routing protocol. It is a "link 354 reversal" routing protocol that differs significantly from existing 355 routing technology. TORA is similar to distance vector protocols 356 (e.g. RIP) in that a separate version of the protocol runs 357 independently for each destination prefix. Thus its storage 358 requirements scale with the number of destination prefixes. But its 359 similarity ends there. In its original usage mode, TORA is an on- 360 demand protocol that builds routes reactively in response to route 361 requests from a MANET IP kernel. However, for usage in fixed 362 networks it is transformed into a more traditional, proactive 363 protocol. 365 The key attribute of the protocol remains the same for both 366 versions, and this is the protocol's aggressive, optimistic reaction 367 to link failures which is critical within the MANET environment but 368 is becoming equally important in fixed network scenario's. On link 369 failure, when necessary, the protocol immediately reverses the 370 directions of links in the area near the failure effectively 371 searching for alternative routes to the destination. If such exist, 372 the protocol typically finds them in a time and communication- 373 efficient, single-pass reaction. Otherwise (in case of partition), 374 the reaction detects the partition enabling route erasure if 375 desired. 377 We now briefly describe how TORA would function within the fast 378 restoration framework. 380 4.2 Router Activation (Route Creation) 382 When a router boots up, it establishes new links with its neighbors. 383 The router proactively initiates a localized process of height 384 database synchronization with each of its new neighbors. Its 385 neighbors transmit the contents of their height databases to the new 386 router. From this and any received routing advertisements (if any), 387 a router can determine which of its own prefixes are single or 388 multi-homed. 390 It then generates an Optimization (OPT) packet for each destination 391 prefix it wishes to advertise. If the prefix is single-homed and 392 heights exist in the neighboring routers, the OPT packet stops at 393 these neighbors and is not propagated further. Otherwise, the OPT 394 packet floods fully (or partially) throughout a domain installing 395 (or reorienting) routing entries for its destination prefix. The 396 collection of routing entries forms a Directed Acyclic Graph (DAG). 397 From an individual router's perspective, this means that it will 398 have one or more feasible next hops for the destination, all of 399 which are guaranteed to be loop-free. 401 Loop-free multipath routing is achieved through the use of 402 "heights". Each router maintains a height with respect to the 403 destination, and the destination has the lowest height. A link is 404 directed from a higher height to a lower height, and packets may 405 only be forwarded from routers with higher heights to routers with 406 lower heights. Thus packets flow "downhill" towards the 407 destination. Each height is guaranteed to be unique and thus the 408 collection of heights forms a total order. The protocol is 409 sometimes also referred to as the "Totally-Ordered Routing 410 Algorithm". 412 From a router's perspective, a neighbor that is higher is referred 413 to as an "upstream" neighbor while a neighbor that is lower is 414 referred to as a "downstream" neighbor. A destination router always 415 has only upstream neighbors, and no other router is allowed to have 416 upstream neighbors without having at least one downstream neighbor. 417 It should be understood that all heights and associated link 418 directions are per IP destination prefix DAG. 420 4.3 Link Failure 422 A router may lose an upstream neighbor (due to a link failure) at 423 any time without requiring protocol reaction. Similarly, it may 424 lose a downstream neighbor provided it has other downstream 425 neighbors, or has no downstream neighbors and no upstream neighbors, 426 without requiring protocol reaction. However, if due to a link 427 failure a router finds itself with at least one upstream neighbor 428 without having any downstream neighbors (i.e. it just lost its last 429 downstream neighbor and is a black hole), the router must adjust its 430 height so that it has at least one downstream neighbor. The manner 431 in which this occurs is described in [TORA,TORAdraft]. The 432 adjustment results in transmission of an UPD (RPLF) packet to the 433 router's neighbors advertising its new height. 435 The effect of this height adjustment is to reverse the direction of 436 some or all of the router's links; hence the name "link reversal" 437 algorithm. Reception of an UPD (RPLF) packet at a neighbor may 438 cause that neighbor to lose its last downstream link, which is cause 439 for subsequent height modification and UPD packet transmission. 440 Examination of the algorithm's reaction on mesh-like topologies 441 reveals that the number of subsequent UPD (RPLF) transmissions is 442 typically quite low; often only one UPD (RPLF) transmission is 443 required. The resultant reaction is thus highly localized. It is 444 also strongly de-correlated with the size of the domain. 446 4.4 Link Activation 448 When a router restarts, it re-establishes links with its neighbors. 449 The router initiates a localized process of height database 450 synchronization with each of its new neighbors. Its neighbors 451 transmit the contents of their height databases to the new router. 452 By default, the router's new heights are NULL for each destination. 453 (Note: A NULL height is higher than any non-NULL height, and packets 454 may be forwarded from NULL to non-NULL heights. A link between two 455 routers with NULL heights for a destination is marked as 456 "unassigned", and packet forwarding over such links is disallowed.) 457 The re-started router then performs its own non-NULL height 458 selection with respect to the existing heights of its neighbors. 460 The algorithm for height selection is not discussed here, but it 461 will be described in an upcoming draft detailing modification of 462 TORA for use in fixed networks. The procedure only involves a 463 router's one-hop neighbors. The height selection procedure occurs 464 slowly, and a link is gradually brought back into active usage 465 within the routing domain. 467 4.5 Router Failure (Route Erasure) 469 When a router fails, then the default behavior is that each neighbor 470 individually reacts to the router failure as a link failure, sending 471 UPD (RPLF) packets as described in 3.3 above. This will restore 472 paths around the failed router if sufficient physical connectivity 473 exists. 475 The failed router itself may have been advertising destination 476 prefixes into the network and in the case of multi-homed 477 subnets/prefixes the TORA reaction will result in the paths towards 478 the failed routing being redirected towards the nearest alternate 479 router also advertising that prefix. 481 In the case of single-homed prefixes, the restoration attempt will 482 clearly fail as those subnets are now partitioned from the core 483 network. The collective UPD (RPLF) reaction of the routers connected 484 to the failed router, followed by partition detection via the normal 485 TORA reflection, would eventually clear the network of routing state 486 for these single-homed destination prefixes. This is wasteful if 487 the router is simply undergoing a re-boot and therefore additional 488 protocol mechanisms may be warranted as highlighted in 3.3 and 489 described in section5 below. The aim of these mechanisms in TORA 490 would be to avoid sending UPD (RPLF) packets for the single-homed 491 destination prefixes, wait to see if the associated router recovers 492 in a set time, and if not, only then use ERS (RPDF) packets to 493 efficiently clear this routing state from the network. 495 5. Detecting Router Failures with Single-Homed Prefixes 497 As has previously been discussed, it may be useful to use additional 498 protocol mechanisms to be able to distinguish router failures and 499 associated single-homed destinations so that futile restoration 500 processing and premature route elimination can be avoided. One such 501 approach is outlined below, but the necessity and some of the 502 mechanics for solving this problem may be critically dependent on 503 the particular IGP. 505 In this approach, a bootstrapping neighbor advertisement protocol 506 enables a router to learn its one-hop neighbors, as well as its 507 neighbors' one-hop neighbors, called the two-hop neighbor set. It 508 works in combination with IPv6 router advertisements (or equivalent 509 ICMP router messages in IPv4 (RFC1256)) to enable a router to assess 510 its routing neighbors. Once a link failure is detected, the neighbor 511 reachability protocol attempts to determine whether the router at 512 the other end of the failed link is still reachable, and whether it 513 is advertising single-homed and/or multi-homed destination prefixes. 514 With input from the preceding protocols, the highly localized 515 routing protocol then reacts differently for router failures 516 involving single-homed prefixes. 518 5.1 Neighbor Advertisement Protocol 520 This protocol is principally concerned with neighbor discovery and 521 protocol bootstrapping. As with OSPF, each router is identified by a 522 Router ID (RID). It also is associated with at least one IP 523 destination prefix by which it is globally reachable. Routes for 524 this prefix are advertised out of all its interfaces and built by 525 the routing algorithm. It may also advertise other destination 526 prefixes. These prefixes may be single-homed (i.e. only this router 527 advertises itself as the last hop for these prefixes) or multi-homed 528 (i.e. they are advertised by multiple routers as being last hop 529 routers). 531 The neighbor advertisement protocol requires that each router 532 periodically advertises its RID and destination prefix(es) in Fast 533 Restore Neighbor Advertisement (FRNA) packets to its one-hop routing 534 neighbors out interfaces on which routing is activated. FRNA packets 535 would likely be multicast (TTL scope == 1) to a well-known multicast 536 address. The transmission period would be relatively long (on the 537 order of 30 seconds). A router's FRNA packet also contains the RIDs 538 and corresponding prefixes of the router's one-hop neighbors that it 539 has learned from their previous FRNA transmissions. In this way, by 540 receiving its neighbors' FRNA packets, a router learns the 541 identities of its neighbors' neighbors in a fashion very similar to 542 the neighbor advertisement algorithm of [OLSR]. Associated with 543 each link a router has with a routing neighbor is an instance of a 544 link status assessment protocol (see section 2.2). For each 545 instance a router has a Link Failure Detection Timer (LFDT) for that 546 link. This is the time after which a router will declare a link to 547 its neighbor down if no packets are received from its neighbor. A 548 router also computes an Link Failure Upper Bound (LFUB) on the 549 overall time required to detect a failure on each link, which is 550 equal to the LFDT value plus a link polling period (to be 551 described). A router X also advertises in its FRNA transmissions the 552 maximum computed value over all its links of these bounds, known as 553 the maximum LFUB (LFUBmax(X)). 555 The neighbor advertisement protocol operates in conjunction with 556 IPv6 Router Advertisement (RA) messages (or an equivalent in IPv4) 557 which are sent out interfaces on which routing is not activated 558 (e.g. over subnets with attached hosts). Router advertisement 559 messages convey the destination prefixes being advertised by other 560 routers on a given subnet. By listening to these advertisements, a 561 router can learn if other routers are advertising destination 562 prefixes that itself is also advertising. It thus determines which 563 of its own last-hop prefixes are single-homed or multi-homed. 565 5.2 Neighbor Reachability Protocol 567 We now describe the basic operation of a neighbor reachability 568 protocol. The protocol is assumed to run at a router X, and is 569 triggered by link failure detection on a link to router Y. After a 570 short time (estimated sufficient for localized route re- 571 computation), router X executes a distributed neighbor reachability 572 protocol to determine reachability of the single-homed destination 573 prefixes advertised by neighbor Y as follows. 575 The (potentially failed) neighbor Y's maximum LFUB (LFUBmax) is 576 known from its previous FRNA transmissions. With this information a 577 router X creates a RPLF packet as it would if this were a link 578 failure, but delays transmission of the packet for a time T somewhat 579 greater than its neighbor Y's LFUBmax; thus T > LFUBmax(Y). In the 580 meantime it immediately sends a Fast Restore Link Failure (FRLF) 581 packet to each of the (potentially failed) neighbor Y's one-hop 582 neighbors. Reception of a FRLF packet at such a neighbor Z 583 instructs that neighbor to mark the link between the indicated 584 router Y and the sender X as "failed". But the receiver Z keeps the 585 sender X in its list of the neighbor Y's one-hop neighbors. This is 586 to ensure that Z will still send a FRLF packet to X if it also 587 detects a link failure with Y. Z will change its notion of Y's 588 neighbors only on reception of a FRNA message from Y itself. 589 Subsequent reception of a FRLS packet at router Z from neighbor Y 590 indicates to the receiver Z that the link between routers X and Y 591 may indeed have failed, but that router Y itself is still 592 functioning. 594 Let us assume that router Y has indeed failed, and that its neighbor 595 X was the first neighbor to detect this failure, thus sending FRLF 596 packets to router Y's previous one-hop neighbors. Subsequently 597 others of Y's previous neighbor set begin to declare link failures 598 with Y, each scheduling delayed RPLF transmissions and immediately 599 sending an FRLF packet to each of Y's past neighbors. Let us also 600 assume that Y's neighbor Z is the last neighbor to detect a link 601 failure with router Y. It also schedules a RPLF packet for 602 transmission (suitably delayed) and sends FRLF packets to each of 603 Y's previous neighbors. By this time (or very shortly thereafter) Z 604 will likely to be the first of Y's past neighbors to both declare 605 link failure with Y and have received FRLF messages from all of Y's 606 previous neighbors (without having heard a subsequent message from 607 Y). 609 At that point router Z declares "router failure" of previously 610 adjacent neighbor Y, and cancels its pending RPLF transmission for 611 Y. The router Z then queues a Routing Protocol Destination Failure 612 (RPDF) packet for each of neighbor Y's single-homed destination 613 prefixes (Note that for efficiency, multiple RPDF packets may be 614 aggregated and sent within a single aggregate RPDF packet). 616 Assuming T (T > LFUBmax(Y)) is set appropriately at each of Y's 617 previous neighbors, none of Y's previous neighbors would have yet 618 transmitted their pending RPLF packets. The router also sets a 619 NEIGHBOR_RESTART_TIMER to some value (possibly 30 seconds) while it 620 waits for neighbor Y to reboot. During this time packets sent to 621 Y's single-homed prefixes will be undeliverable resulting in "host 622 unreachable" ICMP message being returned to the sources. Assuming 623 neighbor Y reboots within this time, the queued RPDF transmission is 624 cancelled. Otherwise the queued RPDF packet is transmitted when the 625 timer expires and the routes for Y's single-homed destination 626 prefixes are erased from the domain. The objective of this timer is 627 to suppress route erasure in the case of fast router reboot in a 628 fashion similar to that proposed in [BGPrestart]. 630 If the NEIGHBOR_RESTART_TIMER expires and T (T > LFUBmax(Y)) is not 631 set appropriately at each of Y's previous neighbors, possibly others 632 may also have queued and then issued RPDF packets prior to receiving 633 an FRDF message originating from router Z. This is not detrimental. 634 RPDF packets are destination-specific and source-independent. Their 635 collective propagation results in only a single aggregated flood, 636 regardless of the number of routers originating RPDF packets. An 637 RPDF packet floods throughout the network immediately, efficiently 638 clearing state information regarding the effected destinations. An 639 RPDF packet is sent to the ALL_ROUTERS multicast address. 641 For maximal efficiency the aforementioned protocol assumes a domain 642 topology with the characteristic that, after any single router or 643 link failure, the remaining domain remains well connected. Thus 644 router failure does not partition the remaining domain permitting 645 the failed neighbors to exchange FRLF messages. This is required so 646 fast restoration can be achieved. 648 Nevertheless, if the network becomes partitioned the above algorithm 649 need not result in undo harm provided the routing protocol is 650 capable of detecting partitions. A router/link failure, in such a 651 case, will result in multiple link failures being detected since the 652 failed router Y's neighbors on one side of the partitioned network 653 will not be able to reach Y's one hop neighbor's on the other side 654 of the partitioned domain (i.e. FRLF will fail to reach some 655 neighbors) to reach a more definite conclusion. Consequently each 656 such neighbor will transmit a RPLF packet for each single-homed 657 destination prefix on the potentially failed router. Each packet 658 will initiate a distributed computation that performs "distributed 659 partition detection" and eventually clears the domain of routing 660 state for its destination. This approach is less efficient than 661 that using RPDF packets. 663 The approach also assumes the FRLF messages are not frequently 664 dropped. Loss of at least one FRLF message destined for each of the 665 failed destination's neighbors would ensure that no neighbor 666 generates an RPDF packet. If there are k such neighbors with each 667 needing to receive all k-1 FRLF packets from the other neighbors, 668 and the probability of packet dropping is p, then an upper bound on 669 the probability of this undesirable, joint packet dropping event is 670 p^k. That is, each of the k neighbors must have missed at least one 671 packet from some other neighbor. If control packets are given 672 reasonable priority (suitably small p), then this will not occur 673 very frequently (i.e. only with probability p^k). 675 If it does occur, each of these neighbors would start to 676 independently transmit their queued RPLF packets (which is what this 677 router failure detection mechanism intends to avoid). This results 678 in the aforementioned distributed partition detection process and 679 the worst-case consequences regarding convergence in this case 680 depend on the particular routing algorithm. 682 6. Conclusions 684 The fast restoration processing occurs entirely at layer 3 and does 685 not require special layer 2 considerations, although a layer 2 link 686 activation/failure notification trigger may be used if available. 688 The overall restoration approach is constrained only by the speed at 689 which a router can create, transmit and process the localized 690 routing updates. Increases in processing and transmission speeds 691 will only improve restoration performance over time. 693 It is anticipated that restoration can frequently be achieved with 694 little or no noticeable effect on voice over IP sessions. Some 695 degradation would likely be experienced by higher rate sessions 696 given current hardware capabilities. 698 The approach offers immunity to link instability (such as that 699 triggered by malfunctioning interface cards). A link that flip- 700 flops between UP and DOWN will effectively be removed from the 701 forwarding paths of its adjacent routers without triggering domain- 702 wide routing updates or routing re-convergence computations. Thus 703 the network will not melt down due to localized link problems. 705 The approach offers reasonable immunity to router instability as 706 well. A router that frequently goes UP and DOWN will not create 707 domain-wide routing messages (route building followed by erasure) if 708 it can quickly reboot. If it cannot, then these messages still only 709 effect its own prefixes. With regards to the remaining network 710 prefixes, the routing protocol will treat such router instability as 711 a set of individually unstable links, effectively removing them from 712 the forwarding paths for these prefixes and isolating the network 713 from harm. 715 The net effect then is that one can deploy much larger domains with 716 greatly reduced risk of domain melt down while enabling fast 717 restoration of IP flows when necessary. 719 7. References 721 [BGPrestart] S. Ramachandra et al., "Graceful Restart mechanism for 722 BGP", Internet-Draft, draft-ramachandra-bgp-restart-00.txt, (work in 723 progress), August 2000. 725 [FLIP] H.Sandick et al., "Fast LIveness Protocol (FLIP)", Internet 726 Draft, draft-sandiick-flip-00.txt, February 2000. 728 [IAB99] M. Kaat, "Overview of 1999 IAB Network Layer Workshop", 729 Internet-Draft, draft-iab-ntwlyrws-over-03.txt, July 2000. 731 [MobileIP] C.E. Perkins, ``IP Mobility Support," Internet RFC 2002, 732 Oct 1996. 734 [OLSR] P. Jacquet, P. Muhlethaler and A. Qayyum, "Optimized Link 735 State Routing Protocol", Internet-Draft, draft-ietf-manet-olsr- 736 01.txt, February 2000. 738 [OSPF] J. Moy, "OSPF Version 2", Internet RFC 2328, April 1998. 740 [RIP] G. Malkin, "RIP Version 2", Internet RFC 2453, November 1998. 742 [TORA] V. Park, M. S. Corson, "The Temporally-Ordered Routing 743 Algorithm", Proc. IEEE INFOCOM '97, Kobe, Japan, April 1997. 745 [TORAdraft] V. Park, M. S. Corson, "Temporally-Ordered Routing 746 Algorithm (TORA) Version 1 Functional Specification", Internet-Draft 747 (work in progress), draft-ietf-manet-tora-spec-02.txt, October 1999. 749 Author's Addresses 751 M. Scott Corson 752 University of Maryland 753 Ansible Systems 754 (+1) 301-405-6630 755 corson@isr.umd.edu 756 mscorson@ix.netcom.com 758 Alan O'Neill 759 BT 760 (+44) 1473-646459 761 alan.w.oneill@bt.com 763 George Tsirtsis 764 Flarion Technologies 765 (+44) 020 88260073 766 g.tsirtsis@Flarion.com 767 gtsirt@hotmail.com 768 Copyright Notice 770 Placeholder for ISOC copyright.