idnits 2.17.1 draft-ietf-rtgwg-mrt-frr-architecture-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 26, 2012) is 4471 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'R' is mentioned on line 257, but not defined == Missing Reference: 'F' is mentioned on line 257, but not defined == Missing Reference: 'C' is mentioned on line 257, but not defined == Missing Reference: 'I' is mentioned on line 287, but not defined == Missing Reference: 'ABR1' is mentioned on line 589, but not defined == Missing Reference: 'ABR2' is mentioned on line 384, but not defined == Missing Reference: 'A' is mentioned on line 579, but not defined == Missing Reference: 'H' is mentioned on line 589, but not defined == Unused Reference: 'RFC5384' is defined on line 794, but no explicit reference was found in the text == Outdated reference: A later version (-04) exists of draft-enyedi-rtgwg-mrt-frr-algorithm-00 ** Downref: Normative reference to an Informational draft: draft-enyedi-rtgwg-mrt-frr-algorithm (ref. 'I-D.enyedi-rtgwg-mrt-frr-algorithm') ** Downref: Normative reference to an Informational RFC: RFC 5714 == Outdated reference: A later version (-11) exists of draft-ietf-rtgwg-ipfrr-notvia-addresses-08 == Outdated reference: A later version (-12) exists of draft-ietf-rtgwg-ordered-fib-05 Summary: 2 errors (**), 0 flaws (~~), 13 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing Area Working Group A. Atlas, Ed. 3 Internet-Draft R. Kebler 4 Intended status: Standards Track M. Konstantynowicz 5 Expires: July 29, 2012 Juniper Networks 6 G. Enyedi 7 A. Csaszar 8 Ericsson 9 R. White 10 Cisco Systems 11 M. Shand 12 January 26, 2012 14 An Architecture for IP/LDP Fast-Reroute Using Maximally Redundant Trees 15 draft-ietf-rtgwg-mrt-frr-architecture-00 17 Abstract 19 As IP and LDP Fast-Reroute are increasingly deployed, the coverage 20 limitations of Loop-Free Alternates are seen as a problem that 21 requires a straightforward and consistent solution for IP and LDP, 22 for unicast and multicast. This draft describes an architecture 23 based on redundant backup trees where a single failure can cut a 24 point-of-local-repair from the destination only on one of the pair of 25 redundant trees. 27 One innovative algorithm to compute such topologies is maximally 28 disjoint backup trees. Each router can compute its next-hops for 29 each pair of maximally disjoint trees rooted at each node in the IGP 30 area with computational complexity similar to that required by 31 Dijkstra. 33 The additional state, address and computation requirements are 34 believed to be significantly less than the Not-Via architecture 35 requires. 37 Status of this Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at http://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on July 29, 2012. 54 Copyright Notice 56 Copyright (c) 2012 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 1.1. Goals for Extending IP Fast-Reroute coverage beyond LFA . 4 73 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 3. Maximally Redundant Trees (MRT) . . . . . . . . . . . . . . . 6 75 4. Maximally Redundant Trees (MRT) and Fast-Reroute . . . . . . . 8 76 4.1. Multi-homed Prefixes . . . . . . . . . . . . . . . . . . . 9 77 4.2. Unicast Forwarding with MRT Fast-Reroute . . . . . . . . . 10 78 4.2.1. LDP Unicast Forwarding - Avoid Tunneling . . . . . . . 11 79 4.2.1.1. Protocol Extensions and Considerations: LDP . . . 12 80 4.2.2. IP Unicast Traffic . . . . . . . . . . . . . . . . . . 12 81 4.2.2.1. Protocol Extensions and Considerations: OSPF 82 and ISIS . . . . . . . . . . . . . . . . . . . . . 13 83 4.2.3. Inter-Area and ABR Forwarding Behavior . . . . . . . . 13 84 4.2.4. Issues with Area Abstraction . . . . . . . . . . . . . 15 85 4.2.5. Partial Deployment and Islands of Compatible MRT 86 FRR routers . . . . . . . . . . . . . . . . . . . . . 16 87 4.2.6. Network Convergence and Preparing for the Next 88 Failure . . . . . . . . . . . . . . . . . . . . . . . 17 89 4.2.6.1. Micro-forwarding loop prevention and MRTs . . . . 17 90 4.2.6.2. MRT Recalculation . . . . . . . . . . . . . . . . 17 91 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 92 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 93 7. Security Considerations . . . . . . . . . . . . . . . . . . . 18 94 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 95 8.1. Normative References . . . . . . . . . . . . . . . . . . . 18 96 8.2. Informative References . . . . . . . . . . . . . . . . . . 19 97 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 99 1. Introduction 101 There is still work required to completely provide IP and LDP Fast- 102 Reroute[RFC5714] for unicast and multicast traffic. This draft 103 proposes an architecture to provide 100% coverage. 105 Loop-free alternates (LFAs)[RFC5286] provide a useful mechanism for 106 link and node protection but getting complete coverage is quite hard. 107 [LFARevisited] defines sufficient conditions to determine if a 108 network provides link-protecting LFAs and also proves that augmenting 109 a network to provide better coverage is NP-hard. 110 [I-D.ietf-rtgwg-lfa-applicability] discusses the applicability of LFA 111 to different topologies with a focus on common PoP architectures. 113 While Not-Via [I-D.ietf-rtgwg-ipfrr-notvia-addresses] is defined as 114 an architecture, in practice, it has proved too complicated and 115 stateful to spark substantial interest in implementation or 116 deployment. Academic implementations [LightweightNotVia] exist and 117 have found the address management complexity high (but no 118 standardization has been done to reduce this). 120 A different approach is needed and that is what is described here. 121 It is based on the idea of using disjoint backup topologies as 122 realized by Maximally Redundant Trees (described in 123 [LightweightNotVia]); the general architecture could also apply to 124 future improved redundant tree algorithms. 126 1.1. Goals for Extending IP Fast-Reroute coverage beyond LFA 128 Any scheme proposed for extending IPFRR network topology coverage 129 beyond LFA, apart from attaining basic IPFRR properties, should also 130 aim to achieve the following usability goals: 132 o ensure maximum physically feasible link and node disjointness 133 regardless of topology, 135 o automatically compute backup next-hops based on the topology 136 information distributed by link-state IGP, 138 o do not require any signaling in the case of failure and use pre- 139 programmed backup next-hops for forwarding, 141 o introduce minimal amount of additional addressing and state on 142 routers, 144 o enable gradual introduction of the new scheme and backward 145 compatibility, 147 o and do not impose requirements for external computation. 149 2. Terminology 151 2-connected: A graph that has no cut-vertices. This is a graph 152 that requires two nodes to be removed before the network is 153 partitioned. 155 2-connected cluster: A maximal set of nodes that are 2-connected. 157 2-edge-connected: A network graph where at least two links must be 158 removed to partition the network. 160 ADAG: Almost Directed Acyclic Graph - a graph that, if all links 161 incoming to the root were removed, would be a DAG. 163 block: Either a 2-connected cluster, a cut-edge, or an isolated 164 vertex. 166 cut-link: A link whose removal partitions the network. A cut-link 167 by definition must be connected between two cut-vertices. If 168 there are multiple parallel links, then they are referred to as 169 cut-links in this document if removing the set of parallel links 170 would partition the network. 172 cut-vertex: A vertex whose removal partitions the network. 174 DAG: Directed Acyclic Graph - a graph where all links are directed 175 and there are no cycles in it. 177 GADAG: Generalized ADAG - a graph that is the combination of the 178 ADAGs of all blocks. 180 Maximally Redundant Trees (MRT): A pair of trees where the path 181 from any node X to the root R along the first tree and the path 182 from the same node X to the root along the second tree share the 183 minimum number of nodes and the minimum number of links. Each 184 such shared node is a cut-vertex. Any shared links are cut-links. 185 Any RT is an MRT but many MRTs are not RTs. 187 network graph: A graph that reflects the network topology where all 188 links connect exactly two nodes and broadcast links have been 189 transformed into the standard pseudo-node representation. 191 Redundant Trees (RT): A pair of trees where the path from any node 192 X to the root R along the first tree is node-disjoint with the 193 path from the same node X to the root along the second tree. 194 These can be computed in 2-connected graphs. 196 3. Maximally Redundant Trees (MRT) 198 In the last few years, there's been substantial research on how to 199 compute and use redundant trees. Redundant trees are directed 200 spanning trees that provide disjoint paths towards their common root. 201 These redundant trees only exist and provide link protection if the 202 network is 2-edge-connected and node protection if the network is 203 2-connected. Such connectiveness may not be the case in real 204 networks, either due to architecture or due to a previous failure. 205 The work on maximally redundant trees has added two useful pieces 206 that make them ready for use in a real network. 208 o Computable regardless of network topology: The maximally redundant 209 trees are computed so that only the cut-edges or cut-vertices are 210 shared between the multiple trees. 212 o Computationally practical algorithm is based on a common network 213 topology database. Algorithm variants can compute in O( e) or O(e 214 + n log n), as given in [I-D.enyedi-rtgwg-mrt-frr-algorithm]. 216 There is, of course, significantly more in the literature related to 217 redundant trees and even fast-reroute, but the formulation of the 218 Maximally Redundant Trees (MRT) algorithm makes it very well suited 219 to use in routers. 221 A known disadvantage of MRT, and redundant trees in general, is that 222 the trees do not necessarily provide shortest detour paths. The use 223 of the shortest-path-first algorithm in tree-building and including 224 all links in the network as possibilities for one path or another 225 should improve this. Modeling is underway to investigate and compare 226 the MRT alternates to the optimal 227 [I-D.enyedi-rtgwg-mrt-frr-algorithm]. Providing shortest detour 228 paths would require failure-specific detour paths to the 229 destinations, but the state-reduction advantage of MRT lies in the 230 detour being established per destination (root) instead of per 231 destination AND per failure. 233 The specific algorithm to compute MRTs as well as the logic behind 234 that algorithm and alternative computational approaches are given in 235 detail in [I-D.enyedi-rtgwg-mrt-frr-algorithm]. Those interested are 236 highly recommended to read that document. This document describes 237 how the MRTs can be used and not how to compute them. 239 The most important thing to understand about MRTs is that for each 240 pair of destination-routed MRTs, there is a path from every node X to 241 the destination D on the Blue MRT that is as disjoint as possible 242 from the path on the Red MRT. The two paths along the two MRTs to a 243 given destination-root of a 2-connected graph are node-disjoint, 244 while in any non-2-connected graph, only the cut-vertices and cut- 245 edges can be contained by both of the paths. 247 For example, in Figure 1, there is a network graph that is 248 2-connected in (a) and associated MRTs in (b) and (c). One can 249 consider the paths from B to R; on the Blue MRT, the paths are 250 B->F->D->E->R or B->F->C->E->R. On the Red MRT, the path is B->A->R. 251 These are clearly link and node-disjoint. These MRTs are redundant 252 trees because the paths are disjoint. 254 [E]---[D]---| [E]<--[D]<--| [E]-->[D]---| 255 | | | | ^ | | | 256 | | | V | | V V 257 [R] [F] [C] [R] [F] [C] [R] [F] [C] 258 | | | ^ ^ ^ | | 259 | | | | | | V | 260 [A]---[B]---| [A]-->[B]---| [A]---[B]<--| 262 (a) (b) (c) 263 a 2-connected graph Blue MRT towards R Red MRT towards R 265 Figure 1: A 2-connected Network 267 By contrast, in Figure 2, the network in (a) is not 2-conneted. If 268 F, G or the link F<->G failed, then the network would be partitioned. 269 It is clearly impossible to have two link-disjoint or node-disjoint 270 paths from G, I or J to R. The MRTs given in (b) and (c) offer paths 271 that are as disjoint as possible. For instance, the paths from B to 272 R are the same as in Figure 1 and the path from G to R on the Blue 273 MRT is G->F->D->E->R and on the Red MRT is G->F->B->A->R. 275 [E]---[D]---| 276 | | | |----[I] 277 | | | | | 278 [R]---[C] [F]---[G] | 279 | | | | | 280 | | | |----[J] 281 [A]---[B]---| 283 (a) 284 a non-2-connected graph 286 [E]<--[D]<--| [E]-->[D]---| 287 | ^ | [I] | | [I] 288 V | | ^ V V | 289 [R]<--[C] [F]<--[G] | [R]---[C] [F]<--[G] | 290 ^ ^ | | ^ | | ^ V 291 | | |--->[J] | V | |----[J] 292 [A]-->[B]---| [A]<--[B]<--| 294 (b) (c) 295 Blue MRT towards R Red MRT towards R 297 Figure 2: A non-2-connected network 299 4. Maximally Redundant Trees (MRT) and Fast-Reroute 301 In normal IGP routing, each router has its shortest-path-tree to all 302 destinations. From the perspective of a particular destination, D, 303 this looks like a reverse SPT (rSPT). To use maximally redundant 304 trees, in addition, each destination D has two MRTs associated with 305 it; by convention these will be called the blue and red MRTs. 307 MRTs are practical to maintain redundancy even after a single link or 308 node failure. If a pair of MRTs is computed rooted at each 309 destination, all the destinations remain reachable along one of the 310 MRTs in the case of a single link or node failure. 312 When there is a link or node failure affecting the rSPT, each node 313 will still have at least one path via one of the MRTs to reach the 314 destination D. For example, in Figure 2, C would normally forward 315 traffic to R across the C<->R link. If that C<->R link fails, then C 316 could use either the Blue MRT path C->D->E->R or the Red MRT path 317 C->B->A->R. 319 As is always the case with fast-reroute technologies, forwarding does 320 not change until a local failure is detected. Packets are forwarded 321 along the shortest path. The appropriate alternate to use is pre- 322 computed. [I-D.enyedi-rtgwg-mrt-frr-algorithm] describes exactly how 323 to determine whether the Blue MRT next-hops or the Red MRT next-hops 324 should be the MRT alternate next-hops for a particular primary next- 325 hop N to a particular destination D. 327 MRT alternates are always available to use, unless the network has 328 been partitioned. It is a local decision whether to use an MRT 329 alternate, a Loop-Free Alternate or some other type of alternate. 330 When a network needs to use a micro-loop prevention mechanism 331 [RFC5715] such as Ordered FIB[I-D.ietf-rtgwg-ordered-fib] or Farside 332 Tunneling[RFC5715], then the whole IGP area needs to have alternates 333 available so that the micro-loop prevention mechanism, which requires 334 slower network convergence, can take the necessary time without 335 impacting traffic badly. 337 As described in [RFC5286], when a worse failure than is anticipated 338 happens, using LFAs that are not downstream neighbors can cause 339 micro-looping. An example is given of link-protecting alternates 340 causing a loop on node failure. Even if a worse failure than 341 anticipated happened, the use of MRT alternates will not cause 342 looping. Therefore, while node-protecting LFAs may be prefered, 343 there are advantages to using MRT alternates when such a node- 344 protecting LFA is not a downstream path. 346 4.1. Multi-homed Prefixes 348 One advantage of LFAs that is necessary to preserve is the ability to 349 protect multi-homed prefixes against ABR failure. For instance, if a 350 prefix from the backbone is available via both ABR A and ABR B, if A 351 fails, then the traffic should be redirected to B. This can also be 352 done for backups via MRT. 354 This generalizes to any multi-homed prefix. A multi-homed prefix 355 could be: 357 o An out-of-area prefix announced by more than one ABR, 359 o An AS-External route announced by 2 or more ASBRs, 361 o A prefix with iBGP multipath to different ASBRs, 363 o etc. 365 For each prefix, the two lowest total cost ABRs are selected and a 366 proxy-node is created connected to those two ABRs. If there exist 367 multiple multi-homed prefixes that share the same two best 368 connectivity, then a single proxy-node can be used to represent the 369 set. An example of this is shown in Figure 3. 371 2 2 2 2 372 A----B----C A----B----C 373 2 | | 2 2 | | 2 374 | | | | 375 [ABR1] [ABR2] [ABR1] [ABR2] 376 | | | | 377 p,10 p,15 10 |---[P]---| 15 379 (a) Initial topology (b)with proxy-node 381 A<---B<---C A--->B--->C 382 | ^ ^ | 383 V | | V 384 [ABR1] [ABR2] [ABR1] [ABR2] 385 | | 386 |-->[P] [P]<--| 388 (c) Blue MRT (d) Red MRT 390 Figure 3: Prefixes Advertised by Multiple ABRs 392 The proxy-nodes and associated links are added to the network 393 topology after all real links have been assigned to a direction and 394 before the actual MRTs are computed. Proxy-nodes cannot be transited 395 when computing the MRTs. In addition to computing the pair of MRTs 396 associated with each router destination D in the area, a pair of MRTs 397 can be computed for each such proxy-node to fully protect against ABR 398 failure. 400 Each ABR or attaching router must remove the MRT marking[see 401 Section 4.2] and then forward the traffic outside of the area (or 402 island of MRT-fast-reroute-supporting routers). 404 When directing traffic along an MRT towards a multi-homed prefix, if 405 a topology-identifier label[see Section 4.2.1] is not used, then the 406 proxy-node must be named and either additional LDP labels or IP 407 addresses associated with it. 409 4.2. Unicast Forwarding with MRT Fast-Reroute 411 With LFA, there is no need to tunnel unicast traffic, whether IP or 412 LDP. The traffic is simply sent to an alternate. The behavior with 413 MRT Fast-Reroute is different depending upon whether IP or LDP 414 unicast traffic is considered. 416 Logically, one could use the same IP address or LDP FEC and then also 417 use 2 bits to express the topology to use. The topology options are 418 (00) IGP/SPT, (01) blue MRT, (10) red MRT. Unfortunately, there just 419 aren't 2 spare bits available in the IPv4 or IPv6 header. This has 420 different consequences for IP and LDP because LDP can just add a 421 topology label on top or take 2 spare bits from the label space. 423 Once the MRTs are computed, the two sets of MRTs are seen by the 424 forwarding plane as essentially two additional topologies. The same 425 considerations apply for forwarding along the MRTs as for handling 426 multiple topologies. 428 4.2.1. LDP Unicast Forwarding - Avoid Tunneling 430 For LDP, it is very desirable to avoid tunneling because, for at 431 least node protection, tunneling requires knowledge of remote LDP 432 label mappings and thus requires targeted LDP sessions and the 433 associated management complexity. There are two different mechanisms 434 that can be used. 436 1. Option A - Encode Topology in Labels: In addition to sending a 437 single label for a FEC, a router would provide two additional 438 labels with their associated MRT colors. This is simple, but 439 reduces the label space for other uses. It also increases the 440 memory to store the labels and the communication required by LDP. 442 2. Option B - Create Topology-Identification Labels: Use the label- 443 stacking ability of MPLS and specify only two additional labels - 444 one for each associated MRT color - by a new FEC type. When 445 sending a packet onto an MTR, first swap the LDP label and then 446 push the topology-identification label for that MTR color. When 447 receiving a packet with a topology-identification label, pop it 448 and use it to guide the next-hop selection in combination with 449 the next label in the stack; then swap the remaining label, if 450 appropriate, and push the topology-identification label for the 451 next-hop. This has minimal usage of additional labels, memory 452 and LDP communication. It does increase the size of packets and 453 the complexity of the required label operations and look-ups. 454 This can use the same mechanisms as are needed for context-aware 455 label spaces. 457 Note that with LDP unicast forwarding, regardless of whether 458 topology-identification label or encoding topology in label is used, 459 no additional loopbacks per router are required as are required in 460 the IP unicast forwarding case. This is because LDP labels are used 461 on a hop-by-hop basis to identify MRT-blue and MRT-red forwarding 462 trees. 464 For greatest hardware compatibility, routers should support Option B 465 of encoding the topology in the labels. 467 4.2.1.1. Protocol Extensions and Considerations: LDP 469 This captures an initial understanding of what may need to be 470 specified. 472 1. Specify Topology in Label: When sending a Label Mapping, have the 473 ability to send a Label TLV and multiple Topology-Label TLVs. 474 The Topology-Label TLV would specify MRT and the associated MRT 475 color. 477 2. Topology-Identification Labels: Define a new FEC type that 478 describes the topology for MRT and the associated MRT color. 480 4.2.2. IP Unicast Traffic 482 For IP, there is no currently practical alternative except tunneling. 483 The tunnel egress could be the original destination in the area, the 484 next-next-hop, etc.. If the tunnel egress is the original 485 destination router, then the traffic remains on the redundant tree 486 with sub-optimal routing. If the tunnel egress is the next-next-hop, 487 then protection of multi-homed prefixes and node-failure for ABRs is 488 not available. Selection of the tunnel egress is a router-local 489 decision. 491 There are three options available for marking IP packets with which 492 MRT it should be forwarded in. 494 1. Tunnel IP packets via an LDP LSP. This has the advantage that 495 more installed routers can do line-rate encapsulation and 496 decapsulation. Also, no additional IP addresses would need to be 497 allocated or signaled. 499 A. Option A - LDP Destination-Topology Label: Use a label that 500 indicates both destination and MRT. This method allows easy 501 tunneling to the next-next-hop as well as to the IGP-area 502 destination. For multi-homed prefixes, this requires that 503 additional labels be advertised for each proxy-node. 505 B. Option B - LDP Topology Label: Use a Topology-Identifier 506 label on top of the IP packet. This is very simple and 507 doesn't require additional labels for proxy-nodes. If 508 tunneling to a next-next-hop is desired, then a two-deep 509 label stack can be used with [ Topology-ID label, Next-Next- 510 Hop Label ]. 512 2. Tunnel IP packets in IP. Each router supporting this option 513 would announce two additional loopback addresses and their 514 associated MRT color. Those addresses are used as destination 515 addresses for MRT-blue and MRT-red IP tunnels respectively. They 516 allow the transit nodes to identify the traffic as being 517 forwarded along either MRT-blue or MRT-red tree topology to reach 518 the tunnel destination. Announcements of these two additional 519 loopback addresses per router with their MRT color requires IGP 520 extensions. 522 For proxy-nodes associated with one or more multi-homed prefixes, the 523 problem is harder because there is no router associated with the 524 proxy-node, so its loopbacks can't be known or used. In this case, 525 each router attached to the proxy-node could announce two common IP 526 addresses with their associated MRT colors. This would require 527 configuration as well as the previously mentioned IGP extensions. 528 Similarly, in the LDP case, two additional FEC bindings could be 529 announced. 531 4.2.2.1. Protocol Extensions and Considerations: OSPF and ISIS 533 This captures an initial understanding of what may need to be 534 specified. 536 o Capabilities: Does a router support MRT? Does the router do MRT 537 tunneling with LDP or IP or GRE or...? 539 o Topology Association: A router needs to advertise a loopback and 540 associate it with an MRT whether blue or red. Additional 541 flexibility for future uses would be good. 543 o Proxy-nodes for Multi-homed Prefixes: We need a way to advertise 544 common addresses with MRT for multi-homed prefixes' proxy-nodes. 545 Currently, those proxy-nodes aren't named or considered. 547 As with LFA, it is expected that OSPF Virtual Links will not be 548 supported. 550 4.2.3. Inter-Area and ABR Forwarding Behavior 552 In regular forwarding, packets destined outside the area arrive at 553 the ABR and the ABR forwards them into the other area because the 554 next-hops from the area with the best route (according to tie- 555 breaking rules) are used by the ABR. The question is then what to do 556 with packets marked with an MRT that are received by the ABR. 558 The only option that doesn't require forwarding based upon incoming 559 interface is to forward an MRT marked packet in the area with the 560 best route along its associated MRT. If the packet came from that 561 area, this correctly avoids the failure. If the packet came from a 562 different area, at least this gets the packet to the destination even 563 though it is along an MRT rather than the shortest-path. 565 +----[C]---- --[D]--[E] --[D]--[E] 566 | \ / \ / \ 567 p--[A] Area 10 [ABR1] Area 0 [H]--p +-[ABR1] Area 0 [H]-+ 568 | / \ / | \ / | 569 +----[B]---- --[F]--[G] | --[F]--[G] | 570 | | 571 | other | 572 +----------[p]-------+ 573 area 575 (a) Example topology (b) Proxy node view in Area 0 nodes 577 +----[C]<--- [D]->[E] 578 V \ \ 579 +-[A] Area 10 [ABR1] Area 0 [H]-+ 580 | ^ / / | 581 | +----[B]<--- [F]->[G] V 582 | | 583 +------------->[p]<--------------+ 585 (c) rSPT towards destination p 587 ->[D]->[E] -<[D]<-[E] 588 / \ / \ 589 [ABR1] Area 0 [H]-+ +-[ABR1] [H] 590 / | | \ 591 [F]->[G] V V -<[F]<-[G] 592 | | 593 | | 594 [p]<------+ +--------->[p] 596 (d) Blue MRT in Area 0 (e) Red MRT in Area 0 598 Figure 4: ABR Forwarding Behavior and MRTs 600 To avoid using an out-of-area MRT, special action can be taken by the 601 penultimate router along the in-local-area MRT immediately before the 602 ABR is reached. The penultimate router can determine that the ABR 603 will forward the packet out of area and, in that case, the 604 penultimate router can remove the MRT marking but still forward the 605 packet along the MRT next-hop to reach the ABR. For instance, in 606 Figure 4, if node H fails, node E has to put traffic towards prefix p 607 onto the red MRT. But since node D knows that ABR1 will use a best 608 from another area, it is safe for D to remove the MRT marking and 609 just send the packet to ABR1 still on the red MRT but unmarked. ABR1 610 will use the shortest path in Area 10. 612 In all cases for ISIS and most cases for OSPF, the penultimate router 613 can determine what decision the adjacent ABR will make. The one case 614 where it can't be determined is when two ASBRs are in different non- 615 backbone areas attached to the same ABR, then the ASBR's Area ID may 616 be needed for tie-breaking (prefer the route with the largest OPSF 617 area ID) and the Area ID isn't announced as part of the ASBR link- 618 state advertisement (LSA). In this one case, suboptimal forwarding 619 along the MRT in the other area would happen. If this is a realistic 620 deployment scenario, OSPF extensions could be considered. 622 4.2.4. Issues with Area Abstraction 624 MRT fast-reroute provides complete coverage in a area that is 625 2-connected. Where a failure would partition the network, of course, 626 no alternate can protect against that failure. Similarly, there are 627 ways of connecting multi-homed prefixes that make it impractical to 628 protect them without excessive complexity. 630 50 631 |----[ASBR Y]---[B]---[ABR 2]---[C] Backbone Area 0: 632 | | ABR 1, ABR 2, C, D 633 | | 634 | | Area 20: A, ASBR X 635 | | 636 p ---[ASBR X]---[A]---[ABR 1]---[D] Area 10: B, ASBR Y 637 5 p is a Type 1 AS-external 639 Figure 5: AS external prefixes in different areas 641 Consider the network in Figure 5 and assume there is a richer 642 connective topology that isn't shown, where the same prefix is 643 announced by ASBR X and ASBR Y which are in different non-backbone 644 areas. If the link from A to ASBR X fails, then an MRT alternate 645 could forward the packet to ABR 1 and ABR 1 could forward it to D, 646 but then D would find the shortest route is back via ABR 1 to Area 647 20. The only real way to get it from A to ASBR Y is to explicitly 648 tunnel it to ASBR Y. 650 Tunnelling to the backup ASBR is for future consideration. The 651 previously proposed PHP approach needs to have an exception if BGP 652 policies (e.g. BGP local preference) determines which ASBR to use. 653 Consider the case in Figure 6. If the link between A and ASBR X (the 654 preferred border router) fails, A can put the packets to p onto an 655 MRT alternate, even tunnel it towards ASBR Y. Node B, however, must 656 not remove the MRT marking in this case, as nodes in Area 0, 657 including ASBR Y itself would not know that their preferred ASBR is 658 down. 660 Area 20 BB Area 0 661 p ---[ASBR X]-X-[A]---[B]---[ABR 1]---[D]---[ASBR Y]--- p 663 BGP prefers ASBR X for prefix p 665 Figure 6: Failure of path towards ASBR preferred by BGP 667 The fine details of how to solve multi-area external prefix cases, or 668 identifying certain cases as too unlikely and too complex to protect 669 is for further consideration. 671 4.2.5. Partial Deployment and Islands of Compatible MRT FRR routers 673 A natural concern with new functionality is how to have it be useful 674 when it is not deployed across an entire IGP area. In the case of 675 MRT FRR, where it provides alternates when appropriate LFAs aren't 676 available, there are also deployment scenarios where it may make 677 sense to only enable some routers in an area with MRT FRR. A simple 678 example of such a scenario would be a ring of 6 or more routers that 679 is connected via two routers to the rest of the area. 681 First, a computing router S must determine its local island of 682 compatible MRT fast-reroute routers. A router that has common 683 forwarding mechanisms and common algorithm and is connected to either 684 to S or to another router already determined to be in S's local 685 island can be added to S's local island. 687 Destinations inside the local island can obviously use MRT 688 alternates. Destinations outside the local island can be treated 689 like a multi-homed prefix with caveats to avoid looping. For LDP 690 labels including both destination and topology, the routers at the 691 borders of the local island need to originate labels for the original 692 FEC and the associated MRT-specific labels. Packets sent to an LDP 693 label marked as blue or red MRT to a destination outside the local 694 island will have the last router in the local island swap the label 695 to one for the destination and forward the packet along the outgoing 696 interface on the MRT towards a router outside the local island that 697 was represented by the proxy-node. 699 For IP in IP encapsulations, remote destinations may not be 700 advertising additional IP loopback addresses for the MRTs. In that 701 case, a router attached to a proxy-node, which represents 702 destinations outside the local island, must advertise IP addresses 703 associated with that proxy-node. Packets sent to an address 704 associated with a proxy-node will have their outer IP header removed 705 by the router attached to the proxy-node and be forwarded by the 706 router along the outgoing interface on the MRT towards a router 707 outside the local island that was represented by the proxy-node. 709 4.2.6. Network Convergence and Preparing for the Next Failure 711 After a failure, MRT detours ensure that packets reach their intended 712 destination while the IGP has not reconverged onto the new topology. 713 As link-state updates reach the routers, the IGP process calculates 714 the new shortest paths. Two things need attention: micro-loop 715 prevention and MRT re-calculation. 717 4.2.6.1. Micro-forwarding loop prevention and MRTs 719 As is well known[RFC5715], micro-loops can occur during IGP 720 convergence; such loops can be local to the failure or remote from 721 the failure. Managing micro-loops is an orthogonal issue to having 722 alternates for local repair, such as MRT fast-reroute provides. 724 There are two possible micro-loop prevention mechanism discussed in 725 [RFC5715]. The first is Ordered FIB [I-D.ietf-rtgwg-ordered-fib]. 726 The second is Farside Tunneling which requires tunnels or an 727 alternate topology to reach routers on the farside of the failure. 729 Since MRTs provide an alternate topology through which traffic can be 730 sent and which can be manipulated separately from the SPT, it is 731 possible that MRTs could be used to support Farside Tunneling. 732 Details of how to do so are outside of this document. 734 4.2.6.2. MRT Recalculation 736 When a failure event happens, traffic is put by the PLRs onto the MRT 737 topologies. After that, each router recomputes its shortest path 738 tree (SPT) and moves traffic over to that. Only after all the PLRs 739 have switched to using their SPTs and traffic has drained from the 740 MRT topologies should each router install the recomputed MRTs into 741 the FIBs. 743 At each router, therefore, the sequence is as follows: 745 1. Receive failure notification 747 2. Recompute SPT 749 3. Install new SPT 751 4. Recompute MRTs 753 5. Wait configured period for all routers to be using their SPTs and 754 traffic to drain from the MRTs. 756 6. Install new MRTs. 758 While the recomputed MRTs are not installed in the FIB, protection 759 coverage is lowered. Therefore, it is important to recalculate the 760 MRTs and install them as quickly as possible. 762 It is for further study whether MRT re-calculation is possible in an 763 incremental fashion, such that the sections of the MRT in use after a 764 failure are not changed. 766 5. Acknowledgements 768 The authors would like to thank Hannes Gredler, Jeff Tantsura, Ted 769 Qian, Kishore Tiruveedhula, Santosh Esale, Nitin Bahadur, Harish 770 Sitaraman and Raveendra Torvi for their suggestions and review. 772 6. IANA Considerations 774 This doument includes no request to IANA. 776 7. Security Considerations 778 This architecture is not currently believed to introduce new security 779 concerns. 781 8. References 783 8.1. Normative References 785 [I-D.enyedi-rtgwg-mrt-frr-algorithm] 786 Atlas, A., Envedi, G., and A. Csaszar, "Algorithms for 787 computing Maximally Redundant Trees for IP/LDP Fast- 788 Reroute", draft-enyedi-rtgwg-mrt-frr-algorithm-00 (work in 789 progress), October 2011. 791 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 792 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 794 [RFC5384] Boers, A., Wijnands, I., and E. Rosen, "The Protocol 795 Independent Multicast (PIM) Join Attribute Format", 796 RFC 5384, November 2008. 798 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", 799 RFC 5714, January 2010. 801 8.2. Informative References 803 [I-D.ietf-rtgwg-ipfrr-notvia-addresses] 804 Bryant, S., Previdi, S., and M. Shand, "IP Fast Reroute 805 Using Not-via Addresses", 806 draft-ietf-rtgwg-ipfrr-notvia-addresses-08 (work in 807 progress), December 2011. 809 [I-D.ietf-rtgwg-lfa-applicability] 810 Filsfils, C. and P. Francois, "LFA applicability in SP 811 networks", draft-ietf-rtgwg-lfa-applicability-06 (work in 812 progress), January 2012. 814 [I-D.ietf-rtgwg-ordered-fib] 815 Shand, M., Bryant, S., Previdi, S., and C. Filsfils, 816 "Loop-free convergence using oFIB", 817 draft-ietf-rtgwg-ordered-fib-05 (work in progress), 818 April 2011. 820 [LFARevisited] 821 Retvari, G., Tapolcai, J., Enyedi, G., and A. Csaszar, "IP 822 Fast ReRoute: Loop Free Alternates Revisited", Proceedings 823 of IEEE INFOCOM , 2011, . 826 [LightweightNotVia] 827 Enyedi, G., Retvari, G., Szilagyi, P., and A. Csaszar, "IP 828 Fast ReRoute: Lightweight Not-Via without Additional 829 Addresses", Proceedings of IEEE INFOCOM , 2009, 830 . 832 [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free 833 Convergence", RFC 5715, January 2010. 835 Authors' Addresses 837 Alia Atlas (editor) 838 Juniper Networks 839 10 Technology Park Drive 840 Westford, MA 01886 841 USA 843 Email: akatlas@juniper.net 845 Robert Kebler 846 Juniper Networks 847 10 Technology Park Drive 848 Westford, MA 01886 849 USA 851 Email: rkebler@juniper.net 853 Maciek Konstantynowicz 854 Juniper Networks 856 Email: maciek@juniper.net 858 Gabor Sandor Enyedi 859 Ericsson 860 Konyves Kalman krt 11. 861 Budapest 1097 862 Hungary 864 Email: Gabor.Sandor.Enyedi@ericsson.com 866 Andras Csaszar 867 Ericsson 868 Konyves Kalman krt 11 869 Budapest 1097 870 Hungary 872 Email: Andras.Csaszar@ericsson.com 873 Russ White 874 Cisco Systems 876 Email: russwh@cisco.com 878 Mike Shand 880 Email: mike@mshand.org.uk