Routing Area Working Group A. Atlas, Ed. Internet-Draft R. Kebler Intended status: Standards Track Juniper Networks Expires:August 28, 2013January 13, 2014 G. Enyedi A. Csaszar J. Tantsura Ericsson M. Konstantynowicz Cisco Systems R. WhiteVerisign M. Shand February 24,VCE July 12, 2013 An Architecture for IP/LDP Fast-Reroute Using Maximally Redundant Treesdraft-ietf-rtgwg-mrt-frr-architecture-02draft-ietf-rtgwg-mrt-frr-architecture-03 AbstractAs IP and LDP Fast-Reroute are increasingly deployed, the coverage limitationsWith increasing deployment of Loop-Free Alternatesare seen as a problem(LFA) [RFC5286], it is clear thatrequiresastraightforward and consistentcomplete solution for IP andLDP, for unicast and multicast.LDP Fast-Reroute is required. Thisdraft describes an architecture based on redundant backup trees where a single failure can cutspecification provides that solution. IP/LDP Fast- Reroute with Maximally Redundant Trees (MRT-FRR) is apoint-of-local-repair from the destination only on one oftechnology that gives link-protection and node-protection with 100% coverage in any network topology that is still connected after thepair of redundant trees. One innovative algorithmfailure. MRT removes all need tocompute such topologiesengineer for coverage. MRT ismaximally disjoint backup trees. Eachalso extremely computationally efficient. For any routercan compute its next-hops for each pair of maximally disjoint trees rooted at each nodein theIGP area with computational complexity similar to that required by Dijkstra. The additional state, address andnetwork, the MRT computationrequirements are believed to be significantlyis less than theNot-Via architecture requires.LFA computation for a node with three or more neighbors. Status ofthisThis Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onAugust 28, 2013.January 13, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . .. 43 1.1.Goals for Extending IP Fast-Reroute coverage beyond LFAImportance of 100% Coverage . . . . . . . . . . . . . . . 4 1.2. Partial Deployment and Backwards Compatibility . . . . . 5 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 6 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . .5 3.6 4. Maximally Redundant Trees (MRT) . . . . . . . . . . . . . . .6 4.7 5. Maximally Redundant Trees (MRT) and Fast-Reroute . . . . . .. 8 5.9 6. Unicast Forwarding with MRT Fast-Reroute . . . . . . . . . .. 9 5.1.10 6.1. LDP Unicast Forwarding - Avoid Tunneling . . . . . . . ..105.2.6.2. IP Unicast Traffic . . . . . . . . . . . . . . . . . . .. 10 6.11 7. Protocol Extensions and Considerations: OSPF and ISIS . . . . 127.8. Protocol Extensions and considerations: LDP . . . . . . . . . 148. Multi-homed Prefixes . . . . . . . . . . . . . . . . . . . . . 159. Inter-Area and ABR Forwarding Behavior . . . . . . . . . . .. 1615 10.Issues with Area AbstractionPrefixes Multiply Attached to the MRT Island . . . . . . . . 18 10.1. Endpoint Selection . . . . . . . . . .19 11. Partial Deployment and Islands of Compatible MRT FRR routers. . . . . . . . . 19 10.2. Named Proxy-Nodes . . . . . . . . . . . . . . . . . .20 12.. 21 10.2.1. Computing if an Island Neighbor (IN) is loop-free . 22 10.3. MRT Alternates for Destinations Outside the MRT Island . 23 11. Network Convergence and Preparing for the Next Failure . . .. 22 12.1.24 11.1. Micro-forwarding loop prevention and MRTs . . . . . . .. 22 12.2.24 11.2. MRT Recalculation . . . . . . . . . . . . . . . . . . .. 23 13.24 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . .. 23 14.25 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . .23 15.25 14. Security Considerations . . . . . . . . . . . . . . . . . . .24 16.25 15. References . . . . . . . . . . . . . . . . . . . . . . . . .. 24 16.1.25 15.1. Normative References . . . . . . . . . . . . . . . . . .. 24 16.2.25 15.2. Informative References . . . . . . . . . . . . . . . . . 26 Appendix A. General Issues with Area Abstraction . . .24. . . . . 27 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . .. 2528 1. IntroductionThereThis document gives a complete solution for IP/LDP fast-reroute [RFC5714]. MRT-FRR creates two alternate trees separate from the primary next-hop forwarding used during stable operation. These two trees are maximally diverse from each other, providing link and node protection for 100% of paths and failures as long as the failure does not cut the network into multiple pieces. This document defines the architecture for IP/LDP fast-reroute with MRT. The associated protocol extensions are defined in [I-D.atlas-ospf-mrt] and [I-D.atlas-mpls-ldp-mrt]. The exact MRT algorithm isstill work requireddefined in [I-D.enyedi-rtgwg-mrt-frr-algorithm]. IP/LDP Fast-Reroute with MRT (MRT-FRR) uses two maximally diverse forwarding topologies to provide alternates. A primary next-hop should be on only one of the diverse forwarding topologies; thus, the other can be used tocompletelyprovide an alternate. Once traffic has been moved to one of MRTs, it is not subject to further repair actions. Thus, the traffic will not loop even if a worse failure (e.g. node) occurs when protection was only available for a simpler failure (e.g. link). In addition to supporting IP and LDPFast- Reroute[RFC5714] forunicast fast-reroute, the diverse forwarding topologies andmulticast traffic. This draft proposes an architecture to provideguarantee of 100% coveragefor unicast traffic. The associatedpermit fast-reroute technology to be applied to multicastarchitecture istraffic as described in [I-D.atlas-rtgwg-mrt-mc-arch].Loop-free alternates (LFAs)[RFC5286]Other existing or proposed solutions are partial solutions or have significant issues, as described below. Summary Comparison of IP/LDP FRR Methods +-----------+---------------+---------------+-----------------------+ | Method | Coverage | Alternate | Computation (in SPFs) | | | | Looping? | | +-----------+---------------+---------------+-----------------------+ | MRT-FRR | 100% | None | less than 3 | | | Link/Node | | | | | | | | | LFA | Partial | Possible | per neighbor | | | Link/Node | | | | | | | | | Remote | Partial | Possible | per neighbor (link) | | LFA | Link/Node | | or neighbor's | | | | | neighbor (node) | | | | | | | Not-Via | 100% | None | per link and node | | | Link/Node | | | +-----------+---------------+---------------+-----------------------+ Table 1 Loop-Free Alternates (LFA): LFAs [RFC5286] providea useful mechanismlimited topology-dependent coverage for link and nodeprotection but getting complete coverage is quite hard. [LFARevisited] defines sufficient conditionsprotection. Restrictions on choice of alternates can be relaxed todetermineimprove coverage, but this can cause forwarding loops if anetwork provides link-protecting LFAs and also proves that augmentingworse failure is experienced than protected against. Augmenting a network to provide better coverage isNP-hard. [I-D.ietf-rtgwg-lfa-applicability]NP-hard [LFARevisited]. [RFC6571] discusses the applicability of LFA to different topologies with a focus on common PoP architectures.WhileRemote LFA: Remote LFAs [I-D.ietf-rtgwg-remote-lfa] improve coverage over LFAs for link protection but still cannot guarantee complete coverage. The trade-off of looping traffic to improve coverage is still made. Remote LFAs can provide node-protection [I-D.litkowski-rtgwg-node-protect-remote-lfa] but not guaranteed coverage and the computation required is quite high (an SPF per neighbor's neighbor). [I-D.bryant-ipfrr-tunnels] describes additional mechanisms to further improve coverage, at the cost of added complexity. Not-Via: Not-Via [I-D.ietf-rtgwg-ipfrr-notvia-addresses] isdefined as an architecture, in practice, it has proved too complicatedthe only other solution that provides 100% coverage for link andstateful to spark substantial interest in implementation or deployment. Academicnode failures and does not have potential looping. However, the computation is very high (an SPF per failure point) and academic implementations [LightweightNotVia]exist andhave found the address management complexityhigh (but no standardization has been donetoreduce this). A different approach is needed and that is what is described here. Itbe high. 1.1. Importance of 100% Coverage Fast-reroute is basedonupon theidea of using disjoint backup topologies as realized by Maximally Redundant Trees (described in [LightweightNotVia]);single failure assumption - that thegeneral architecture can also apply to future improved redundant tree algorithms. 1.1. Goals for Extending IP Fast-Reroute coverage beyond LFA Any scheme proposedtime between single failures is long enough forextending IPFRRa networktopology coverage beyond LFA, apart from attaining basic IPFRR properties, should also aimtoachieve the following usability goals: o ensure maximum physically feasible linkreconverge andnode disjointness regardless of topology, o automatically compute backup next-hops basedstart forwarding on thetopology information distributed by link-state IGP, o donew shortest paths. That does notrequire any signaling inimply that thecase ofnetwork will only experience one failureand use pre- programmed backup next-hopsor change. It is straightforward to analyze a particular network topology forforwarding, o introduce minimal amountcoverage. However, a real network does not always have the same topology. For instance, maintenance events will take links or nodes out ofadditional addressing and stateuse. Simply costing out a link can have a significant effect onrouters, o enable gradual introduction ofwhat LFAs are available. Similarly, after a single failure has happened, thenew schemetopology is changed andbackward compatibility, oits associated coverage. Finally, many networks have new routers or links added anddo not impose requirementsremoved; each of those changes can have an effect on the coverage forexternal computation. 2. Terminology 2-connected: A graph that has no cut-vertices. Thistopology-sensitive methods such as LFA and Remote LFA. If fast- reroute is important for the network services provided, then agraphmethod thatrequires two nodesguarantees 100% coverage is important tobe removed before theaccomodate natural networkis partitioned. 2-connected cluster: A maximal settopology changes. Asymmetric link costs are also a common aspect ofnodes thatnetworks. There are2-connected. 2-edge-connected: A network graph whereat leasttwo links must be removedthree common causes for them. First, any broadcast interface is represented by a pseudo-node and has asymmetric link costs topartition the network. ADAG: Almost Directed Acyclic Graph -and from that pseudo-node. Second, when routers come up or agraph that, if all links incominglink with LDP comes up, it is recommended in [RFC5443] and [RFC3137] that the link metric be raised to theroot were removed, wouldmaximum cost; this may not be symmetric and for [RFC3137] is not expected to be. Third, techniques such as IGP metric tuning for traffic-engineering can result in asymmetric link costs. A fast-reroute solution needs to handle network topologies with asymmetric link costs. When aDAG. block: Either a 2-connected cluster,network needs to use acut-edge,micro-loop prevention mechanism [RFC5715] such as Ordered FIB[I-D.ietf-rtgwg-ordered-fib] oran isolated vertex. cut-link: A link whose removal partitionsFarside Tunneling[RFC5715], then thenetwork. A cut-link by definition mustwhole IGP area needs to have alternates available so that the micro-loop prevention mechanism, which requires slower network convergence, can take the necessary time without impacting traffic badly. Without complete coverage, traffic to the unprotected destinations will be dropped for significantly longer than with current convergence - where routers individually converge as fast as possible. 1.2. Partial Deployment and Backwards Compatibility MRT-FRR supports partial deployment. As with many new features, the protocols (OSPF, LDP, ISIS) indicate their capability to support MRT. Inside the MRT-capable connectedbetween two cut-vertices. If there are multiple parallel links, then they are referredgroup of routers (referred to ascut-links in this document if removingan MRT Island), theset of parallel links would partitionMRTs are computed. Alternates to destinations outside thenetwork. cut-vertex: A vertex whose removal partitionsMRT Island are computed and depend upon thenetwork. DAG: Directed Acyclic Graph -existence of agraph where all links are directedloop-free neighbor of the MRT Island for that destination. 2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", andthere"OPTIONAL" in this document areno cyclesto be interpreted as described init. GADAG: Generalized ADAG - a[RFC2119] 3. Terminology network graph: A graph thatisreflects thecombination ofnetwork topology where all links connect exactly two nodes and broadcast links have been transformed into theADAGsstandard pseudo-node representation. Redundant Trees (RT): A pair ofall blocks.trees where the path from any node X to the root R along the first tree is node-disjoint with the path from the same node X to the root along the second tree. These can be computed in 2-connected graphs. Maximally Redundant Trees (MRT): A pair of trees where the path from any node X to the root R along the first tree and the path from the same node X to the root along the second tree share the minimum number of nodes and the minimum number of links. Each such shared node is a cut-vertex. Any shared links are cut-links. Any RT is an MRT but many MRTs are not RTs.network graph: A graph that reflectsMRT-Red: MRT-Red is used to describe one of thenetworktwo MRTs; it is used to described the associated forwarding topology and MT-ID. Specifically, MRT-Red is the decreasing MRT wherealllinksconnect exactlyin the GADAG are taken in the direction from a higher topologically ordered node to a lower one. MRT-Blue: MRT-Blue is used to describe one of the twonodesMRTs; it is used to described the associated forwarding topology andbroadcast links have been transformed intoMT-ID. Specifically, MRT-Blue is thestandard pseudo-node representation. Redundant Trees (RT): A pair of treesincreasing MRT where links in thepathGADAG are taken in the direction fromanya lower topologically ordered nodeXto a higher one. Rainbow MRT: It is useful to have an MT-ID that refers to theroot R alongmultiple MRT topologies and to thefirst treedefault topology. This isnode-disjoint withreferred to as thepath fromRainbow MRT MT-ID and is used by LDP to reduce signaling and permit the samenode Xlabel to always be advertised to all peers for theroot alongsame (MT-ID, Prefix). MRT Island: From thesecond tree. These can be computedcomputing router, the set of routers that support a particular MRT profile and are connected. Island Border Router (IBR): A router in2-connected graphs. 3. Maximally Redundant Trees (MRT) Inthelast few years, there's been substantial research on howMRT Island that is connected tocomputea router not in the MRT Island anduse redundant trees. Redundant treesboth routers aredirected spanning trees that provide disjoint paths towards theirin a commonroot. These redundant trees only existarea or level. Island Neighbor (IN): A router that is not in the MRT Island but is adjacent to an IBR andprovidein the same area/level as the IBR. cut-link: A linkprotectionwhose removal partitions the network. A cut-link by definition must be connected between two cut-vertices. If there are multiple parallel links, then they are referred to as cut-links in this document if removing the set of parallel links would partition the network graph. cut-vertex: A vertex whose removal partitions the network graph. 2-connected: A graph that has no cut-vertices. This is2-edge-connected and node protection ifa graph that requires two nodes to be removed before the network is partitioned. 2-connected cluster: A maximal set of nodes that are 2-connected.Such connectiveness may not2-edge-connected: A network graph where at least two links must bethe case in real networks, either dueremoved toarchitecturepartition the network. block: Either a 2-connected cluster, a cut-edge, orduean isolated vertex. DAG: Directed Acyclic Graph - a graph where all links are directed and there are no cycles in it. ADAG: Almost Directed Acyclic Graph - a graph that, if all links incoming to the root were removed, would be aprevious failure. The work on maximally redundant trees has addedDAG. GADAG: Generalized ADAG - a graph that is the combination of the ADAGs of all blocks. named proxy-node: A proxy-node can represent a destination prefix that can be attached to the MRT Island via at least twouseful piecesrouters. It is named if there is a way thatmake them readytraffic can be encapsulated to reach specifically that proxy node; this could be because there is an LDP FEC forusethe associated prefix or because MRT-Red and MRT- Blue IP addresses are advertised ina real network. o Computable regardlessan undefined fashion for that proxy-node. 4. Maximally Redundant Trees (MRT) A pair ofnetwork topology: The maximally redundant treesMaximally Redundant Trees arecomputed sodirected spanning trees thatonly the cut-edgesprovide maximally disjoint paths towards their common root. Only links orcut-verticesnodes whose failure would partition the network (i.e. cut- links and cut-vertices) are shared between themultipletrees.o Computationally practicalThe algorithm to compute MRTs isbased on a common network topology database. Algorithm variantsgiven in [I-D.enyedi-rtgwg-mrt-frr-algorithm]. This algorithm cancomputebe computed inO( e) orO(e + n logn), as given in [I-D.enyedi-rtgwg-mrt-frr-algorithm]. There is, of course, significantly more in the literature related to redundant trees and even fast-reroute, but the formulation of the Maximally Redundant Trees (MRT) algorithm makesn); itvery well suited to use in routers. A known disadvantage of MRT, and redundant trees in general,isthat the trees do not necessarily provide shortest detour paths. The use of the shortest-path-first algorithm in tree-building and including all links in the network as possibilities for one path or another should improve this.less than three SPFs. Modelingis underway to investigate and compare theresults comparing MRT alternates to the optimal[I-D.enyedi-rtgwg-mrt-frr-algorithm]. Providing shortest detour paths would require failure-specific detour paths to the destinations, but the state-reduction advantage of MRT lies in the detour being established per destination (root) instead of per destination AND per failure. The specific algorithms to compute MRTs as well as the logic behind that algorithm and alternative computational approachesaregiven in detaildescribed in [I-D.enyedi-rtgwg-mrt-frr-algorithm].Those interested are highly recommended to read that document.This document describes how the MRTs can be used and not how to compute them. MRT provides destination-based trees for each destination. Each router stores its normal primary next-hop(s) as well as MRT-Blue next-hop(s) and MRT-Red next-hop(s) toward each destination. The alternate will be selected between the MRT-Blue and MRT-Red. The most important thing to understand about MRTs is that for each pair of destination-routed MRTs, there is a path from every node X to the destination D on the Blue MRT that is as disjoint as possible from the path on the Red MRT.The two paths along the two MRTs to a given destination-root of a 2-connected graph are node-disjoint and link-disjoint, while in any non-2-connected graph, only the cut- vertices and cut-edges can be contained by both of the paths.For example, in Figure 1, there is a network graph that is 2-connected in (a) and associated MRTs in (b) and (c). One can consider the paths from B to R; on the Blue MRT, the paths are B->F->D->E->R or B->C->D->E->R. On the Red MRT, the path is B->A->R. These are clearly link and node-disjoint. These MRTs are redundant trees because the paths are disjoint. [E]---[D]---| [E]<--[D]<--| [E]-->[D]---| | | | | ^ | | | | | | V | | V V [R] [F] [C] [R] [F] [C] [R] [F] [C] | | | ^ ^ ^ | | | | | | | | V | [A]---[B]---| [A]-->[B]---|[A]---[B]<--|[A]<--[B]<--| (a) (b) (c) a 2-connected graph Blue MRT towards R Red MRT towards R Figure 1: A 2-connected Network By contrast, in Figure 2, the network in (a) is not 2-connected. If F, G or the link F<->G failed, then the network would be partitioned. It is clearly impossible to have two link-disjoint or node-disjoint paths from G, I or J to R. The MRTs given in (b) and (c) offer paths that are as disjoint as possible. For instance, the paths from B to R are the same as in Figure 1 and the path from G to R on the Blue MRT is G->F->D->E->R and on the Red MRT is G->F->B->A->R. [E]---[D]---| | | | |----[I] | | | | | [R]---[C] [F]---[G] | | | | | | | | | |----[J] [A]---[B]---| (a) a non-2-connected graph [E]<--[D]<--|[E]-->[D]---|[E]-->[D] | ^ | [I] || [I]|----[I] V | |^| V V| [R]<--[C]^ [R] [C] [F]<--[G] |[R]---[C][R]<--[C] [F]<--[G] | ^ ^| | ^ | |^ V ^ | ||--->[J]|V| |----[J] | | [J] [A]-->[B]---| [A]<--[B]<--| (b) (c) Blue MRT towards R Red MRT towards R Figure 2: A non-2-connected network4.5. Maximally Redundant Trees (MRT) and Fast-Reroute In normal IGP routing, each router has its shortest-path-tree to all destinations. From the perspective of a particular destination, D, this looks like a reverse SPT (rSPT). To use maximally redundant trees, in addition, each destination D has two MRTs associated with it; by convention these will be called theblueMRT-Blue andred MRTs.MRT-Red. MRT-FRR is realized by using multi-topology forwarding. There is a MRT-Blue forwarding topology and a MRT-Red forwarding topology. Any IP/LDP fast-reroute technique beyond LFA requires an additional dataplane procedure, such as an additional forwarding mechanism. The well-known options are multi-topology forwarding (used by MRT-FRR), tunneling (e.g. [I-D.ietf-rtgwg-ipfrr-notvia-addresses] or [I-D.ietf-rtgwg-remote-lfa]), and per-interface forwarding (e.g.Loop- FreeLoop-Free Failure Insensitive Routing in[EnyediThesis]), and multi- topology forwarding. MRT is realized by using multi-topology forwarding. There is a Blue MRT forwarding topology and a Red MRT forwarding topology. MRTs are practical to maintain redundancy even after a single link or node failure. If a pair of MRTs is computed rooted at each destination, all the destinations remain reachable along one of the MRTs in the case of a single link or node failure.[EnyediThesis]). When there is a link or node failureaffectingaffecting, but not partitioning, therSPT,network, each node will still have at least one path via one of the MRTs to reach the destination D. For example, in Figure 2, C would normally forward traffic to R across the C<->R link. If that C<->R link fails, then C could useeitherthe Blue MRT pathC->D->E->R or the Red MRT path C->B->A->R.C->D->E->R. As is always the case with fast-reroute technologies, forwarding does not change until a local failure is detected. Packets are forwarded along the shortest path. The appropriate alternate to use is pre- computed. [I-D.enyedi-rtgwg-mrt-frr-algorithm] describes exactly how to determine whether theBlue MRTMRT-Blue next-hops or theRed MRTMRT-Red next-hops should be the MRT alternate next-hops for a particular primary next- hop N to a particular destination D. MRT alternates are always available touse, unless the network has been partitioned.use. It is a local decision whether to use an MRT alternate, a Loop-Free Alternate or some other type of alternate.When a network needs to use a micro-loop prevention mechanism [RFC5715] such as Ordered FIB[I-D.ietf-rtgwg-ordered-fib] or Farside Tunneling[RFC5715], then the whole IGP area needs to have alternates available so that the micro-loop prevention mechanism, which requires slower network convergence, can take the necessary time without impacting traffic badly.As described in [RFC5286], when a worse failure than is anticipated happens, using LFAs that are not downstream neighbors can cause micro-looping.AnSection 1.1 of [RFC5286] gives an exampleis givenoflink-protectinglink- protecting alternates causing a loop on node failure. Even if a worse failure than anticipatedhappened,happens, the use of MRT alternates will not cause looping. Therefore, while node-protecting LFAs may beprefered, anpreferred, the certainty that no alternate-induced looping will occur is an advantage of using MRT alternates when the availablenode-protectingnode- protecting LFA is not a downstream path.5.6. Unicast Forwarding with MRT Fast-Reroute With LFA, there is no need to tunnel unicast traffic, whether IP or LDP. The traffic is simply sent to an alternate. As mentioned earlier in Section4,5, MRT needs multi-topology forwarding. Unfortunately, neither IP nor LDPprovideprovides extra bits for a packet to indicate its topology. Once the MRTs are computed, the two sets of MRTs are seen by the forwarding plane as essentially two additional topologies. The same considerations apply for forwarding along the MRTs as for handling multiple topologies.5.1.6.1. LDP Unicast Forwarding - Avoid Tunneling For LDP, it is very desirable to avoid tunneling because, for at least node protection, tunneling requires knowledge of remote LDP label mappings and thus requires targeted LDP sessions and the associated management complexity. There are two different mechanisms that can beused.used; Option A MUST be supported. 1. Option A - Encode MT-ID in Labels: In addition to sending a single label for a FEC, a router would provide two additional labels with the MT-IDs associated with the Blue MRT or Red MRT forwarding topologies. This is very simple for hardware support. It does reduce the label space for other uses. It also increases the memory to store the labels and the communication required by LDP. 2. Option B - Create Topology-Identification Labels: Use the label- stacking ability of MPLS and specify only two additional labels - one for each associated MRT color - by a new FEC type. When sending a packet onto an MRT, first swap the LDP label and then push the topology-identification label for that MRT color. When receiving a packet with a topology-identification label, pop it and use it to guide the next-hop selection in combination with the next label in the stack; then swap the remaining label, if appropriate, and push the topology-identification label for the next-hop. This has minimal usage of additional labels, memory and LDP communication. It does increase the size of packets and the complexity of the required label operations and look-ups. This can use the same mechanisms as are needed for context-aware label spaces. Note that with LDP unicast forwarding, regardless of whether topology-identification label or encoding topology in label is used, no additional loopbacks per router are required. This is because LDP labels are used on a hop-by-hop basis to identify MRT-blue and MRT- red forwading topologies. For greatest hardware compatibility, routers implementing MRT LDP fast-reroute MUST support Option A of encoding the MT-ID in the labels. The extensions to indicate an MT-ID for a FEC are described in Section 3.2.1 of[I-D.ietf-mpls-ldp-multi-topology] 5.2.[I-D.ietf-mpls-ldp-multi-topology]. 6.2. IP Unicast Traffic For IP, there is no currently practical alternative excepttunneling.tunneling to gain the bits needed to indicate the MRT-Blue or MRT-Red forwarding topology. The choice of tunnel egresscouldMAY be flexible since any router closer to the destination than the next-hop can work. This architecture assumes that the original destination in thearea,area is selected (see Section 10 for handling of multi-homed prefixes); another possible choice is the next-next-hop towards the destination. For LDP traffic, using the original destination simplifies MRT-FRR by avoiding the need for targeted LDP sessions to thenext-next-hop, etc..next-next-hop. For IP, that consideration doesn't apply but consistency with LDP is RECOMMENDED. If the tunnel egress is the original destination router, then the traffic remains on the redundant tree with sub-optimal routing.If the tunnel egress is the next-next-hop, then protection of multi-homed prefixes and node-failure for ABRs is not available.Selection of the tunnel egress is a router-local decision. There are three options available for marking IP packets with which MRT it should be forwarded in. For greatest hardware compatibility and ease in removing the MRT-topology marking at area/level boundaries, routers that support MPLS and implement IP MRT fast- reroute MUST support Option A - using an LDP label that indicates the destination and MT-ID. 1. Tunnel IP packets via an LDP LSP. This has the advantage that more installed routers can do line-rate encapsulation and decapsulation. Also, no additional IP addresses would need to be allocated or signaled.A.a. Option A - LDP Destination-Topology Label: Use a label that indicates both destination and MRT. This method allows easy tunneling to the next-next-hop as well as to the IGP-area destination. For a proxy-node, the destination to use is the non-proxy-node immediately before the proxy-node on that particular color MRT.B.b. Option B - LDP Topology Label: Use a Topology-Identifier label on top of the IP packet. This is very simple. If tunneling to a next-next-hop is desired, then a two-deep label stack can be used with [ Topology-ID label, Next-Next- Hop Label ]. 2. Tunnel IP packets in IP. Each router supporting this option would announce two additional loopback addresses and their associated MRT color. Those addresses are used as destination addresses for MRT-blue and MRT-red IP tunnels respectively. They allow the transit nodes to identify the traffic as being forwarded along either MRT-blue or MRT-red tree topology to reach the tunnel destination. Announcements of these two additional loopback addresses per router with their MRT color requires IGP extensions.For greatest hardware compatibility7. Protocol Extensions andease in removingConsiderations: OSPF and ISIS For simplicity, theMRT- topology marking at area/level boundaries, routers thatapproach of defining a well-known profile is taken in [I-D.atlas-ospf-mrt]. The purpose of communicating supportMPLS and implement IPfor MRTfast-reroute SHOULD support Option A - using an LDP label that indicatesin thedestination and MT-ID. For proxy-nodes associated with one or more multi-homed prefixes, thereIGP isno router associated with the proxy-node, so its loopbacks can't be known or used. Instead, the loopback addresses ofto indicate thatqq therouters thatMRT-Blue and MRT-Red forwarding topologies areattached tocreated for transit traffic. This section describes theproxy-node can be used. One of those routers willvarious options to beon the Redselected. The default MRT profile is described here and theother onsignaling extensions for OSPF are given in [I-D.atlas-ospf-mrt]. For any MRT profile, theBlue MRT. The MRT-red loopback ofMRT Island is created by starting from thefirst router would be used to reachcomputing router. If the computing routeronsupports theReddefault MRTand similarly the MRT-blue loopback ofprofile, add it to thesecondMRT Island. Add a routerwould be used. The routers connectedto theproxy-node are the end of the area/level and can decapsulate the traffic and properly forward it intoMRT Island if thenext area. 6. Protocol Extensions and Considerations: OSPF and ISIS There are two possible approaches to what additional information to distribute inrouter supports theIGP. The first is to allow full flexibility in all information and distribute whichever valuesdefault MRT profile andcombinations are desired. The secondis connected tosimply distribute flags indicating a particular well-known profile is supported. Thusthe MRT IslandCreation process is trivial. The profile approach is recommended, withvia bidirectional links eligible for MRT. If a router advertises support for multiple MRT profiles, then it MUST create theadded flexibilitytransit forwarding topologies for each ofbeing able to specify more specific information if necessary and supported. For example,those, unless the profile specifies No Forwarding Mechanism (e.g. as might be done for asimpleprofile"metric-insensitiveused only for multicast global protection). A router MUST NOT advertise multiple MRTunicast fast- reroute via LDP" could specify:profiles that overlap in their MRT-Red MT-ID or MRT-Blue MT-ID. The MRTIsland Creation: Only include other routers advertising this profile.Profile also defines different behaviors such as how MRTAlgorithm ID: Therecomputation is handled and how area/level boundaries are dealt with. MRT Algorithm: MRT Lowpoint algorithm defined in [I-D.enyedi-rtgwg-mrt-frr-algorithm].Red MRTMRT-Red MT-ID:The Red MRT MT-ID is the single well-knownexperimental 3997, final valueallocatedassigned by IANA allocated from theOSPF, ISIS,LDPand PIMMT-IDspaces. Blue MRTspace MRT-Blue MT-ID:The Blue MRT MT-ID is the single well-knownexperimental 3998, final valueallocatedassigned by IANA allocated from theOSPF, ISIS,LDPand PIMMT-IDspaces.space GADAG RootElectionSelection Priority:PickAmong therouterrouters in the MRT Island and with thelowesthighest priority advertised, an implementation MUST pick the router with the highest Router ID to be the GADAG root. ForwardingMechanisms for IP: Use IP-in-LDP. MRT Capabilities: Computes MRTs, IP Fast-Reroute,Mechanisms: LDPFast-RerouteRecalculation: Recalculation of MRTs SHOULD occur as described in Section 11.2. This allows the MRT forwarding topologies to support IP/LDP fast-reroute traffic. Area/Level Border Behavior: As described in Section 9, ABRs/LBRs SHOULD ensure that traffic leaving the area also exits the MRT-Red or MRT-Blue forwarding topology. The followingcaptures an initial understanding ofdescribes the aspectsthat mustto be considered tofully formdefine a profile to advertise. For some profiles, associated information may need to be distributed, such as GADAG RootElectionSelection Priority, Red MRT Loopback Address, Blue MRT LoopbackAddress, or MRT Algorithm ID.Address. MRTIsland Creation ID: This identifies the process that the router uses to form an MRT Island. By advertising an ID for the process, it is possible to have different processes in the future. It may be desirable to advertise a list ordered by preference to allow transitions. MRT Algorithm ID:Algorithm: This identifies the particular MRT algorithm used by therouter. By having anrouter for this profile. AlgorithmID, it is possible to change the algorithm used or use different ones in different networks. It maytransitions can bedesirable to advertise a list orderedmanaged bypreference to allow transitions. Redadvertising multiple MRT profiles. MRT-Red MT-ID: This specifies the MT-ID to be associated with theRed MRTMRT-Red forwarding topology. It is needed for use in LDP signaling. All routers in the MRT Island MUST agree on a value.Blue MRTMRT-Blue MT-ID: This specifies the MT-ID to be associated with theBlue MRTMRT-Blue forwarding topology. It is needed for use in LDP signaling. All routers in the MRT Island MUST agree on a value. GADAG RootElectionSelection Priority:This specifies the priority of the router for being used as the GADAG root of its island.AGADAG root is elected from the set of routers with the highest priority; ties are broken based upon highest Router ID. The sensitivity of theMRTAlgorithmsprofile might specify this toGADAG root selection is still being evaluated. This providesprovide the network operator with a knob to force a particular GADAG root selection. If not specified in the MRT profile, the highest Router ID in the profile's MRT Island will be elected the GADAG Root. If a GADAG Root Selection Priority is specified, then the MRT profile must also specify how the GADAG Root is elected. ForwardingMechanism for IP:Mechanism: This specifies which forwarding mechanisms the router supports forIPtransit traffic. An MRT island mustsupport a common set of forwarding mechanisms, which may be less thanprogram appropriate next-hops into thefull set advertised. Multipleforwardingmechanisms mayplane. The known options are IPv4, IPv6, LDP, and None. If IPv4 is supported, then both MRT-Red and MRT-Blue IPv4 Loopback Addresses SHOULD bespecified, such as IP-in-IPv4, IP-in-IPv6 or IP-in-LDP Label. Nonespecified. If IPv6 isalso an option. Red MRTsupported, both MRT-Red and MRT- Blue IPv6 Loopback Addresses SHOULD be specified. If LDP is supported, then LDP support and signaling extensions MUST be supported. MRT-Red Loopback Address: This provides the router's loopback address to reach the router via theRed MRTMRT-Red forwarding topology. It can, of course, be specified for both IPv4 and IPv6.Blue MRTMRT-Blue Loopback Address: This provides the router's loopback address to reach the router via theBlue MRTMRT-Blue forwarding topology. It can, of course, be specified for both IPv4 and IPv6.MRT Capabilities Available: This is the setRecalculation: As part ofcapabilities that the router is configured to support. MRT Capabilities Required: This iswhat process and timing should theset of capabilities that other routers must have available tonew MRTs beadded into the MRT island. MRT Capability: Computes MRTs: The router can compute MRTs. MRT Capability: IP Fast-Reroute: The router can use thecomputedMRTs for IP fast-reroute. MRT Capability: LDP Fast-Reroute: The router can useon a modified topology? Section 11.2 describes thecomputed MRTs for LDPminimum behavior required to support fast-reroute.MRT Capability: PIM Fast-Reroute: The router can useArea/Level Border Behavior: Should inter-area traffic on thecomputed MRTs for PIM fast-reroute. MRT Capability: mLDP Fast-Reroute: The router can useMRT- Blue or MRT-Red be put back onto thecomputed MRTs for mLDP fast-reroute. MRT Capability: PIM Global Protection: The router can useshortest path tree? Should it be swapped from MRT-Blue or MRT-Red in one area/level to MRT- Red or MRT-Blue in thecomputed MRTs for PIM Global Protection 1+1. MRT Capability: mLDP Global Protection: The router can usenext area/level to avoid thecomputed MRTs for mLDP Global Protection 1+1. The assumption is that a router will formpotential failure of anMRT island, compute MRTs within that island, and then use those MRTsABR? (See [I-D.atlas-rtgwg-mrt-mc-arch] for use- case details. Other Profile-Specific Behavior: Depending upon thepurposes specified in the profile. If multiple profiles are supported with different purposes (e.g. mLDP Global Protection), then the router may use a different profile and associated MRT island to be useduse-case for thepurposes in that different profile. If a router wanted to form multiple MRT islands for different application purposes, that couldprofile, there may bedone by specifying different Red MRT MT-ID and Blue MRT MT-IDs.additional profile-specific behavior. As with LFA, it is expected that OSPF Virtual Links will not be supported.7.8. Protocol Extensions and considerations: LDPCapability negotiation inThe protocol extensions for LDPis needed toare defined in [I-D.atlas-mpls-ldp-mrt]. A router must indicate that it has the ability to supportforMRT; having this explicit allows the use ofMRT-specific signaling extensions.MRT- specific processing, such as special handling of FECs sent with the Rainbow MRT MT-ID. Arouter also needs to indicate, viaFECadvertisement, whether it supports LDP Destination-Topology Labels, LDP Topology Labels, or both. Since the label or labels are swapped at each LSR, consistency acrosssent with thenetwork is not required. If both mechanisms are supported, then if a Destination-Topology label is provided for a FEC, that should be used soRainbow MRT MT-ID indicates thatan ABR/LBR can indicatetheappropriate labels, as discussed in Section Section 9. 8. Multi-homed Prefixes One advantage of LFAs that is necessaryFEC applies topreserve isall theabilityMRT-Blue and MRT-Red MT-IDs in supported MRT profiles as well as toprotect multi-homed prefixes against ABR failure. For instance, if a prefix fromthebackbonedefault shortest-path based MT-ID 0. The Rainbow MRT MT-ID isavailable via both ABR A and ABR B, if A fails, then the traffic should be redirected to B. This can also be done for backups via MRT. This generalizesdefined toany multi-homed prefix. A multi-homed prefix could be: o An out-of-area prefix announced by more than one ABR, o An AS-External route announced by 2 or more ASBRs, o A prefix with iBGP multipathprovide an easy way todifferent ASBRs, o etc. For each prefix,handle theattached ABRs are selected and a proxy-nodespecial signaling that iscreated connectedneeded at ABRs or LBRs. It avoids the problem of needing tothose ABRs. If there exist multiple multi-homed prefixes that sharesignal different MPLS labels for the sameconnectivity and costs to each of those ABRs, then a single proxy-node can beFEC. Because the Rainbow MRT MT-ID is usedto representonly by ABRs/LBRs or theset. An example of thisLDP egress, it isshown in Figure 3. 2 2 2 2 A----B----C A----B----C 2 | | 2 2 | | 2 | | | | [ABR1] [ABR2] [ABR1] [ABR2] | | | | p,10 p,15 10 |---[P]---| 15 (a) Initial topology (b)with proxy-node A<---B<---C A--->B--->C | ^ ^ | V | | V [ABR1] [ABR2] [ABR1] [ABR2] | | |-->[P] [P]<--| (c) Blue MRT (d) Rednot MRTFigure 3: Prefixes Advertised by Multiple ABRsprofile specific. Theproxy-nodes and associated links are added to the network topology after all real links have been assigned to a directionproposed experimental value is 3999 andbefore the actual MRTs are computed. Proxy-nodes cannot be transited when computing the MRTs. In addition to computingthepair of MRTs associated with each router destination D in the area, a pair of MRTs canfinal value will becomputed for each such proxy-node to fully protect against ABR failure. Each ABR or attaching router must remove the MRT marking[see Section 5]assigned by IANA andthen forward the traffic outside of the area (or island of MRT-fast-reroute-supporting routers). If ASBR protection is desired, this has additonal complexities ifallocated from theASBRsLDP MT-ID space. The authoritative values are given indifferent areas. Similarly, protecting labeled BGP traffic in the event of an ASBR failure has additional complexities due to the per-ASBR label spaces involved.[I-D.atlas-mpls-ldp-mrt]. 9. Inter-Area and ABR Forwarding BehaviorIn regular forwarding, packets destined outside the area arrive at the ABR and the ABRAn ABR/LBR has two forwarding roles. First, it forwardsthemtraffic inside its area. Second, it forwards traffic from one area into another. These same two roles apply for MRT transit traffic. Traffic on MRT-Red or MRT-Blue destined inside theotherareabecause the next-hops fromneeds to stay on MRT-Red or MRT-Blue in that area. However, it is desirable for traffic leaving the areawith the best route (accordingtotie- breaking rules) are used by the ABR. The question is then whatalso exit MRT-Red or MRT-Blue back todo with packets marked with an MRT that are received bytheABR.shortest-path forwarding. For unicastfast-reroute,MRT-FRR, the need to stay on an MRT forwarding topology terminates at the ABR/LBR whose best route is via a differentarea/level.area/ level. It is highly desirable to go back to the default forwarding topology when leaving an area/level. There are three basic reasons for this. First, the default topology uses shortest paths; the packet will thus take the shortest possible route to the destination. Second, this allows failures that might appear in multiple areas (e.g. ABR/LBR failures) to be separately identified and repaired around. Third, the packet can befast- reroutedfast-rerouted again, if necessary, due to a failure in a different area. An ABR/LBR that receives a packetmarked with an MRTon MRT-Red or MRT-Blue towards a destination in another area/level should forward theMRT markedpacket in the area/level with the best route alongits associated MRT.MRT-Red or MRT-Blue. If the packet came from that area/level, this correctly avoids the failure.How does anHowever, if the traffic came from a different area/level, the packet should be removed from MRT-Red or MRT-Blue and forwarded on the shortest-path default forwarding topology. To avoid per-interface forwarding state for MRT-Red and MRT-Blue, the ABR/LBRensureneeds to arrange thatMRT-markedpacketsdo notdestined to a different area arrive at theABR/LBR? There are two different mechanisms depending upon theABR/LBR already not marked as MRT-Red or MRT-Blue. For LDP forwardingmechanism being used. Ifwhere theLDPMPLS labelencodes the MT-ID as well as the destination, thenspecifies (MT-ID, FEC), the ABR/LBR is responsible for advertisinga particularthe proper label to each neighbor.Additionally, an LDP label is associated with an MT-ID due to the MT FEC that was used and not due to any intrisic particular value for the label.Assume that an ABR/LBR has allocated three labels for a particular destination; those labels are L_primary, L_blue, and L_red. When the ABR/LBR advertises label bindings to routers in the area with the best route to the destination, theABR/ LBRABR/LBR provides L_primary for the default topology, L_blue for theBlue MRTMRT-Blue MT-ID and L_red for theRed MRTMRT-Red MT-ID, exactly as expected. However, when the ABR/LBR advertises label bindings to routers in other areas, theABR/LBRABR/ LBR advertises L_primary for the Rainbow MRT MT-ID, which is then used for the default topology, for theBlue MRT MT-ID,MRT-Blue MT-ID and for theRed MRTMRT-Red MT-ID. The ABR/LBR installs all next-hops from the bestareaarea: primary next- hops forL_primary based on the default topology,L_primary, MRT-Blue next-hops forL_blue based on the Blue MRT forwarding topology,L_blue, and MRT-Red next- hops forL_red based onL_red. Because theRedABR/LBR advertised (Rainbow MRTforwarding topology. Therefore,MT-ID, FEC) with L_primary to neighbors not in the best area, packets fromthe non-best areathose neighbors will arrive at the ABR/LBR with a label L_primary and will be forwarded into the best area along the default topology. By controlling what labels are advertised, the ABR/LBR can thus enforce that packets exiting the area do so on the shortest-path default topology. IfIP-in-IPIP forwarding is used, then the ABR/LBR behavior is dependent upon the outermost IP address. If the outermost IP address is an MRT loopback address of the ABR/LBR, then the packet is decapsulated and forwarded based upon the inner IP address, which should go on the default SPT topology. If the outermost IP address is not an MRT loopback address of the ABR/LBR, then the packet is simply forwarded along the associated forwarding topology. A PLR sending traffic to a destination outside its local area/level will pick the MRT and use the associated MRT loopback address of theABR/ LBR immediately beforeselected ABR/LBR connected to theproxy-node on that MRT.external destination. Thus, regardless of which of these two forwarding mechanisms are used, there is no need for additional computation or per-area forwarding state. +----[C]---- --[D]--[E] --[D]--[E] | \ / \ / \ p--[A] Area 10 [ABR1] Area 0 [H]--p +-[ABR1] Area 0 [H]-+ | / \ / | \ / | +----[B]---- --[F]--[G] | --[F]--[G] | | | | other | +----------[p]-------+ area (a) Example topology (b) Proxy node view in Area 0 nodes +----[C]<--- [D]->[E] V \ \ +-[A] Area 10 [ABR1] Area 0 [H]-+ | ^ / / | | +----[B]<--- [F]->[G] V | | +------------->[p]<--------------+ (c) rSPT towards destination p ->[D]->[E] -<[D]<-[E] / \ / \ [ABR1] Area 0 [H]-+ +-[ABR1] [H] / | | \ [F]->[G] V V -<[F]<-[G] | | | | [p]<------+ +--------->[p] (d) Blue MRT in Area 0 (e) Red MRT in Area 0 Figure4:3: ABR Forwarding Behavior and MRTs The otherpotentialforwardingmechanismsmechanism described in Section 6 is using Topology-Identification Labels. This mechanism would requireadditional computation by the penultimatethat any routeralong the in-local-area MRT immediately before the ABR/LBRwhose MRT-Red or MRT-Blue next-hop isreached. The penultimate router canan ABR/LBR would need to determinethatwhether the ABR/LBRwillwould forward the packet out ofarea/ level and, in that case,thepenultimatearea/level. If so, then that routercan removeshould pop off theMRT marking but still forwardtopology- identification label before forwarding the packetalong the MRT next-hoptoreachtheABR.ABR/LBR. Forinstance,example, in Figure4,3, if node H fails, node E has to put traffic towards prefix p ontothe red MRT.MRT-Red. But since node D knows that ABR1 will use a best from another area, it is safe for D toremovepop theMRT markingTopology- Identification Label and justsendforward the packet to ABR1still onalong thered MRT but unmarked.MRT-Red next-hop. ABR1 will use the shortest path in Area 10. In all cases for ISIS and most cases for OSPF, the penultimate router can determine what decision the adjacent ABR will make. The one case where it can't be determined is when two ASBRs are in different non- backbone areas attached to the same ABR, then the ASBR's Area ID may be needed for tie-breaking (prefer the route with the largest OPSF area ID) and the Area ID isn't announced as part of the ASBR link- state advertisement (LSA). In this one case, suboptimal forwarding along the MRT in the other area would happen. Ifthis isthat becomes a realistic deployment scenario, OSPF extensions could be considered. This is not covered in [I-D.atlas-ospf-mrt]. 10.Issues with Area AbstractionPrefixes Multiply Attached to the MRTfast-reroute provides complete coverage inIsland How aarea thatcomputing router S determines its local MRT Island for each supported MRT profile is2-connected. Where a failure would partition the network,already discussed in Section 7. There are two types ofcourse, no alternate can protect againstprefixes or FECs thatfailure. Similarly, theremay be multiply attached to an MRT Island. The first type areways of connectingmulti-homed prefixes thatmake it impractical to protect them without excessive complexity. 50 |----[ASBR Y]---[B]---[ABR 2]---[C] Backbone Area 0: | | ABR 1, ABR 2, C, D | | | | Area 20: A, ASBR X | | p ---[ASBR X]---[A]---[ABR 1]---[D] Area 10: B, ASBR Y 5 p isusually connect at aType 1 AS-external Figure 5: AS external prefixes in different areas Considerdomain or protocol boundary. The second type represent routers that do not support thenetworkprofile for the MRT Island. The key difference is whether the traffic, once out of the MRT Island, remains inFigure 5the same area/level andassume there ismight reenter the MRT Island if aricher connective topologyloop-free exit point is not selected. One property of LFAs thatisn't shown, whereis necessary to preserve is thesameability to protect multi-homed prefixes against ABR failure. For instance, if a prefix from the backbone isannounced by ASBR Xavailable via both ABR A and ABR B, if A fails, then the traffic should be redirected to B. This can also be done for backups via MRT. If ASBRY whichprotection is desired, this has additonal complexities if the ASBRs are in differentnon-backboneareas.IfSimilarly, protecting labeled BGP traffic in thelink from A to ASBR X fails, thenevent of anMRT alternate could forward the packet to ABR 1 and ABR 1 could forward itASBR failure has additional complexities due toD, but then D would findtheshortestper-ASBR label spaces involved. As discussed in [RFC5286], a multi-homed prefix could be: o An out-of-area prefix announced by more than one ABR, o An AS-External routeis back via ABR 1announced by 2 or more ASBRs, o A prefix with iBGP multipath toArea 20.different ASBRs, o etc. There are also two different approaches to protection. Theonly real wayfirst is toget it from Ado endpoint selection toASBR Y ispick a router toexplicitlytunnelittoASBR Y. Tunnelling to the backup ASBRwhere that router isfor future consideration. The previously proposed PHP approach needs to have an exception if BGP policies (e.g. BGP local preference) determines which ASBRloop-free with respect touse. Considerthecase in Figure 6. Iffailure-point. Conceptually, thelink between A and ASBR X (the preferred border router) fails, A can put the packetsset of candidate routers top ontoprovide LFAs expands to all routers, with an MRT alternate,even tunnel it towards ASBR Y. Node B, however, must not removeattached to theMRT marking in this case, as nodes in Area 0, including ASBR Y itself would not know that their preferred ASBR is down. Area 20 BB Area 0 p ---[ASBR X]-X-[A]---[B]---[ABR 1]---[D]---[ASBR Y]--- p BGP prefers ASBR X for prefix p Figure 6: Failure of path towards ASBR preferred by BGPprefix. Thefine details of howsecond is tosolve multi-area external prefix cases,use a proxy-node, that can be named via MPLS label oridentifying certain cases as too unlikelyIP address, andtoo complexpick the appropriate label or IP address toprotect is for further consideration. 11. Partial Deployment and Islands of Compatible MRT FRR routersreach it on either MRT-Blue or MRT-Red as appropriate to avoid the failure point. Anatural concern with new functionalityproxy-node can represent a destination prefix that can be attached to the MRT Island via at least two routers. It ishowtermed a named proxy-node if there is a way that traffic can be encapsulated tohave itreach specifically that proxy-node; this could beuseful when itbecause there isnot deployed acrossanentire IGP area. InLDP FEC for thecase of MRT FRR, where it provides alternates when appropriate LFAs aren't available, thereassociated prefix or because MRT-Red and MRT-Blue IP addresses arealso deployment scenarios where it may make sense to only enable some routersadvertised in anarea with MRT FRR. A simple example of suchas-yet undefined fashion for that proxy-node. Traffic to ascenario would benamed proxy-node may take aring of 6 or more routersdifferent path than traffic to the attaching router; traffic is also explicitly forwarded from the attaching router along a predetermined interface towards the relevant prefixes. For IP traffic, multi-homed prefixes can use endpoint selection. For IP traffic that isconnected via two routersdestined to a router outside therest ofMRT Island, if that router is thearea. First,egress for a FEC advertised into the MRT Island, then the named proxy-node approach can be used. For LDP traffic, there is always a FEC advertised into the MRT Island. The named proxy-node approach should be used, unless the computing router Smust determine its local island of compatible MRT fast-reroute routers. A router that hasknows the label for the FEC at the selected endpoint. If acommonFEC is advertised from outside the MRT Island into the MRT Island and the forwarding mechanism specified in the profileflagincludes LDP, then the routers learning that FEC MUST also advertise labels for (MRT-Red, FEC) andis connected either(MRT-Blue, FEC) toS orneighbors inside the MRT Island. If the forwarding mechanism includes LDP, any router receiving a FEC corresponding toanothera routeralready determinedoutside the MRT Island or to a multi-homed prefix MUST compute and install the transit MRT-Blue and MRT-Red next-hops for that FEC; the associated FECs ( (MT-ID 0, FEC), (MRT-Red, FEC), and (MRT-Blue, FEC)) MUST also bein S's local island can be addedprovided via LDP toS's local island. Destinationsneighbors inside thelocal island can obviously useMRTalternates. Destinations outside theIsland. 10.1. Endpoint Selection Endpoint Selection is a localisland can be treated likematter for amulti-homed prefix with caveatsrouter in the MRT Island since it pertains toavoid looping. For LDP labels including both destinationselecting andtopology,using an alternate and does not affect therouters attransit MRT-Red and MRT-Blue forwarding topologies. Let theborders ofcomputing router be S and thelocal island neednext-hop F be the node whose failure is tooriginate labelsbe avoided. Let the destination be prefix p. Have A be the router to which the prefix p is attached for S's shortest path to p. The candidates for endpoint selection are those to which theoriginal FEC anddestination prefix is attached in theassociated MRT-specific labels. Packets sentarea/level. For a particular candidate B, it is necessary toan LDP label marked as bluedetermine if B is loop-free to reach p with respect to S and F for node-protection orred MRTat least with respect toa destination outsideS and thelocal islandlink (S, F) for link-protection. If B willhavealways prefer to send traffic to p via a different area/level, then this is definitional. Otherwise, distance-based computations are necessary and an SPF from B's perspective may be necessary. The following equations give thelast routerchecks needed; the rationale is similar to that given in [RFC5286]. Loop-Free for S: D_opt(B, p) < D_opt(B, S) + D_opt(S, p) Loop-Free for F: D_opt(B, p) < D_opt(B, F) + D_opt(F, p) The latter is equivalent to thelocal island swapfollowing, which avoids thelabelneed toonecompute the shortest path from F to p. Loop-Free for F: D_opt(B, p) < D_opt(B, F) + D_opt(S, p) - D_opt(S, F) Finally, thedestinationrules for Endpoint selection are given below. The basic idea is to repair to the prefix-advertising router selected for the shortest-path andforwardonly to select and tunnel to a different endpoint if necessary (e.g. A=F or F is a cut-vertex or the link (S,F) is a cut-link). 1. Does S have a node-protecting alternate to A? If so, select that. Tunnel the packet to A along that alternate. For example, if LDP is theoutgoing interface onforwarding mechanism, then push theMRT towardslabel (MRT-Red, A) or (MRT-Blue, A) onto the packet. 2. If not, then is there a routeroutside the local islandB thatwas represented by the proxy-node. For IP in IP encapsulations, remote destinations' loopback addresses foris loop-free to reach p while avoiding both F and S? If so, select B as theMRTs cannot be used, even if they were available. Instead,end-point. Determine the MRTloopback address ofalternate to reach B while avoiding F. Tunnel therouter attachedpacket to B along that alternate. For example, with LDP, push the label (MRT-Red, B) or (MRT-Blue, B) onto the packet. 3. If not, then does S have a link-protecting alternate to A? If so, select that. 4. If not, then is there aproxy-node, which represents destinations outsiderouter B that is loop-free to reach p while avoiding S and thelocal island, can be used. Packets sentlink from S to F? If so, select B as the endpoint and therouter'sMRTloopback addressalternate that for reaching B from S avoiding the link (S,F). The endpoint selected will receive a packet destined to itself and, being the egress, will pop that MPLS label (or havetheir outer IP header removedsignaled Implicit Null) andwill needforward based on what is underneath. This suffices for IP traffic where the MPLS labels understood by the endpoint router are not needed. 10.2. Named Proxy-Nodes A clear advantage to using a named proxy-node is that it is possible tobeexplicitlyforwarded along the outgoing interface onforward from the MRTtowardsIsland along an interface to arouter outside the localloop-free island neighbor (LFIN) when thatwas represented byinterface may not be a primary next-hop. For LDP traffic where theproxy-node. This behavior requires essentially rememberinglabel indicates both theMT-ID indicated bytopology and theouter IP address. An alternate optionFEC, it is necessary to either use a named proxy- node or deal with learning remote MPLS labels. A named proxy-node represents one or more destinations and, for LDP forwarding, has a FEC associated with it that is signaled into the MRT Island. Therefore, it is possible to explicitly label packets to go to (MRT-Red, FEC) or (MRT-Blue, FEC); at the border of the MRT Island, the label will swap to meaning (MT-ID 0, FEC). It would be possible toadvertise different loopbackhave named proxy-nodes for IP forwarding, but this would require extensions to signal two IP addresses to be associated with MRT-Red and MRT-Blue for theproxy-node;proxy-node. A named proxy-node can be uniquely represented by theoutertwo routers in the MRT Island to which it is connected. The extensions to signal such IPaddress would stilladdresses are not defined in [I-D.atlas-ospf-mrt]. The details of what label-bindings must beremoved but it would indicateoriginated are described in [I-D.atlas-mpls-ldp-mrt]. Computing theoutgoing interfaceMRT next-hops tousea named proxy-node andno lookup would be necessary ontheinternal IP address while maintaining MT-ID context.MRT alternate for the computing router S to avoid a particular failure node F is extremely straightforward. The details of the simple constant-time functions, Select_Proxy_Node_NHs() and Select_Alternates_Proxy_Node(), are given in [I-D.enyedi-rtgwg-mrt-frr-algorithm]. A key point is that computing these MRT next-hops and alternates can be done as new named proxy- nodes are added or removed without requiring a new MRT computation or impacting other existing MRT paths. This maps very well to, for example, how OSPFv2 [[RFC2328] Section 16.5] does incremental updates for new summary-LSAs. The key question iswhichhow to attach the named proxy-node to the MRT Island; all the routersoutsidein the MRTislandIsland MUST do this consistently. No more than 2 routers in the MRT Island canpacketsbeforwarded to so that theyselected; one should only be selected if there arenot forwarded back intono others that meet theMRT island. An examplenecessary criteria. The named proxy-node is logically part of thenecessary network graph transformations are given in Figure 7.area/level. There are twopartssources for candidate routers in the MRT Island to connect to thecomputation. First,named proxy-node. The first set are those routers that are advertising theMRT islandprefix; the cost assigned to each such router iscollapsed into a single node; this assumes thatthe announced costof transitingto the prefix. The second set are those routers in the MRTisland is nothing and is pessimistic but allows for simpler computation. Then, for each destination (other thanIsland that are connected to routers not in the MRTisland),Island but in the same area/level; such routersadjacentwill be defined as Island Border Routers (IBRs). The routers connected to the IBRs that are not in the MRTislandIsland and arechecked to see if theyin the same area/level are Island Neighbors (INs). Since packets sent to the named proxy-node along MRT-Red or MRT-Blue may come from any router inside the MRT Island, it is necessary that whatever router to which an IBR forwards the packet be loop-free withrespectregard to the whole MRTisland andIsland for the destination. Thus, an IBR is a candidate router only if it possesses at least one IN whose path to the prefix does not enter the MRT Island. Theloop-free neighborscost assigned to each (IBR, IN) pair is the D_opt(IN, prefix) plus Cost(IBR, IN). From the set of prefix-advertising routers and theMRT island thatIBRs, the two lowest cost routers areclosest toselected and ties are broken based upon thedestinationlowest Router ID. For ease of discussion, such selected routers areselected. Then,proxy-node attachment routers and the two selected will be named A and B. A proxy-node attachment router has agraph of justspecial forwarding role. When a packet is received destined to (MRT-Red, prefix) or (MRT-Blue, prefix), if theMRT islandproxy-node attachment router isaugmented with proxy-nodes that are attached viaan IBR, it MUST swap to theoutgoing interfacesdefault topology (e.g. swap to theselected loop-free neighbors. Finally,label for (MT-ID 0, prefix) or remove the outer IP encapsulation) and forward the packet to the IN whose cost was used in the selection. If theMRTs rooted at eachproxy-nodeare computed on that augmented MRT island graph. Essentially,attachment router is not an IBR, then the packet MUST be removed from the MRTisland must have aforwarding topology and sent along the interface that caused the router to advertise the prefix; this interface might be out of the area/level/AS. 10.2.1. Computing if an Island Neighbor (IN) is loop-freeneighborAs discussed, the Island Neighbor needs to beableloop-free with regard tohave an alternate. [G]---[E]---(B)---(C)---(D)the whole MRT Island for the destination. Conceptually, the cost of transiting the MRT Island should be regarded as 0. This can be done by collapsing the MRT Island into a single node, as seen in Figure 4, and then computing SPFs from each Island Neighbor and from the MRT Island itself. [G]---[E]---(V)---(U)---(T) | \ | | | | \ | | | | \ | | |[H]---[F]---(A)---(S)----|[H]---[F]---(R)---(S)----| (1) Network Graph with Partial Deployment [E],[F],[G],[H] : No support forMRT-FRR (A),(B),(C),(D),(S):MRT (R),(S),(T),(U),(V): MRT Island - supportsMRT-FRRMRT [G]---[E]----||---(B)---(C)---(D)|---(V)---(U)---(T) | \ | | | | | | \ | ( MRT Island ) [ proxy ] | | | \ | | | | | [H]---[F]----||---(A)---(S)----||---(R)---(S)----| (2) Graph for determining (3) Graph for MRT computation loop-free neighbors Figure7:4: Computing alternates to destinations outside the MRT Island The simple way to do this without manipulating the topology is to compute the SPFs from each IN and a node in the MRT Island (e.g. the GADAG root), but use a link metric of 0 for all links between routers in the MRT Island. The distances computed via SPF this way will be refered to as Dist_mrt0. An IN is loop-free with respect to a destination D if: Dist_mrt0(IN, D) < Dist_mrt0(IN, MRT Island Router) + Dist_mrt0(MRT Island Router, D). Any router in the MRT Island can be used since the cost of transiting between MRT Island routers is 0. The GADAG Root is recommended for consistency. 10.3. MRT Alternates for Destinations Outside the MRT Island A natural concern with new functionality is how to have it be useful when it is not deployed across an entire IGP area. In the case of MRT FRR, where it provides alternates when appropriate LFAs aren't available, there are also deployment scenarios where it may make sense to only enable some routers in an area with MRT FRR. A simple example of such a scenario would be a ring of 6 or more routers that is connected via two routers to the rest of the area. Destinations inside the local island can obviously use MRT alternates. Destinations outside the local island can be treated like a multi-homed prefix and either Endpoint Selection or Named Proxy-Nodes can be used. Named Proxy-Nodes MUST be supported when LDP forwarding is supported and a label-binding for the destination is sent to an IBR. Naturally, there are more complicated options to improve coverage, such as connecting multiple MRT islands across tunnels, butit is not clear thatthe need for the additional complexityis necessary. 12.has not been justified. 11. Network Convergence and Preparing for the Next Failure After a failure, MRT detours ensure that packets reach their intended destination while the IGP has not reconverged onto the new topology. As link-state updates reach the routers, the IGP process calculates the new shortest paths. Two things need attention: micro-loop prevention and MRT re-calculation.12.1.11.1. Micro-forwarding loop prevention and MRTs As is well known[RFC5715], micro-loops can occur during IGP convergence; such loops can be local to the failure or remote from the failure. Managing micro-loops is an orthogonal issue to having alternates for local repair, such as MRT fast-reroute provides. There are two possible micro-loop preventionmechanismmechanisms discussed in [RFC5715]. The first is Ordered FIB [I-D.ietf-rtgwg-ordered-fib]. The second is Farside Tunneling which requires tunnels or an alternate topology to reach routers on the farside of the failure. Since MRTs provide an alternate topology through which traffic can be sent and which can be manipulated separately from the SPT, it is possible that MRTs could be used to support Farside Tunneling. Details of how to do so are outside the scope of this document.12.2.Micro-loop mitigation mechanisms can also work when combined with MRT. 11.2. MRT Recalculation When a failure event happens, traffic is put by the PLRs onto the MRT topologies. After that, each router recomputes its shortest path tree (SPT) and moves traffic over to that. Only after all the PLRs have switched to using their SPTs and traffic has drained from the MRT topologies should each router install the recomputed MRTs into the FIBs. At each router, therefore, the sequence is as follows: 1. Receive failure notification 2. Recompute SPT 3. Install new SPT 4.Recompute MRTs 5. WaitIf the network was stable before the failure occured, wait a configured (or advertised) period for all routers to be using their SPTs and traffic to drain from the MRTs. 5. Recompute MRTs 6. Install new MRTs. While the recomputed MRTs are not installed in the FIB, protection coverage is lowered. Therefore, it is important to recalculate the MRTs and install them quickly.13.12. Acknowledgements The authors would like to thank Mike Shand for his valuable review and contributions. The authors would like to thank Joel Halpern, Hannes Gredler,Jeff Tantsura,Ted Qian, Kishore Tiruveedhula, Shraddha Hegde, Santosh Esale, Nitin Bahadur, HarishSitaraman andSitaraman, Raveendra Torvi and Chris Bowers for their suggestions and review.14.13. IANA Considerations This doument includes no request to IANA.15.14. Security Considerations This architecture is not currently believed to introduce new security concerns.16.15. References16.1.15.1. Normative References [I-D.enyedi-rtgwg-mrt-frr-algorithm] Atlas, A., Envedi, G., Csaszar, A.,and A.Gopalan, A., and C. Bowers, "Algorithms for computing Maximally Redundant Trees for IP/LDP Fast- Reroute",draft-enyedi-rtgwg-mrt-frr-algorithm-02draft-enyedi-rtgwg-mrt- frr-algorithm-03 (work in progress),October 2012.July 2013. [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast Reroute: Loop-Free Alternates", RFC 5286, September 2008. [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", RFC 5714, January 2010.16.2.15.2. Informative References [EnyediThesis] Enyedi, G., "Novel Algorithms for IP Fast Reroute", Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics Ph.D. Thesis, February 2011, <http://timon.tmit.bme.hu/theses/thesis_book.pdf>. [I-D.atlas-mpls-ldp-mrt] Atlas, A., Tiruveedhula, K., Tantsura, J., and IJ. Wijnands, "LDP Extensions to Support Maximally Redundant Trees", draft-atlas-mpls-ldp-mrt-00 (work in progress), July 2013. [I-D.atlas-ospf-mrt] Atlas, A., Hegde, S., Chris, C., and J. Tantsura, "OSPF Extensions to Support Maximally Redundant Trees", draft- atlas-ospf-mrt-00 (work in progress), July 2013. [I-D.atlas-rtgwg-mrt-mc-arch] Atlas, A., Kebler, R., Wijnands, I., Csaszar, A., and G. Envedi, "An Architecture for Multicast Protection Using Maximally Redundant Trees",draft-atlas-rtgwg-mrt-mc-arch-00draft-atlas-rtgwg-mrt-mc- arch-02 (work in progress),March 2012.July 2013. [I-D.bryant-ipfrr-tunnels] Bryant, S., Filsfils, C., Previdi, S., and M. Shand, "IP Fast Reroute using tunnels", draft-bryant-ipfrr-tunnels-03 (work in progress), November 2007. [I-D.ietf-mpls-ldp-multi-topology] Zhao, Q., Fang, L., Zhou, C., Li, L., and K. Raza, "LDP Extensions for Multi Topology Routing",draft-ietf-mpls-ldp-multi-topology-06draft-ietf-mpls- ldp-multi-topology-08 (work in progress),December 2012.May 2013. [I-D.ietf-rtgwg-ipfrr-notvia-addresses] Bryant, S., Previdi, S., and M. Shand, "A Framework for IP and MPLS Fast Reroute Using Not-via Addresses",draft-ietf-rtgwg-ipfrr-notvia-addresses-10draft- ietf-rtgwg-ipfrr-notvia-addresses-11 (work in progress),December 2012. [I-D.ietf-rtgwg-lfa-applicability] Filsfils, C. and P. Francois, "LFA applicability in SP networks", draft-ietf-rtgwg-lfa-applicability-06 (work in progress), January 2012.May 2013. [I-D.ietf-rtgwg-ordered-fib] Shand, M., Bryant, S., Previdi, S., Filsfils, C., Francois, P., and O. Bonaventure, "Framework for Loop-free convergence using oFIB",draft-ietf-rtgwg-ordered-fib-09draft-ietf-rtgwg-ordered-fib-12 (work in progress),JanuaryMay 2013. [I-D.ietf-rtgwg-remote-lfa] Bryant, S., Filsfils, C., Previdi, S., Shand, M., and S. Ning, "Remote LFA FRR",draft-ietf-rtgwg-remote-lfa-01draft-ietf-rtgwg-remote-lfa-02 (work in progress),December 2012.May 2013. [I-D.litkowski-rtgwg-node-protect-remote-lfa] Litkowski, S., "Node protecting remote LFA", draft- litkowski-rtgwg-node-protect-remote-lfa-00 (work in progress), April 2013. [LFARevisited] Retvari, G., Tapolcai, J., Enyedi, G., and A. Csaszar, "IP Fast ReRoute: Loop Free Alternates Revisited", Proceedings of IEEE INFOCOM , 2011,<http://opti.tmit.bme.hu/ ~tapolcai/papers/retvari2011lfa_infocom.pdf>.<http://opti.tmit.bme.hu/~tapolcai /papers/retvari2011lfa_infocom.pdf>. [LightweightNotVia] Enyedi, G., Retvari, G., Szilagyi, P., and A. Csaszar, "IP Fast ReRoute: Lightweight Not-Via without Additional Addresses", Proceedings of IEEE INFOCOM , 2009, <http://mycite.omikk.bme.hu/doc/71691.pdf>. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. [RFC3137] Retana, A., Nguyen, L., White, R., Zinin, A., and D. McPherson, "OSPF Stub Router Advertisement", RFC 3137, June 2001. [RFC5443] Jork, M., Atlas, A., and L. Fang, "LDP IGP Synchronization", RFC 5443, March 2009. [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free Convergence", RFC 5715, January 2010. [RFC6571] Filsfils, C., Francois, P., Shand, M., Decraene, B., Uttaro, J., Leymann, N., and M. Horneffer, "Loop-Free Alternate (LFA) Applicability in Service Provider (SP) Networks", RFC 6571, June 2012. Appendix A. General Issues with Area Abstraction When a multi-homed prefix is connected in two different areas, it may be impractical to protect them without adding the complexity of explicit tunneling. This is also a problem for LFA and Remote-LFA. 50 |----[ASBR Y]---[B]---[ABR 2]---[C] Backbone Area 0: | | ABR 1, ABR 2, C, D | | | | Area 20: A, ASBR X | | p ---[ASBR X]---[A]---[ABR 1]---[D] Area 10: B, ASBR Y 5 p is a Type 1 AS-external Figure 5: AS external prefixes in different areas Consider the network in Figure 5 and assume there is a richer connective topology that isn't shown, where the same prefix is announced by ASBR X and ASBR Y which are in different non-backbone areas. If the link from A to ASBR X fails, then an MRT alternate could forward the packet to ABR 1 and ABR 1 could forward it to D, but then D would find the shortest route is back via ABR 1 to Area 20. This problem occurs because the routers, including the ABR, in one area are not yet aware of the failure in a different area. The only way to get it from A to ASBR Y is to explicitly tunnel it to ASBR Y. If the traffic is unlabeled or the appropriate MPLS labels are known, then explicit tunneling MAY be used as long as the shortest-path of the tunnel avoids the failure point. In that case, A must determine that it should use an explicit tunnel instead of an MRT alternate. Authors' Addresses Alia Atlas (editor) Juniper Networks 10 Technology Park Drive Westford, MA 01886 USA Email: akatlas@juniper.net Robert Kebler Juniper Networks 10 Technology Park Drive Westford, MA 01886 USA Email: rkebler@juniper.net Gabor Sandor Enyedi Ericsson Konyves Kalman krt 11. Budapest 1097 Hungary Email: Gabor.Sandor.Enyedi@ericsson.com Andras Csaszar Ericsson Konyves Kalman krt 11 Budapest 1097 Hungary Email: Andras.Csaszar@ericsson.com Jeff Tantsura Ericsson 300 Holger Way San Jose, CA 95134 USA Email: jeff.tantsura@ericsson.com Maciek Konstantynowicz Cisco Systems Email: maciek@bgp.nu Russ WhiteVerisign 12061 Bluemont Way Reston, VA 20190 USA Email: riwhite@verisign.com Mike ShandVCE Email:mike@mshand.org.ukrussw@riw.us