idnits 2.17.1 draft-atlas-rtgwg-mrt-mc-arch-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 194 has weird spacing: '...wo MRTs found...' == Line 477 has weird spacing: '...wo MRTs found...' -- The document date (July 12, 2013) is 3939 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'R' is mentioned on line 470, but not defined == Missing Reference: 'F' is mentioned on line 470, but not defined == Missing Reference: 'C' is mentioned on line 470, but not defined == Missing Reference: 'G' is mentioned on line 470, but not defined == Missing Reference: 'E' is mentioned on line 467, but not defined == Missing Reference: 'D' is mentioned on line 467, but not defined == Missing Reference: 'J' is mentioned on line 467, but not defined == Missing Reference: 'A' is mentioned on line 473, but not defined == Missing Reference: 'B' is mentioned on line 473, but not defined == Missing Reference: 'H' is mentioned on line 473, but not defined == Missing Reference: 'S' is mentioned on line 1080, but not defined == Missing Reference: 'PLR' is mentioned on line 1080, but not defined == Unused Reference: 'I-D.wijnands-mpls-mldp-node-protection' is defined on line 1274, but no explicit reference was found in the text == Outdated reference: A later version (-04) exists of draft-enyedi-rtgwg-mrt-frr-algorithm-03 ** Downref: Normative reference to an Informational draft: draft-enyedi-rtgwg-mrt-frr-algorithm (ref. 'I-D.enyedi-rtgwg-mrt-frr-algorithm') == Outdated reference: A later version (-10) exists of draft-ietf-rtgwg-mrt-frr-architecture-03 ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) == Outdated reference: A later version (-08) exists of draft-ietf-rtgwg-mofrr-02 == Outdated reference: A later version (-04) exists of draft-iwijnand-mpls-mldp-multi-topology-03 == Outdated reference: A later version (-01) exists of draft-kebler-pim-mrt-protection-00 Summary: 2 errors (**), 0 flaws (~~), 21 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing Area Working Group A. Atlas, Ed. 3 Internet-Draft R. Kebler 4 Intended status: Standards Track Juniper Networks 5 Expires: January 13, 2014 IJ. Wijnands 6 Cisco Systems, Inc. 7 A. Csaszar 8 G. Enyedi 9 Ericsson 10 July 12, 2013 12 An Architecture for Multicast Protection Using Maximally Redundant Trees 13 draft-atlas-rtgwg-mrt-mc-arch-02 15 Abstract 17 Failure protection is desirable for multicast traffic, whether 18 signaled via PIM or mLDP. Different mechanisms are suitable for 19 different use-cases and deployment scenarios. This document 20 describes the architecture for global protection (aka multicast live- 21 live) and for local protection (aka fast-reroute). 23 The general methods for global protection and local protection using 24 alternate-trees are dependent upon the use of Maximally Redundant 25 Trees. Local protection can also tunnel traffic in unicast tunnels 26 to take advantage of the routing and fast-reroute mechanisms 27 available for IP/LDP unicast destinations. 29 The failures protected against are single link or node failures. 30 While the basic architecture might support protection against shared 31 risk group failures, algorithms to dynamically compute MRTs 32 supporting this are for future study. 34 Status of this Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 13, 2014. 50 Copyright Notice 52 Copyright (c) 2013 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 1.1. Maximally Redundant Trees (MRTs) . . . . . . . . . . . . . 4 69 1.2. MRTs and Multicast . . . . . . . . . . . . . . . . . . . . 6 70 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 71 3. Use-Cases and Applicability . . . . . . . . . . . . . . . . . 8 72 4. Global Protection: Multicast Live-Live . . . . . . . . . . . . 9 73 4.1. Creation of MRMTs . . . . . . . . . . . . . . . . . . . . 10 74 4.2. Traffic Self-Identification . . . . . . . . . . . . . . . 11 75 4.2.1. Merging MRMTs for PIM if Traffic Doesn't 76 Self-Identify . . . . . . . . . . . . . . . . . . . . 12 77 4.3. Convergence Behavior . . . . . . . . . . . . . . . . . . . 13 78 4.4. Inter-area/level Behavior . . . . . . . . . . . . . . . . 14 79 4.4.1. Inter-area Node Protection with 2 border routers . . . 15 80 4.4.2. Inter-area Node Protection with > 2 Border Routers . . 16 81 4.5. PIM . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 82 4.5.1. Traffic Handling: RPF Checks . . . . . . . . . . . . . 17 83 4.6. mLDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 84 5. Local Repair: Fast-Reroute . . . . . . . . . . . . . . . . . . 17 85 5.1. PLR-driven Unicast Tunnels . . . . . . . . . . . . . . . . 18 86 5.1.1. Learning the MPs . . . . . . . . . . . . . . . . . . . 19 87 5.1.2. Using Unicast Tunnels and Indirection . . . . . . . . 19 88 5.1.3. MP Alternate Traffic Handling . . . . . . . . . . . . 20 89 5.1.4. Merge Point Reconvergence . . . . . . . . . . . . . . 21 90 5.1.5. PLR termination of alternate traffic . . . . . . . . . 21 91 5.2. MP-driven Unicast Tunnels . . . . . . . . . . . . . . . . 21 92 5.3. MP-driven Alternate Trees . . . . . . . . . . . . . . . . 22 93 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 94 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 95 8. Security Considerations . . . . . . . . . . . . . . . . . . . 23 96 9. Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . 23 97 9.1. MP-driven Alternate Trees . . . . . . . . . . . . . . . . 23 98 9.1.1. PIM details for Alternate-Trees . . . . . . . . . . . 26 99 9.1.2. mLDP details for Alternate-Trees . . . . . . . . . . . 26 100 9.1.3. Traffic Handling by PLR . . . . . . . . . . . . . . . 26 101 9.2. Methods Compared for PIM . . . . . . . . . . . . . . . . . 27 102 9.3. Methods Compared for mLDP . . . . . . . . . . . . . . . . 27 103 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 104 10.1. Normative References . . . . . . . . . . . . . . . . . . . 27 105 10.2. Informative References . . . . . . . . . . . . . . . . . . 28 106 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 29 108 1. Introduction 110 This document describes how the algorithms in 111 [I-D.enyedi-rtgwg-mrt-frr-algorithm], which are used in 112 [I-D.ietf-rtgwg-mrt-frr-architecture] for unicast IP/LDP fast- 113 reroute, can be used to provide protection for multicast traffic. It 114 specifically applies to multicast state signaled by PIM[RFC4601] or 115 mLDP[RFC6388]. There are additional protocols that depend upon these 116 (e.g. VPLS, mVPN, etc.) and consideration of the applicability to 117 such traffic will be in a future version. 119 In this document, global protection is used to refer to the method of 120 having two maximally disjoint multicast trees where traffic may be 121 sent on both and resolved by the receiver. This is similar to the 122 ability with RSVP-TE LSPs to have a primary and a hot standby, except 123 that it can operate in 1+1 mode. This capability is also referred to 124 as multicast live-live and is a generalized form of that discussed in 125 [I-D.ietf-rtgwg-mofrr]. In this document, local protection refers to 126 the method of having alternate ways of reaching the pre-identified 127 merge points upon detection of a local failure. This capability is 128 also referred to as fast-reroute. 130 This document describes the general architecture, framework, and 131 trade-offs of the different approaches to solving these general 132 problems. It will recommend how to generally provide global 133 protection and local protection for mLDP and PIM traffic. Where 134 protocol extensions are necessary, they will be defined in separate 135 documents as follows. 137 o Global 1+1 Protection Using PIM 139 o Global 1+1 Protection Using mLDP 141 o Local Protection Using mLDP: 142 [I-D.wijnands-mpls-mldp-node-protection]This document describes 143 how to provide node-protection and the necessary extensions using 144 targeted LDP session. 146 o Local Protection Using PIM 148 1.1. Maximally Redundant Trees (MRTs) 150 Maximally Redundant Trees (MRTs) are described in 151 [I-D.enyedi-rtgwg-mrt-frr-algorithm]; here we only give a brief 152 description about the concept. A pair of MRTs is a pair of directed 153 spanning trees (red and blue tree) with a common root, directed so 154 that each node can be reached from the root on both trees. Moreover, 155 these trees are redundant, since they are constructed so that no 156 single link or single node failure can separate any node from the 157 root on both trees, unless that failed link or node is splitting the 158 network into completely separated components (e.g. the link or node 159 was a cut-edge or cut-vertex). 161 Although for multicast, the arcs (directed links) are directed away 162 from the root instead of towards the root, the same MRT computations 163 are used and apply. This is similar to how multicast uses unicast 164 routing's next-hops as the upstream-hops. Thus this definition 165 slightly differs from the one presented in 166 [I-D.enyedi-rtgwg-mrt-frr-algorithm], since the arcs are directed 167 away and not towards the root. When we need two paths towards a 168 given destination and not two away from it (e.g. for unicast detours 169 for local repair solutions), we only need to reverse the arcs from 170 how they are used for the unicast routing case; thus constructing 171 MRTs towards or away from the root is the same problem. A pair of 172 MRTs is depicted in Figure 1. 174 [E]---[D]---| |---[J] 175 | | | | | 176 | | | | | 177 [R] [F] [C]---[G] | 178 | | | | | 179 | | | | | 180 [A]---[B]---| |---[H] 182 (a) a network 184 [E]<--[D]---| |-->[J] [E]<--[D] [J] 185 ^ | | | | ^ ^ 186 | V V | | | | 187 [R] [F] [C]-->[G] | [R] [F] [C]-->[G] | 188 | | | ^ ^ | | 189 V V V | | | | 190 [A]<--[B] [H] [A]-->[B]---| |-->[H] 192 (b) Blue MRT of root R (c) Red MRT of root R 194 Figure 1: A network and two MRTs found in it 196 It is important to realize that this redundancy criterion does not 197 imply that, after a failure, either of the MRTs remains intact, since 198 a node failure must affect any spanning tree. Redundancy here means 199 that there will be a set of nodes, which can be reached along the 200 blue MRT, and there will be another set, which remains reachable 201 along the red MRT. As an example, suppose that node F goes down; 202 that would separate B and A on the blue MRT and D and E on the red 203 MRT. Naturally, it is possible that the intersection of these two 204 sets is not empty, e.g. C, G, H and J will remain reachable on both 205 MRTs. Additionally, observe that a single link can be used in both 206 of the trees in different directions, so even a link failure can cut 207 both trees. In this example, the failure of link F<->B leads to the 208 same reachability sets. 210 Finally, it is critical to recall that a pair of MRTs is always 211 constructed together and they are not SPTs. While it would be useful 212 to have an algorithm that could find a redundant pair for a given 213 tree (e.g. for the SPT), that is impossible in general. Moreover, if 214 there is a failure and at least one of the trees change, the other 215 tree may need to change as well. Therefore, even if a node still 216 receives the traffic along the red tree, it cannot keep the old red 217 tree, and construct a blue pair for it; there can be reconfiguration 218 in cases when traditional shortest-path-based-thinking would not 219 expect it. To converge to a new pair of disjoint MRTs, it is 220 generally necessary to update both the blue MRT and the red MRT. 222 The two MRTs provide two separate forwarding topologies that can be 223 used in addition to the default shortest-path-tree (SPT) forwarding 224 topology (usually MT-ID 0). There is a Blue MRT forwarding topology 225 represented by one MT-ID; similarly there is a Red MRT forwarding 226 topology represented by a different MT-ID. Naturally, a multicast 227 protocol is required to use the forwarding topologies information to 228 build the desired multicast trees. The multicast protocol can simply 229 request appropriate upstream interfaces, but include the MT-ID when 230 needed. 232 1.2. MRTs and Multicast 234 Maximally Redundant Trees (MRT) provide two advantages for protecting 235 multicast traffic. First, for global protection, MRTs are precisely 236 what needs to be computed to have maximally redundant multicast 237 distribution trees. Second, for local repair, MRTs ensure that there 238 will protection to the merge points; the certainty of a path from any 239 merge point to the PLR that avoids the failure node allows for the 240 creation of alternate trees. 242 A known disadvantage of MRT, and redundant trees in general, is that 243 the trees do not necessarily provide shortest detour paths. Modeling 244 is underway to investigate and compare the MRT lengths for the 245 different algorithm options [I-D.enyedi-rtgwg-mrt-frr-algorithm]. 247 2. Terminology 248 2-connected: A graph that has no cut-vertices. This is a graph 249 that requires two nodes to be removed before the network is 250 partitioned. 252 2-connected cluster: A maximal set of nodes that are 2-connected. 254 2-edge-connected: A network graph where at least two links must be 255 removed to partition the network. 257 ADAG: Almost Directed Acyclic Graph - a graph that, if all links 258 incoming to the root were removed, would be a DAG. 260 block: Either a 2-connected cluster, a cut-edge, or an isolated 261 vertex. 263 cut-link: A link whose removal partitions the network. A cut-link 264 by definition must be connected between two cut-vertices. If 265 there are multiple parallel links, then they are referred to as 266 cut-links in this document if removing the set of parallel links 267 would partition the network. 269 cut-vertex: A vertex whose removal partitions the network. 271 DAG: Directed Acyclic Graph - a graph where all links are directed 272 and there are no cycles in it. 274 GADAG: Generalized ADAG - a graph that is the combination of the 275 ADAGs of all blocks. 277 Maximally Redundant Trees (MRT): A pair of trees where the path 278 from any node X to the root R along the first tree and the path 279 from the same node X to the root along the second tree share the 280 minimum number of nodes and the minimum number of links. Each 281 such shared node is a cut-vertex. Any shared links are cut-links. 282 Any RT is an MRT but many MRTs are not RTs. 284 Maximally Redundant Multicast Trees (MRMT): A pair of multicast 285 trees built of the sub-set of MRTs that is needed to reach all 286 interested receivers. 288 network graph: A graph that reflects the network topology where all 289 links connect exactly two nodes and broadcast links have been 290 transformed into the standard pseudo-node representation. 292 Redundant Trees (RT): A pair of trees where the path from any node 293 X to the root R along the first tree is node-disjoint with the 294 path from the same node X to the root along the second tree. 295 These can be computed in 2-connected graphs. 297 Merge Point (MP): For local repair, a router at which the alternate 298 traffic rejoins the primary multicast tree. For global 299 protection, a router which receives traffic on multiple trees and 300 must decide which stream to forward on. 302 Point of Local Repair (PLR): The router that detects a local 303 failure and decides whether and when to forward traffic on 304 appropriate alternates. 306 MT-ID: Multi-topology identifier. The default shortest-path-tree 307 topology is MT-ID 0. 309 MultiCast Ingress (MCI): Multicast Ingress, the node where the 310 multicast stream enters the current transport technology (MPLS- 311 mLDP or IP-PIM) domain. This maybe the router attached to the 312 multicast source, the PIM Rendezvous Point (RP) or the mLDP Root 313 node address. 315 Upstream Multicast Hop (UMH): Upstream Multicast Hop, a candidate 316 next-hop that can be used to reach the MCI of the tree. 318 Stream Selection: The process by which a router determines which of 319 the multiple primary multicast streams to accept and forward. The 320 router can decide on a packet-by-packet basis or simply per- 321 stream. This is done for global protection 1+1 and described in 322 [I-D.ietf-rtgwg-mofrr]. 324 MultiCast Egress (MCE): Multicast Egress, a node where the 325 multicast stream exists the current transport technology (MPLS- 326 mLDP or IP-PIM) domain. This is usually a receiving router that 327 may forward the multicast traffic on towards receivers based upon 328 IGMP or other technology. 330 3. Use-Cases and Applicability 332 Protection of multicast streams has gained importance with the use of 333 multicast to distribute video, including live video such as IP-TV. 334 There are a number of different scenarios and uses of multicast that 335 require protection. A few preliminary examples are described below. 337 o When video is distributed via IP or MPLS for a cable application, 338 it is desirable to have global protection 1+1 so that the 339 customer-perceived impact is limited. A QAM can join two 340 multicast groups and determine which stream to use based upon the 341 stream quality. A network implementing this may be custom- 342 engineered for this particular purpose. 344 o In financial markets, stock ticker data is distributed via 345 multicast. The loss of data can have a significant financial 346 impact. Depending on the network, either global protection 1+1 or 347 local protection can minimize the impact. 349 o Several solutions exist for updating software or firmwares of a 350 large number of end-user or operator-owned networking equipment 351 that are based on IP multicast. Since IP multicast is based on 352 datagram transport so taking care of lost data is cumbersome and 353 decreases the advantages offered by multicast. Solutions may rely 354 on sending the updates several times: a properly protected network 355 may result in that less repetitions are required. Other solutions 356 rely on the recipent asking for lost data segments explicitly on- 357 demand. A network failure could cause data loss for a significant 358 number of receivers, which in turn would start requesting the the 359 lost data in a burst that could overload the server. Properly 360 engineered multicast fast reroute would minimise such impacts. 362 o Some providers offer multicast VPN services to their customers. 363 SLAs between the customer and provider may set low packet loss 364 requirements. In such cases interruptions longer than the outage 365 timescales targeted by FRR could cause direct financial losses for 366 the provider. 368 Global protection 1+1 uses maximally redundant multicast trees 369 (MRMTs) to simultaneously distribute a multicast stream on both 370 MRMTs. The disadvantage is the extra state and bandwidth 371 requirements of always sending the traffic twice. The advantage is 372 that the latency of each MRMT can be known and the receiver can 373 select the best stream. 375 Local protection provides a patch around the fault while the 376 multicast tree reconverges. When PLR replication is used, there is 377 no extra multicast state in the network, but the bandwidth 378 requirements vary based upon how many potential merge-points must be 379 provided. When alternate-trees are used, there is extra multicast 380 state but the bandwidth requirements on a link can be minimized to no 381 more than once for the primary multicast tree traffic and once for 382 the alternate-tree traffic. 384 4. Global Protection: Multicast Live-Live 386 In MoFRR [I-D.ietf-rtgwg-mofrr], the idea of joining both a primary 387 and a secondary tree is introduced with the requirement that the 388 primary and secondary trees be link and node disjoint. This works 389 well for networks where there are dual-planes, as explained in 390 [I-D.ietf-rtgwg-mofrr]. For other networks, it is still desirable to 391 have two disjoint multicast trees and allow a receiver to join both 392 and make its own decision about which traffic to accept. 394 Using MRTs gives the ability to guarantee that the two trees are as 395 disjoint as possible and dynamically recomputed whenever the topology 396 changes. The MRTs used are rooted at the MultiCast Ingress (MCI). 397 One multicast tree is created using the Blue MRT forwarding topology. 398 The second multicast tree is created using the Red MRT forwarding 399 topology. This can be accomplished by specifying the appropriate 400 MT-ID associated with each forwarding topology. 402 There are four different aspects of using MRTs for 1+1 Global 403 Protection that are necessary to consider. They are as follows. 405 1. Creation of the maximally redundant multicast trees (MRMTs) based 406 upon the forwarding topologies. 408 2. Traffic Identification: How to handle traffic when the two MRMTs 409 overlap due to a cut-vertex or cut-link. 411 3. Convergence: How to converge after a network change and get back 412 to a protected state. 414 4. Inter-area/inter-level Behavior: How to compute and use MRMTs 415 when the multicast source is outside the area/level and how to 416 provide border-router protection. 418 4.1. Creation of MRMTs 420 The creation of the two maximally redundant multicast trees occurs as 421 described below. This assumes that the next-hops to the MCI 422 associated with the Blue and Red forwarding topologies have already 423 been computed and stored. 425 1. A receiving router determines that it wants to join both the Blue 426 tree and the Red tree. The details on how it does this decision 427 are not covered in this document and could be based on 428 configuration, additional protocols, etc. 430 2. The router selects among the Blue next-hops an Upstream Multicast 431 Hop (UMH) to reach the MCI node. The router joins the tree 432 towards the selected UMH including a multi-topology id (MT-ID) 433 identifying the Blue MRT. 435 3. The router selects among the Red next-hops an Upstream Multicast 436 Hop (UMH) to reach the MCI node. The router joins the tree 437 towards the selected UMH including a multi-topology id (MT-ID) 438 identifying the Red MRT. 440 4. When a router receives a tree setup request specifying a 441 particular MT-ID (e.g. Color), then the router selects among the 442 Color next-hops to the MCI a UMH node, creates the necessary 443 multicast state, and joins the tree towards the UMH node. 445 4.2. Traffic Self-Identification 447 Two maximally redundant trees will share any cut-vertices and cut- 448 links in the network. In the multicast global protection 1+1 case, 449 this means that the potential single failures of the other nodes and 450 links in the network are still protected against. If each cut-vertex 451 cannot associate traffic to a particular MRMT, then the traffic would 452 be incorrectly replicated to both MRMT resulting in complete 453 duplication of traffic. An example of such MRTs is given earlier in 454 Figure 1 and repeated below in Figure 2, where there are two cut- 455 vertices C and G and a cut-link C<->G. 457 [E]---[D]---| |---[J] 458 | | | | | 459 | | | | | 460 [R] [F] [C]---[G] | 461 | | | | | 462 | | | | | 463 [A]---[B]---| |---[H] 465 (a) a network 467 [E]<--[D]---| |-->[J] [E]<--[D] [J] 468 ^ | | | | ^ ^ 469 | V V | | | | 470 [R] [F] [C]-->[G] | [R] [F] [C]-->[G] | 471 | | | ^ ^ | | 472 V V V | | | | 473 [A]<--[B] [H] [A]-->[B]---| |-->[H] 475 (b) Blue MRT of root R (c) Red MRT of root R 477 Figure 2: A network and two MRTs found in it 479 In this example, traffic from the multicast source R to a receiver G, 480 J, or H will cross link C<->G on both the Blue and Red MRMTs. When 481 this occurs, there are several different possibilities depending upon 482 protocol. 484 mLDP: Different label bindings will be created for the Blue and Red 485 MRMTs. As specified in [I-D.iwijnand-mpls-mldp-multi-topology], 486 the P2MP FEC Element will use the MT IP Address Family to encode 487 the Root node address and MRT T-ID. Each MRMT will therefore have 488 a different P2MP FEC Element and be assigned an independent label. 490 PIM: There are three different ways to handle IP traffic forwarded 491 based upon PIM when that traffic will overlap on a link. 493 A. Different Groups: If different multicast groups are used for 494 each MRMT, then the traffic clearly indicates which MRMT it 495 belongs to. In this case, traffic on the Blue MRMT would use 496 multicast group G-blue and traffic on the Red MRMT would use 497 multicast group G-red. 499 B. Different Source Loopbacks: Another option is to use different 500 IP addresses for the source S, so S might announce S-red and 501 S-blue. In this case, traffic on the Blue MRMT would have an 502 IP source of S-blue and traffic on the Red MRMT would have an 503 IP source of S-red. 505 C. Stream Selection and Merging: The third option, described in 506 Section 4.2.1, is to have a router that gets (S,G) Joins for 507 both the Blue MT-ID and the Red MT-ID merge those into a 508 single tree. The router may need to select which upstream 509 stream to use, just as if it were a receiving router. 511 There are three options presented for PIM. The most appropriate will 512 depend upon deployment scenario as well as router capabilities. 514 4.2.1. Merging MRMTs for PIM if Traffic Doesn't Self-Identify 516 When traffic doesn't self-identify, the cut-vertices must follow 517 specific rules to avoid traffic duplication. This section describes 518 that behavior which allows the same (S,G) to be used for both the 519 Blue MT-ID and Red MT-ID (e.g. when the traffic doesn't self-identify 520 as to its MT-ID). 522 The behavior described in this section differs from the conflict 523 resolution described in [RFC6420] because these rules apply to the 524 Global Protection 1+1 case. Specifically, it is not sufficient for a 525 upstream router to pick only one of the two MT-IDs to join because 526 that does not maximize the protection provided. 528 As described in [RFC6420], a router that receives (S,G) Joins for 529 both the Blue MT-ID and the Red MT-ID can merge the set of downstream 530 interfaces in its forwarding entry. Unlike the procedures defined in 531 [RFC6420], the router must send a Join upstream for each MT-ID. If a 532 router has different upstream interfaces for these MRMTs, then the 533 router will need to do stream selection and forward the selected 534 stream to its outgoing interfaces, just as if it were an MCE. The 535 stream selection methods of detecting failures and handle traffic 536 discarding are described in [I-D.ietf-rtgwg-mofrr]. 538 This method does not work if the MRMTs merge on a common LAN with 539 different upstream routers. In this case, the traffic cannot be 540 distinguished on the LAN and will result in duplication on the LAN. 541 The normal PIM Assert procedure would stop one of the upstream 542 routers from transmitting duplicates onto the LAN once it is 543 detected. This, in turn, may cause the duplicate stream to be pruned 544 back to the source. Thus, end-to-end protection in this case of the 545 MRMTs converging on a single LAN with different upstream interfaces 546 can only be accomplished by the methods of traffic self- 547 identification. 549 4.3. Convergence Behavior 551 It is necessary to handle topology changes and get back to having two 552 MRMTs that provide global protection. To understand the requirements 553 and what can be computed, recall the following facts. 555 a. It is not generally possible to compute a single tree that is 556 maximally redundant to an existing tree. 558 b. The pair of MRTs must be computed simultaneously. 560 c. After a single link or node failure, there is one set of nodes 561 that can be reached from the root on the Blue MRMT and a second 562 set of nodes that can be reached from the root on the Red MRMT. 563 If the failure wasn't a cut-vertex or cut-edge, all nodes will be 564 in at least one of these two sets. 566 To gracefully converge, it is necessary to never have a router where 567 both its red MRMT and blue MRMT are broken. There are three 568 different ways in which this could be done. These options are being 569 more fully explored to see which is most practical and provides the 570 best set of trade-offs. 572 Ordered Convergence When a single failure occurs, each receiver 573 determines whether it was affected or unaffected. First, the 574 affected receivers identify the broken MRMT color (e.g. blue) and 575 join the MRMT via their new UMH for that MRT color. Once the 576 affected receivers receive confirmation that the new MRMT has been 577 successfully created back to the MCI, then the affected receivers 578 switch to using that MRMT. The affected receivers tear down the 579 old broken MRMT state and join the MRMT via their new UMH for the 580 other MRT color (e.g. red). Finally, once the affected receivers 581 receive confirmation that the new MRMT has been successfully 582 created back to the MCI, the affected receivers can tear down the 583 old working MRMT state. Once the affected receivers have updated 584 their state, the unaffected receivers need to also do the same 585 staging - first joining the MRMT via their new UMH for the Blue 586 MRT, waiting for confirmation, switching to using traffic from the 587 Blue MRMT, tearing down the old Blue MRMT state, joining the MRMT 588 via their new UMH for the Red MRT, waiting for confirmation, and 589 tearing down the old Red MRMT state. There are complexities 590 remaining, such as determining how an Unaffected Receiver decides 591 that the Affected Receivers are done. When the topology change 592 isn't a failure, all receivers are unaffected and the same process 593 can apply. 595 Protocol Make-Before-Break In the control plane, a router joins the 596 tree on the new Blue topology but does not stop receiving traffic 597 on the old Blue topology. Once traffic is observed from the new 598 Blue UMH, then the router accepts traffic on the new Blue UMH and 599 removes the old Blue UMH. This behavior can happen simultaneously 600 with both Blue and Red forwarding topologies. An advantage is 601 that it works regardless of the type of topology change and 602 existing traffic streams aren't broken. Another advantage is that 603 the complexity is limited and this method is well understood. The 604 disadvantage is that the number of traffic-affecting events 605 depends upon the number of hops to the MCI. 607 Multicast Source Make-Before-Break On a topology change, routers 608 would create new MRMTs using new MRT forwarding state and leaving 609 the old MRMTs as they are. After the new MRMTs are complete, the 610 multicast source could switch from sending on the old MRMTs to 611 sending on the new MRMTs. After a time, the old MRMTs could be 612 torn down. There are a number of details to still investigate. 614 4.4. Inter-area/level Behavior 616 A source outside of the IGP area/level can be treated as a proxy 617 node. When the join request reaches a border router (whether ABR for 618 OSPF or LBR for ISIS), that border router needs to determine whether 619 to use the Blue or Red forwarding topology in the next selected area/ 620 level. 622 |-------------------| 623 | | 624 |---[S]---| [BR1]-----[ X ] | 625 | | | | | 626 [ A ]-----[ B ] | | | 627 | | [ Y ]-----[BR2]--(proxy for S) 628 | | 629 [BR1]-----[BR2] (b) Area 10 630 Y's Red next-hop: BR1 631 (a) Area 0 Y's Blue next-hop: BR2 632 Red Next-Hops to S 633 BR1's is BR2 634 BR2's is B 635 B's is S 637 Blue Next-Hops to S 638 BR1's is A 639 BR2's is BR1 640 A's is S 642 Figure 3: Inter-area Selection - next-hops towards S 644 Achieving maximally node-disjoint trees across multiple areas is hard 645 due to the information-hiding and abstraction. If there is only one 646 border router, it is trivial but protection of the border router is 647 not possible. With exactly 2 border routers, inter-area/level node 648 protection is reasonably straightforward but can require that the BR 649 rewrite the (S,G) for PIM. With more than 2 border routers, inter- 650 area node protection is possible at the cost of additional bandwidth 651 and border router complexity. These two solutions are described in 652 the following sub-sections. 654 4.4.1. Inter-area Node Protection with 2 border routers 656 If there are exactly two border routers between the areas, then the 657 solution and necessary computation is straightforward. In that 658 specific case, each BR knows that only the other BR must specifically 659 be avoided in the second area when a forwarding topology is selected. 660 As described in [I-D.enyedi-rtgwg-mrt-frr-algorithm], it is possible 661 for a node X to determine whether the Red or Blue forwarding topology 662 should be used to reach a node D while avoiding another node Y. 664 The results of this computation and the resulting changes in MT-ID 665 from Red to Blue or Blue to Red are illustrated in Figure 3. It 666 shows an example where BR1 must modify joins received from Area 10 667 for the Red MT-ID to use the Blue MT-ID in Area 0. Similarly, BR2 668 must modify joins received from Area 10 for the Blue MT-ID to use the 669 Red MT-ID in Area 0. 671 For mLDP, modifying the MT-ID in the control-plane is all that is 672 needed. For PIM, if the same (S,G) is used for both the Blue MT-ID 673 and the Red MT-ID, then only control-plane changes are needed. 674 However, for PIM, if different group IDs (e.g. G-red and G-blue) or 675 different source loopback addresses (S-red and S-blue) are used, it 676 is necessary to modify the traffic to reflect the MT-ID included in 677 the join message received on that interface. An alternative could be 678 to use an MPLS label that indicates the MT-ID instead of different 679 group IDs or source loopback addresses. 681 To summarize the necessary logic, when a BR1 receives a join from a 682 neighbor in area N to a destination D in area M on the Color MT-ID, 683 the BR1: 685 a. Identifies the BR2 at the other end of the proxy node in area N. 687 b. Determines which forwarding topology may avoid BR2 to reach D in 688 area M. Refer to that as Color-2 MT-ID. 690 c. Uses Color-2 MT-ID to determine the next-hops to S. When a join 691 is sent upstream, the MT-ID used is that for Color-2. 693 4.4.2. Inter-area Node Protection with > 2 Border Routers 695 If there are more than two BRs between areas, then the problem of 696 ensuring inter-area node-disjointness is not solved. Instead, once a 697 request to join the multicast tree has been received by a BR from an 698 area that isn't closest to the multicast source, the BR must join 699 both the Red MT-ID and the Blue MT-ID in the area closest to the 700 multicast source. Regardless of what single link or node failure 701 happens, each BR will receive the multicast stream. Then, the BR can 702 use the stream-selection techniques specified in 703 [I-D.ietf-rtgwg-mofrr] to pick either the Blue or Red stream and 704 forward it to downstream routers in the other area. Each of the BRs 705 for the other area should be attached to a proxy-node representing 706 the other area. 708 This approach ensures that a BR will receive the multicast stream in 709 the closest area as long as the single link or node failure isn't a 710 single point of failure. Thus, each area or level is independently 711 protected. The BR is required to be able to select among the 712 multicast streams and, if necessary for PIM, translate the traffic to 713 contain the correct (S,G) for forwarding. 715 4.5. PIM 717 Capabilities need to be exchanged to determine that a neighbor 718 supports using MRT forwarding topologies with PIM. Additional 719 signaling extensions are not necessary to PIM to support Global 720 Protection. [RFC6420] already defines how to specify an MT-ID as a 721 Join Attribute. 723 4.5.1. Traffic Handling: RPF Checks 725 For PIM, RPF checks would still be enabled by the control plane. The 726 control plane can program different forwarding entries on the G-blue 727 incoming interface and on the G-red incoming interface. The other 728 interfaces would still discard both G-blue and G-red traffic. 730 The receiver would still need to detect failures and handle traffic 731 discarding as is specified in [I-D.ietf-rtgwg-mofrr]. 733 4.6. mLDP 735 Capabilities need to be exchanged to determine that a neighbor 736 supports using MRT forwarding topologies with mLDP. The basic 737 mechansims for mLDP to support multi-topology are already described 738 in [I-D.iwijnand-mpls-mldp-multi-topology]. It may be desirable to 739 extend the capability defined in this draft to indicate that MRT is 740 or is not supported. 742 5. Local Repair: Fast-Reroute 744 Local repair for multicast traffic is different from unicast in 745 several important ways. 747 o There is more than a single final destination. The full set of 748 receiving routers may not be known by the PLR and may be extremely 749 large. Therefore, it makes sense to repair to the immediate next- 750 hops for link-repair and the next-next-hops for node-repair. 751 These are the potential merge points (MPs). 753 o If a failure cannot be positively identified as a node-failure, 754 then it is important to repair to the immediate next-hops since 755 they may have receivers attached. 757 o If a failure cannot be positively identified as a link-failure and 758 node protection is desired, then it is important to repair to the 759 next-next-hops since they may not receive traffic from the 760 immediate next-hops. 762 o Updating multicast forwarding state may take significantly longer 763 than updating unicast state, since the multicast state is updated 764 tree by tree based on control-plane signaling. 766 o For tunnel-based IP/LDP approaches, neither the PLR nor the MP may 767 be able to specify which interface the alternate traffic will 768 arrive at the MP on. The simplest reason is the unicast 769 forwarding includes the use of ECMP and the path selection is 770 based upon internal router behavior for all paths between the PLR 771 and the MP. 773 For multicast fast-reroute, there are three different mechanisms that 774 can be used. As long as the necessary signaling is available, these 775 methods can be combined in the same network and even for the same PLR 776 and failure point. 778 PLR-driven Unicast Tunnels: The PLR learns the set of MPs that need 779 protection. On a failure, the PLR replicates the traffic and 780 tunnels it to each MP using the unicast route. If desired, an 781 RSVP-TE tunnel could be used instead of relying upon unicast 782 routing. 784 MP-driven Unicast Tunnels: Each MP learns the identity of the PLR. 785 Before failure, each MP independently signals to the PLR the 786 desire for protection and other information to use. On a failure, 787 the PLR replicates the traffic and tunnels it to each MP using the 788 unicast route. If desired, an RSVP-TE tunnel could be used 789 instead of relying upon unicast routing. 791 MP-driven Alternate Trees: Each MP learns the identity of the PLR 792 and the failure point (node and interface) to be protected 793 against. Each MP selects an upstream interface and forwarding 794 topology where the path will avoid the failure point; each MP 795 signals a join towards that upstream interface to create that 796 state. 798 Each of these options is described in more detail in their respective 799 sections. Then the methods are compared and contrasted for PIM and 800 for mLDP. 802 5.1. PLR-driven Unicast Tunnels 804 With PLR-driven unicast tunnels, the PLR learns the set of merge 805 points (MPs) and, on a locally detected failure, uses the existing 806 unicast routing to tunnel the multicast traffic to those merge 807 points. The failure being protected against may be link or node 808 failure. If unicast forwarding can provide an SRLG-protecting 809 alternate, then SRLG-protection is also possible. 811 There are five aspects to making this work. 813 1. PLR needs to learn the MPs and their associated MPLS labels to 814 create protection state. 816 2. Unicast routing has to offer alternates or have dedicated tunnels 817 to reach the MPs. The PLR encapsulates the multicast traffic and 818 directs it to be forwarded via unicast routing. 820 3. The MP must identify alternate traffic and decide when to accept 821 and forward it or drop it. 823 4. When the MP reconverges, it must move to its new UMH using make- 824 before-break so that traffic loss is minimized. 826 5. The PLR must know when to stop sending traffic on the alternates. 828 5.1.1. Learning the MPs 830 If link-protection is all that is desired, then the PLR already knows 831 the identities of the MPs. For node-protection, this is not 832 sufficient. In the PLR-driven case, there is no direct communication 833 possible between the PLR and the next-next-hops on the multicast 834 tree. (For mLDP, when targeted LDP sessions are used, this is 835 considered to be MP-driven and is covered in Section 5.2.) 837 In addition to learning the identities of the MPs, the PLR must also 838 learn the MPLS label, if any, associated with each MP. For mLDP, a 839 different label should be supplied for the alternate traffic; this 840 allows the MP to distinguish between the primary and alternate 841 traffic. For PIM, an MPLS label is used to identify that traffic is 842 the alternate. The unicast tunnel used to send traffic to the MP may 843 have penultimate-hop-popping done; thus without an explicit MPLS 844 label, there is no certainty that a packet could be conclusively 845 identified as primary traffic or as alternate traffic. 847 A router must tell its UMH the identity of all downstream multicast 848 routers, and their associated alternate labels, on the particular 849 multicast tree. This clearly requires protocol extensions. The 850 extensions for PIM are given in [I-D.kebler-pim-mrt-protection]. 852 5.1.2. Using Unicast Tunnels and Indirection 854 The PLR must encapsulate the multicast traffic and tunnel it towards 855 each MP. The key point is how that traffic then reaches the MP. 856 There are basically two possibilities. It is possible that a 857 dedicated RSVP-TE tunnel exists and can be used to reach the MP for 858 just this traffic; such an RSVP-TE tunnel would be explicitly routed 859 to avoid the failure point. The second possibility is that the 860 packet is tunneled via LDP and uses unicast routing. The second case 861 is explored here. 863 It is necessary to assume that unicast LDP fast-reroute 864 [I-D.ietf-rtgwg-mrt-frr-architecture][RFC5714][RFC5286] is supported 865 by the PLR. Since multicast convergence takes longer than unicast 866 convergence, the PLR may have two different routes to the MP over 867 time. When the failure happens, the PLR will have an alternate, 868 whether LFA or MRT, to reach the MP. Then the unicast routing 869 converges and the PLR will have a new primary route to the MP. Once 870 the routing has converged, it is important that alternate traffic is 871 no longer carried on the MRT forwarding topologies. This rule allows 872 the MRT forwarding topologies to reconverge and be available for the 873 next failure. Therefore, it is also necessary for the tunneled 874 multicast traffic to move from the alternate route to the new primary 875 route when the PLR reconverges. Therefore, the tunneled multicast 876 traffic should use indirection to obtain the unicast routing's 877 current next-hops to the MP. If physical indirection is not 878 feasible, then when the unicast LIB is updated, the associated 879 multicast alternate tunnel state should be as well. 881 When the PLR detects a local failure, the PLR replicates each 882 multicast packet, swaps or adds the alternate MPLS label needed by 883 the MP, and finally pushes the appropriate label for the MP based 884 upon the outgoing interface selected by the unicast routing. 886 For PIM, if no alternate labels are supplied by the MPs, then the 887 multicast traffic could be tunneled in IP. This would require 888 unicast IP fast-reroute. 890 5.1.3. MP Alternate Traffic Handling 892 A potential Merge Point must determine when and if to accept 893 alternate traffic. There are two critical components to this 894 decision. First, the MP must know the state of all links to its UMH. 895 This allows the MP to determine whether the multicast stream could be 896 received from the UMH. Second, the MP must be able to distinguish 897 between a normal multicast tree packet and an alternate packet. 899 The logic is similar for PIM and mLDP, but in PIM there is only one 900 RPF-interface or interface of interest to the UMH. In mLDP, all the 901 directly connected interfaces to the UMH are of interest. When the 902 MP detects a local failure, if that interface was the last connected 903 to the UMH and used for the multicast group, then the MP must rapidly 904 switch from accepting the normal multicast tree traffic to accepting 905 the alternate traffic. This rapid change must happen within the same 906 approximately 50 milliseconds that the PLR switching to send traffic 907 on the alternate takes and for the same reasons. It does no good for 908 the PLR to send alternate traffic if the MP doesn't accept it when it 909 is needed. 911 The MP can identify alternate traffic based upon the MPLS label. 912 This will be the alternate label that the MP supplied to its UMH for 913 this purpose. 915 5.1.4. Merge Point Reconvergence 917 After a failure, the MP will want to join the multicast tree 918 according to the new topology. It is critical that the MP does this 919 in a way that minimizes the traffic disruption. Whenever paths 920 change, there is also the possibility for a traffic-affecting event 921 due to different latencies. However, traffic impact above that 922 should be avoided. 924 The MP must do make-before-break. Until the MP knows that its new 925 UMH is fully connected to the MCI, the MP should continue to accept 926 its old alternate traffic. The MP could learn that the new UMH is 927 sufficient either via control-plane mechanisms or data-driven. In 928 the latter case, the reception of traffic from the new UMH can 929 trigger the change-over. If the data-driven approach is used, a 930 time-out to force the switch should apply to handle multicast trees 931 that have long quiet periods. 933 5.1.5. PLR termination of alternate traffic 935 The PLR sends traffic on the alternates for a configurable time-out. 936 There is no clean way for the next-hop routers and/or next-next-hop 937 routers to indicate that the traffic is no longer needed. 939 If better control were desired, each MP could tell its UMH what the 940 desired time-out is. The UMH could forward this to the PLR as well. 941 Then the PLR could send alternate traffic to different MPs based upon 942 the MP's individual timer. This would only be an advantage if some 943 of the MPs were expected to have a longer multicast reconvergence 944 time than others - either due to load or router capabilities. 946 5.2. MP-driven Unicast Tunnels 948 MP-driven unicast tunnels are only relevant for mLDP where targeted 949 LDP sessions are feasible. For PIM, there is no mechanism to 950 communicate beyond a router's immediate neighbors; these techniques 951 could work for link-protection, but even then there would not be a 952 way of requesting that the PLR should stop sending traffic. 954 There are three differences for MP-driven unicast tunnels from PLR- 955 driven unicast tunnels. 957 1. The MPs learn the identity of the PLR from their UMH. The PLR 958 does not learn the identities of the MPs. 960 2. The MPs create direct connections to the PLR and communicate 961 their alternate labels. 963 3. When the MPs have converged, each explicitly tells the PLR to 964 stop sending alternate traffic. 966 The first means that a router communicates its UMH to all its 967 downstream multicast hops. Then each MP communicates to the PLR(s) 968 (1 for link-protection and 1 for node-protection) and indicates the 969 multicast tree that protection is desired for and the associated 970 alternate label. 972 When the PLR learns about a new MP, it adds that MP and associated 973 information to the set of MPs to be protected. On a failure, the PLR 974 does the same behavior as for the PLR-driven unicast tunnels. 976 After the failure, the MP reconverges using make-before-break. Then 977 the MP explicitly communicates to the PLR(s) that alternate traffic 978 is no longer needed for that multicast tree. When the node- 979 protecting PLR hasn't changed for a MP, it may be necessary to 980 withdraw the old alternate label, which tells the PLR to stop 981 transmitting alternate traffic, and then provide a new alternate 982 label. 984 5.3. MP-driven Alternate Trees 986 In this document we have defined different solutions to achieve fast 987 convergence for multicast link and node protection based on MRTs. At 988 a high level these solutions can be separated in Local and Global 989 protections. Alternate Trees, which is a Local protection schema, 990 initially looked like an attractive solution for Multicast node 991 protection since it avoids replicating the packet by the PLR to each 992 of the receivers of the protected node and waisting bandwidth. 993 However, this comes at the expense of extra multicast state and 994 complexity. In order to mitigate the extra multicast state its 995 possible to aggregate the Alternate Trees by creating an Alternate 996 Tree per protected node and reuse it for all the multicast trees 997 going through this node. This further complicates the procedures and 998 upstream assigned labels are required to de-aggregate the trees. 999 With aggregation we are also introducing an unwanted side effect. 1000 The receiver population of the aggregated trees will very likely not 1001 be the same. That means multicast packets will be forwarded on the 1002 Alternate Tree to node(s) that may not have a receiver(s) for the 1003 protected tree. The more protected trees are aggregated, the higher 1004 the risk of forwarding unwanted multicast packets, this leads again 1005 to waisting bandwidth. 1007 Considering the complexity of this solution and the unwanted side- 1008 effects, the authors of this document believe its better to solve 1009 Multicast node protection using a Global protection schema, as 1010 documented in Section 4. The solution previously defined in this 1011 section has been move to Appendix A (Section 9). 1013 6. Acknowledgements 1015 The authors would like to thank Kishore Tiruveedhula, Santosh Esale, 1016 and Maciek Konstantynowicz for their suggestions and review. 1018 7. IANA Considerations 1020 This doument includes no request to IANA. 1022 8. Security Considerations 1024 This architecture is not currently believed to introduce new security 1025 concerns. 1027 9. Appendix A 1029 9.1. MP-driven Alternate Trees 1031 For some networks, it is highly desirable not to have the PLR perform 1032 replication to each MP. PLR replication can cause substantial 1033 congestion on links used by alternates to different MPs. At the same 1034 time, it is also desirable to have minimal extra state created in the 1035 network. This can be resolved by creating alternate-trees that can 1036 protect multiple multicast groups as a bypass-alternate-tree. An 1037 alternate-tree can also be created per multicast group, PLR and 1038 failure point. 1040 It is not possible to merge alternate-trees for different PLRs or for 1041 different neighbors. This is shown in Figure 4 where G can't select 1042 an acceptable upstream node on the alternate tree that doesn't 1043 violate either the need to avoid C (for PLR A) or D (for PLR B). 1045 |-------[S]--------| Alternate from A must avoid C 1046 V V Alternate from B ust avoid D 1047 [A]------[E]-------[B] 1048 | | | 1049 V | V 1050 |--[C]------[F]-------[D]---| 1051 | | | | 1052 | |-------[G]--------| | 1053 | | | 1054 | | | 1055 |->[R1]-----[H]-------[R2]<-| 1057 (a) Multicast tree from S 1058 S->A->C->R1 and S->B->D->R2 1060 Figure 4: Alternate Trees from PLR A and B can't be merged 1062 A MP that joins an alternate-tree for a particular multicast stream 1063 should not expect or request PLR-replicated tunneled alternate 1064 traffic for that same multicast stream. 1066 Each alternate-tree is identified by the PLR which sources the 1067 traffic and the failure point (node and link) (FP) to be avoided. 1068 Different multicast groups with the same PLR and FP may have 1069 different sets of MPs - but they are all at most going to include the 1070 FP (for link protection) and the neighbors of FP except for the PLR. 1071 For a bypass-alternate-tree to work, it must be acceptable to 1072 temporarily send a multicast group's traffic to FP's neighbors that 1073 do not need it. This is the trade-off required to reduce alternate- 1074 tree state and use bypass-alternate-trees. As discussed in 1075 Section 5.1.3, a potential MP can determine whether to accept 1076 alternate traffic based upon the state of its normal upstream links. 1077 Alternate traffic for a group the MP hasn't joined can just be 1078 discarded. 1080 [S]......[PLR]--[ A ] 1081 | | | 1082 1| |2 | 1083 [ FP]--[MP3] 1084 | \ | 1085 | \ | 1086 [MP1]--[MP2] 1088 Figure 5: Alternate Tree Scenario 1090 For any router, knowing the PLR and the FP to avoid will force 1091 selection of either the Blue MRT or the Red MRT. It is possible that 1092 the FP doesn't actually appear in either MRT path, but the FP will 1093 always be in either the set of nodes that might be used for the Blue 1094 MRT path or the set of nodes that might be used for the Red MRT path. 1095 The FP's membership in one of the sets is a function of the partial 1096 ordering and topological ordering created by the MRT algorithm and is 1097 consistent between routers in the network graph. 1099 To create an alternate-tree, the following must happen: 1101 1. For node-protection, the MP learns from its upstream (the FP) the 1102 node-id of its upstream (the PLR) and, optionally, a link 1103 identifier for the link used to the PLR. The link-id is only 1104 needed for traffic handling in PIM, since mLDP can have targeted 1105 sessions between the MP and the PLR. 1107 2. For link-protection, the MP needs to know the node-id of its 1108 upstream (the PLR) and, optionally, its identifier for the link 1109 used to the PLR. 1111 3. The MP determines whether to use the Blue or Red forwarding 1112 topology to reach the PLR while avoiding the FP and associated 1113 interface. This gives the MP its alternate-tree upstream 1114 interface. 1116 4. The MP signals a backup-join to its alternate-tree upstream 1117 interface. The backup-join specifies the PLR, FP and, for PIM, 1118 the FP-PLR link identifier. If the alternate-tree is not to be 1119 used as a bypass-alternate-tree, then the multicast group (e.g. 1120 (S,G) or Opaque-Value) must be specified. 1122 5. A router that receives a backup-join and is not the PLR needs to 1123 create multicast state and send a backup-join towards the PLR on 1124 the appropriate Blue or Red forwarding topology as is locally 1125 determined to avoid the FP and FP-PLR link. 1127 6. Backup-joins for the same (PLR, FP, PLR-FP link-id) that 1128 reference the same multicast group can be merged into a single 1129 alternate-tree. Similarly, backup-joins for the same (PLR, FP, 1130 PLR-FP link-id) that reference no multicast group can be merged 1131 into a single alternate-tree. 1133 7. When the PLR receives the backup-join, it associates either the 1134 specified multicast group with that alternate-tree, if such is 1135 given, or associates all multicast groups that go to the FP via 1136 the specified FP-PLR link with the alternate-tree. 1138 For an example, look at Figure 5. FP would send a backup-join to MP3 1139 indicating (PLR, FP, PLR-FP link-1). MP3 sends a backup-join to A. 1140 MP1 sends a backup-join to MP2 and MP2 sends a backup-join to MP3. 1142 It is necessary that traffic on each alternate-tree self-identify as 1143 to which alternate-tree it is part of. This is because an alternate- 1144 tree for a multicast-group and a particular (PLR, FP, PLR-FP link-id) 1145 can easily overlap with an alternate-tree for the same multicast 1146 group and a different (PLR, FP, PLR-FP link-id). The best way of 1147 doing this depends upon whether PIM or mLDP is being used. 1149 9.1.1. PIM details for Alternate-Trees 1151 For PIM, the (S,G) of the IP packet is a globally unique identifier 1152 and is understood. To identify the alternate-tree, the most 1153 straightforward way is to use MPLS labels distributed in the PIM 1154 backup-join messages. A MP can use the incoming label to indicate 1155 the set of RPF-interfaces for which the traffic may be an alternate. 1156 If the alternate-tree isn't a bypass-alternate-tree, then only one 1157 RPF interface is referenced. If the alternate-tree is a bypass- 1158 alternate-tree, then multiple RPF-interfaces (parallel links to FP) 1159 might be intended. Alternate-tree traffic may cross an interface 1160 multiple times - either because the interface is a broadcast 1161 interface and different downstream-assigned labels are provided 1162 and/or because a MP may provide different labels. 1164 9.1.2. mLDP details for Alternate-Trees 1166 For mLDP, if bypass-alternate-trees are used, then the PLR must 1167 provide upstream-assigned labels to each multicast stream. The MP 1168 provides the label for the alternate-tree; if the alternate-tree is 1169 not a bypass-alternate-tree, this label also describes the multicast 1170 stream. If the alternate-tree is a bypass-alternate-tree, then it 1171 provides the context for the PLR-assigned labels for each multicast 1172 stream. If there are targeted LDP sessions between the PLR and the 1173 MPs, then the PLR could provide the necessary upstream-assigned 1174 labels. 1176 9.1.3. Traffic Handling by PLR 1178 An issue with traffic is how long should the PLR continue to send 1179 alternate traffic out. With an alternate-tree, the PLR can know to 1180 stop forwarding alternate traffic on the alternate-tree when that 1181 alternate-tree's state is torn down. This provides a clear signal 1182 that alternate traffic is no longer needed. 1184 9.2. Methods Compared for PIM 1186 The two approaches that are feasible for PIM are PLR-driven Unicast 1187 Tunnels and MP-driven Alternate-Trees. 1189 +-------------------------+-------------------+---------------------+ 1190 | Aspect | PLR-driven | MP-driven | 1191 | | Unicast Tunnels | Alternate-Trees | 1192 +-------------------------+-------------------+---------------------+ 1193 | Worst-case Traffic | 1 + number of MPs | 2 | 1194 | Replication Per Link | | | 1195 | PLR alternate-traffic | timer-based | control-plane | 1196 | | | terminated | 1197 | Extra multicast state | none | per (PLR,FP,S) for | 1198 | | | bypass mode | 1199 +-------------------------+-------------------+---------------------+ 1201 Which approach is prefered may be network-dependent. It should also 1202 be possible to use both in the same network. 1204 9.3. Methods Compared for mLDP 1206 All three approaches are feasible for mLDP. Below is a brief 1207 comparison of various aspects of each. 1209 +-------------------+---------------+-------------+-----------------+ 1210 | Aspect | MP-driven | PLR-driven | MP-driven | 1211 | | Unicast | Unicast | Alternate-Trees | 1212 | | Tunnels | Tunnels | | 1213 +-------------------+---------------+-------------+-----------------+ 1214 | Worst-case | 1 + number of | 1 + number | 2 | 1215 | Traffic | MPs | of MPs | | 1216 | Replication Per | | | | 1217 | Link | | | | 1218 | PLR | control-plane | timer-based | control-plane | 1219 | alternate-traffic | terminated | | terminated | 1220 | Extra multicast | none | none | per (PLR,FP,S) | 1221 | state | | | for bypass mode | 1222 +-------------------+---------------+-------------+-----------------+ 1224 10. References 1226 10.1. Normative References 1228 [I-D.enyedi-rtgwg-mrt-frr-algorithm] 1229 Atlas, A., Envedi, G., Csaszar, A., and A. Gopalan, 1230 "Algorithms for computing Maximally Redundant Trees for 1231 IP/LDP Fast- Reroute", 1232 draft-enyedi-rtgwg-mrt-frr-algorithm-03 (work in 1233 progress), July 2013. 1235 [I-D.ietf-rtgwg-mrt-frr-architecture] 1236 Atlas, A., Kebler, R., Envedi, G., Csaszar, A., Tantsura, 1237 J., Konstantynowicz, M., White, R., and M. Shand, "An 1238 Architecture for IP/LDP Fast-Reroute Using Maximally 1239 Redundant Trees", draft-ietf-rtgwg-mrt-frr-architecture-03 1240 (work in progress), July 2013. 1242 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 1243 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 1244 Protocol Specification (Revised)", RFC 4601, August 2006. 1246 [RFC6388] Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas, 1247 "Label Distribution Protocol Extensions for Point-to- 1248 Multipoint and Multipoint-to-Multipoint Label Switched 1249 Paths", RFC 6388, November 2011. 1251 [RFC6420] Cai, Y. and H. Ou, "PIM Multi-Topology ID (MT-ID) Join 1252 Attribute", RFC 6420, November 2011. 1254 10.2. Informative References 1256 [I-D.ietf-rtgwg-mofrr] 1257 Karan, A., Filsfils, C., Farinacci, D., Wijnands, I., 1258 Decraene, B., Joorde, U., and W. Henderickx, "Multicast 1259 only Fast Re-Route", draft-ietf-rtgwg-mofrr-02 (work in 1260 progress), June 2013. 1262 [I-D.iwijnand-mpls-mldp-multi-topology] 1263 Wijnands, I. and K. Raza, "mLDP Extensions for Multi 1264 Topology Routing", 1265 draft-iwijnand-mpls-mldp-multi-topology-03 (work in 1266 progress), June 2013. 1268 [I-D.kebler-pim-mrt-protection] 1269 Kebler, R., Atlas, A., Wijnands, IJ., and G. Enyedi, "PIM 1270 Extensions for Protection Using Maximally Redundant 1271 Trees", draft-kebler-pim-mrt-protection-00 (work in 1272 progress), March 2012. 1274 [I-D.wijnands-mpls-mldp-node-protection] 1275 Wijnands, I., Rosen, E., Raza, K., Tantsura, J., Atlas, 1276 A., and Q. Zhao, "mLDP Node Protection", 1277 draft-wijnands-mpls-mldp-node-protection-04 (work in 1278 progress), June 2013. 1280 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 1281 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 1283 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", 1284 RFC 5714, January 2010. 1286 Authors' Addresses 1288 Alia Atlas (editor) 1289 Juniper Networks 1290 10 Technology Park Drive 1291 Westford, MA 01886 1292 USA 1294 Email: akatlas@juniper.net 1296 Robert Kebler 1297 Juniper Networks 1298 10 Technology Park Drive 1299 Westford, MA 01886 1300 USA 1302 Email: rkebler@juniper.net 1304 IJsbrand Wijnands 1305 Cisco Systems, Inc. 1307 Email: ice@cisco.com 1309 Andras Csaszar 1310 Ericsson 1311 Konyves Kalman krt 11 1312 Budapest 1097 1313 Hungary 1315 Email: Andras.Csaszar@ericsson.com 1316 Gabor Sandor Enyedi 1317 Ericsson 1318 Konyves Kalman krt 11. 1319 Budapest 1097 1320 Hungary 1322 Email: Gabor.Sandor.Enyedi@ericsson.com