idnits 2.17.1 draft-ietf-trill-resilient-trees-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Updates: ' line in the draft header should list only the _numbers_ of the RFCs which will be updated by this document (if approved); it should not include the word 'RFC' in the list. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC6325, updated by this document, for RFC5378 checks: 2006-05-11) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2, 2015) is 3214 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 172, but not defined ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) ** Obsolete normative reference: RFC 7180 (Obsoleted by RFC 7780) Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Mingui Zhang 3 Intended Status: Proposed Standard Huawei 4 Updates: RFC 6325 Tissa Senevirathne 5 Cisco 6 Janardhanan Pathangi 7 DELL 8 Ayan Banerjee 9 Cisco 10 Anoop Ghanwani 11 DELL 12 Expires: January 3, 2016 July 2, 2015 14 TRILL Resilient Distribution Trees 15 draft-ietf-trill-resilient-trees-03.txt 17 Abstract 19 TRILL protocol provides multicast data forwarding based on IS-IS link 20 state routing. Distribution trees are computed based on the link 21 state information through Shortest Path First calculation. When a 22 link on the distribution tree fails, a campus-wide recovergence of 23 this distribution tree will take place, which can be time consuming 24 and may cause considerable disruption to the ongoing multicast 25 service. 27 This document specifies how to build backup distribution trees to 28 protect links on the primary distribution tree. Since the backup 29 distribution tree is built up ahead of the link failure, when a link 30 on the primary distribution tree fails, the pre-installed backup 31 forwarding table will be utilized to deliver multicast packets 32 without waiting for the campus-wide recovergence. This minimizes the 33 service disruption. This document updates RFC 6325. 35 Status of this Memo 37 This Internet-Draft is submitted to IETF in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF), its areas, and its working groups. Note that other 42 groups may also distribute working documents as Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/1id-abstracts.html 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html 55 Copyright and License Notice 57 Copyright (c) 2015 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 1.1. Conventions used in this document . . . . . . . . . . . . . 5 74 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 75 2. Usage of Affinity Sub-TLV . . . . . . . . . . . . . . . . . . . 5 76 2.1. Allocating Affinity Links . . . . . . . . . . . . . . . . . 5 77 2.2. Distribution Tree Calculation with Affinity Links . . . . . 6 78 3. Resilient Distribution Trees Calculation . . . . . . . . . . . 7 79 3.1. Designating Roots for Backup Trees . . . . . . . . . . . . 8 80 3.1.1. Conjugate Trees . . . . . . . . . . . . . . . . . . . . 8 81 3.1.2. Explicitly Advertising Tree Roots . . . . . . . . . . . 8 82 3.2. Backup DT Calculation . . . . . . . . . . . . . . . . . . . 8 83 3.2.1. Backup DT Calculation with Affinity Links . . . . . . . 8 84 3.2.1.1. Algorithm for Choosing Affinity Links . . . . . . . 9 85 3.2.1.2. Affinity Links Advertisement . . . . . . . . . . . 10 86 3.2.2. Backup DT Calculation without Affinity Links . . . . . 10 87 4. Resilient Distribution Trees Installation . . . . . . . . . . . 10 88 4.1. Pruning the Backup Distribution Tree . . . . . . . . . . . 11 89 4.2. RPF Filters Preparation . . . . . . . . . . . . . . . . . . 12 90 5. Protection Mechanisms with Resilient Distribution Trees . . . . 12 91 5.1. Global 1:1 Protection . . . . . . . . . . . . . . . . . . . 13 92 5.2. Global 1+1 Protection . . . . . . . . . . . . . . . . . . . 13 93 5.2.1. Failure Detection . . . . . . . . . . . . . . . . . . . 14 94 5.2.2. Traffic Forking and Merging . . . . . . . . . . . . . . 14 95 5.3. Local Protection . . . . . . . . . . . . . . . . . . . . . 14 96 5.3.1. Start Using the Backup Distribution Tree . . . . . . . 15 97 5.3.2. Duplication Suppression . . . . . . . . . . . . . . . . 15 98 5.3.3. An Example to Walk Through . . . . . . . . . . . . . . 15 99 5.4. Switching Back to the Primary Distribution Tree . . . . . . 16 100 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 16 101 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 17 102 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . 17 103 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 104 8.1. Normative References . . . . . . . . . . . . . . . . . . . 17 105 8.2. Informative References . . . . . . . . . . . . . . . . . . 18 106 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 108 1. Introduction 110 Lots of multicast traffic is generated by interrupt latency sensitive 111 applications, e.g., video distribution including IP-TV, video 112 conference and so on. Normally, a network fault will be recovered 113 through a network wide reconvergence of the forwarding states, but 114 this process is too slow to meet the tight Service Level Agreement 115 (SLA) requirements on the service disruption duration. What is worse, 116 updating multicast forwarding states may take significantly longer 117 than unicast convergence since multicast states are updated based on 118 control-plane signaling [mMRT]. 120 Protection mechanisms are commonly used to reduce the service 121 disruption caused by network faults. With backup forwarding states 122 installed in advance, a protection mechanism can restore an 123 interrupted multicast stream in tens of milliseconds which meets 124 stringent SLAs on service disruption. Several protection mechanisms 125 for multicast traffic have been developed for IP/MPLS networks [mMRT] 126 [MoFRR]. However, the way that TRILL constructs distribution trees 127 (DT) is different from the way that multicast trees are computed 128 under IP/MPLS, therefore a multicast protection mechanism suitable 129 for TRILL is required. 131 This document proposes "Resilient Distribution Trees" (RDT) in which 132 backup trees are installed in advance for the purpose of fast failure 133 repair. Three types of protection mechanisms are proposed. 135 o Global 1:1 protection is used to refer to the mechanism where the 136 multicast source RBridge normally injects one multicast stream 137 onto the primary DT. When an interruption of this stream is 138 detected, the source RBridge switches to the backup DT to inject 139 subsequent multicast streams until the primary DT is recovered. 141 o Global 1+1 protection is used to refer to the mechanism where the 142 multicast source RBridge always injects two copies of multicast 143 streams, one onto the primary DT and one onto the backup DT 144 respectively. In the normal case, multicast receivers pick the 145 stream sent along the primary DT and egress it to its local link. 146 When a link failure interrupts the primary stream, the backup one 147 will be picked until the primary DT is recovered. 149 o Local protection refers to the mechanism where the RBridge 150 attached to the failed link locally repairs the failure. 152 RDT may greatly reduce the service disruption caused by link 153 failures. In the global 1:1 protection, the time cost by DT 154 recalculation and installation can be saved. The global 1+1 155 protection and local protection further save the time spent on 156 failure propagation. A failed link can be repaired in tens of 157 milliseconds. Although it's possible to make use of RDT to achieve 158 load balance of multicast traffic, this document leaves that for 159 future study. 161 [RFC7176] specifies the Affinity Sub-TLV. An "Affinity Link" can be 162 explicitly assigned to a distribution tree or trees. This offers a 163 way to manipulate the calculation of distribution trees. With 164 intentional assignment of Affinity Links, a backup distribution tree 165 can be set up to protect links on a primary distribution tree. 167 1.1. Conventions used in this document 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 171 document are to be interpreted as described in RFC 2119 [RFC2119]. 173 1.2. Terminology 175 DT: Distribution Tree 177 IS-IS: Intermediate System to Intermediate System 179 PLR: Point of Local Repair. In this document, PLR is the multicast 180 upstream RBridge connecting the failed link. It's valid only for 181 local protection. 183 RDT: Resilient Distribution Tree 185 RPF: Reverse Path Forwarding 187 SLA: Service Level Agreement 189 TRILL: TRansparent Interconnection of Lots of Links 191 2. Usage of Affinity Sub-TLV 193 This document uses the Affinity Sub-TLV [RFC7176] to assign a parent 194 to an RBridge in a tree as discussed below. 196 2.1. Allocating Affinity Links 198 The Affinity Sub-TLV explicitly assigns parents for RBridges on 199 distribution trees. It can be recognized by each RBridge in the 200 campus. The originating RBridge becomes the parent and the nickname 201 contained in the Affinity Record identifies the child. This 202 explicitly provides an "Affinity Link" on a distribution tree or 203 trees. The "Tree-num of roots" of the Affinity Record identify the 204 distribution trees that adopt this Affinity Link [RFC7176]. 206 Affinity Links may be configured or automatically determined using an 207 algorithm [CMT]. Suppose link RB2-RB3 is chosen as an Affinity Link 208 on the distribution tree rooted at RB1. RB2 should send out the 209 Affinity Sub-TLV with an Affinity Record that is like {Nickname=RB3, 210 Num of Trees=1, Tree-num of roots=RB1}. In this document, RB3 does 211 not have to be a leaf node on a distribution tree, therefore an 212 Affinity Link can be used to identify any link on a distribution 213 tree. This kind of assignment offers a flexibility to RBridges in 214 distribution tree calculation: they are allowed to choose child for 215 which they are not on the shortest paths from the root. This 216 flexibility is used to increase the reliability of distribution trees 217 in this document. 219 Note that Affinity Link MUST NOT be misused to connect two RBridges 220 which are not adjacent. If it is, the Affinity Link is ignored and 221 has no effect on tree building. 223 2.2. Distribution Tree Calculation with Affinity Links 225 When RBridges receive an Affinity Sub-TLV with Affinity Link that is 226 an incoming link of RB2 (i.e., RB2 is the child on this Affinity 227 Link), RB2's incoming links other than the Affinity Link are removed 228 from the full graph of the campus to get a sub graph. RBridges 229 perform the Shortest Path First calculation to compute the 230 distribution tree based on the sub graph. In this way, the Affinity 231 Link will surely appear on the distribution tree. 233 Root Root 234 +---+ -> +---+ -> +---+ +---+ -> +---+ -> +---+ 235 |RB1| |RB2| |RB3| |RB1| |RB2| |RB3| 236 +---+ <- +---+ <- +---+ +---+ <- +---+ <- +---+ 237 ^ | ^ | ^ | ^ | ^ ^ | 238 | v | v | v | v | | v 239 +---+ -> +---+ -> +---+ +---+ -> +---+ -> +---+ 240 |RB4| |RB5| |RB6| |RB4| |RB5| |RB6| 241 +---+ <- +---+ <- +---+ +---+ <- +---+ +---+ 243 Full Graph Sub Graph 245 Root 1 Root 1 246 / \ / \ 247 / \ / \ 248 4 2 4 2 249 / \ | | 250 / \ | | 251 5 3 5 3 252 | | 253 | | 254 6 6 256 Shortest Path Tree of Full Graph Shortest Path Tree of Sub Graph 258 Figure 2.1: DT Calculation with the Affinity Link RB4-RB5 260 Take Figure 2.1 as an example. Suppose RB1 is the root and link RB4- 261 RB5 is the Affinity Link. RB5's other incoming links RB2-RB5 and RB6- 262 RB5 are removed from the Full Graph to get the Sub Graph. Since RB4- 263 RB5 is the unique link to reach RB5, the Shortest Path Tree 264 inevitably contains this link. 266 3. Resilient Distribution Trees Calculation 268 RBridges use IS-IS to detect and advertise network faults. A node or 269 link failure will trigger a campus-wide reconvergence of distribution 270 trees. The reconvergence generally includes the following procedures: 272 1. Failure detected through IS-IS control messages (HELLO) exchanging 273 or some other method such as BFD [RFC7175] [RBmBFD]; 275 2. IS-IS state flooding so each RBridge learns about the failure; 277 3. Each RBridge recalculates affected distribution trees 278 independently; 280 4. RPF filters are updated according to the new distribution trees. 281 The recomputed distribution trees are pruned and installed into 282 the multicast forwarding tables. 284 The reconvergence can be slow, which will disrupt ongoing multicast 285 traffic. In protection mechanisms, alternative paths prepared ahead 286 of potential node or link failures are used to detour the failures 287 upon the failure detection, therefore service disruption can be 288 minimized. 290 This document focuses only on link protection. The construction of 291 backup DT for the purpose of node protection is out the scope of this 292 document. In order to protect a node on the primary tree, a backup 293 tree can be setup without this node. When this node fails, the backup 294 tree can be safely used to forward multicast traffic to make a 295 detour. However, TRILL distribution trees are shared among all VLANs 296 and Fine Grained Labels [RFC7172] and they have to cover all RBridge 297 nodes in the campus [RFC6325]. A DT that does not span all RBridges 298 in the campus may not cover all receivers of many multicast groups. 299 (This is different from the multicast trees construction signaled by 300 PIM [RFC4601] or mLDP [RFC6388].) 302 3.1. Designating Roots for Backup Trees 304 Operators MAY manually configure the roots for the backup DTs. 305 Nevertheless, this document aims to provide a mechanism with minimum 306 configuration. Two options are offered as follows. 308 3.1.1. Conjugate Trees 310 [RFC6325] and [RFC7180] specify how distribution tree roots are 311 selected. When a backup DT is computed for a primary DT, its root is 312 set to be the root of this primary DT. In order to distinguish the 313 primary DT and the backup DT, the root RBridge MUST own multiple 314 nicknames. 316 3.1.2. Explicitly Advertising Tree Roots 318 RBridge RB1 having the highest root priority nickname might 319 explicitly advertise a list of nicknames to identify the roots of the 320 primary and backup tree roots (See Section 4.5 of [RFC6325]). 322 3.2. Backup DT Calculation 324 3.2.1. Backup DT Calculation with Affinity Links 325 2 1 326 / \ 327 Root 1___ ___2 Root 328 /|\ \ / /|\ 329 / | \ \ / / | \ 330 3 4 5 6 3 4 5 6 331 | | | | \/ \/ 332 | | | | /\ /\ 333 7 8 9 10 7 8 9 10 335 Primary DT Backup DT 337 Figure 3.1: An Example of a Primary DT and its Backup DT 339 TRILL supports the computation of multiple distribution trees by 340 RBridges. With the intentional assignment of Affinity Links in DT 341 calculation, this document proposes a method to construct RDTs. For 342 example, in Figure 3.1, the backup DT is set up maximally disjoint to 343 the primary DT. (The full topology is a combination of these two DTs, 344 which is not shown in the figure.) Except for the link between RB1 345 and RB2, all other links on the primary DT do not overlap with links 346 on the backup DT. It means that every link on the primary DT, except 347 link RB1-RB2, can be protected by the backup DT. 349 3.2.1.1. Algorithm for Choosing Affinity Links 351 Operators MAY configure Affinity Links to intentionally protect a 352 specific link, such as the link connected to a gateway. But it is 353 desirable that every RBridge independently computes Affinity Links 354 for a backup DT across the whole campus. This enables a distributed 355 deployment and also minimizes configuration. 357 Algorithms for Maximally Redundant Trees [MRT] may be used to figure 358 out Affinity Links on a backup DT which is maximally disjointed to 359 the primary DT but it only provides a subset of all possible 360 solutions, i.e., the conjugate trees described in Section 3.1.1. In 361 TRILL, RDT does not restrict the root of the backup DT to be the same 362 as that of the primary DT. Two disjoint (or maximally disjointed) 363 trees may root from different nodes, which significantly augments the 364 solution space. 366 This document RECOMMENDS achieving the independent method through a 367 slight change to the conventional DT calculation process of TRILL. 368 Basically, after the primary DT is calculated, the RBridge will be 369 aware of which links will be used. When the backup DT is calculated, 370 each RBridge increases the metric of these links by a proper value 371 (for safety, it's recommended to used the summation of all original 372 link metrics in the campus but not more than 2**23), which gives 373 these links a lower priority being chosen by the backup DT by 374 performing Shortest Path First calculation. All links on this backup 375 DT can be assigned as Affinity Links but this is unnecessary. In 376 order to reduce the amount of Affinity Sub-TLVs flooded across the 377 campus, only those NOT picked by conventional DT calculation process 378 ought to be recognized as Affinity Links. 380 3.2.1.2. Affinity Links Advertisement 382 Similar to [CMT], every parent RBridge of an Affinity Link takes 383 charge of announcing this link in an Affinity Sub-TLV. When this 384 RBridge plays the role of parent RBridge for several Affinity Links, 385 it is natural to have them advertised together in the same Affinity 386 Sub-TLV and each Affinity Link is structured as one Affinity Record. 388 Affinity Links are announced in the Affinity Sub-TLV that is 389 recognized by every RBridge. Since each RBridge computes distribution 390 trees as the Affinity Sub-TLV requires, the backup DT will be built 391 up consistently. 393 3.2.2. Backup DT Calculation without Affinity Links 395 This section provides an alternative method to set up a disjoint 396 backup DT. 398 After the primary DT is calculated, each RBridge increases the cost 399 of those links which are already in the primary DT by a multiplier 400 (For safety, 64x is RECOMMENDED.). It would ensure that a link 401 appears in both trees if and only if there is no other way to reach 402 the node (i.e. the graph would become disconnected if it were pruned 403 of the links in the first tree.). In other words, the two trees will 404 be maximally disjoint. 406 The above algorithm is similar as that defined in Section 3.2.1.1. 407 All RBridges MUST agree on the same algorithm, then the backup DT can 408 be calculated by each RBridge consistently and configuration is 409 unnecessary. 411 4. Resilient Distribution Trees Installation 413 As specified in Section 4.5.2 of [RFC6325], an ingress RBridge MUST 414 announce the distribution trees it may choose to ingress multicast 415 frames. Thus other RBridges in the campus can limit the amount of 416 states which are necessary for RPF check. Also, [RFC6325] recommends 417 that an ingress RBridge by default chooses the DT or DTs whose root 418 or roots are least cost from the ingress RBridge. To sum up, RBridges 419 do pre-compute all the trees that might be used so they can properly 420 forward multi-destination packets, but only install RPF state for 421 some combinations of ingress and tree. 423 This document states that the backup DT MUST be contained in an 424 ingress RBridge's DT announcement list and included in this ingress 425 RBridge's LSP. In order to reduce the service disruption time, 426 RBridges SHOULD install backup DTs in advance, which also includes 427 the RPF filters that need to be set up for RPF Check. 429 Since the backup DT is intentionally built maximally disjoint to the 430 primary DT, when a link fails and interrupts the ongoing multicast 431 traffic sent along the primary DT, it is probable that the backup DT 432 is not affected. Therefore, the backup DT installed in advance can be 433 used to deliver multicast packets immediately. 435 4.1. Pruning the Backup Distribution Tree 437 The way that a backup DT is pruned is different from the way that the 438 primary DT is pruned. Even though a branch contains no downstream 439 receivers, it is probable that it should not be pruned for the 440 purpose of protection. The rule for backup DT pruning is that the 441 backup DT should be pruned, eliminating branches that have no 442 potential downstream RBridges which appear on the pruned primary DT. 444 It is probably that the primary DT is not optimally pruned in 445 practice. In this case, the backup DT SHOULD be pruned presuming that 446 the primary DT is optimally pruned. Those redundant links that ought 447 to be pruned will not be protected. 449 1 450 \ 451 Root 1___ ___2 Root 452 / \ \ / /|\ 453 / \ \ / / | \ 454 3 5 6 3 4 5 6 455 | | | / \/ 456 | | | / /\ 457 7 9 10 7 9 10 458 Pruned Primary DT Pruned Backup DT 460 Figure 4.1: The Backup DT is Pruned Based on the Pruned Primary DT. 462 Suppose RB7, RB9 and RB10 constitute a multicast group MGx. The 463 pruned primary DT and backup DT are shown in Figure 4.1. Referring 464 back to Figure 3.1, branches RB2-RB1 and RB4-RB1 on the primary DT 465 are pruned for the distribution of MGx traffic since there are no 466 potential receivers on these two branches. Although branches RB1-RB2 467 and RB3-RB2 on the backup DT have no potential multicast receivers, 468 they appear on the pruned primary DT and may be used to repair link 469 failures of the primary DT. Therefore they are not pruned from the 470 backup DT. Branch RB8-RB3 can be safely pruned because it does not 471 appear on the pruned primary DT. 473 4.2. RPF Filters Preparation 475 RB2 includes in its LSP the information to indicate which trees RB2 476 might choose to ingress multicast frames [RFC6325]. When RB2 477 specifies the trees it might choose to ingress multicast traffic, it 478 SHOULD include the backup DT. Other RBridges will prepare the RPF 479 check states for both the primary DT and backup DT. When a multicast 480 packet is sent along either the primary DT or the backup DT, it will 481 pass the RPF Check. This works when global 1:1 protection is used. 482 However, when global 1+1 protection or local protection is applied, 483 traffic duplication will happen if multicast receivers accept both 484 copies of the multicast packets from two RPF filters. In order to 485 avoid such duplication, egress RBridge multicast receivers MUST act 486 as merge points to activate a single RPF filter and discard the 487 duplicated packets from the other RPF filter. In normal case, the RPF 488 state is set up according to the primary DT. When a link fails, the 489 RPF filter based on the backup DT should be activated. 491 5. Protection Mechanisms with Resilient Distribution Trees 493 Protection mechanisms can be developed to make use of the backup DT 494 installed in advance. But protection mechanisms already developed 495 using PIM or mLDP for multicast of IP/MPLS networks are not 496 applicable to TRILL due to the following fundamental differences in 497 their distribution tree calculation. 499 o The link on a TRILL distribution tree is bidirectional while the 500 link on a distribution tree in IP/MPLS networks is unidirectional. 502 o In TRILL, a multicast source node does not have to be the root of 503 the distribution tree. It is just the opposite in IP/MPLS 504 networks. 506 o In IP/MPLS networks, distribution trees are constructed for each 507 multicast source node as well as their backup distribution trees. 508 In TRILL, a small number of core distribution trees are shared 509 among multicast groups. A backup DT does not have to share the 510 same root as the primary DT. 512 Therefore a TRILL specific multicast protection mechanism is needed. 514 Global 1:1 protection, global 1+1 protection and local protection are 515 developed in this section. In Figure 4.1, assume RB7 is the ingress 516 RBridge of the multicast stream while RB9 and RB10 are the multicast 517 receivers. Suppose link RB1-RB5 fails during the multicast 518 forwarding. The backup DT rooted at RB2 does not include link RB1- 519 RB5, therefore it can be used to protect this link. In global 1:1 520 protection, RB7 will switch the subsequent multicast traffic to this 521 backup DT when it's notified about the link failure. In the global 522 1+1 protection, RB7 will inject two copies of the multicast stream 523 and let multicast receivers RB9 and RB10 merge them. In the local 524 protection, when link RB1-RB5 fails, RB1 will locally replicate the 525 multicast traffic and send it on the backup DT. 527 5.1. Global 1:1 Protection 529 In the global 1:1 protection, the ingress RBridge of the multicast 530 traffic is responsible for switching the failure affected traffic 531 from the primary DT over to the backup DT. Since the backup DT has 532 been installed in advance, the global protection need not wait for 533 the DT recalculation and installation. When the ingress RBridge is 534 notified about the failure, it immediately makes this switch over. 536 This type of protection is simple and duplication safe. However, 537 depending on the topology of the RBridge campus, the time spent on 538 the failure detection and propagation through the IS-IS control plane 539 may still cause a considerable service disruption. 541 BFD (Bidirectional Forwarding Detection) protocol can be used to 542 reduce the failure detection time. Link failures can be rapidly 543 detected with one-hop BFD [RFC7175]. [RBmBFD] introduces the fast 544 failure detection of multicast paths. It can be used to reduce both 545 the failure detection and propagation time in the global protection. 546 In [RBmBFD], ingress RBridge need to send BFD control packets to poll 547 each receiver, and receivers return BFD control packets to the 548 ingress as response. If no response is received from a specific 549 receiver for a detection time, the ingress can judge that the 550 connectivity to this receiver is broken. Therefore, [RBmBFD] is used 551 to detect the connectivity of a path rather than a link. The ingress 552 RBridge will determine a minimum failed branch which contains this 553 receiver. The ingress RBridge will switch ongoing multicast traffic 554 based on this judgment. For example, on Figure 4.1, if RB9 does not 555 response while RB10 still responds, RB7 will presume that link RB1- 556 RB5 and RB5-RB9 are failed. Multicast traffic will be switched to a 557 backup DT that can protect these two links. Accurate link failure 558 detection might help ingress RBridges to make smarter decision but 559 it's out of the scope of this document. 561 5.2. Global 1+1 Protection 563 In the global 1+1 protection, the multicast source RBridge always 564 replicates the multicast packets and sends them onto both the primary 565 and backup DT. This may sacrifice the capacity efficiency but given 566 there is much connection redundancy and inexpensive bandwidth in Data 567 Center Networks, such kind of protection can be popular [MoFRR]. 569 5.2.1. Failure Detection 571 Egress RBridges (merge points) SHOULD realize the link failure as 572 early as possible so that failure affected egress RBridges may update 573 their RPF filters quickly to minimize the traffic disruption. Three 574 options are provided as follows. 576 1. Egress RBridges assume a minimum known packet rate for a given 577 data stream [MoFRR]. A failure detection timer Td are set as the 578 interval between two continuous packets. Td is reinitialized each 579 time a packet is received. If Td expires and packets are arriving 580 at the egress RBridge on the backup DT (within the time frame Td), 581 it updates the RPF filters and starts to receive packets forwarded 582 on the backup DT. 584 2. With [RBmBFD], when a link failure happens, affected egress 585 RBridges can detect a lack of connectivity from the ingress. 586 Therefore these egress RBridges are able to update their RPF 587 filters promptly. 589 3. Egress RBridges can always rely on the IS-IS control plane to 590 learn the failure and determine whether their RPF filters should 591 be updated. 593 5.2.2. Traffic Forking and Merging 595 For the sake of protection, transit RBridges SHOULD activate both 596 primary and backup RPF filters, therefore both copies of the 597 multicast packets will pass through transit RBridges. 599 Multicast receivers (egress RBridges) MUST act as "merge points" to 600 egress only one copy of each multicast packet. This is achieved by 601 the activation of only a single RPF filter. In normal case, egress 602 RBridges activate the primary RPF filter. When a link on the pruned 603 primary DT fails, ingress RBridge cannot reach some of the receivers. 604 When these unreachable receivers realize it, they SHOULD update their 605 RPF filters to receive packets sent on the backup DT. 607 5.3. Local Protection 609 In the local protection, the Point of Local Repair (PLR) happens at 610 the upstream RBridge connecting the failed link. It is this RBridge 611 that makes the decision to replicate the multicast traffic to recover 612 this link failure. Local protection can further save the time spent 613 on failure notification through the flooding of LSPs across the 614 campus. In addition, the failure detection can be speeded up using 615 [RFC7175], therefore local protection can minimize the service 616 disruption within 50 milliseconds. 618 Since the ingress RBridge is not necessarily the root of the 619 distribution tree in TRILL, a multicast downstream point may not be 620 the descendants of the ingress point on the distribution tree. 621 Moreover, distribution trees in TRILL are bidirectional and do not 622 share the same root. There are fundamental differences between the 623 distribution tree calculation of TRILL and those used in PIM and 624 mLDP, therefore local protection mechanisms used for PIM and mLDP, 625 such as [mMRT] and [MoFRR], are not applicable here. 627 5.3.1. Start Using the Backup Distribution Tree 629 The egress nickname TRILL header field of the replicated multicast 630 TRILL data packets specifies the tree on which they are being 631 distributed. This field will be rewritten to the backup DT's root 632 nickname by the PLR. But the ingress of the multicast frame MUST 633 remain unchanged. This is a halfway change of the DT for multicast 634 packets. Afterwards, the PLR begins to forward multicast traffic 635 along the backup DT. This updates [RFC6325] which specifies that the 636 egress nickname in the TRILL header of a multi-destination TRILL data 637 packet must not be changed by transit RBridges. 639 In the above example, the PLR RB1 locally determines to send 640 replicated multicast packets according to the backup DT. It will send 641 it to the next hop RB2. 643 5.3.2. Duplication Suppression 645 When a PLR starts to send replicated multicast packets on the backup 646 DT, some multicast packets are still being sent along the primary DT. 647 Some egress RBridges might receive duplicated multicast packets. The 648 traffic forking and merging method in the global 1+1 protection can 649 be adopted to suppress the duplication. 651 5.3.3. An Example to Walk Through 653 The example used in the above local protection is put together to get 654 a whole "walk through" below. 656 In the normal case, multicast frames ingressed by RB7 with pruned 657 distribution on primary DT rooted at RB1 are being received by RB9 658 and RB10. When the link RB1-RB5 fails, the PLR RB1 begins to 659 replicate and forward subsequent multicast packets using the pruned 660 backup DT rooted at RB2. When RB2 gets the multicast packets from the 661 link RB1-RB2, it accepts them since the RPF filter {DT=RB2, 662 ingress=RB7, receiving links=RB1-RB2, RB3-RB2, RB4-RB2, RB5-RB2 and 663 RB6-RB2} is installed on RB2. RB2 forwards the replicated multicast 664 packets to its neighbors except RB1. The multicast packets reach RB6 665 where both RPF filters {DT=RB1, ingress=RB7, receiving link=RB1-RB6} 666 and {DT=RB2, ingress=RB7, receiving links=RB2-RB6 and RB9-RB6} are 667 active. RB6 will let both multicast streams through. Multicast 668 packets will finally reach RB9 where the RPF filter is updated from 669 {DT=RB1, ingress=RB7, receiving link=RB5-RB9} to {DT=RB2, 670 ingress=RB7, receiving link=RB6-RB9}. RB9 will egress the multicast 671 packets on to the local link. 673 5.4. Switching Back to the Primary Distribution Tree 675 Assume an RBridge receives the LSP that indicates a link failure. 676 This RBridge starts to calculate the new primary DT based on the new 677 topology without the failed link. Suppose the new primary DT is 678 installed at t1. 680 The propagation of LSPs around the campus will take some time. For 681 safety, we assume all RBridges in the campus will have converged to 682 the new primary DT at t1+Ts. By default, Ts (the "settling time") is 683 set to 30s but it is configurable. At t1+Ts, the ingress RBridge 684 switches the traffic from the backup DT back to the new primary DT. 686 After another Ts (at t1+2*Ts), no multicast packets are being 687 forwarded along the old primary DT. The backup DT should be updated 688 (recalculated and reinstalled) according to the new primary DT. The 689 process of this update under different protection types are discussed 690 as follows. 692 a) For the global 1:1 protection, the backup DT is simply updated at 693 t1+2*Ts. 695 b) For the global 1+1 protection, the ingress RBridge stops 696 replicating the multicast packets onto the old backup DT at t1+Ts. 697 The backup DT is updated at t1+2*Ts. It MUST wait for another Ts, 698 during which time period all RBridges converge to the new backup 699 DT. At t1+3*Ts, it's safe for the ingress RBridge start to 700 replicate multicast packets onto the new backup DT. 702 c) For the local protection, the PLR stops replicating and sending 703 packets on the old backup DT at t1+Ts. It is safe for RBridges to 704 start updating the backup DT at t1+2*Ts. 706 6. Security Considerations 708 This document raises no new security issues for TRILL. 710 For general TRILL Security Considerations, see [RFC6325]. 712 7. IANA Considerations 714 No new registry or registry entries are requested to be assigned by 715 IANA. The Affinity Sub-TLV has already been defined in [RFC7176]. 716 This document does not change its definition. RFC Editor: please 717 remove this section before publication. 719 Acknowledgements 721 The careful review from Gayle Noble is gracefully acknowledged. The 722 authors would like to thank the comments and suggestions from Donald 723 Eastlake, Erik Nordmark, Fangwei Hu, Hongjun Zhai and Xudong Zhang. 725 8. References 727 8.1. Normative References 729 [RFC7176] Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, D., 730 and A. Banerjee, "Transparent Interconnection of Lots of 731 Links (TRILL) Use of IS-IS", RFC 7176, May 2014. 733 [CMT] T. Senevirathne, J. Pathangi, et al, "Coordinated Multicast 734 Trees (CMT) for TRILL", draft-ietf-trill-cmt, work in 735 progress. 737 [RFC6325] R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol 738 Specification", RFC 6325, July 2011. 740 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 741 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 742 Protocol Specification (Revised)", RFC 4601, August 2006. 744 [RFC6388] Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas, 745 "Label Distribution Protocol Extensions for Point-to- 746 Multipoint and Multipoint-to-Multipoint Label Switched 747 Paths", RFC 6388, November 2011. 749 [RBmBFD] M. Zhang, S. Pallagatti and V. Govindan, "TRILL Support of 750 Point to Multipoint BFD", draft-ietf-trill-p2mp-bfd, work 751 in progress. 753 [RFC7175] Manral, V., Eastlake 3rd, D., Ward, D., and A. Banerjee, 754 "Transparent Interconnection of Lots of Links (TRILL): 755 Bidirectional Forwarding Detection (BFD) Support", RFC 756 7175, May 2014. 758 [RFC7180] Eastlake 3rd, D., Zhang, M., Ghanwani, A., Manral, V., and 759 A. Banerjee, "Transparent Interconnection of Lots of Links 760 (TRILL): Clarifications, Corrections, and Updates", RFC 761 7180, May 2014. 763 8.2. Informative References 765 [mMRT] A. Atlas, R. Kebler, et al., "An Architecture for Multicast 766 Protection Using Maximally Redundant Trees", draft-atlas- 767 rtgwg-mrt-mc-arch, work in progress. 769 [MRT] A. Atlas, Ed., R. Kebler, et al., "An Architecture for 770 IP/LDP Fast-Reroute Using Maximally Redundant Trees", 771 draft-ietf-rtgwg-mrt-frr-architecture, work in progress. 773 [MoFRR] A. Karan, C. Filsfils, et al., "Multicast only Fast Re- 774 Route", draft-ietf-rtgwg-mofrr, work in progress. 776 [mBFD] D. Katz, D. Ward, "BFD for Multipoint Networks", draft- 777 ietf-bfd-multipoint, work in progress. 779 [RFC7172] Eastlake 3rd, D., Zhang, M., Agarwal, P., Perlman, R., and 780 D. Dutt, "Transparent Interconnection of Lots of Links 781 (TRILL): Fine-Grained Labeling", RFC 7172, May 2014. 783 Author's Addresses 785 Mingui Zhang 786 Huawei Technologies Co.,Ltd 787 Huawei Building, No.156 Beiqing Rd. 788 Beijing 100095 P.R. China 790 Email: zhangmingui@huawei.com 792 Tissa Senevirathne 793 Cisco Systems 794 375 East Tasman Drive, 795 San Jose, CA 95134 797 Phone: +1-408-853-2291 798 Email: tsenevir@cisco.com 800 Janardhanan Pathangi 801 Dell/Force10 Networks 802 Olympia Technology Park, 803 Guindy Chennai 600 032 805 Phone: +91 44 4220 8400 806 Email: Pathangi_Janardhanan@Dell.com 808 Ayan Banerjee 809 Cisco 811 Email: ayabaner@cisco.com 813 Anoop Ghanwani 814 Dell 815 350 Holger Way 816 San Jose, CA 95134 818 Phone: +1-408-571-3500 819 Email: Anoop@alumni.duke.edu