idnits 2.17.1 draft-ietf-trill-resilient-trees-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC6325, updated by this document, for RFC5378 checks: 2006-05-11) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 14, 2016) is 2688 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 179, but not defined ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Mingui Zhang 3 Intended Status: Proposed Standard Huawei 4 Updates: 6325 Tissa Senevirathne 5 Consultant 6 Janardhanan Pathangi 7 DELL 8 Ayan Banerjee 9 Cisco 10 Anoop Ghanwani 11 DELL 12 Expires: June 17, 2017 December 14, 2016 14 TRILL: Resilient Distribution Trees 15 draft-ietf-trill-resilient-trees-06.txt 17 Abstract 19 The TRILL (Transparent Interconnection of Lots of Links) protocol 20 provides multicast data forwarding based on IS-IS link state routing. 21 Distribution trees are computed based on the link state information 22 through Shortest Path First calculation. When a link on the 23 distribution tree fails, a campus-wide reconvergence of this 24 distribution tree will take place, which can be time consuming and 25 may cause considerable disruption to the ongoing multicast service. 27 This document specifies how to build backup distribution trees to 28 protect links on the primary distribution tree. Since the backup 29 distribution tree is built up ahead of the link failure, when a link 30 on the primary distribution tree fails, the pre-installed backup 31 forwarding table will be utilized to deliver multicast packets 32 without waiting for the campus-wide reconvergence. This minimizes the 33 service disruption. This document updates RFC 6325. 35 Status of this Memo 37 This Internet-Draft is submitted to IETF in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF), its areas, and its working groups. Note that other 42 groups may also distribute working documents as Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/1id-abstracts.html 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html 55 Copyright and License Notice 57 Copyright (c) 2016 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 1.1. Conventions used in this document . . . . . . . . . . . . . 5 74 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 75 2. Usage of the Affinity Sub-TLV . . . . . . . . . . . . . . . . . 6 76 2.1. Allocating Affinity Links . . . . . . . . . . . . . . . . . 6 77 2.2. Distribution Tree Calculation with Affinity Links . . . . . 6 78 3. Resilient Distribution Trees Calculation . . . . . . . . . . . 7 79 3.1. Designating Roots for Backup Trees . . . . . . . . . . . . 8 80 3.2. Backup DT Calculation . . . . . . . . . . . . . . . . . . . 8 81 3.2.1. Backup DT Calculation with Affinity Links . . . . . . . 8 82 3.2.1.1. Algorithm for Choosing Affinity Links . . . . . . . 9 83 3.2.1.2. Affinity Links Advertisement . . . . . . . . . . . 10 84 3.2.2. Backup DT Calculation without Affinity Links . . . . . 10 85 4. Resilient Distribution Trees Installation . . . . . . . . . . . 10 86 4.1. Pruning the Backup Distribution Tree . . . . . . . . . . . 11 87 4.2. RPF Filters Preparation . . . . . . . . . . . . . . . . . . 12 88 5. Protection Mechanisms with Resilient Distribution Trees . . . . 12 89 5.1. Global 1:1 Protection . . . . . . . . . . . . . . . . . . . 13 90 5.2. Global 1+1 Protection . . . . . . . . . . . . . . . . . . . 13 91 5.2.1. Failure Detection . . . . . . . . . . . . . . . . . . . 14 92 5.2.2. Traffic Forking and Merging . . . . . . . . . . . . . . 14 93 5.3. Local Protection . . . . . . . . . . . . . . . . . . . . . 14 94 5.3.1. Starting to Use the Backup Distribution Tree . . . . . 15 95 5.3.2. Duplication Suppression . . . . . . . . . . . . . . . . 15 96 5.3.3. An Example to Walk Through . . . . . . . . . . . . . . 15 97 5.4. Updating the Primary and the Backup Trees . . . . . . . . . 16 98 6. TRILL IS-IS Extensions . . . . . . . . . . . . . . . . . . . . 16 99 6.1 Resilient Trees Extended Capability Bit . . . . . . . . . . 17 100 6.2 Backup Tree Root APPsub-TLV . . . . . . . . . . . . . . . . 17 101 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 17 102 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 17 103 8.1. Resilient Tree Extended Capability Bit . . . . . . . . . . 18 104 8.2. Backup Tree Root APPsub-TLV . . . . . . . . . . . . . . . . 18 105 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . 18 106 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 107 9.1. Normative References . . . . . . . . . . . . . . . . . . . 18 108 9.2. Informative References . . . . . . . . . . . . . . . . . . 19 109 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 111 1. Introduction 113 Lots of multicast traffic is generated by interrupt latency sensitive 114 applications, e.g., video distribution including IP-TV, video 115 conference and so on. Normally, a network fault will be recovered 116 through a network wide reconvergence of the forwarding states, but 117 this process is too slow to meet tight Service Level Agreement (SLA) 118 requirements on the duration of service disruption. What is worse, 119 updating multicast forwarding states may take significantly longer 120 than unicast convergence since multicast states are updated based on 121 control-plane signaling [mMRT]. 123 Protection mechanisms are commonly used to reduce the service 124 disruption caused by network faults. With backup forwarding states 125 installed in advance, a protection mechanism can restore an 126 interrupted multicast stream in tens of milliseconds which meets 127 stringent SLAs on service disruption. Several protection mechanisms 128 for multicast traffic have been developed for IP/MPLS networks [mMRT] 129 [RFC7431]. However, the way that TRILL constructs distribution trees 130 (DT) is different from the way that multicast trees are computed 131 under IP/MPLS, therefore a multicast protection mechanism suitable 132 for TRILL is required. 134 This document proposes "Resilient Distribution Trees" in which backup 135 trees are installed in advance for the purpose of fast failure 136 repair. Three types of protection mechanisms are proposed. 138 o Global 1:1 protection is used to refer to the mechanism where the 139 multicast source RBridge normally injects one multicast stream 140 onto the primary DT. When an interruption of this stream is 141 detected, the source RBridge switches to the backup DT to inject 142 subsequent multicast streams until the primary DT is recovered. 144 o Global 1+1 protection is used to refer to the mechanism where the 145 multicast source RBridge always injects two copies of multicast 146 streams, one onto the primary DT and one onto the backup DT 147 respectively. In the normal case, multicast receivers pick the 148 stream sent along the primary DT and egress it to its local link. 149 When a link failure interrupts the primary stream, the backup 150 stream will be picked until the primary DT is recovered. 152 o Local protection refers to the mechanism where the RBridge 153 attached to the failed link locally repairs the failure. 155 Resilient Distribution Trees can greatly reduce the service 156 disruption caused by link failures. In the global 1:1 protection, the 157 time cost by DT recalculation and installation can be saved. The 158 global 1+1 protection and local protection further save the time 159 spent on the propagation of failure indication. Routing can be 160 repaired for a failed link in tens of milliseconds. Although it's 161 possible to make use of Resilient Distribution Trees to achieve load 162 balancing of multicast traffic, this document leaves that for future 163 study. 165 [RFC7176] specifies the Affinity Sub-TLV. An "Affinity Link" can be 166 explicitly assigned to a distribution tree or trees as discussed in 167 Section 2.1. This offers a way to manipulate the calculation of 168 distribution trees. With intentional assignment of Affinity Links, a 169 backup distribution tree can be set up to protect links on a primary 170 distribution tree. 172 This document updates [RFC6325] as specified in Section 5.3.1. 174 1.1. Conventions used in this document 176 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 177 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 178 document are to be interpreted as described in RFC 2119 [RFC2119]. 180 1.2. Terminology 182 BFD: Bidirectional Forwarding Detection 184 CMT: Coordinated Multicast Trees [RFC7783] 186 DT: Distribution Tree [RFC6325] 188 IS-IS: Intermediate System to Intermediate System [RFC7176] 190 LSP: IS-IS Link State Packet 192 mLDP: Multipoint Label Distribution Protocol [RFC6388] 194 MPLS: Multi-Protocol Label Switching 196 PIM: Protocol Independent Multicast [RFC4601] 198 PLR: Point of Local Repair. In this document, PLR is the multicast 199 upstream RBridge connecting to the failed link. It's valid only for 200 local protection (Section 5.3). 202 RBridge: A device implementing the TRILL protocol [RFC6325] [RFC7780] 204 RPF: Reverse Path Forwarding 206 SLA: Service Level Agreement 207 Td: failure detection timer 209 TRILL: TRansparent Interconnection of Lots of Links or Tunneled 210 Routing in the Link Layer [RFC6325] [RFC7780] 212 2. Usage of the Affinity Sub-TLV 214 This document uses the Affinity Sub-TLV [RFC7176] to assign a parent 215 to an RBridge in a tree as discussed below. Support of the Affinity 216 Sub-TLV by an RBridge is indicated by a capability bit in the TRILL- 217 VER Sub-TLV [RFC7783]. 219 2.1. Allocating Affinity Links 221 The Affinity Sub-TLV explicitly assigns parents for RBridges on 222 distribution trees. It is distributed in an LSP and can be recognized 223 by each RBridge in the campus. The originating RBridge becomes the 224 parent and the nickname contained in the Affinity Record identifies 225 the child. This explicitly provides an "Affinity Link" on a 226 distribution tree or trees. The "Tree-num of roots" in the Affinity 227 Record(s) in the Affinity Sub-TLV identify the distribution trees 228 that adopt this Affinity Link [RFC7176]. 230 Affinity Links may be configured or automatically determined using an 231 algorithm [RFC7783]. Suppose link RB2-RB3 is chosen as an Affinity 232 Link on the distribution tree rooted at RB1. RB2 should send out the 233 Affinity Sub-TLV with an Affinity Record that says {Nickname=RB3, Num 234 of Trees=1, Tree-num of roots=RB1}. In this document, RB3 does not 235 have to be a leaf node on a distribution tree, therefore an Affinity 236 Link can be used to identify any link on a distribution tree. This 237 kind of assignment offers a flexibility of control to RBridges in 238 distribution tree calculation: they are allowed to choose child for 239 which they are not on the shortest paths from the root. This 240 flexibility is used to increase the reliability of distribution trees 241 in this document. 243 Note that Affinity Link SHOULD NOT be misused to declare connection 244 of two RBridges that are not adjacent. If it is, the Affinity Link is 245 ignored and has no effect on tree building. 247 2.2. Distribution Tree Calculation with Affinity Links 249 When RBridges receive an Affinity Sub-TLV declaring an Affinity Link 250 that is an incoming link of RB2 (i.e., RB2 is the child on this 251 Affinity Link), RB2's incoming links/adjacencies other than the 252 Affinity Link are removed from the full graph of the campus to get a 253 sub graph. RBridges perform the Shortest Path First calculation to 254 compute the distribution tree based on the resulting sub graph. In 255 this way, the Affinity Link will surely appear on the distribution 256 tree. Outgoing links/adjacencies are not affected. (When two 257 RBridges, say RB1 and RB2, are adjacent, the adjacency/link from RB1 258 to RB2 and the adjacency/link from RB2 to RB1 are separate and, for 259 example, might have different costs.) 261 Root Root 262 +---+ -> +---+ -> +---+ +---+ -> +---+ -> +---+ 263 |RB1| |RB2| |RB3| |RB1| |RB2| |RB3| 264 +---+ <- +---+ <- +---+ +---+ <- +---+ <- +---+ 265 ^ | ^ | ^ | ^ | ^ ^ | 266 | v | v | v | v | | v 267 +---+ -> +---+ -> +---+ +---+ -> +---+ -> +---+ 268 |RB4| |RB5| |RB6| |RB4| |RB5| |RB6| 269 +---+ <- +---+ <- +---+ +---+ <- +---+ +---+ 271 Full Graph Sub Graph 273 Root 1 Root 1 274 / \ / \ 275 / \ / \ 276 4 2 4 2 277 / \ | | 278 / \ | | 279 5 3 5 3 280 | | 281 | | 282 6 6 284 Shortest Path Tree of Full Graph Shortest Path Tree of Sub Graph 286 Figure 2.1: DT Calculation with the Affinity Link RB4-RB5 288 Take Figure 2.1 as an example. Suppose RB1 is the root and link RB4- 289 RB5 is the Affinity Link. RB5's other incoming links RB2-RB5 and RB6- 290 RB5 are removed from the Full Graph to get the Sub Graph. Since RB4- 291 RB5 is the unique link to reach RB5, the Shortest Path Tree 292 inevitably contains this link. 294 3. Resilient Distribution Trees Calculation 296 RBridges use IS-IS to advertise network faults. A node or link 297 failure will trigger a campus-wide reconvergence of distribution 298 trees. The reconvergence generally includes the following procedures: 300 1. Failure (loss of adjacency) detected through IS-IS control 301 messages (HELLO) not getting through or some other link test such 302 as BFD [RFC7175] [RBmBFD]; 304 2. IS-IS state flooding so each RBridge learns about the failure; 306 3. Each RBridge recalculates affected distribution trees 307 independently; 309 4. RPF filters are updated according to the new distribution trees. 310 The recomputed distribution trees are pruned and installed into 311 the multicast forwarding tables. 313 The reconvergence time disrupts ongoing multicast traffic. In 314 protection mechanisms, alternative paths prepared ahead of potential 315 node or link failures are used to detour the failures upon the 316 failure detection; thus service disruption can be minimized. 318 This document focuses only on link failure protection. The 319 construction of backup DTs for the purpose of node protection is out 320 the scope of this document. (The usual way to protect from a node 321 failure on the primary tree, is to have a backup tree setup without 322 this node. When this node fails, the backup tree can be safely used 323 to forward multicast traffic to make a detour. However, TRILL 324 distribution trees are shared among all VLANs and Fine Grained Labels 325 [RFC7172] and they have to cover all RBridge nodes in the campus 326 [RFC6325]. A DT that does not span all RBridges in the campus may not 327 cover all receivers of many multicast groups. (This is different from 328 the multicast trees construction signaled by PIM [RFC4601] or mLDP 329 [RFC6388].)) An RBridge that supports the resilient distribution 330 trees specified in this document advertises this capability through 331 the Resilient Trees Extended Capability Bit (See Section 6.1). 333 3.1. Designating Roots for Backup Trees 335 RBridge RB1 having the highest root priority nickname might 336 explicitly advertise a list of couple of nicknames to identify the 337 roots of primary and backup DTs using the Backup Tree APPsub-TLV as 338 specified in Section 6.2 (See also Section 4.5 of [RFC6325]). It's 339 possible that the backup DT and the primary DT have the common root 340 RBridge. In order to distinguish the primary DT and the backup DT for 341 this case, the root RBridge MUST own multiple nicknames. 343 3.2. Backup DT Calculation 345 3.2.1. Backup DT Calculation with Affinity Links 346 2 1 347 / \ 348 Root 1___ ___2 Root 349 /|\ \ / /|\ 350 / | \ \ / / | \ 351 3 4 5 6 3 4 5 6 352 | | | | \/ \/ 353 | | | | /\ /\ 354 7 8 9 10 7 8 9 10 356 Primary DT Backup DT 358 Figure 3.1: An Example of a Primary DT and its Backup DT 360 TRILL supports the computation of multiple distribution trees by 361 RBridges. With the intentional assignment of Affinity Links in DT 362 calculation, this document proposes a method to construct Resilient 363 Distribution Trees. For example, in Figure 3.1, the backup DT is set 364 up maximally disjoint to the primary DT. (The full topology is a 365 combination of these two DTs, which is not shown in the figure.) 366 Except for the link between RB1 and RB2, all other links on the 367 primary DT do not overlap with links on the backup DT. It means that 368 every link on the primary DT, except link RB1-RB2, can be protected 369 by the backup DT. 371 3.2.1.1. Algorithm for Choosing Affinity Links 373 Operators MAY configure Affinity Links to intentionally protect a 374 specific link, such as the link connected to a gateway. But it is 375 desirable that every RBridge independently computes Affinity Links 376 for a backup DT across the whole campus. This enables a distributed 377 deployment and also minimizes configuration. 379 Algorithms for Maximally Redundant Trees [MRT] may be used to figure 380 out Affinity Links on a backup DT which is maximally disjointed to 381 the primary DT but it only provides a subset of all possible 382 solutions, i.e., the conjugate trees described in Section 3.1.1. In 383 TRILL, Resilient Distribution Tree does not restrict the root of the 384 backup DT to be the same as that of the primary DT. Two disjoint (or 385 maximally disjoint) trees may have different root nodes, which 386 significantly augments the solution space. 388 This document RECOMMENDS achieving the independent method through a 389 slight change to the conventional DT calculation process of TRILL. 390 Basically, after the primary DT is calculated, the RBridge will be 391 aware of which links will be used. When the backup DT is calculated, 392 each RBridge increases the metric of these links by a proper value 393 (for safety, it's recommended to use the summation of all original 394 link metrics in the campus but not more than 2**23), which gives 395 these links a lower priority of being chosen for the backup DT by the 396 Shortest Path First calculation. All links on this backup DT can be 397 assigned as Affinity Links but this is unnecessary. In order to 398 reduce the amount of Affinity Sub-TLVs flooded across the campus, 399 only those NOT picked by the conventional DT calculation process 400 SHOULD be announced as Affinity Links. 402 3.2.1.2. Affinity Links Advertisement 404 Similar to [RFC7783], every parent RBridge of an Affinity Link takes 405 charge of announcing this link in an Affinity Sub-TLV. When this 406 RBridge plays the role of parent RBridge for several Affinity Links, 407 it is natural to have them advertised together in the same Affinity 408 Sub-TLV, and each Affinity Link is structured as one Affinity Record 409 [RFC7176]. 411 Affinity Links are announced in the Affinity Sub-TLV that is 412 recognized by every RBridge. Since each RBridge computes distribution 413 trees as the Affinity Sub-TLV requires, the backup DT will be built 414 up consistently. 416 3.2.2. Backup DT Calculation without Affinity Links 418 This section provides an alternative method to set up a disjoint 419 backup DT. 421 After the primary DT is calculated, each RBridge increases the cost 422 of those links which are already in the primary DT by a multiplier 423 (For safety, 64x is RECOMMENDED.). It would ensure that a link 424 appears in both trees if and only if there is no other way to reach 425 the node (i.e. the graph would become disconnected if it were pruned 426 of the links in the first tree.). In other words, the two trees will 427 be maximally disjoint. 429 The above algorithm is similar to that defined in Section 3.2.1.1. 430 All RBridges MUST agree on the same algorithm, then the backup DT can 431 be calculated by each RBridge consistently and configuration is 432 unnecessary. 434 4. Resilient Distribution Trees Installation 436 As specified in Section 4.5.2 of [RFC6325], an ingress RBridge MUST 437 announce the distribution trees it may choose to ingress multicast 438 frames. Thus other RBridges in the campus can limit the amount of 439 states which are necessary for RPF check. Also, [RFC6325] recommends 440 that an ingress RBridge by default chooses the DT or DTs whose root 441 or roots are least cost from the ingress RBridge. To sum up, RBridges 442 do pre-compute all the trees that might be used so they can properly 443 forward multi-destination packets, but only install RPF state for 444 some combinations of ingress and tree. 446 This document specifies that the backup DT MUST be contained in an 447 ingress RBridge's DT announcement list and included in this ingress 448 RBridge's LSP. In order to reduce the service disruption time, 449 RBridges SHOULD install backup DTs in advance, which also includes 450 the RPF filters that need to be set up for RPF Check. 452 Since the backup DT is intentionally built maximally disjoint to the 453 primary DT, when a link fails and interrupts the ongoing multicast 454 traffic sent along the primary DT, it is probable that the backup DT 455 is not affected. Therefore, the backup DT installed in advance can be 456 used to deliver multicast packets immediately. 458 4.1. Pruning the Backup Distribution Tree 460 The way that a backup DT is pruned is different from the way that the 461 primary DT is pruned. Even though a branch contains no downstream 462 receivers, it is probable that it should not be pruned for the 463 purpose of protection. The rule for backup DT pruning is that the 464 backup DT should be pruned, eliminating branches that have no 465 potential downstream RBridges which appear on the pruned primary DT. 467 It is probable that the primary DT is not optimally pruned in 468 practice. In this case, the backup DT SHOULD be pruned presuming that 469 the primary DT is optimally pruned. Those redundant links that ought 470 to be pruned will not be protected. 472 1 473 \ 474 Root 1___ ___2 Root 475 / \ \ / /|\ 476 / \ \ / / | \ 477 3 5 6 3 4 5 6 478 | | | / \/ 479 | | | / /\ 480 7 9 10 7 9 10 481 Pruned Primary DT Pruned Backup DT 483 Figure 4.1: The Backup DT is Pruned Based on the Pruned Primary DT. 485 Suppose RB7, RB9 and RB10 constitute a multicast group MGx. The 486 pruned primary DT and backup DT are shown in Figure 4.1. Referring 487 back to Figure 3.1, branches RB2-RB1 and RB4-RB1 on the primary DT 488 are pruned for the distribution of MGx traffic since there are no 489 potential receivers on these two branches. Although branches RB1-RB2 490 and RB3-RB2 on the backup DT have no potential multicast receivers, 491 they appear on the pruned primary DT and may be used to repair link 492 failures of the primary DT. Therefore they are not pruned from the 493 backup DT. Branch RB8-RB3 can be safely pruned because it does not 494 appear on the pruned primary DT. 496 4.2. RPF Filters Preparation 498 RB2 includes in its LSP the information to indicate which trees RB2 499 might choose to ingress multicast frames [RFC6325]. When RB2 500 specifies the trees it might choose to ingress multicast traffic, it 501 SHOULD include the backup DT. Other RBridges will prepare the RPF 502 check states for both the primary DT and backup DT. When a multicast 503 packet is sent along either the primary DT or the backup DT, it will 504 pass the RPF Check. This works when global 1:1 protection is used. 505 However, when global 1+1 protection or local protection is applied, 506 traffic duplication will happen if multicast receivers accept both 507 copies of the multicast packets from two RPF filters. In order to 508 avoid such duplication, egress RBridge multicast receivers MUST act 509 as merge points to activate a single RPF filter and discard the 510 duplicated packets from the other RPF filter. In the normal case, the 511 RPF state is set up according to the primary DT. When a link fails, 512 the RPF filter based on the backup DT should be activated. 514 5. Protection Mechanisms with Resilient Distribution Trees 516 Protection mechanisms can be developed to make use of the backup DT 517 installed in advance. But protection mechanisms already developed 518 using PIM or mLDP for multicast of IP/MPLS networks are not 519 applicable to TRILL due to the following fundamental differences in 520 their distribution tree calculation. 522 o The link on a TRILL distribution tree is bidirectional while the 523 link on a distribution tree in IP/MPLS networks is unidirectional. 525 o In TRILL, a multicast source node does not have to be the root of 526 the distribution tree. It is just the opposite in IP/MPLS 527 networks. 529 o In IP/MPLS networks, distribution trees are constructed for each 530 multicast source node as well as their backup distribution trees. 531 In TRILL, a small number of core distribution trees are shared 532 among multicast groups. A backup DT does not have to share the 533 same root as the primary DT. 535 Therefore a TRILL specific multicast protection mechanism is needed. 537 Global 1:1 protection, global 1+1 protection and local protection are 538 developed in this section. In Figure 4.1, assume RB7 is the ingress 539 RBridge of the multicast stream while RB9 and RB10 are the multicast 540 receivers. Suppose link RB1-RB5 fails during the multicast 541 forwarding. The backup DT rooted at RB2 does not include link RB1- 542 RB5, therefore it can be used to protect this link. In global 1:1 543 protection, RB7 will switch the subsequent multicast traffic to this 544 backup DT when it's notified of the link failure. In the global 1+1 545 protection, RB7 will inject two copies of the multicast stream and 546 let multicast receivers RB9 and RB10 merge them. In the local 547 protection, when link RB1-RB5 fails, RB1 will locally replicate the 548 multicast traffic and send it on the backup DT. 550 5.1. Global 1:1 Protection 552 In the global 1:1 protection, the ingress RBridge of the multicast 553 traffic is responsible for switching the failure affected traffic 554 from the primary DT over to the backup DT. Since the backup DT has 555 been installed in advance, the global protection need not wait for 556 the DT recalculation and installation. When the ingress RBridge is 557 notified about the failure, it immediately makes this switch over. 559 This type of protection is simple and duplication safe. However, 560 depending on the topology of the RBridge campus, the time spent on 561 the failure detection and propagation through the IS-IS control plane 562 may still cause a considerable service disruption. 564 BFD (Bidirectional Forwarding Detection) protocol can be used to 565 reduce the failure detection time. Link failures can be rapidly 566 detected with one-hop BFD [RFC7175]. [RBmBFD] introduces the fast 567 failure detection of multicast paths. It can be used to reduce both 568 the failure detection and propagation time in the global protection. 569 In [RBmBFD], ingress RBridge needs to send BFD control packets to 570 poll each receiver, and receivers return BFD control packets to the 571 ingress as the response. If no response is received from a specific 572 receiver for a detection time, the ingress can judge that the 573 connectivity to this receiver is broken. Therefore, [RBmBFD] is used 574 to detect the connectivity of a path rather than a link. The ingress 575 RBridge will determine a minimum failed branch that contains this 576 receiver. The ingress RBridge will switch ongoing multicast traffic 577 based on this judgment. For example, on Figure 4.1, if RB9 does not 578 response while RB10 still responds, RB7 will presume that link RB1- 579 RB5 and RB5-RB9 are failed. Multicast traffic will be switched to a 580 backup DT that can protect these two links. More accurate link 581 failure detection might help ingress RBridges to make smarter 582 decision but it's out of the scope of this document. 584 5.2. Global 1+1 Protection 585 In the global 1+1 protection, the multicast source RBridge always 586 replicates the multicast packets and sends them onto both the primary 587 and backup DT. This may sacrifice the capacity efficiency but given 588 there is much connection redundancy and inexpensive bandwidth in Data 589 Center Networks, such kind of protection can be popular [RFC7431]. 591 5.2.1. Failure Detection 593 Egress RBridges (merge points) SHOULD realize the link failure as 594 early as possible so that failure affected egress RBridges may update 595 their RPF filters quickly to minimize the traffic disruption. Three 596 options are provided as follows. 598 1. If you had a very reliable and data stream, egress RBridges assume 599 a minimum known packet rate for that data stream [RFC7431]. A 600 failure detection timer (say Td) is set as the interval between 601 two continuous packets. Td is reinitialized each time a packet is 602 received. If Td expires and packets are arriving at the egress 603 RBridge on the backup DT (within the time frame Td), it updates 604 the RPF filters and starts to receive packets forwarded on the 605 backup DT. 607 2. With [RBmBFD], when a link failure happens, affected egress 608 RBridges can detect a lack of connectivity from the ingress. 609 Therefore these egress RBridges are able to update their RPF 610 filters promptly. 612 3. Egress RBridges can always rely on the IS-IS control plane to 613 learn the failure and determine whether their RPF filters should 614 be updated. 616 5.2.2. Traffic Forking and Merging 618 For the sake of protection, transit RBridges SHOULD activate both 619 primary and backup RPF filters, therefore both copies of the 620 multicast packets will pass through transit RBridges. 622 Multicast receivers (egress RBridges) MUST act as "merge points" to 623 egress only one copy of each multicast packet. This is achieved by 624 the activation of only a single RPF filter. In the normal case, 625 egress RBridges activate the primary RPF filter. When a link on the 626 pruned primary DT fails, the ingress RBridge cannot reach some of the 627 receivers. When these unreachable receivers realize the link failed, 628 they SHOULD update their RPF filters to receive packets sent on the 629 backup DT. 631 5.3. Local Protection 632 In the local protection, the Point of Local Repair (PLR) happens at 633 the upstream RBridge connecting the failed link. It is this RBridge 634 that makes the decision to replicate the multicast traffic to recover 635 this link failure. Local protection can further save the time spent 636 on failure notification through the flooding of LSPs across the TRILL 637 campus. In addition, the failure detection can be speeded up using 638 BFD [RFC7175], therefore local protection can minimize the service 639 disruption, typically reducing it to less than 50 milliseconds. 641 Since the ingress RBridge is not necessarily the root of the 642 distribution tree in TRILL, a multicast downstream point may not be 643 the descendants of the ingress point on the distribution tree. 645 5.3.1. Starting to Use the Backup Distribution Tree 647 The egress nickname TRILL header field of the replicated multicast 648 TRILL data packets specifies the tree on which they are being 649 distributed. This field will be rewritten to the backup DT's root 650 nickname by the PLR. But the ingress nickname filed of the multicast 651 TRILL Data packet MUST remain unchanged. The PLR forwards all 652 multicast traffic with the backup DT egress nickname along the backup 653 DT. This updates [RFC6325] which specifies that the egress nickname 654 in the TRILL header of a multi-destination TRILL data packet must not 655 be changed by transit RBridges. 657 In the above example, the PLR RB1 locally determines to send 658 replicated multicast packets according to the backup DT. It will send 659 it to the next hop RB2. 661 5.3.2. Duplication Suppression 663 When a PLR starts to send replicated multicast packets on the backup 664 DT, some multicast packets are still being sent along the primary DT. 665 Some egress RBridges might receive duplicated multicast packets. The 666 traffic forking and merging method in the global 1+1 protection can 667 be adopted to suppress the duplication. 669 5.3.3. An Example to Walk Through 671 The example used in the above local protection is put together to get 672 a whole "walk through" below. 674 In the normal case, multicast frames ingressed by RB7 in Figure 4.1 675 with pruned distribution on the primary DT rooted at RB1 are being 676 received by RB9 and RB10. When the link RB1-RB5 fails, the PLR RB1 677 begins to replicate and forward subsequent multicast packets using 678 the pruned backup DT rooted at RB2. When RB2 gets the multicast 679 packets from the link RB1-RB2, it accepts them since the RPF filter 680 {DT=RB2, ingress=RB7, receiving links=RB1-RB2, RB3-RB2, RB4-RB2, RB5- 681 RB2 and RB6-RB2} is installed on RB2. RB2 forwards the replicated 682 multicast packets to its neighbors except RB1. The multicast packets 683 reach RB6 where both RPF filters {DT=RB1, ingress=RB7, receiving 684 link=RB1-RB6} and {DT=RB2, ingress=RB7, receiving links=RB2-RB6 and 685 RB9-RB6} are active. RB6 will let both multicast streams through. 686 Multicast packets will finally reach RB9 where the RPF filter is 687 updated from {DT=RB1, ingress=RB7, receiving link=RB5-RB9} to 688 {DT=RB2, ingress=RB7, receiving link=RB6-RB9}. RB9 will egress the 689 multicast packets from the Backup Distribution Tree on to the local 690 link and drop those from the Primary Distribution Tree based on the 691 reverse path forwarding filter. 693 5.4. Updating the Primary and the Backup Trees 695 Assume an RBridge receives the LSP that indicates a link failure. 696 This RBridge starts to calculate the new primary DT based on the new 697 topology with the failed link excluded. Suppose the new primary DT is 698 installed at t1. 700 The propagation of LSPs around the campus will take some time. For 701 safety, we assume all RBridges in the campus will have converged to 702 the new primary DT at t1+Ts. By default, Ts (the "settling time") is 703 set to 30 seconds but it is configurable in seconds from 1 to 100. At 704 t1+Ts, the ingress RBridge switches the traffic from the backup DT 705 back to the new primary DT. 707 After another Ts (at t1+2*Ts), no multicast packets are being 708 forwarded along the old primary DT. The backup DT should be updated 709 (recalculated and reinstalled) after the new primary DT. The process 710 of this update under different protection types are discussed as 711 follows. 713 a) For the global 1:1 protection, the backup DT is simply updated at 714 t1+2*Ts. 716 b) For the global 1+1 protection, the ingress RBridge stops 717 replicating the multicast packets onto the old backup DT at t1+Ts. 718 The backup DT is updated at t1+2*Ts. The ingress RBridge MUST wait 719 for another Ts, during which time period all RBridges converge to 720 the new backup DT. At t1+3*Ts, it's safe for the ingress RBridge 721 to start to replicate multicast packets onto the new backup DT. 723 c) For the local protection, the PLR stops replicating and sending 724 packets on the old backup DT at t1+Ts. It is safe for RBridges to 725 start updating the backup DT at t1+2*Ts. 727 6. TRILL IS-IS Extensions 728 This section lists extensions to TRILL IS-IS to support resilient 729 trees. 731 6.1 Resilient Trees Extended Capability Bit 733 A bit tbd1 is provided in the Extended RBridge Capabilities APPsub- 734 TLV [RFC7782] to indicate that the advertising RBridge supports the 735 facilities specified in this document. 737 6.2 Backup Tree Root APPsub-TLV 739 The structure of the Backup Tree Root APPsub-TLV is shown below. 741 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 742 | Type = tbd2 | (2 bytes) 743 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 744 | Length | (2 bytes) 745 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 746 | Primary Tree Root Nickname | (2 bytes) 747 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 748 | Backup Tree Root Nickname | (2 bytes) 749 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 751 o Type = Backup Tree Root APPsubTLV type, set to tbd2 753 o Length = 4 755 o Primary Tree Root Nickname = the nickname of the root RBridge 756 of the primary tree for which a resilient backup tree is being 757 created 759 o Backup Tree Root Nickname = the nickname of the root RBridge of 760 the backup tree 762 If either nickname is not the nickname of a tree whose calculation is 763 being directed by the highest priority tree root RBridge, the APPsub- 764 TLV is ignored. 766 7. Security Considerations 768 This document raises no new security issues for TRILL. 770 For general TRILL Security Considerations, see [RFC6325]. 772 8. IANA Considerations 774 The Affinity Sub-TLV has already been defined in [RFC7176]. This 775 document does not change its definition. See below for IANA Actions. 777 8.1. Resilient Tree Extended Capability Bit 779 IANA will assign a bit (Section 6.1) in the Extended RBridge 780 Capabilities subregistry on the TRILL Parameters page adding the 781 following to the registry: 783 Bit Mnemonic Description Reference 784 ---- -------- ----------- --------- 785 tbd1 RT Resilient Tree Support [this document] 787 8.2. Backup Tree Root APPsub-TLV 789 IANA will assign and APPsub-TLV type under IS-IS TLV 251 Application 790 Identifier 1 on the TRILL Parameters page from the range below 255 791 for the Backup Tree Root APPsub-TLV (Section 6.2) as follows: 793 Type Name Reference 794 ---- ---------------- --------------- 795 tbd2 Backup Tree Root [this document] 797 Acknowledgements 799 The careful review from Gayle Noble is gracefully acknowledged. The 800 authors would like to thank the comments and suggestions from Donald 801 Eastlake, Erik Nordmark, Fangwei Hu, Gayle Noble, Hongjun Zhai and 802 Xudong Zhang. 804 9. References 806 9.1. Normative References 808 [RFC7176] Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, D., 809 and A. Banerjee, "Transparent Interconnection of Lots of 810 Links (TRILL) Use of IS-IS", RFC 7176, DOI 811 10.17487/RFC7176, May 2014, . 814 [RFC7783] Senevirathne, T., Pathangi, J., and J. Hudson, "Coordinated 815 Multicast Trees (CMT) for Transparent Interconnection of 816 Lots of Links (TRILL)", RFC 7783, DOI 10.17487/RFC7783, 817 February 2016, . 819 [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 820 Ghanwani, "Routing Bridges (RBridges): Base Protocol 821 Specification", RFC 6325, DOI 10.17487/RFC6325, July 2011, 822 . 824 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 825 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 826 Protocol Specification (Revised)", RFC 4601, DOI 827 10.17487/RFC4601, August 2006, . 830 [RFC6388] Wijnands, IJ., Ed., Minei, I., Ed., Kompella, K., and B. 831 Thomas, "Label Distribution Protocol Extensions for Point- 832 to-Multipoint and Multipoint-to-Multipoint Label Switched 833 Paths", RFC 6388, DOI 10.17487/RFC6388, November 2011, 834 . 836 [RBmBFD] M. Zhang, S. Pallagatti and V. Govindan, "TRILL Support of 837 Point to Multipoint BFD", draft-ietf-trill-p2mp-bfd, work 838 in progress. 840 [RFC7175] Manral, V., Eastlake 3rd, D., Ward, D., and A. Banerjee, 841 "Transparent Interconnection of Lots of Links (TRILL): 842 Bidirectional Forwarding Detection (BFD) Support", RFC 843 7175, DOI 10.17487/RFC7175, May 2014, . 846 [RFC7780] Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A., 847 Ghanwani, A., and S. Gupta, "Transparent Interconnection of 848 Lots of Links (TRILL): Clarifications, Corrections, and 849 Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016, 850 . 852 [RFC7782] Zhang, M., Perlman, R., Zhai, H., Durrani, M., and S. 853 Gupta, "Transparent Interconnection of Lots of Links 854 (TRILL) Active-Active Edge Using Multiple MAC Attachments", 855 RFC 7782, DOI 10.17487/RFC7782, February 2016, 856 . 858 9.2. Informative References 860 [mMRT] A. Atlas, R. Kebler, et al., "An Architecture for Multicast 861 Protection Using Maximally Redundant Trees", draft-atlas- 862 rtgwg-mrt-mc-arch, work in progress. 864 [MRT] A. Atlas, Ed., R. Kebler, et al., "An Architecture for 865 IP/LDP Fast-Reroute Using Maximally Redundant Trees", 866 draft-ietf-rtgwg-mrt-frr-architecture, work in progress. 868 [RFC7431] Karan, A., Filsfils, C., Wijnands, IJ., Ed., and B. 869 Decraene, "Multicast-Only Fast Reroute", RFC 7431, DOI 870 10.17487/RFC7431, August 2015, . 873 [mBFD] D. Katz, D. Ward, "BFD for Multipoint Networks", draft- 874 ietf-bfd-multipoint, work in progress. 876 [RFC7172] Eastlake 3rd, D., Zhang, M., Agarwal, P., Perlman, R., and 877 D. Dutt, "Transparent Interconnection of Lots of Links 878 (TRILL): Fine-Grained Labeling", RFC 7172, DOI 879 10.17487/RFC7172, May 2014, . 882 Author's Addresses 884 Mingui Zhang 885 Huawei Technologies Co.,Ltd 886 Huawei Building, No.156 Beiqing Rd. 887 Beijing 100095 P.R. China 889 Email: zhangmingui@huawei.com 891 Tissa Senevirathne 892 Consultant 894 Email: tsenevir@gmail.com 896 Janardhanan Pathangi 897 Dell/Force10 Networks 898 Olympia Technology Park, 899 Guindy Chennai 600 032 India 901 Phone: +91 44 4220 8400 902 Email: Pathangi_Janardhanan@Dell.com 904 Ayan Banerjee 905 Cisco 906 170 West Tasman Drive 907 San Jose, CA 95134 USA 909 Email: ayabaner@cisco.com 911 Anoop Ghanwani 912 Dell 913 350 Holger Way 914 San Jose, CA 95134 916 Phone: +1-408-571-3500 917 Email: Anoop@alumni.duke.edu