idnits 2.17.1 draft-ietf-trill-resilient-trees-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC6325, updated by this document, for RFC5378 checks: 2006-05-11) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 9, 2017) is 2512 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Mingui Zhang 3 Intended Status: Proposed Standard Huawei 4 Updates: 6325 Tissa Senevirathne 5 Consultant 6 Janardhanan Pathangi 7 Gigamon 8 Ayan Banerjee 9 Cisco 10 Anoop Ghanwani 11 DELL 12 Expires: December 11, 2017 June 9, 2017 14 TRILL: Resilient Distribution Trees 15 draft-ietf-trill-resilient-trees-08.txt 17 Abstract 19 The TRILL (Transparent Interconnection of Lots of Links) protocol 20 provides multicast data forwarding based on IS-IS link state routing. 21 Distribution trees are computed based on the link state information 22 through Shortest Path First calculation. When a link on the 23 distribution tree fails, a campus-wide re-convergence of this 24 distribution tree will take place, which can be time consuming and 25 may cause considerable disruption to the ongoing multicast service. 27 This document specifies how to build backup distribution trees to 28 protect links on the primary distribution tree. Since the backup 29 distribution tree is built up ahead of the link failure, when a link 30 on the primary distribution tree fails, the pre-installed backup 31 forwarding table will be utilized to deliver multicast packets 32 without waiting for the campus-wide re-convergence. This minimizes 33 the service disruption. This document updates RFC 6325. 35 Status of this Memo 37 This Internet-Draft is submitted to IETF in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF), its areas, and its working groups. Note that other 42 groups may also distribute working documents as Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/1id-abstracts.html 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html 55 Copyright and License Notice 57 Copyright (c) 2017 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 1.1. Conventions used in this document . . . . . . . . . . . . . 5 74 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 75 2. Usage of the Affinity Sub-TLV . . . . . . . . . . . . . . . . . 6 76 2.1. Allocating Affinity Links . . . . . . . . . . . . . . . . . 6 77 2.2. Distribution Tree Calculation with Affinity Links . . . . . 7 78 3. Resilient Distribution Trees Calculation . . . . . . . . . . . 8 79 3.1. Designating Roots for Backup Distribution Trees . . . . . . 8 80 3.2. Backup DT Calculation . . . . . . . . . . . . . . . . . . . 9 81 3.2.1. Backup DT Calculation with Affinity Links . . . . . . . 9 82 3.2.1.1. Algorithm for Choosing Affinity Links . . . . . . . 9 83 3.2.1.2. Affinity Links Advertisement . . . . . . . . . . . 10 84 4. Resilient Distribution Trees Installation . . . . . . . . . . . 10 85 4.1. Pruning the Backup Distribution Tree . . . . . . . . . . . 11 86 4.2. RPF Filters Preparation . . . . . . . . . . . . . . . . . . 11 87 5. Protection Mechanisms with Resilient Distribution Trees . . . . 12 88 5.1. Global 1:1 Protection . . . . . . . . . . . . . . . . . . . 13 89 5.2. Global 1+1 Protection . . . . . . . . . . . . . . . . . . . 13 90 5.2.1. Failure Detection . . . . . . . . . . . . . . . . . . . 13 91 5.2.2. Traffic Forking and Merging . . . . . . . . . . . . . . 14 92 5.3. Local Protection . . . . . . . . . . . . . . . . . . . . . 15 93 5.3.1. Starting to Use the Backup Distribution Tree . . . . . 15 94 5.3.2. Duplication Suppression . . . . . . . . . . . . . . . . 15 95 5.3.3. An Example to Walk Through . . . . . . . . . . . . . . 15 97 5.4. Updating the Primary and the Backup Trees . . . . . . . . . 16 98 6. TRILL IS-IS Extensions . . . . . . . . . . . . . . . . . . . . 17 99 6.1. Resilient Trees Extended Capability Bit . . . . . . . . . . 17 100 6.2 Backup Tree Root APPsub-TLV . . . . . . . . . . . . . . . . 17 101 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 18 102 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 18 103 8.1. Resilient Tree Extended Capability Bit . . . . . . . . . . 18 104 8.2. Backup Tree Root APPsub-TLV . . . . . . . . . . . . . . . . 18 105 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . 18 106 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 107 9.1. Normative References . . . . . . . . . . . . . . . . . . . 18 108 9.2. Informative References . . . . . . . . . . . . . . . . . . 20 109 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 111 1. Introduction 113 Lots of multicast traffic is generated by interrupt latency sensitive 114 applications, e.g., video distribution including IPTV, video 115 conference and so on. Normally, a network fault will be recovered 116 through a network wide re-convergence of the forwarding states, but 117 this process is too slow to meet tight Service Level Agreement (SLA) 118 requirements on the duration of service disruption. 120 Protection mechanisms are commonly used to reduce the service 121 disruption caused by network faults. With backup forwarding states 122 installed in advance, a protection mechanism can restore an 123 interrupted multicast stream in a much shorter time than the normal 124 network wide re-convergence, which can meet stringent SLAs on service 125 disruption. A protection mechanism for multicast traffic has been 126 developed for IP/MPLS networks [RFC7431]. However, the way that TRILL 127 constructs distribution trees (DT) is different from the way that 128 multicast trees are computed under IP/MPLS, therefore a multicast 129 protection mechanism suitable for TRILL is developed in this 130 document. 132 This document specifies "Resilient Distribution Trees" in which 133 backup trees are installed in advance for the purpose of fast failure 134 repair. Three types of protection mechanisms are proposed. 136 o Global 1:1 protection is used to refer to the mechanism where the 137 multicast source RBridge normally injects one multicast stream 138 onto the primary DT. When an interruption of this stream is 139 detected, the source RBridge switches to the backup DT to inject 140 subsequent multicast streams until the primary DT is recovered. 142 o Global 1+1 protection is used to refer to the mechanism where the 143 multicast source RBridge always injects two copies of multicast 144 streams, one onto the primary DT and one onto the backup DT 145 respectively. In the normal case, multicast receivers pick the 146 stream sent along the primary DT and egress it to its local link. 147 When a link failure interrupts the primary stream, the backup 148 stream will be picked until the primary DT is recovered. 150 o Local protection refers to the mechanism where the RBridge 151 attached to the failed link locally repairs the failure. 153 Resilient Distribution Trees can greatly reduce the service 154 disruption caused by link failures. In the global 1:1 protection, the 155 time cost by DT recalculation and installation can be saved. The 156 global 1+1 protection and local protection further saves the time 157 spent on the propagation of failure indication. Routing can be 158 repaired for a failed link in tens of milliseconds. Protection 159 mechanisms to handle node failures are out the scope of this 160 document. Although it's possible to use Resilient Distribution Trees 161 to achieve load balancing of multicast traffic, this document leaves 162 that for future study. 164 [RFC7176] specifies the Affinity Sub-TLV. An "Affinity Link" can be 165 explicitly assigned to a distribution tree or trees as discussed in 166 Section 2.1. This offers a way to manipulate the calculation of 167 distribution trees. With intentional assignment of Affinity Links, a 168 backup distribution tree can be set up to protect links on a primary 169 distribution tree. 171 This document updates [RFC6325] as specified in Section 5.3.1. 173 1.1. Conventions used in this document 175 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 176 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 177 document are to be interpreted as described in RFC 2119 [RFC2119]. 179 1.2. Terminology 181 BFD: Bidirectional Forwarding Detection [RFC7175] [RBmBFD] 183 CMT: Coordinated Multicast Trees [RFC7783] 185 Child: A directly connected node further from the Root. 187 DT: Distribution Tree [RFC6325] 189 IS-IS: Intermediate System to Intermediate System [RFC7176] 191 LSP: IS-IS Link State PDU 193 mLDP: Multipoint Label Distribution Protocol [RFC6388] 195 MPLS: Multi-Protocol Label Switching 197 Parent: A directly connected node closer to the Root. 199 PDU: Protocol Data Unit 201 Root: The top node in a tree. 203 PIM: Protocol Independent Multicast [RFC7761] 205 PLR: Point of Local Repair. In this document, PLR is the multicast 206 upstream RBridge connecting to the failed link. It's valid only for 207 local protection (Section 5.3). 209 RBridge: A device implementing the TRILL protocol [RFC6325] [RFC7780] 211 RPF: Reverse Path Forwarding 213 SLA: Service Level Agreement 215 Td: failure detection timer 217 TRILL: TRansparent Interconnection of Lots of Links or Tunneled 218 Routing in the Link Layer [RFC6325] [RFC7780] 220 2. Usage of the Affinity Sub-TLV 222 This document uses the Affinity Sub-TLV [RFC7176] to assign a parent 223 to an RBridge in a tree as discussed below. Support of the Affinity 224 Sub-TLV by an RBridge is indicated by a capability bit in the TRILL- 225 VER Sub-TLV [RFC7783]. 227 2.1. Allocating Affinity Links 229 The Affinity Sub-TLV explicitly assigns parents for RBridges on 230 distribution trees. It is distributed in an LSP and can be recognized 231 by each RBridge in the campus. The originating RBridge becomes the 232 parent and the nickname contained in the Affinity Record identifies 233 the child. This explicitly provides an "Affinity Link" on a 234 distribution tree or trees. The "Tree-num of roots" in the Affinity 235 Record(s) in the Affinity Sub-TLV identify the distribution trees 236 that adopt this Affinity Link [RFC7176]. 238 Suppose the link between RBridge RB2 and RBridge RB3 is chosen as an 239 Affinity Link on the distribution tree rooted at RB1. RB2 should send 240 out the Affinity Sub-TLV with an Affinity Record that says 241 {Nickname=RB3, Num of Trees=1, Tree-num of roots=RB1}. Different from 242 [RFC7783], RB3 does not have to be a leaf node on a distribution 243 tree, therefore an Affinity Link can be used to identify any link on 244 a distribution tree. This kind of assignment offers a flexibility of 245 control to RBridges in distribution tree calculation: they are 246 allowed to choose a child for which they are not on the shortest 247 paths from the root. This flexibility is used to increase the 248 reliability of distribution trees in this document. Affinity Links 249 may be configured or automatically determined according to an 250 algorithm as described in this document. 252 Note that Affinity Link SHOULD NOT be misused to declare connection 253 of two RBridges that are not adjacent. If it is, the Affinity Link is 254 ignored and has no effect on tree building. 256 2.2. Distribution Tree Calculation with Affinity Links 258 Root Root 259 +---+ -> +---+ -> +---+ +---+ -> +---+ -> +---+ 260 |RB1| |RB2| |RB3| |RB1| |RB2| |RB3| 261 +---+ <- +---+ <- +---+ +---+ <- +---+ <- +---+ 262 ^ | ^ | ^ | ^ | ^ ^ | 263 | v | v | v | v | | v 264 +---+ -> +---+ -> +---+ +---+ -> +---+ -> +---+ 265 |RB4| |RB5| |RB6| |RB4| |RB5| |RB6| 266 +---+ <- +---+ <- +---+ +---+ <- +---+ +---+ 268 Full Graph Sub Graph 270 Root 1 Root 1 271 / \ / \ 272 / \ / \ 273 4 2 4 2 274 / \ | | 275 / \ | | 276 5 3 5 3 277 | | 278 | | 279 6 6 281 Shortest Path Tree of Full Graph Shortest Path Tree of Sub Graph 283 Figure 2.1: DT Calculation with the Affinity Link RB4-RB5 285 When RBridges receive an Affinity Sub-TLV declaring an Affinity Link 286 that is an incoming link of an RBridge (i.e., this RBridge is the 287 child on this Affinity Link), this RBridge's incoming 288 links/adjacencies other than the Affinity Link are removed from the 289 full graph of the campus to get a sub graph. RBridges perform the 290 Shortest Path First calculation to compute the distribution tree 291 based on the resulting sub graph. In this way, it is made sure that 292 the Affinity Link appears on the distribution tree. 294 Take Figure 2.1 as an example. Suppose RB1 is the root and link RB4- 295 RB5 is the Affinity Link. RB5's other incoming links RB2-RB5 and RB6- 296 RB5 are removed from the Full Graph to get the Sub Graph. Since RB4- 297 RB5 is the unique link to reach RB5, the Shortest Path Tree 298 inevitably contains this link. 300 Note that outgoing links/adjacencies are not affected by the Affinity 301 Link. When two RBridges, say RB4 and RB5, are adjacent, the 302 adjacency/link from RB4 to RB5 and the adjacency/link from RB5 to RB4 303 are separate and, for example, might have different costs. 305 3. Resilient Distribution Trees Calculation 307 RBridges use IS-IS to advertise network faults. A node or link 308 failure will trigger a campus-wide re-convergence of distribution 309 trees. The re-convergence generally includes the following sequence 310 of procedures: 312 1. Failure (loss of adjacency) detected through IS-IS control 313 messages (HELLO) not getting through or some other link test such 314 as BFD [RFC7175] [RBmBFD]; 316 2. IS-IS state flooding so each RBridge learns about the failure; 318 3. Each RBridge recalculates affected distribution trees 319 independently; 321 4. RPF filters are updated according to the new distribution trees. 322 The recomputed distribution trees are pruned and installed into 323 the multicast forwarding tables. 325 The re-convergence time disrupts ongoing multicast traffic. In 326 protection mechanisms, alternative paths prepared ahead of potential 327 node or link failures are used to detour around the failures upon the 328 failure detection; thus service disruption can be minimized. 330 This document focuses only on link failure protection. The 331 construction of backup DTs for the purpose of node protection is out 332 the scope of this document. (The usual way to protect from a node 333 failure on the primary tree, is to have a backup tree setup without 334 this node. When this node fails, the backup tree can be safely used 335 to forward multicast traffic to make a detour. However, TRILL 336 distribution trees are shared among all VLANs and Fine Grained Labels 337 [RFC7172] and they have to cover all RBridge nodes in the campus 338 [RFC6325]. A DT that does not span all RBridges in the campus may not 339 cover all receivers of many multicast groups. (This is different from 340 the multicast trees construction signaled by PIM [RFC7761] or mLDP 341 [RFC6388].)) 343 3.1. Designating Roots for Backup Distribution Trees 345 RBridge RB1 having the highest root priority nickname might 346 explicitly advertise a list of nicknames to identify the roots of 347 primary and backup DTs using the Backup Tree APPsub-TLV as specified 348 in Section 6.2 (See also Section 4.5 of [RFC6325]). It's possible 349 that the backup DT and the primary DT have the common root RBridge. 350 In that case, to distinguish the primary DT and the backup DT for 351 this case, the root RBridge MUST own at least two nicknames so a 352 different nickname can be used to name each tree. 354 3.2. Backup DT Calculation 356 3.2.1. Backup DT Calculation with Affinity Links 358 2 1 359 / \ 360 Root 1___ ___2 Root 361 /|\ \ / /|\ 362 / | \ \ / / | \ 363 3 4 5 6 3 4 5 6 364 | | | | \/ \/ 365 | | | | /\ /\ 366 7 8 9 10 7 8 9 10 368 Primary DT Backup DT 370 Figure 3.1: An Example of a Primary DT and its Backup DT 372 TRILL supports the computation of multiple distribution trees by 373 RBridges. With the intentional assignment of Affinity Links in DT 374 calculation, this document specifies a method to construct Resilient 375 Distribution Trees. For example, in Figure 3.1, the backup DT is set 376 up to be maximally disjoint to the primary DT. (The full topology is 377 a combination of these two DTs, which is not shown in the figure.) 378 Except for the link between RB1 and RB2, all other links on the 379 primary DT do not overlap with links on the backup DT. It means that 380 every link on the primary DT, except link RB1-RB2, can be protected 381 by the backup DT. 383 3.2.1.1. Algorithm for Choosing Affinity Links 385 Operators MAY configure Affinity Links to intentionally protect a 386 specific link, such as the link connected to a gateway. But it is 387 desirable that every RBridge independently computes Affinity Links 388 for a backup DT across the whole campus. This enables a distributed 389 deployment and also minimizes configuration. 391 The algorithms for Maximally Redundant Trees in [RFC7811] may be used 392 to figure out Affinity Links on a backup DT which is maximally 393 disjointed to the primary DT but those algorithms only provides a 394 subset of all possible solutions. In TRILL, Resilient Distribution 395 Tree does not restrict the root of the backup DT to be the same as 396 that of the primary DT. Two disjoint (or maximally disjoint) trees 397 may have different root nodes, which significantly augments the 398 solution space. 400 This document RECOMMENDS achieving the independent method through a 401 slight change to the conventional DT calculation process of TRILL. 402 Basically, after the primary DT is calculated, the RBridge will be 403 aware of which links are used in that primary tree. When the backup 404 DT is calculated, each RBridge increases the metric of these links by 405 a proper value (for safety, it's recommended to use the summation of 406 all original link metrics in the campus but not more than 2**23), 407 which gives these links a lower priority of being chosen for the 408 backup DT by the Shortest Path First calculation. All links on this 409 backup DT can be assigned as Affinity Links but this is unnecessary. 410 In order to reduce the amount of Affinity Sub-TLVs flooded across the 411 campus, only those NOT picked by the conventional DT calculation 412 process SHOULD be announced as Affinity Links. 414 3.2.1.2. Affinity Links Advertisement 416 Similar to [RFC7783], every parent RBridge of an Affinity Link takes 417 charge of announcing this link in an Affinity Sub-TLV. When this 418 RBridge plays the role of parent RBridge for several Affinity Links, 419 it is natural to have them advertised together in the same Affinity 420 Sub-TLV, and each Affinity Link is structured as one Affinity Record 421 [RFC7176]. 423 Affinity Links are announced in the Affinity Sub-TLV that is 424 recognized by every RBridge. Since each RBridge computes distribution 425 trees as the Affinity Sub-TLV requires, the backup DT will be built 426 up consistently. 428 4. Resilient Distribution Trees Installation 430 As specified in Section 4.5.2 of [RFC6325], an ingress RBridge MUST 431 announce the distribution trees it may choose to ingress multicast 432 frames. Thus other RBridges in the campus can limit the amount of 433 states which are necessary for RPF check. Also, [RFC6325] recommends 434 that an ingress RBridge by default chooses the DT or DTs whose root 435 or roots are least cost from the ingress RBridge. To sum up, RBridges 436 do pre-compute all the trees that might be used so they can properly 437 forward multi-destination packets, but only install RPF state for 438 some combinations of ingress and tree. 440 This document specifies that the backup DT MUST be contained in an 441 ingress RBridge's DT announcement list and included in this ingress 442 RBridge's LSP. In order to reduce the service disruption time, 443 RBridges SHOULD install backup DTs in advance, which also includes 444 the RPF filters that need to be set up for RPF Check. 446 Since the backup DT is intentionally built maximally disjoint to the 447 primary DT, when a link fails and interrupts the ongoing multicast 448 traffic sent along the primary DT, it is probable that the backup DT 449 is not affected. Therefore, the backup DT installed in advance can be 450 used to deliver multicast packets immediately. 452 4.1. Pruning the Backup Distribution Tree 454 The way that a backup DT is pruned is different from the way that the 455 primary DT is pruned. To enable protection it is possible that a 456 branch should not be pruned, even though it does not have any 457 downstream receivers. The rule for backup DT pruning is that the 458 backup DT should be pruned, eliminating branches that have no 459 potential downstream RBridges which appear on the pruned primary DT. 461 Even though the primary DT may not be optimally pruned in practice, 462 the backup DT SHOULD always be pruned as if the primary DT is 463 optimally pruned. Those redundant links that ought to be pruned on 464 the primary DT will not be protected. 466 1 467 \ 468 Root 1___ ___2 Root 469 / \ \ / /|\ 470 / \ \ / / | \ 471 3 5 6 3 4 5 6 472 | | | / \/ 473 | | | / /\ 474 7 9 10 7 9 10 475 Pruned Primary DT Pruned Backup DT 477 Figure 4.1: The Backup DT is Pruned Based on the Pruned Primary DT. 479 Suppose RB7, RB9 and RB10 constitute a multicast group MGx. The 480 pruned primary DT and backup DT are shown in Figure 4.1. Referring 481 back to Figure 3.1, branches RB2-RB1 and RB4-RB1 on the primary DT 482 are pruned for the distribution of MGx traffic since there are no 483 potential receivers on these two branches. Although branches RB1-RB2 484 and RB3-RB2 on the backup DT have no potential multicast receivers, 485 they appear on the pruned primary DT and may be used to repair link 486 failures of the primary DT. Therefore they are not pruned from the 487 backup DT. Branch RB8-RB3 can be safely pruned because it does not 488 appear on the pruned primary DT. 490 4.2. RPF Filters Preparation 492 RB2 includes in its LSP the information to indicate which trees RB2 493 might choose when RB2 ingresses a multicast packet [RFC6325]. When 494 RB2 specifies such trees, it SHOULD include the backup DT. Other 495 RBridges will prepare the RPF check states for both the primary DT 496 and backup DT. When a multicast packet is sent along either the 497 primary DT or the backup DT, it will be subject to the RPF Check. 498 This works when global 1:1 protection is used. However, when global 499 1+1 protection or local protection is applied, traffic duplication 500 will happen if multicast receivers accept both copies of the 501 multicast packets from two RPF filters. In order to avoid such 502 duplication, egress RBridge multicast receivers MUST act as merge 503 points to activate a single RPF filter and discard the duplicated 504 packets from the other RPF filter. In the normal case, the RPF state 505 is set up according to the primary DT. When a link failure on the 506 primary DT is detected, the egress node RPF filter based on the 507 backup DT should be activated. 509 5. Protection Mechanisms with Resilient Distribution Trees 511 Protection mechanisms can be developed to make use of the backup DT 512 installed in advance. Protection mechanisms developed using PIM or 513 mLDP for multicast in IP/MPLS networks are not applicable to TRILL 514 due to the following fundamental differences in their distribution 515 tree calculation. 517 o The link on a TRILL distribution tree is always bidirectional 518 while the link on a distribution tree in IP/MPLS networks may be 519 unidirectional. 521 o In TRILL, a multicast source node does not have to be the root of 522 the distribution tree. It is just the opposite in IP/MPLS 523 networks. 525 o In IP/MPLS networks, distribution trees are constructed for each 526 multicast source node as well as their backup distribution trees. 527 In TRILL, a small number of core distribution trees are shared 528 among multicast groups. A backup DT does not have to share the 529 same root as the primary DT. 531 Therefore a TRILL specific multicast protection mechanism is needed. 533 Global 1:1 protection, global 1+1 protection and local protection are 534 described in this section. In Figure 4.1, assume RB7 is the ingress 535 RBridge of the multicast stream while RB9 and RB10 are the multicast 536 receivers. Suppose link RB1-RB5 fails during the multicast 537 forwarding. The backup DT rooted at RB2 does not include link RB1- 538 RB5, therefore it can be used to protect this link. In global 1:1 539 protection, RB7 will switch the subsequent multicast traffic to this 540 backup DT when it's notified of the link failure. In the global 1+1 541 protection, RB7 will inject two copies of the multicast stream and 542 let multicast receivers RB9 and RB10 choose which copy would be 543 delivered. In the local protection, when link RB1-RB5 fails, RB1 will 544 locally replicate the multicast traffic and send it on the backup DT. 546 5.1. Global 1:1 Protection 548 In the global 1:1 protection, the ingress RBridge of the multicast 549 traffic is responsible for switching the failure affected traffic 550 from the primary DT over to the backup DT. Since the backup DT has 551 been installed in advance, the global protection need not wait for 552 the DT recalculation and installation. When the ingress RBridge is 553 notified about the failure, it immediately makes this switch over. 555 This type of protection is simple and duplication safe. However, 556 depending on the topology of the RBridge campus, the time spent on 557 the failure detection and propagation through the IS-IS control plane 558 may still cause a considerable service disruption. 560 BFD (Bidirectional Forwarding Detection) protocol can be used to 561 reduce the failure detection time. Link failures can be rapidly 562 detected with one-hop BFD [RFC7175]. [RBmBFD] introduces the fast 563 failure detection of multicast paths. It can be used to reduce both 564 the failure detection and propagation time in the global protection. 565 In [RBmBFD], ingress RBridge needs to send BFD control packets to 566 poll each receiver, and receivers return BFD control packets to the 567 ingress as the response. If no response is received from a specific 568 receiver for a detection time, the ingress can judge that the 569 connectivity to this receiver is broken. Therefore, [RBmBFD] is used 570 to detect the connectivity of a path rather than a link. The ingress 571 RBridge will determine a minimum failed branch that contains this 572 receiver. The ingress RBridge will switch ongoing multicast traffic 573 based on this judgment. For example, in Figure 4.1, if RB9 does not 574 respond while RB10 still responds, RB7 will presume that link RB1-RB5 575 and RB5-RB9 are failed. Multicast traffic will be switched to a 576 backup DT that can protect these two links. More accurate link 577 failure detection might help ingress RBridges make smarter decision 578 but it's out of the scope of this document. 580 5.2. Global 1+1 Protection 582 In the global 1+1 protection, the multicast source RBridge always 583 replicates the multicast packets and sends them onto both the primary 584 and backup DT. This may sacrifice the capacity efficiency but given 585 there is much connection redundancy and inexpensive bandwidth in Data 586 Center Networks, such kind of protection can be popular [RFC7431]. 588 5.2.1. Failure Detection 590 Egress RBridges (merge points) SHOULD realize the link failure as 591 early as possible so that failure affected egress RBridges may update 592 their RPF filters quickly to minimize the traffic disruption. Three 593 options are provided as follows. 595 1. If you had a very reliable and steady data stream, egress RBridges 596 assume a minimum known packet rate for that data stream [RFC7431]. 597 A failure detection timer (say Td) is set as the interval between 598 two continuous packets. Td is reinitialized each time a packet is 599 received. If Td expires and packets are arriving at the egress 600 RBridge on the backup DT (within the time frame Td), it updates 601 the RPF filters and starts to receive packets forwarded on the 602 backup DT. This method requires configuration at the egress 603 RBridge of Td and of some method (filter) to determine if a packet 604 is part of the reliable data stream. Since the filtering 605 capabilities of various fast path logic differs greatly, specifics 606 of such configuration are outside the scope of this document. 608 2. With multi-point BFD [RBmBFD], when a link failure happens, 609 affected egress RBridges can detect a lack of connectivity from 610 the ingress. Therefore these egress RBridges are able to update 611 their RPF filters promptly. 613 3. Egress RBridges can always rely on the IS-IS control plane to 614 learn the failure and determine whether their RPF filters should 615 be updated. 617 5.2.2. Traffic Forking and Merging 619 For the sake of protection, transit RBridges SHOULD activate both 620 primary and backup RPF filters, therefore both copies of the 621 multicast packets will pass through transit RBridges. 623 Multicast receivers (egress RBridges) MUST act as "merge points" to 624 egress only one copy of each multicast packet. This is achieved by 625 the activation of only a single RPF filter. In the normal case, 626 egress RBridges activate the primary RPF filter. When a link on the 627 pruned primary DT fails, the ingress RBridge cannot reach some of the 628 receivers. When these unreachable receivers realize the link failed, 629 they SHOULD update their RPF filters to receive packets sent on the 630 backup DT. 632 Note that the egress RBridge need not be a literal merge point, that 633 is receiving the primary and backup DT versions over different links. 634 Even if the egress RBridge receives both copies over the same link, 635 because disjoint links are not available, it can still filter out one 636 copy because the RFP filtering logic is designed to test which tree 637 the packet is on as indicated by a field in the TRILL Header 638 [RFC6325]. 640 5.3. Local Protection 642 In the local protection, the Point of Local Repair (PLR) happens at 643 the upstream RBridge connected to the failed link. It is this RBridge 644 that makes the decision to replicate the multicast traffic to recover 645 from this link failure. Local protection can further save the time 646 spent on failure notification through the flooding of LSPs across the 647 TRILL campus. In addition, the failure detection can be speeded up 648 using BFD [RFC7175], therefore local protection can minimize the 649 service disruption, typically reducing it to less than 50 650 milliseconds. 652 Since the ingress RBridge is not necessarily the root of the 653 distribution tree in TRILL, a multicast downstream point may not be 654 the descendants of the ingress point on the distribution tree. 656 5.3.1. Starting to Use the Backup Distribution Tree 658 The egress nickname TRILL Header field of the replicated multicast 659 TRILL data packets specifies the tree on which they are being 660 distributed. This field will be rewritten to the backup DT's root 661 nickname by the PLR. But the ingress nickname field of the multicast 662 TRILL Data packet MUST remain unchanged. The PLR forwards all 663 multicast traffic with the backup DT egress nickname along the backup 664 DT. This updates [RFC6325] which specifies that the egress nickname 665 in the TRILL header of a multi-destination TRILL data packet must not 666 be changed by transit RBridges. 668 In the above example, the PLR RB1 locally determines to send 669 replicated multicast packets according to the backup DT. It will send 670 them to the next hop RB2. 672 5.3.2. Duplication Suppression 674 When a PLR starts to send replicated multicast packets on the backup 675 DT, some multicast packets are still being sent along the primary DT. 676 Some egress RBridges might receive duplicated multicast packets. The 677 traffic forking and merging method in the global 1+1 protection can 678 be adopted to suppress the duplication. 680 5.3.3. An Example to Walk Through 682 The example used to illustrate the above local protection is put 683 together to get a whole "walk through" below. 685 In the normal case, multicast frames ingressed by RB7 in Figure 4.1 686 with pruned distribution on the primary DT rooted at RB1 are being 687 received by RB9 and RB10. When the link RB1-RB5 fails, the PLR RB1 688 begins to replicate and forward subsequent multicast packets using 689 the pruned backup DT rooted at RB2. When RB2 gets the multicast 690 packets from the link RB1-RB2, it accepts them since the RPF filter 691 {DT=RB2, ingress=RB7, receiving links=RB1-RB2, RB3-RB2, RB4-RB2, RB5- 692 RB2 and RB6-RB2} is installed on RB2. RB2 forwards the replicated 693 multicast packets to its neighbors except RB1. The multicast packets 694 reach RB6 where both RPF filters {DT=RB1, ingress=RB7, receiving 695 link=RB1-RB6} and {DT=RB2, ingress=RB7, receiving links=RB2-RB6 and 696 RB9-RB6} are active. RB6 will let both multicast streams through. 697 Multicast packets will finally reach RB9 where the RPF filter is 698 updated from {DT=RB1, ingress=RB7, receiving link=RB5-RB9} to 699 {DT=RB2, ingress=RB7, receiving link=RB6-RB9}. RB9 will egress the 700 multicast packets from the Backup Distribution Tree on to the local 701 link and drop those from the Primary Distribution Tree based on the 702 reverse path forwarding filter. 704 5.4. Updating the Primary and the Backup Trees 706 Assume an RBridge receives the LSP that indicates a link failure. 707 This RBridge starts to calculate the new primary DT based on the new 708 topology with the failed link excluded. Suppose the new primary DT is 709 installed at t1. 711 The propagation of LSPs around the campus will take some time. For 712 safety, we assume all RBridges in the campus will have converged to 713 the new primary DT at t1+Ts. By default, Ts (the "settling time") is 714 set to 30 seconds but it is configurable in seconds from 1 to 100. At 715 t1+Ts, the ingress RBridge switches the traffic from the backup DT 716 back to the new primary DT. 718 After another Ts (at t1+2*Ts), no multicast packets are being 719 forwarded along the old primary DT. The backup DT should be updated 720 (recalculated and reinstalled) after the new primary DT. The process 721 of this update under different protection types are discussed as 722 follows. 724 a) For the global 1:1 protection, the backup DT is simply updated at 725 t1+2*Ts. 727 b) For the global 1+1 protection, the ingress RBridge stops 728 replicating the multicast packets onto the old backup DT at t1+Ts. 729 The backup DT is updated at t1+2*Ts. The ingress RBridge MUST wait 730 for another Ts, during which time period all RBridges converge to 731 the new backup DT. At t1+3*Ts, it's safe for the ingress RBridge 732 to start to replicate multicast packets onto the new backup DT. 734 c) For the local protection, the PLR stops replicating and sending 735 packets on the old backup DT at t1+Ts. It is safe for RBridges to 736 start updating the backup DT at t1+2*Ts. 738 6. TRILL IS-IS Extensions 740 This section lists extensions to TRILL IS-IS to support resilient 741 trees. 743 6.1. Resilient Trees Extended Capability Bit 745 An RBridge that supports the facilities specified in this document 746 MUST announce the Extended RBridge Capabilities APPsub-TLV [RFC7782] 747 with the bit tbd1 set to one. If there are RBridges that do not 748 announce the bit tbd1 set to one, all RBridges of the campus MUST 749 disable the Resilient Distribution Tree mechanism as defined in this 750 document and fall back to the distribution tree calculation algorithm 751 as specified in [RFC6325]. 753 6.2 Backup Tree Root APPsub-TLV 755 The structure of the Backup Tree Root APPsub-TLV is shown below. 757 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 758 | Type = tbd2 | (2 bytes) 759 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 760 | Length | (2 bytes) 761 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 762 | Primary Tree Root Nickname | (2 bytes) 763 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 764 | Backup Tree Root Nickname | (2 bytes) 765 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 767 o Type = Backup Tree Root APPsubTLV type, set to tbd2 769 o Length = 4, if the length is any other value, the APPsub-TLV is 770 corrupt and MUST be ignored. 772 o Primary Tree Root Nickname = the nickname of the root RBridge 773 of the primary tree for which a resilient backup tree is being 774 created 776 o Backup Tree Root Nickname = the nickname of the root RBridge of 777 the backup tree 779 If either nickname is not the nickname of a tree whose calculation is 780 being directed by the highest priority tree root RBridge, the APPsub- 781 TLV is ignored. This APPsub-TLV must be advertised by the highest 782 priority RBridge to be a tree root. Backup Tree Root APPsub-TLVs 783 advertised by other RBridges are ignored. If there are two or more 784 Backup Tree Root APPsub-TLVs for the same primary tree specifying 785 different backup trees, then the one specifying the lowest magnitude 786 backup tree root nickname is used, treating nicknames as unsigned 16- 787 bit quantities. 789 7. Security Considerations 791 This document raises no new security issues for TRILL. The IS-IS PDUs 792 used to transmit the information specified in Section 6 can be 793 secured with IS-IS security [RFC5310]. 795 For general TRILL Security Considerations, see [RFC6325]. 797 8. IANA Considerations 799 The Affinity Sub-TLV has already been defined in [RFC7176]. This 800 document does not change its definition. See below for IANA Actions. 802 8.1. Resilient Tree Extended Capability Bit 804 IANA will assign a bit (Section 6.1) in the Extended RBridge 805 Capabilities subregistry on the TRILL Parameters page adding the 806 following to the registry: 808 Bit Mnemonic Description Reference 809 ---- -------- ----------- --------- 810 tbd1 RT Resilient Tree Support [this document] 812 8.2. Backup Tree Root APPsub-TLV 814 IANA will assign and APPsub-TLV type under IS-IS TLV 251 Application 815 Identifier 1 on the TRILL Parameters page from the range below 255 816 for the Backup Tree Root APPsub-TLV (Section 6.2) as follows: 818 Type Name Reference 819 ---- ---------------- --------------- 820 tbd2 Backup Tree Root [this document] 822 Acknowledgements 824 The careful review from Gayle Noble is gracefully acknowledged. The 825 authors would like to thank the comments and suggestions from Donald 826 Eastlake, Erik Nordmark, Fangwei Hu, Gayle Noble, Hongjun Zhai and 827 Xudong Zhang. 829 9. References 831 9.1. Normative References 833 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 834 Requirement Levels", BCP 14, RFC 2119, DOI 835 10.17487/RFC2119, March 1997, . 838 [RFC7176] Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, D., 839 and A. Banerjee, "Transparent Interconnection of Lots of 840 Links (TRILL) Use of IS-IS", RFC 7176, DOI 841 10.17487/RFC7176, May 2014, . 844 [RFC7783] Senevirathne, T., Pathangi, J., and J. Hudson, "Coordinated 845 Multicast Trees (CMT) for Transparent Interconnection of 846 Lots of Links (TRILL)", RFC 7783, DOI 10.17487/RFC7783, 847 February 2016, . 849 [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 850 Ghanwani, "Routing Bridges (RBridges): Base Protocol 851 Specification", RFC 6325, DOI 10.17487/RFC6325, July 2011, 852 . 854 [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., 855 Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent 856 Multicast - Sparse Mode (PIM-SM): Protocol Specification 857 (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 858 2016, . 860 [RFC6388] Wijnands, IJ., Ed., Minei, I., Ed., Kompella, K., and B. 861 Thomas, "Label Distribution Protocol Extensions for Point- 862 to-Multipoint and Multipoint-to-Multipoint Label Switched 863 Paths", RFC 6388, DOI 10.17487/RFC6388, November 2011, 864 . 866 [RBmBFD] M. Zhang, S. Pallagatti and V. Govindan, "TRILL Support of 867 Point to Multipoint BFD", draft-ietf-trill-p2mp-bfd, work 868 in progress. 870 [RFC7175] Manral, V., Eastlake 3rd, D., Ward, D., and A. Banerjee, 871 "Transparent Interconnection of Lots of Links (TRILL): 872 Bidirectional Forwarding Detection (BFD) Support", RFC 873 7175, DOI 10.17487/RFC7175, May 2014, . 876 [RFC7780] Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A., 877 Ghanwani, A., and S. Gupta, "Transparent Interconnection of 878 Lots of Links (TRILL): Clarifications, Corrections, and 879 Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016, 880 . 882 [RFC7782] Zhang, M., Perlman, R., Zhai, H., Durrani, M., and S. 883 Gupta, "Transparent Interconnection of Lots of Links 884 (TRILL) Active-Active Edge Using Multiple MAC Attachments", 885 RFC 7782, DOI 10.17487/RFC7782, February 2016, 886 . 888 [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., 889 and M. Fanto, "IS-IS Generic Cryptographic Authentication", 890 RFC 5310, DOI 10.17487/RFC5310, February 2009, 891 . 893 9.2. Informative References 895 [RFC7811] Enyedi, G., Csaszar, A., Atlas, A., Bowers, C., and A. 896 Gopalan, "An Algorithm for Computing IP/LDP Fast Reroute 897 Using Maximally Redundant Trees (MRT-FRR)", RFC 7811, DOI 898 10.17487/RFC7811, June 2016, . 901 [RFC7431] Karan, A., Filsfils, C., Wijnands, IJ., Ed., and B. 902 Decraene, "Multicast-Only Fast Reroute", RFC 7431, DOI 903 10.17487/RFC7431, August 2015, . 906 [mBFD] D. Katz, D. Ward, "BFD for Multipoint Networks", draft- 907 ietf-bfd-multipoint, work in progress. 909 [RFC7172] Eastlake 3rd, D., Zhang, M., Agarwal, P., Perlman, R., and 910 D. Dutt, "Transparent Interconnection of Lots of Links 911 (TRILL): Fine-Grained Labeling", RFC 7172, DOI 912 10.17487/RFC7172, May 2014, . 915 Author's Addresses 917 Mingui Zhang 918 Huawei Technologies Co.,Ltd 919 Huawei Building, No.156 Beiqing Rd. 920 Beijing 100095 P.R. China 922 Email: zhangmingui@huawei.com 924 Tissa Senevirathne 925 Consultant 927 Email: tsenevir@gmail.com 929 Janardhanan Pathangi 930 Gigamon 932 Email: path.jana@gmail.com 934 Ayan Banerjee 935 Cisco 936 170 West Tasman Drive 937 San Jose, CA 95134 USA 939 Email: ayabaner@cisco.com 941 Anoop Ghanwani 942 Dell 943 350 Holger Way 944 San Jose, CA 95134 946 Phone: +1-408-571-3500 947 Email: Anoop@alumni.duke.edu