idnits 2.17.1 draft-ietf-trill-resilient-trees-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 308 has weird spacing: '...vertise adjac...' == Line 861 has weird spacing: '... page to fo...' (Using the creation date from RFC6325, updated by this document, for RFC5378 checks: 2006-05-11) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 19, 2018) is 2261 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Mingui Zhang 3 Intended Status: Proposed Standard Huawei 4 Updates: 6325 Tissa Senevirathne 5 Consultant 6 Janardhanan Pathangi 7 Gigamon 8 Ayan Banerjee 9 Cisco 10 Anoop Ghanwani 11 DELL 12 Expires: July 23, 2018 January 19, 2018 14 TRILL: Resilient Distribution Trees 15 draft-ietf-trill-resilient-trees-09.txt 17 Abstract 19 The TRILL (Transparent Interconnection of Lots of Links) protocol 20 provides multicast data forwarding based on IS-IS link state routing. 21 Distribution trees are computed based on the link state information 22 through Shortest Path First calculation. When a link on the 23 distribution tree fails, a campus-wide re-convergence of this 24 distribution tree will take place, which can be time consuming and 25 may cause considerable disruption to the ongoing multicast service. 27 This document specifies how to build backup distribution trees to 28 protect links on the primary distribution tree. Since the backup 29 distribution tree is built up ahead of the link failure, when a link 30 on the primary distribution tree fails, the pre-installed backup 31 forwarding table will be utilized to deliver multicast packets 32 without waiting for the campus-wide re-convergence. This minimizes 33 the service disruption. This document updates RFC 6325. 35 Status of this Memo 37 This Internet-Draft is submitted to IETF in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF), its areas, and its working groups. Note that other 42 groups may also distribute working documents as Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/1id-abstracts.html 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html 55 Copyright and License Notice 57 Copyright (c) 2018 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 1.1. Conventions used in this document . . . . . . . . . . . . . 5 74 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 75 2. Usage of the Affinity Sub-TLV . . . . . . . . . . . . . . . . . 6 76 2.1. Indicating Affinity Links . . . . . . . . . . . . . . . . . 6 77 2.2. Distribution Tree Calculation with Affinity Links . . . . . 7 78 3. Distribution Tree Calculation . . . . . . . . . . . . . . . . . 8 79 3.1. Designating Roots for Backup Distribution Trees . . . . . . 8 80 3.2. Backup DT Calculation with Affinity Links . . . . . . . . . 9 81 3.2.1. The Algorithm for Choosing Affinity Links . . . . . . . 9 82 3.2.2. Affinity Links Advertisement . . . . . . . . . . . . . 10 83 4. Resilient Distribution Trees Installation . . . . . . . . . . . 10 84 4.1. Pruning the Backup Distribution Tree . . . . . . . . . . . 11 85 4.2. RPF Filters Preparation . . . . . . . . . . . . . . . . . . 12 86 5. Protection Mechanisms with Resilient Distribution Trees . . . . 12 87 5.1. Global 1:1 Protection . . . . . . . . . . . . . . . . . . . 13 88 5.2. Global 1+1 Protection . . . . . . . . . . . . . . . . . . . 14 89 5.2.1. Failure Detection . . . . . . . . . . . . . . . . . . . 14 90 5.2.2. Traffic Forking and Merging . . . . . . . . . . . . . . 14 91 5.3. Local Protection . . . . . . . . . . . . . . . . . . . . . 15 92 5.3.1. Starting to Use the Backup Distribution Tree . . . . . 15 93 5.3.2. Duplication Suppression . . . . . . . . . . . . . . . . 16 94 5.3.3. An Example to Walk Through . . . . . . . . . . . . . . 16 95 5.4. Protection Mode Signaling . . . . . . . . . . . . . . . . . 16 96 5.5. Updating the Primary and the Backup Distribution Trees . . 17 97 6. TRILL IS-IS Extensions . . . . . . . . . . . . . . . . . . . . 18 98 6.1. Resilient Trees Extended Capability Field . . . . . . . . . 18 99 6.2 Backup Tree Root APPsub-TLV . . . . . . . . . . . . . . . . 18 100 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 19 101 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 19 102 8.1. Resilient Tree Extended Capability Field . . . . . . . . . 19 103 8.2. Backup Tree Root APPsub-TLV . . . . . . . . . . . . . . . . 19 104 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . 19 105 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 106 9.1. Normative References . . . . . . . . . . . . . . . . . . . 20 107 9.2. Informative References . . . . . . . . . . . . . . . . . . 21 108 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 110 1. Introduction 112 Lots of multicast traffic is generated by interrupt latency sensitive 113 applications, e.g., video distribution including IPTV, video 114 conference and so on. Normally, a network fault will be recovered 115 through a network wide re-convergence of the forwarding states, but 116 this process is too slow to meet tight Service Level Agreement (SLA) 117 requirements on the duration of service disruption. 119 Protection mechanisms are commonly used to reduce the service 120 disruption caused by network faults. With backup forwarding states 121 installed in advance, a protection mechanism can restore an 122 interrupted multicast stream in a much shorter time than the normal 123 network wide re-convergence, which can meet stringent SLAs on service 124 disruption. A protection mechanism for multicast traffic has been 125 developed for IP/MPLS networks [RFC7431]. However, TRILL constructs 126 distribution trees (DT) in a different way from IP/MPLS; therefore a 127 multicast protection mechanism suitable for TRILL is developed in 128 this document. 130 This document specifies "Resilient Distribution Trees" in which 131 backup trees are installed in advance for the purpose of fast failure 132 repair. Three types of protection mechanisms are specified. 134 o Global 1:1 protection refers to the mechanism where the multicast 135 source RBridge normally injects one multicast stream onto the 136 primary DT. When an interruption of this stream is detected, the 137 source RBridge switches to the backup DT to inject subsequent 138 multicast streams until the primary DT is recovered. 140 o Global 1+1 protection refers to the mechanism where the multicast 141 source RBridge always injects two copies of multicast streams, one 142 onto the primary DT and one onto the backup DT respectively. In 143 the normal case, multicast receivers pick the stream sent along 144 the primary DT and egress it to its local link. When a link 145 failure interrupts the primary stream, the backup stream will be 146 picked until the primary DT is recovered. 148 o Local protection refers to the mechanism where the RBridge 149 attached to the failed link locally repairs the failure. 151 Resilient Distribution Trees can greatly reduce the service 152 disruption caused by link failures. In the global 1:1 protection, the 153 time cost for DT recalculation and installation can be saved. The 154 global 1+1 protection and local protection further saves the time 155 spent on the propagation of failure indication. Routing can be 156 repaired for a failed link in tens of milliseconds. 158 Protection mechanisms to handle node failures are out the scope of 159 this document. Although it's possible to use Resilient Distribution 160 Trees to achieve load balancing of multicast traffic, this document 161 leaves that for future study. 163 [RFC7176] specifies the Affinity Sub-TLV. An "Affinity Link" can be 164 explicitly assigned to a distribution tree or trees as discussed in 165 Section 2.1. This offers a way to manipulate the calculation of 166 distribution trees. With intentional assignment of Affinity Links, a 167 backup distribution tree can be set up to protect links on a primary 168 distribution tree. 170 This document updates [RFC6325] as specified in Section 5.3.1. 172 1.1. Conventions used in this document 174 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 175 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 176 document are to be interpreted as described in RFC 2119 [RFC2119]. 178 1.2. Terminology 180 BFD: Bidirectional Forwarding Detection [RFC7175] [RBmBFD] 182 CMT: Coordinated Multicast Trees [RFC7783] 184 Child: A directly connected node further from the Root. 186 DT: Distribution Tree [RFC6325] 188 IS-IS: Intermediate System to Intermediate System [RFC7176] 190 LSP: IS-IS Link State PDU 192 mLDP: Multipoint Label Distribution Protocol [RFC6388] 194 MPLS: Multi-Protocol Label Switching 196 Parent: A directly connected node closer to the Root. 198 PDU: Protocol Data Unit 200 Root: The top node in a tree. 202 PIM: Protocol Independent Multicast [RFC7761] 204 PLR: Point of Local Repair. In this document, PLR is the multicast 205 upstream RBridge connecting to the failed link. It's valid only for 206 local protection (Section 5.3). 208 RBridge: A device implementing the TRILL protocol [RFC6325] [RFC7780] 210 RPF: Reverse Path Forwarding 212 SLA: Service Level Agreement 214 Td: failure detection timer 216 TRILL: TRansparent Interconnection of Lots of Links or Tunneled 217 Routing in the Link Layer [RFC6325] [RFC7780] 219 2. Usage of the Affinity Sub-TLV 221 This document uses the already existing Affinity Sub-TLV [RFC7176] to 222 assign a parent to an RBridge in a tree as discussed below. Support 223 of the Affinity Sub-TLV by an RBridge is indicated by a capability 224 bit in the TRILL-VER Sub-TLV [RFC7783]. 226 2.1. Indicating Affinity Links 228 The Affinity Sub-TLV explicitly assigns parents for RBridges on 229 distribution trees. It is distributed in an LSP and can be recognized 230 by each RBridge in the campus. The originating RBridge becomes the 231 parent and the nickname contained in the Affinity Record identifies 232 the child. This explicitly provides an "Affinity Link" on a 233 distribution tree or trees. The "Tree-num of roots" in the Affinity 234 Record(s) in the Affinity Sub-TLV identify the distribution trees 235 that adopt this Affinity Link [RFC7176]. 237 Suppose the link between RBridge RB2 and RBridge RB3 is chosen as an 238 Affinity Link on the distribution tree rooted at RB1 in Figure 2.1. 239 RB2 sends out the Affinity Sub-TLV with an Affinity Record that says 240 {Nickname=RB3, Num of Trees=1, Tree-num of roots=RB1}. Different from 241 the Affinity Link usage in [RFC7783], RB3 does not have to be a leaf 242 node on a distribution tree. Therefore an Affinity Link can be used 243 to identify any link on a distribution tree. This kind of assignment 244 offers a flexibility of control to RBridges in distribution tree 245 calculation: they can be directed to choose a child for which they 246 are not on the shortest paths from the root. This flexibility is used 247 to construct back-up trees that can be used to increase the 248 reliability of distribution trees. Affinity Links may be configured 249 or automatically determined according to an algorithm as described in 250 this document. 252 Affinity Link SHOULD NOT be misused to declare connection of two 253 RBridges that are not adjacent. If it is, the Affinity Link is 254 ignored and has no effect on tree building. 256 2.2. Distribution Tree Calculation with Affinity Links 258 Root Root 259 +---+ -> +---+ -> +---+ +---+ -> +---+ -> +---+ 260 |RB1| |RB2| |RB3| |RB1| |RB2| |RB3| 261 +---+ <- +---+ <- +---+ +---+ <- +---+ <- +---+ 262 ^ | ^ | ^ | ^ | ^ ^ | 263 | v | v | v | v | | v 264 +---+ -> +---+ -> +---+ +---+ -> +---+ -> +---+ 265 |RB4| |RB5| |RB6| |RB4| |RB5| |RB6| 266 +---+ <- +---+ <- +---+ +---+ <- +---+ +---+ 268 Full Graph Sub Graph 270 Root 1 Root 1 271 / \ / \ 272 / \ / \ 273 4 2 4 2 274 / \ | | 275 / \ | | 276 5 3 5 3 277 | | 278 | | 279 6 6 281 Shortest Path Tree of Full Graph Shortest Path Tree of Sub Graph 283 Figure 2.1: DT Calculation with the Affinity Link RB4-RB5 285 When RBridges receive an Affinity Sub-TLV declaring an Affinity Link 286 that is an incoming link of an RBridge (i.e., this RBridge is the 287 child on this Affinity Link) for a particular distribution tree, this 288 RBridge's incoming links/adjacencies other than the Affinity Link are 289 removed from the full graph of the campus to get a sub graph to 290 compute that tree. RBridges perform the Shortest Path First 291 calculation to compute the tree based on the resulting sub graph. 292 This assures that the Affinity Link appears in the distribution tree 293 being calculated. 295 Take Figure 2.1 as an example. Suppose RB1 is the root and link RB4- 296 RB5 is the Affinity Link. RB5's other incoming links RB2-RB5 and RB6- 297 RB5 are removed from the Full Graph to get the Sub Graph. Since RB4- 298 RB5 is the unique link to reach RB5, the Shortest Path Tree 299 inevitably contains this link. 301 Note that outgoing links/adjacencies are not affected by the Affinity 302 Link. When two RBridges, say RB4 and RB5, are adjacent, the 303 adjacency/link from RB4 to RB5 and the adjacency/link from RB5 to RB4 304 are separate and, for example, might have different costs. 306 3. Distribution Tree Calculation 308 RBridges use IS-IS to advertise adjacencies and thus advertise 309 network faults through the withdrawal of such adjacencies. A node or 310 link failure will trigger a campus-wide re-convergence of all TRILL 311 distribution trees. The re-convergence generally includes the 312 following sequence of procedures: 314 1. Failure (loss of adjacency) detected through IS-IS control 315 messages (HELLO) not getting through or some other link test such 316 as BFD [RFC7175] [RBmBFD]; 318 2. IS-IS state flooding so each RBridge learns about the failure; 320 3. Each RBridge recalculates affected distribution trees 321 independently; 323 4. RPF filters are updated according to the new distribution trees. 324 The recomputed distribution trees are pruned and installed into 325 the multicast forwarding tables. 327 The re-convergence time to go through these four steps disrupts 328 ongoing multicast traffic. In protection mechanisms, alternative 329 paths prepared ahead of potential node or link failures are available 330 to detour around the failures upon the failure detection; thus 331 service disruption can be minimized. 333 This document focuses only on link failure protection. The 334 construction of backup DTs (distribution trees) for the purpose of 335 node protection is out of scope. (The usual way to protect from a 336 node failure on the primary tree, is to have a backup tree setup 337 without this node. When this node fails, the backup tree can be 338 safely used to forward multicast traffic to make a detour. However, 339 TRILL distribution trees are shared among all VLANs and Fine Grained 340 Labels [RFC7172] and they have to cover all RBridge nodes in the 341 campus [RFC6325]. A DT that does not span all RBridges in the campus 342 may not cover all receivers of many multicast groups. (This is 343 different from the multicast trees construction signaled by PIM 344 (protocol independent multicast [RFC7761]) or mLDP (multicast label 345 distribution protocol [RFC6388].)) 347 3.1. Designating Roots for Backup Distribution Trees 348 The RBridge, say, RB1, having the highest root priority nickname 349 controls the creation of backup DTs and specifies their roots. It 350 explicitly advertises a list of nicknames identifying the roots of 351 primary and their backup DTs using the Backup Tree APPsub-TLV as 352 specified in Section 6.2 (See also Section 4.5 of [RFC6325]). It's 353 possible that a backup DT and a primary DT have the same root RBridge 354 but this is not required. In that case, to distinguish the primary DT 355 and the backup DT for the common root case, the root RBridge MUST own 356 at least two nicknames so a different nickname can be used to name 357 each tree. 359 The method by which the highest priority root RBridge determines 360 which primary distribution trees to protect with a backup and what 361 the root of each such back up will be is out of scope for this 362 document. 364 3.2. Backup DT Calculation with Affinity Links 366 2 1 367 / \ 368 Root 1___ ___2 Root 369 /|\ \ / /|\ 370 / | \ \ / / | \ 371 3 4 5 6 3 4 5 6 372 | | | | \/ \/ 373 | | | | /\ /\ 374 7 8 9 10 7 8 9 10 376 Primary DT Backup DT 378 Figure 3.1: An Example of a Primary DT and its Backup DT 380 TRILL supports the computation of multiple distribution trees by 381 RBridges. With the intentional assignment of Affinity Links in DT 382 calculation, this document specifies a method to construct Resilient 383 Distribution Trees. For example, in Figure 3.1, the backup DT is set 384 up to be maximally disjoint to the primary DT. (The full topology is 385 a combination of these two DTs, which is not shown in the figure.) 386 Except for the link between RB1 and RB2, all other links on the 387 primary DT do not overlap with any link on the backup DT. Thus every 388 link on the primary DT, except link RB1-RB2, is protected by the 389 backup DT. 391 3.2.1. The Algorithm for Choosing Affinity Links 393 Operators MAY configure Affinity Links, for example, to intentionally 394 protect a specific link such as the link connected to a gateway. But 395 it is desirable that every RBridge independently computes Affinity 396 Links for a backup DT across the whole campus. This enables a 397 distributed deployment and also minimizes configuration. 399 Compared to the algorithms for Maximally Redundant Trees in 400 [RFC7811], TRILL has both an advantage and a disadvantage. An 401 advantage of TRILL is that Resilient Distribution Tree does not 402 restrict the root of the backup DT to be the same as that of the 403 primary DT. Two disjoint (or maximally disjoint) trees may have 404 different root nodes, which significantly augments the solution 405 space. 407 A disadvantage of TRILL, when using the algorithm specified below in 408 this section is that the backup DT is computed with reference to the 409 primary tree but there may be a pair of tree that is more disjoint 410 than any backup tree can be with the particular primary tree. 412 This document RECOMMENDS achieving the independent backup tree 413 determination method through a change to the conventional DT 414 calculation process of TRILL. After the primary DT is calculated, 415 every RBridge will be aware of which links are used in that primary 416 tree. When the backup DT is calculated, each RBridge increases the 417 metric of these links by the summation of all original link metrics 418 in the campus but not more than 2**23, which gives these links a 419 lower priority of being chosen for the backup DT by the Shortest Path 420 First calculation. All links on this backup DT can be assigned as 421 Affinity Links but this may not be necessary. In order to reduce the 422 amount of Affinity Sub-TLVs flooded across the campus, only those NOT 423 picked by the conventional DT calculation process SHOULD be announced 424 as Affinity Links. 426 3.2.2. Affinity Links Advertisement 428 Similar to [RFC7783], every parent RBridge of an Affinity Link takes 429 charge of announcing this link in an Affinity Sub-TLV. When this 430 RBridge plays the role of parent RBridge for several Affinity Links, 431 it is natural to have them advertised together in the same Affinity 432 Sub-TLV, and each Affinity Link is structured as one Affinity Record 433 [RFC7176]. 435 Affinity Links are announced in the Affinity Sub-TLV that is 436 recognized by every RBridge. Since each RBridge computes distribution 437 trees as the Affinity Sub-TLV requires, the backup DT will be built 438 consistently by all RBridges in the campus. 440 4. Resilient Distribution Trees Installation 442 As specified in Section 4.5.2 of [RFC6325], an ingress RBridge MUST 443 announce the distribution trees it may choose to ingress multicast 444 frames. Thus other RBridges in the campus can limit the amount of 445 state necessary for RPF checks. Also, [RFC6325] recommends that an 446 ingress RBridge by default chooses the DT or DTs whose root or roots 447 are least cost from the ingress RBridge. To sum up, RBridges do pre- 448 compute all the trees that might be used so they can properly forward 449 multi-destination packets, but only install RPF state for some 450 combinations of ingress and tree. 452 This document specifies that the backup DT MUST be included in an 453 ingress RBridge's DT announcement list in this ingress RBridge's LSP 454 if the corresponding primary tree is included. In order to reduce the 455 service disruption time, RBridges SHOULD install backup DTs in 456 advance, which also includes the RPF filters that need to be set up 457 for RPF Checks. 459 Since the backup DT is intentionally built highly disjoint to the 460 primary DT, when a link fails and interrupts the ongoing multicast 461 traffic sent along the primary DT, it is probable that the backup DT 462 is not affected. Therefore, the backup DT installed in advance can be 463 used to deliver multicast packets immediately. 465 4.1. Pruning the Backup Distribution Tree 467 The way that a backup DT is pruned is different from the way that the 468 primary DT is pruned. To enable protection it is possible that a 469 branch should not be pruned (see Section 4.5.3 of [RFC6325]), even 470 though it does not have any downstream receivers for a particular 471 data label. The rule for backup DT pruning is that the backup DT 472 should be pruned, eliminating branches that have no potential 473 downstream RBridges which appear on the pruned primary DT. 475 Even though the primary DT may not be optimally pruned in practice, 476 the backup DT SHOULD always be pruned as if the primary DT is 477 optimally pruned. Those redundant links that ought to be pruned on 478 the primary DT will not be protected. 480 1 481 \ 482 Root 1___ ___2 Root 483 / \ \ / /|\ 484 / \ \ / / | \ 485 3 5 6 3 4 5 6 486 | | | / \/ 487 | | | / /\ 488 7 9 10 7 9 10 489 Pruned Primary DT Pruned Backup DT 491 Figure 4.1: The Backup DT is Pruned Based on the Pruned Primary DT. 493 Suppose RB7, RB9 and RB10 constitute a multicast group MGx. The 494 pruned primary DT and backup DT are shown in Figure 4.1. Referring 495 back to Figure 3.1, branches RB2-RB1 and RB4-RB1 on the primary DT 496 are pruned for the distribution of MGx traffic since there are no 497 potential receivers on these two branches. Although branches RB1-RB2 498 and RB3-RB2 on the backup DT have no potential multicast receivers, 499 they appear on the pruned primary DT and may be used to repair link 500 failures of the primary DT. Therefore they are not pruned from the 501 backup DT. Branch RB8-RB3 can be safely pruned because it does not 502 appear on the pruned primary DT. 504 4.2. RPF Filters Preparation 506 RB2 announces in its LSP the trees RB2 might choose when RB2 507 ingresses a multicast packet [RFC6325]. When RB2 specifies such 508 trees, it SHOULD include the backup DT. Other RBridges will prepare 509 the RPF check states for both the primary DT and backup DT. When a 510 multicast packet is sent along either the primary DT or the backup 511 DT, it will be subject to the RPF Check. This works when global 1:1 512 protection is used. However, when global 1+1 protection or local 513 protection is applied, traffic duplication will happen if multicast 514 receivers accept both copies of the multicast packets from two RPF 515 filters. In order to avoid such duplication, egress RBridge multicast 516 receivers MUST act as merge points to activate a single RPF filter 517 and discard the duplicated packets from the other RPF filter. In the 518 normal case, the RPF state is set up according to the primary DT. 519 When a link failure on the primary DT is detected, the egress node 520 RPF filter based on the backup DT should be activated. 522 5. Protection Mechanisms with Resilient Distribution Trees 524 Protection mechanisms make use of the backup DT installed in advance. 525 Protection mechanisms developed using PIM or mLDP for multicast in 526 IP/MPLS networks are not applicable to TRILL due to the following 527 fundamental differences in their distribution tree calculation. 529 o The link on a TRILL distribution tree is always bidirectional 530 while the link on a distribution tree in IP/MPLS networks may be 531 unidirectional. 533 o In TRILL, a multicast source node does not have to be the root of 534 the distribution tree. It is just the opposite in IP/MPLS 535 networks. 537 o In IP/MPLS networks, distribution trees are constructed for each 538 multicast source node as well as their backup distribution trees. 539 In TRILL, a small number of core distribution trees are shared 540 among multicast groups. A backup DT does not have to share the 541 same root as the primary DT. 543 Therefore a TRILL specific multicast protection mechanism is needed. 545 Global 1:1 protection, global 1+1 protection and local protection are 546 described in this section. In Figure 4.1, assume RB7 is the ingress 547 RBridge of the multicast stream while RB9 and RB10 are the multicast 548 receivers. Suppose link RB1-RB5 fails during the multicast 549 forwarding. The backup DT rooted at RB2 does not include link RB1- 550 RB5, therefore it can be used to protect this link. In global 1:1 551 protection, RB7 will switch the subsequent multicast traffic to this 552 backup DT when it's notified of the link failure. In the global 1+1 553 protection, RB7 will inject two copies of the multicast stream and 554 let multicast receivers RB9 and RB10 choose which copy would be 555 delivered. In the local protection, when link RB1-RB5 fails, RB1 will 556 locally replicate the multicast traffic and send it on the backup DT. 558 The type of protection in use at an RBridge is indicated by a two-bit 559 field in that RBridge's Extended Capability TLV as discussed in 560 Section 5.4. 562 5.1. Global 1:1 Protection 564 In the global 1:1 protection, the ingress RBridge of the multicast 565 traffic is responsible for switching the failure affected traffic 566 from the primary DT over to the backup DT. Since the backup DT has 567 been installed in advance, the global protection need not wait for 568 the DT recalculation and installation. When the ingress RBridge is 569 notified about the failure, it immediately makes this switch over. 571 This type of protection is simple and duplication safe. However, 572 depending on the topology of the RBridge campus, the time spent on 573 the failure detection and propagation through the IS-IS control plane 574 may still cause a considerable service disruption. 576 BFD (Bidirectional Forwarding Detection) protocol can be used to 577 reduce the failure detection time. Link failures can be rapidly 578 detected with one-hop BFD [RFC7175]. [RBmBFD] introduces the fast 579 failure detection of multicast paths. It can be used to reduce both 580 the failure detection and the propagation time for global protection. 581 In [RBmBFD], the ingress RBridge needs to send BFD control packets to 582 poll each receiver, and receivers return BFD control packets to the 583 ingress as the response. If no response is received from a specific 584 receiver for a detection time, the ingress can judge that the 585 connectivity to this receiver is broken. Therefore, [RBmBFD] is used 586 to detect the connectivity of a path rather than a link. The ingress 587 RBridge will determine a minimum failed branch that contains this 588 receiver. The ingress RBridge will switch ongoing multicast traffic 589 based on this judgment. For example, in Figure 4.1, if RB9 does not 590 respond while RB10 still responds, RB7 will presume that link RB1-RB5 591 and RB5-RB9 are failed. Multicast traffic will be switched to a 592 backup DT that can protect these two links. More accurate link 593 failure detection might help ingress RBridges make smarter decision 594 but it's out of the scope of this document. 596 5.2. Global 1+1 Protection 598 In the global 1+1 protection, the multicast source RBridge always 599 replicates the multicast packets and sends them onto both the primary 600 and backup DT. This may sacrifice the capacity efficiency but given 601 there is much connection redundancy and inexpensive bandwidth in Data 602 Center Networks, such kind of protection can be popular [RFC7431]. 604 5.2.1. Failure Detection 606 Egress RBridges (merge points) SHOULD realize the link failure as 607 early as practical and update their RPF filters quickly to minimize 608 the traffic disruption. Three options are provided as follows. 610 1. If you had a very reliable and steady data stream, egress RBridges 611 assume a minimum known packet rate for that data stream [RFC7431]. 612 A failure detection timer (say Td) is set as the interval between 613 two continuous packets. Td is reinitialized each time a packet is 614 received. If Td expires and packets are arriving at the egress 615 RBridge on the backup DT (within the time frame Td), it updates 616 the RPF filters and starts to receive packets forwarded on the 617 backup DT. This method requires configuration at the egress 618 RBridge of Td and of some method (filter) to determine if a packet 619 is part of the reliable data stream. Since the filtering 620 capabilities of various fast path logic differs greatly, specifics 621 of such configuration are outside the scope of this document. 623 2. With multi-point BFD [RBmBFD], when a link failure happens, 624 affected egress RBridges can detect a lack of connectivity from 625 the ingress. Therefore these egress RBridges are able to update 626 their RPF filters promptly. 628 3. Egress RBridges can always rely on the IS-IS control plane to 629 learn the failure and determine whether their RPF filters should 630 be updated. 632 5.2.2. Traffic Forking and Merging 634 For the sake of protection, transit RBridges SHOULD activate both 635 primary and backup RPF filters, therefore both copies of the 636 multicast packets will pass through transit RBridges. 638 Multicast receivers (egress RBridges) MUST act as "merge points" to 639 egress only one copy of each multicast packet. This is achieved by 640 the activation of only a single RPF filter. In the normal case, 641 egress RBridges activate the primary RPF filter. When a link on the 642 pruned primary DT fails, the ingress RBridge cannot reach some of the 643 receivers. When these unreachable receivers realize the link failed, 644 they SHOULD update their RPF filters to receive packets sent on the 645 backup DT. 647 Note that the egress RBridge need not be a literal merge point, that 648 is receiving the primary and backup DT versions over different links. 649 Even if the egress RBridge receives both copies over the same link, 650 because disjoint links are not available, it can still filter out one 651 copy because the RFP filtering logic is designed to test which tree 652 the packet is on as indicated by a field in the TRILL Header 653 [RFC6325]. 655 5.3. Local Protection 657 In the local protection, the Point of Local Repair (PLR) happens at 658 the upstream RBridge connected to the failed link. It is this RBridge 659 that makes the decision to replicate the multicast traffic to recover 660 from this link failure. Local protection can further save the time 661 spent on failure notification through the flooding of LSPs across the 662 TRILL campus. In addition, the failure detection can be sped up using 663 BFD [RFC7175], therefore local protection can minimize the service 664 disruption, typically reducing it to less than 50 milliseconds. 666 Since the ingress RBridge is not necessarily the root of the 667 distribution tree in TRILL, a multicast downstream point may not be 668 the descendant of the ingress point on the distribution tree. 670 Due to the multi-destination RPF check in TRILL, local protection can 671 only be used at a fork point where the primary and backup trees 672 diverge and the set of nodes downstream is identical for both paths. 673 If these conditions do not apply, local protection MUST NOT be used. 675 5.3.1. Starting to Use the Backup Distribution Tree 677 The egress nickname TRILL Header field of the replicated multicast 678 TRILL data packets specifies the tree on which they are being 679 distributed. This field will be rewritten to the backup DT's root 680 nickname by the PLR. But the ingress nickname field of the multicast 681 TRILL Data packet MUST remain unchanged. The PLR forwards all 682 multicast traffic with the backup DT egress nickname along the backup 683 DT. This updates [RFC6325] which specifies that the egress nickname 684 in the TRILL header of a multi-destination TRILL data packet must not 685 be changed by transit RBridges. 687 In the above example, the PLR RB1 locally decides to send replicated 688 multicast packets according to the backup DT. It will send them to 689 the next hop RB2. 691 5.3.2. Duplication Suppression 693 When a PLR starts to send replicated multicast packets on the backup 694 DT, some multicast packets are still being sent along the primary DT. 695 Some egress RBridges might receive duplicated multicast packets. The 696 traffic forking and merging method in the global 1+1 protection can 697 be adopted to suppress the duplication. 699 5.3.3. An Example to Walk Through 701 The example used to illustrate the above local protection is put 702 together to get a whole "walk through" below. 704 In the normal case, multicast frames ingressed by RB7 in Figure 4.1 705 with pruned distribution on the primary DT rooted at RB1 are being 706 received by RB9 and RB10. When the link RB1-RB5 fails, the PLR RB1 707 begins to replicate and forward subsequent multicast packets using 708 the pruned backup DT rooted at RB2. When RB2 gets the multicast 709 packets from the link RB1-RB2, it accepts them since the RPF filter 710 {DT=RB2, ingress=RB7, receiving links=RB1-RB2, RB3-RB2, RB4-RB2, RB5- 711 RB2 and RB6-RB2} is installed on RB2. RB2 forwards the replicated 712 multicast packets to its neighbors except RB1. The multicast packets 713 reach RB6 where both RPF filters {DT=RB1, ingress=RB7, receiving 714 link=RB1-RB6} and {DT=RB2, ingress=RB7, receiving links=RB2-RB6 and 715 RB9-RB6} are active. RB6 will let both multicast streams through. 716 Multicast packets will finally reach RB9 where the RPF filter is 717 updated from {DT=RB1, ingress=RB7, receiving link=RB5-RB9} to 718 {DT=RB2, ingress=RB7, receiving link=RB6-RB9}. RB9 will egress the 719 multicast packets from the Backup Distribution Tree on to the local 720 link and drop those from the Primary Distribution Tree based on the 721 reverse path forwarding filter. 723 5.4. Protection Mode Signaling 725 The desired mode of resilient tree operation for each RBridge is 726 chosen by the network operator and configured on that RBridge. This 727 mode is announced by each RBridge is a two-bit Resilient Tree Mode 728 field in their Extended Capabilities TLV (see Sections 6.1, 8.1). The 729 values of this field have the following meanings: 731 Value Short Name Effect 732 ----- ---------- ------ 733 00 No support If any RBridge does not support Resilient 734 Trees, then the Resilient Tree mechanism is 735 disabled in all RBridges. This also applies if 736 any RBridge does not announce an Extended 737 Capabilities TLV. 738 01 Global 1:1 An RBridge advertising this value will, when it 739 ingresses a multi-destination frames, send them 740 on only one of the primary and backup DTs. All 741 other RBridges set their RPF filters to accept 742 traffic on both trees from this ingress. 743 10 Global 1+1 An RBridge advertising this value will, when it 744 ingresses a multi-destination frames, send them 745 on both the primary and backup DTs. All other 746 RBridges MUST set their RPF filters to accept 747 Traffic only on the primary or backup DT. 748 11 1+1 & Local An RBridge advertising this value acts as an 749 for the value 01 above when it is the ingress 750 RBridge. In addition, if it is a transit 751 RBridge at a fork point between the primary and 752 backup tress and detects that an adjacency has 753 failed, it diverts multi-destination TRILL data 754 packts on the primary tree to the backup tree, 755 changing the tree id in the packet to the 756 backup tree. 758 5.5. Updating the Primary and the Backup Distribution Trees 760 Assume an RBridge receives the LSP that indicates a link failure. 761 This RBridge starts to calculate the new primary DT based on the new 762 topology with the failed link excluded. Suppose the new primary DT is 763 installed at t1. 765 The propagation of LSPs around the campus will take some time. For 766 safety, we assume all RBridges in the campus will have converged to 767 the new primary DT at t1+Ts. By default, Ts (the "settling time") is 768 set to 30 seconds but it is configurable in seconds from 1 to 100. At 769 t1+Ts, the ingress RBridge switches the traffic from the backup DT 770 back to the new primary DT. 772 After another Ts (at t1+2*Ts), no multicast packets are being 773 forwarded along the old primary DT. The backup DT should be updated 774 (recalculated and reinstalled) after the new primary DT. The process 775 of this update under different protection types are discussed as 776 follows. 778 a) For the global 1:1 protection, the backup DT is simply updated at 779 t1+2*Ts. 781 b) For the global 1+1 protection, the ingress RBridge stops 782 replicating the multicast packets onto the old backup DT at t1+Ts. 784 The backup DT is updated at t1+2*Ts. The ingress RBridge MUST wait 785 for another Ts, during which time period all RBridges converge to 786 the new backup DT. At t1+3*Ts, it's safe for the ingress RBridge 787 to start to replicate multicast packets onto the new backup DT. 789 c) For the local protection, the PLR stops replicating and sending 790 packets on the old backup DT at t1+Ts. It is safe for RBridges to 791 start updating the backup DT at t1+2*Ts. 793 6. TRILL IS-IS Extensions 795 This section lists extensions to TRILL IS-IS to support resilient 796 trees. 798 6.1. Resilient Trees Extended Capability Field 800 An RBridge that supports the facilities specified in this document 801 MUST announce the Extended RBridge Capabilities APPsub-TLV [RFC7782] 802 with a non-zero value in the Resilient Trees field. If there are 803 RBridges that do not announce field set to a non-zero value, all 804 RBridges of the campus MUST disable the Resilient Distribution Tree 805 mechanism as defined in this document and fall back to the 806 distribution tree calculation algorithm as specified in [RFC6325]. 808 6.2 Backup Tree Root APPsub-TLV 810 The structure of the Backup Tree Root APPsub-TLV is shown below. 812 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 813 | Type = tbd2 | (2 bytes) 814 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 815 | Length | (2 bytes) 816 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 817 | Primary Tree Root Nickname | (2 bytes) 818 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 819 | Backup Tree Root Nickname | (2 bytes) 820 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 822 o Type = Backup Tree Root APPsubTLV type, set to tbd2 824 o Length = 4, if the length is any other value, the APPsub-TLV is 825 corrupt and MUST be ignored. 827 o Primary Tree Root Nickname = the nickname of the root RBridge 828 of the primary tree for which a resilient backup tree is being 829 created 831 o Backup Tree Root Nickname = the nickname of the root RBridge of 832 the backup tree 834 If either nickname is not the nickname of a tree whose calculation is 835 being directed by the highest priority tree root RBridge, the APPsub- 836 TLV is ignored. This APPsub-TLV MUST be advertised by the highest 837 priority RBridge to be a tree root. Backup Tree Root APPsub-TLVs 838 advertised by other RBridges are ignored. If there are two or more 839 Backup Tree Root APPsub-TLVs for the same primary tree specifying 840 different backup trees, then the one specifying the lowest magnitude 841 backup tree root nickname is used, treating nicknames as unsigned 16- 842 bit quantities. 844 7. Security Considerations 846 This document raises no new security issues for TRILL. The IS-IS PDUs 847 used to transmit the information specified in Section 6 can be 848 secured with IS-IS security [RFC5310]. 850 For general TRILL Security Considerations, see [RFC6325]. 852 8. IANA Considerations 854 The Affinity Sub-TLV has already been defined in [RFC7176]. This 855 document does not change its definition. See below for IANA Actions. 857 8.1. Resilient Tree Extended Capability Field 859 IANA will assign two adjacent bits (Sections 5.4, 6.1) in the 860 Extended RBridge Capabilities subregistry on the TRILL Parameters 861 page to form the Resilient Tree Extended Capability field and change 862 the heading of the "Bit" column to be "Bit(s)", adding the following 863 to the registry [for example, tbd1 could be "2-3"]: 865 Bit Mnemonic Description Reference 866 ---- -------- ----------- --------- 867 tbd1 RT Resilient Tree Support [this document] 869 8.2. Backup Tree Root APPsub-TLV 871 IANA will assign and APPsub-TLV type under IS-IS TLV 251 Application 872 Identifier 1 on the TRILL Parameters page from the range below 255 873 for the Backup Tree Root APPsub-TLV (Section 6.2) as follows: 875 Type Name Reference 876 ---- ---------------- --------------- 877 tbd2 Backup Tree Root [this document] 879 Acknowledgements 880 The careful review from Gayle Noble is gracefully acknowledged. The 881 authors would like to thank the comments and suggestions from Donald 882 Eastlake, Erik Nordmark, Fangwei Hu, Gayle Noble, Hongjun Zhai and 883 Xudong Zhang. 885 9. References 887 9.1. Normative References 889 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 890 Requirement Levels", BCP 14, RFC 2119, DOI 891 10.17487/RFC2119, March 1997, . 894 [RFC7176] Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, D., 895 and A. Banerjee, "Transparent Interconnection of Lots of 896 Links (TRILL) Use of IS-IS", RFC 7176, DOI 897 10.17487/RFC7176, May 2014, . 900 [RFC7783] Senevirathne, T., Pathangi, J., and J. Hudson, "Coordinated 901 Multicast Trees (CMT) for Transparent Interconnection of 902 Lots of Links (TRILL)", RFC 7783, DOI 10.17487/RFC7783, 903 February 2016, . 905 [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 906 Ghanwani, "Routing Bridges (RBridges): Base Protocol 907 Specification", RFC 6325, DOI 10.17487/RFC6325, July 2011, 908 . 910 [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., 911 Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent 912 Multicast - Sparse Mode (PIM-SM): Protocol Specification 913 (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 914 2016, . 916 [RFC6388] Wijnands, IJ., Ed., Minei, I., Ed., Kompella, K., and B. 917 Thomas, "Label Distribution Protocol Extensions for Point- 918 to-Multipoint and Multipoint-to-Multipoint Label Switched 919 Paths", RFC 6388, DOI 10.17487/RFC6388, November 2011, 920 . 922 [RBmBFD] M. Zhang, S. Pallagatti and V. Govindan, "TRILL Support of 923 Point to Multipoint BFD", draft-ietf-trill-p2mp-bfd, work 924 in progress. 926 [RFC7175] Manral, V., Eastlake 3rd, D., Ward, D., and A. Banerjee, 927 "Transparent Interconnection of Lots of Links (TRILL): 928 Bidirectional Forwarding Detection (BFD) Support", RFC 929 7175, DOI 10.17487/RFC7175, May 2014, . 932 [RFC7780] Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A., 933 Ghanwani, A., and S. Gupta, "Transparent Interconnection of 934 Lots of Links (TRILL): Clarifications, Corrections, and 935 Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016, 936 . 938 [RFC7782] Zhang, M., Perlman, R., Zhai, H., Durrani, M., and S. 939 Gupta, "Transparent Interconnection of Lots of Links 940 (TRILL) Active-Active Edge Using Multiple MAC Attachments", 941 RFC 7782, DOI 10.17487/RFC7782, February 2016, 942 . 944 [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., 945 and M. Fanto, "IS-IS Generic Cryptographic Authentication", 946 RFC 5310, DOI 10.17487/RFC5310, February 2009, 947 . 949 9.2. Informative References 951 [RFC7811] Enyedi, G., Csaszar, A., Atlas, A., Bowers, C., and A. 952 Gopalan, "An Algorithm for Computing IP/LDP Fast Reroute 953 Using Maximally Redundant Trees (MRT-FRR)", RFC 7811, DOI 954 10.17487/RFC7811, June 2016, . 957 [RFC7431] Karan, A., Filsfils, C., Wijnands, IJ., Ed., and B. 958 Decraene, "Multicast-Only Fast Reroute", RFC 7431, DOI 959 10.17487/RFC7431, August 2015, . 962 [mBFD] D. Katz, D. Ward, "BFD for Multipoint Networks", draft- 963 ietf-bfd-multipoint, work in progress. 965 [RFC7172] Eastlake 3rd, D., Zhang, M., Agarwal, P., Perlman, R., and 966 D. Dutt, "Transparent Interconnection of Lots of Links 967 (TRILL): Fine-Grained Labeling", RFC 7172, DOI 968 10.17487/RFC7172, May 2014, . 971 Author's Addresses 973 Mingui Zhang 974 Huawei Technologies Co.,Ltd 975 Huawei Building, No.156 Beiqing Rd. 976 Beijing 100095 P.R. China 978 Email: zhangmingui@huawei.com 980 Tissa Senevirathne 981 Consultant 983 Email: tsenevir@gmail.com 985 Janardhanan Pathangi 986 Gigamon 988 Email: path.jana@gmail.com 990 Ayan Banerjee 991 Cisco 992 170 West Tasman Drive 993 San Jose, CA 95134 USA 995 Email: ayabaner@cisco.com 997 Anoop Ghanwani 998 Dell 999 350 Holger Way 1000 San Jose, CA 95134 1002 Phone: +1-408-571-3500 1003 Email: Anoop@alumni.duke.edu