idnits 2.17.1 draft-zhang-trill-resilient-trees-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == The 'Updates: ' line in the draft header should list only the _numbers_ of the RFCs which will be updated by this document (if approved); it should not include the word 'RFC' in the list. -- The draft header indicates that this document updates RFC6325, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC6325, updated by this document, for RFC5378 checks: 2006-05-11) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 21, 2013) is 3838 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 174, but not defined == Outdated reference: A later version (-03) exists of draft-ietf-isis-rfc6326bis-01 == Outdated reference: A later version (-11) exists of draft-ietf-trill-cmt-02 ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) -- Possible downref: Non-RFC (?) normative reference: ref. 'ClearC' == Outdated reference: A later version (-08) exists of draft-ietf-rtgwg-mofrr-02 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Mingui Zhang 3 Intended Status: Proposed Standard Huawei 4 Expires: April 24, 2014 Tissa Senevirathne 5 Updates: RFC 6325 CISCO 6 Janardhanan Pathangi 7 DELL 8 Ayan Banerjee 9 Insieme Networks 10 Anoop Ghanwani 11 DELL 12 Donald Eastlake 13 Huawei 14 October 21, 2013 16 TRILL Resilient Distribution Trees 17 draft-zhang-trill-resilient-trees-04.txt 19 Abstract 21 TRILL protocol provides layer 2 multicast data forwarding using IS-IS 22 link state routing. Distribution trees are computed based on the link 23 state information through Shortest Path First calculation. When a 24 link on the distribution tree fails, a campus-wide recovergence of 25 this distribution tree will take place, which can be time consuming 26 and may cause considerable disruption to the ongoing multicast 27 service. 29 This document proposes to build the backup distribution tree to 30 protect links on the primary distribution tree. Since the backup 31 distribution tree is built up ahead of the link failure, when a link 32 on the primary distribution tree fails, the pre-installed backup 33 forwarding table will be utilized to deliver multicast packets 34 without waiting for the campus-wide recovergence, which minimizes the 35 service disruption. 37 Status of this Memo 39 This Internet-Draft is submitted to IETF in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF), its areas, and its working groups. Note that other 44 groups may also distribute working documents as Internet-Drafts. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 50 The list of current Internet-Drafts can be accessed at 51 http://www.ietf.org/1id-abstracts.html 53 The list of Internet-Draft Shadow Directories can be accessed at 54 http://www.ietf.org/shadow.html 56 Copyright and License Notice 58 Copyright (c) 2013 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 1.1. Conventions used in this document . . . . . . . . . . . . . 5 75 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 76 2. Usage of Affinity Sub-TLV . . . . . . . . . . . . . . . . . . . 5 77 2.1. Allocating Affinity Links . . . . . . . . . . . . . . . . . 5 78 2.2. Distribution Tree Calculation with Affinity Links . . . . . 6 79 3. Resilient Distribution Trees Calculation . . . . . . . . . . . 7 80 3.1. Designating Roots for Backup Trees . . . . . . . . . . . . 8 81 3.1.1. Conjugate Trees . . . . . . . . . . . . . . . . . . . . 8 82 3.1.2. Explicitly Advertising Tree Roots . . . . . . . . . . . 8 83 3.2. Backup DT Calculation . . . . . . . . . . . . . . . . . . . 8 84 3.2.1. Backup DT Calculation with Affinity Links . . . . . . . 8 85 3.2.1.1. Algorithm for Choosing Affinity Links . . . . . . . 9 86 3.2.1.2. Affinity Links Advertisement . . . . . . . . . . . 10 87 3.2.2. Backup DT Calculation without Affinity Links . . . . . 10 88 4. Resilient Distribution Trees Installation . . . . . . . . . . . 10 89 4.1. Pruning the Backup Distribution Tree . . . . . . . . . . . 11 90 4.2. RPF Filters Preparation . . . . . . . . . . . . . . . . . . 12 91 5. Protection Mechanisms with Resilient Distribution Trees . . . . 12 92 5.1. Global 1:1 Protection . . . . . . . . . . . . . . . . . . . 13 93 5.2. Global 1+1 Protection . . . . . . . . . . . . . . . . . . . 13 94 5.2.1. Failure Detection . . . . . . . . . . . . . . . . . . . 14 95 5.2.2. Traffic Forking and Merging . . . . . . . . . . . . . . 14 97 5.3. Local Protection . . . . . . . . . . . . . . . . . . . . . 14 98 5.3.1. Start Using the Backup Distribution Tree . . . . . . . 15 99 5.3.2. Duplication Suppression . . . . . . . . . . . . . . . . 15 100 5.3.3. An Example to Walk Through . . . . . . . . . . . . . . 15 101 5.4. Switching Back to the Primary Distribution Tree . . . . . . 16 102 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 16 103 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 17 104 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . 17 105 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 106 8.1. Normative References . . . . . . . . . . . . . . . . . . . 17 107 8.2. Informative References . . . . . . . . . . . . . . . . . . 18 108 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 110 1. Introduction 112 Lots of multicast traffic is generated by interrupt latency sensitive 113 applications, e.g., video distribution, including IP-TV, video 114 conference and so on. Normally, a network fault will be recovered 115 through a network wide reconvergence of the forwarding states, but 116 this process is too slow to meet the tight Service Level Agreement 117 (SLA) requirements on the service disruption duration. What is worse, 118 updating multicast forwarding states may take significantly longer 119 than unicast convergence since multicast states are updated based on 120 control-plane signaling [mMRT]. 122 Protection mechanisms are commonly used to reduce the service 123 disruption caused by network faults. With backup forwarding states 124 installed in advance, a protection mechanism is possible to restore 125 an interrupted multicast stream in tens of milliseconds which 126 guarantees the stringent SLA on service disruption. Several 127 protection mechanisms for multicast traffic have been developed for 128 IP/MPLS networks [mMRT] [MoFRR]. However, the way TRILL constructs 129 distribution trees (DT) is different from the way multicast trees are 130 computed under IP/MPLS, therefore a multicast protection mechanism 131 suitable for TRILL is required. 133 This document proposes "Resilient Distribution Trees" (RDT) in which 134 backup trees are installed in advance for the purpose of fast failure 135 repair. Three types of protection mechanisms are proposed. 137 o Global 1:1 protection is used to refer to the mechanism that the 138 multicast source RBridge normally injects one multicast stream 139 onto the primary DT. When interruption of this stream is detected, 140 the source RBridge switches to the backup DT to inject subsequent 141 multicast streams until the primary DT is recovered. 143 o Global 1+1 protection is used to refer to the mechanism that the 144 multicast source RBridge always injects two copies of multicast 145 streams onto the primary DT and backup DT respectively. In the 146 normal case, multicast receivers pick the stream sent along the 147 primary DT and egress it to its local link. When a link failure 148 interrupts the primary stream, the backup one will be picked until 149 the primary DT is recovered. 151 o Local protection refers to the mechanism that the RBridge attached 152 to the failed link locally repairs the failure. 154 RDT may greatly reduce the service disruption caused by link 155 failures. In the global 1:1 protection, the time cost by DT 156 recalculation and installation can be saved. The global 1+1 157 protection and local protection further save the time spent on 158 failure propagation. A failed link can be repaired in tens of 159 milliseconds. Although it's possible to make use of RDT to achieve 160 load balance of multicast traffic, this document leaves that for 161 future study. 163 [6326bis] defines the Affinity TLV. An "Affinity Link" can be 164 explicitly assigned to a distribution tree or trees. This offers a 165 way to manipulate the calculation of distribution trees. With 166 intentional assignment of Affinity Links, a backup distribution tree 167 can be set up to protect links on a primary distribution tree. 169 1.1. Conventions used in this document 171 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 172 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 173 document are to be interpreted as described in RFC 2119 [RFC2119]. 175 1.2. Terminology 177 IS-IS: Intermediate System to Intermediate System 178 TRILL: TRansparent Interconnection of Lots of Links 179 DT: Distribution Tree 180 RPF: Reverse Path Forwarding 181 RDT: Resilient Distribution Tree 182 SLA: Service Level Agreement 183 PLR: Point of Local Repair, in this document, it is the multicast 184 upstream RBridge connecting the failed link. It's valid only for 185 local protection. 187 2. Usage of Affinity Sub-TLV 189 This document uses the Affinity Sub-TLV [6326bis] to assign a parent 190 to an RBridge in a tree as discussed below. 192 2.1. Allocating Affinity Links 194 Affinity Sub-TLV explicitly assigns parents for RBridges on 195 distribution trees. They are advertised in the Affinity Sub-TLV and 196 recognized by each RBridge in the campus. The originating RBridge 197 becomes the parent and the nickname contained in the Affinity Record 198 identifies the child. This explicitly provides an "Affinity Link" on 199 a distribution tree or trees. The "Tree-num of roots" of the Affinity 200 Record identify the distribution trees that adopt this Affinity Link 201 [6326bis]. 203 Affinity Links may be configured or automatically determined using a 204 certain algorithm [CMT]. Suppose link RB2-RB3 is chosen as an 205 Affinity Link on the distribution tree rooted at RB1. RB2 should send 206 out the Affinity Sub-TLV with an Affinity Record like {Nickname=RB3, 207 Num of Trees=1, Tree-num of roots=RB1}. In this document, RB3 does 208 not have to be a leaf node on a distribution tree, therefore an 209 Affinity Link can be used to identify any link on a distribution 210 tree. This kind of assignment offers a flexibility to RBridges in 211 distribution tree calculation: they are allowed to choose child for 212 which they are not on the shortest paths from the root. This 213 flexibility is leveraged to increase the reliability of distribution 214 trees in this document. 216 An Affinity Sub-TLV which tries to connect two RBridges that are not 217 adjacent MUST be ignored. 219 2.2. Distribution Tree Calculation with Affinity Links 221 When RBridges receive an Affinity Sub-TLV with Affinity Link which is 222 an incoming link of RB2 (i.e., RB2 is the child on this Affinity 223 Link), RB2's incoming links other than the Affinity Link are removed 224 from the full graph of the campus to get a sub graph. RBridges 225 perform Shortest Path First calculation to compute the distribution 226 tree based on the sub graph. In this way, the Affinity Link will 227 surely appear on the distribution tree. 229 Root Root 230 +---+ -> +---+ -> +---+ +---+ -> +---+ -> +---+ 231 |RB1| |RB2| |RB3| |RB1| |RB2| |RB3| 232 +---+ <- +---+ <- +---+ +---+ <- +---+ <- +---+ 233 ^ | ^ | ^ | ^ | ^ ^ | 234 | v | v | v | v | | v 235 +---+ -> +---+ -> +---+ +---+ -> +---+ -> +---+ 236 |RB4| |RB5| |RB6| |RB4| |RB5| |RB6| 237 +---+ <- +---+ <- +---+ +---+ <- +---+ +---+ 239 Full Graph Sub Graph 241 Root 1 Root 1 242 / \ / \ 243 / \ / \ 244 4 2 4 2 245 / \ | | 246 / \ | | 247 5 3 5 3 248 | | 249 | | 250 6 6 252 Shortest Path Tree of Full Graph Shortest Path Tree of Sub Graph 254 Figure 2.1: DT Calculation with the Affinity Link RB4-RB5 256 Take Figure 2.1 as an example. Suppose RB1 is the root and link RB4- 257 RB5 is the Affinity Link. RB5's other incoming links RB2-RB5 and RB6- 258 RB5 are removed from the Full Graph to get the Sub Graph. Since RB4- 259 RB5 is the unique link to reach RB5, the Shortest Path Tree 260 inevitably contains this link. 262 3. Resilient Distribution Trees Calculation 264 RBridges leverage IS-IS to detect and advertise network faults. A 265 node or link failure will trigger a campus-wide reconvergence of 266 distribution trees. The reconvergence generally includes the 267 following procedures: 269 1. Failure detected through IS-IS control messages (HELLO) exchanging 270 or some other method such as BFD [rbBFD]; 272 2. IS-IS state flooding so each RBridge learns about the failure; 274 3. Each RBridge recalculates affected distribution trees 275 independently; 277 4. RPF filters are updated according to the new distribution trees. 278 The recomputed distribution trees are pruned per VLAN and 279 installed into the multicast forwarding tables. 281 The slow reconvergence can be as long as tens of seconds, which will 282 cause disruption to ongoing multicast traffic. In protection 283 mechanisms, alternative paths prepared ahead of potential node or 284 link failures are used to detour the failures upon the failure 285 detection, therefore service disruption can be minimized. 287 This document will focus only on link protection. The construction of 288 backup DT for the purpose of node protection is out the scope of this 289 document. In order to protect a node on the primary tree, a backup 290 tree can be setup without this node [mMRT]. When this node fails, the 291 backup tree can be safely used to forward multicast traffic to make a 292 detour. However, TRILL distribution trees are shared among all VLANs 293 and Fine Grained Labels [FGL] and they have to cover all RBridge 294 nodes in the campus [RFC6325]. A DT that does not span all RBridges 295 in the campus may not cover all receivers of many multicast groups. 296 (This is different from the multicast trees construction signaled by 297 PIM [RFC4601] or mLDP [RFC6388].) 299 3.1. Designating Roots for Backup Trees 301 Operators MAY manually configure the roots for the backup DTs. 302 Nevertheless, this document aims to provide a mechanism with minimum 303 configuration. Two options are offered as follows. 305 3.1.1. Conjugate Trees 307 [RFC6325] and [ClearC] has defined how distribution tree roots are 308 selected. When a backup DT is computed for a primary DT, its root is 309 set to be the root of this primary DT. In order to distinguish the 310 primary DT and the backup DT, the root RBridge MUST own multiple 311 nicknames. 313 3.1.2. Explicitly Advertising Tree Roots 315 RBridge RB1 having the highest root priority nickname might 316 explicitly advertise a list of nicknames to identify the roots of the 317 primary and backup tree roots (See [RFC6325] Section 4.5). 319 3.2. Backup DT Calculation 321 3.2.1. Backup DT Calculation with Affinity Links 322 2 1 323 / \ 324 Root 1___ ___2 Root 325 /|\ \ / /|\ 326 / | \ \ / / | \ 327 3 4 5 6 3 4 5 6 328 | | | | \/ \/ 329 | | | | /\ /\ 330 7 8 9 10 7 8 9 10 332 Primary DT Backup DT 334 Figure 3.1: An Example of a Primary DT and its Backup DT 336 TRILL allows RBridges to compute multiple distribution trees. With 337 the intentional assignment of Affinity Links in DT calculation, this 338 document proposes a method to construct Resilient Distribution Trees 339 (RDT). For example, in Figure 3.1, the backup DT is set up maximally 340 disjoint to the primary DT (The full topology is a combination of 341 these two DTs, which is not shown in the figure.). Except for the 342 link between RB1 and RB2, all other links on the primary DT do not 343 overlap with links on the backup DT. It means that every link on the 344 primary DT, except link RB1-RB2, can be protected by the backup DT. 346 3.2.1.1. Algorithm for Choosing Affinity Links 348 Operators MAY configure Affinity Links to intentionally protect a 349 specific link, such as the link connected to a gateway. But it is 350 desirable that every RBridge independently computes Affinity Links 351 for a backup DT across the whole campus. This enables a distributed 352 deployment and also minimizes configuration. 354 Algorithms for Maximally Redundant Trees [mMRT] may be used to figure 355 out Affinity Links on a backup DT which is maximally disjointed to 356 the primary DT but it only provides a subset of all possible 357 solutions, i.e., the conjugate trees described in Section 3.1.1. In 358 TRILL, RDT does not restrict the root of the backup DT to be the same 359 as that of the primary DT. Two disjoint (or maximally disjointed) 360 trees may root from different nodes, which significantly augments the 361 solution space. 363 This document RECOMMENDS achieving the independent method through a 364 slight change to the conventional DT calculation process of TRILL. 365 Basically, after the primary DT is calculated, the RBridge will be 366 aware of which links will be used. When the backup DT is calculated, 367 each RBridge increases the metric of these links by a proper value 368 (for safety, it's recommended to used the summation of all original 369 link metrics in the campus but not more than 2**23), which gives 370 these links a lower priority being chosen by the backup DT by 371 performing Shortest Path First calculation. All links on this backup 372 DT can be assigned as Affinity Links but this is unnecessary. In 373 order to reduce the amount of Affinity Sub-TLVs flooded across the 374 campus, only those not picked by conventional DT calculation process 375 ought to be recognized as Affinity Links. 377 3.2.1.2. Affinity Links Advertisement 379 Similar to [CMT], every parent RBridge of an Affinity Link takes 380 charge of announcing this link in an Affinity Sub-TLV. When this 381 RBridge plays the role of parent RBridge for several Affinity Links, 382 it is natural to have them advertised together in the same Affinity 383 Sub-TLV and each Affinity Link is structured as one Affinity Record. 385 Affinity Links are announced in the Affinity Sub-TLV that is 386 recognized by every RBridge. Since each RBridge computes distribution 387 trees as the Affinity Sub-TLV requires, the backup DT will be built 388 up consistently. 390 3.2.2. Backup DT Calculation without Affinity Links 392 This section provides an alternative method to set up the disjointed 393 backup DT. 395 After the primary DT is calculated, each RBridge increases the cost 396 of those links which are already in the primary DT by a multiplier 397 (For safety, 64x is RECOMMENDED.). It would ensure that a link 398 appears in both trees if and only if there is no other way to reach 399 the node (i.e. the graph would become disconnected if it were pruned 400 of the links in the first tree.). In other words, the two trees will 401 be maximally disjointed. 403 The above algorithm is similar as that defined in Section 3.2.1.1. 404 All RBridges MUST agree on the same algorithm, then the backup DT can 405 be calculated by each RBridge consistently and configuration is 406 unnecessary. 408 4. Resilient Distribution Trees Installation 410 As specified in [RFC6325] Section 4.5.2, an ingress RBridge MUST 411 announce the distribution trees it may choose to ingress multicast 412 frames. Thus other RBridges in the campus can limit the amount of 413 states which are necessary for RPF check. Also, [RFC6325] recommends 414 that an ingress RBridge by default chooses the DT or DTs whose root 415 or roots are least cost from the ingress RBridge. To sum up, RBridges 416 do pre-compute all the trees that might be used so they can properly 417 forward multi-destination packets, but only install RPF state for 418 some combinations of ingress and tree. 420 This document states that the backup DT MUST be contained in an 421 ingress RBridge's DT announcement list and included in this ingress 422 RBridge's LSP. In order to reduce the service disruption time, 423 RBridges SHOULD install backup DTs in advance, which also includes 424 the RPF filters that need to be set up for RPF Check. 426 Since the backup DT is intentionally built up maximally disjointed to 427 the primary DT, when a link fails and interrupts the ongoing 428 multicast traffic sent along the primary DT, it is probable that the 429 backup DT is not affected. Therefore, the backup DT installed in 430 advance can be used to deliver multicast packets immediately. 432 4.1. Pruning the Backup Distribution Tree 434 The backup DT SHOULD be pruned per-VLAN. But the way a backup DT is 435 pruned is different from the way that the primary DT is pruned. Even 436 though a branch contains no downstream receivers, it is probable that 437 it should not be pruned for the purpose of protection. The rule for 438 backup DT pruning is that the backup DT should be pruned per-VLAN, 439 eliminating branches that have no potential downstream RBridges which 440 appear on the pruned primary DT. 442 It is probably that the primary DT is not optimally pruned in 443 practice. In this case, the backup DT SHOULD be pruned presuming that 444 the primary DT is optimally pruned. Those redundant links that ought 445 to be pruned will not be protected. 447 1 448 \ 449 Root 1___ ___2 Root 450 / \ \ / /|\ 451 / \ \ / / | \ 452 3 5 6 3 4 5 6 453 | | | / \/ 454 | | | / /\ 455 7 9 10 7 9 10 456 Pruned Primary DT Pruned Backup DT 458 Figure 4.1: The Backup DT is Pruned Based on the Pruned Primary DT. 460 Suppose RB7, RB9 and RB10 constitute a multicast group MGx. The 461 pruned primary DT and backup DT are shown in Figure 4.1. Referring 462 back to Figure 3.1, branches RB2-RB1 and RB4-RB1 on the primary DT 463 are pruned for the distribution of MGx traffic since there are no 464 potential receivers on these two branches. Although branches RB1-RB2 465 and RB3-RB2 on the backup DT have no potential multicast receivers, 466 they appear on the pruned primary DT and may be used to repair link 467 failures of the primary DT. Therefore they are not pruned from the 468 backup DT. Branch RB8-RB3 can be safely pruned because it does not 469 appear on the pruned primary DT. 471 4.2. RPF Filters Preparation 473 RB2 includes in its LSP the information to indicate which trees RB2 474 might choose to ingress multicast frames [RFC6325]. When RB2 475 specifies the trees it might choose to ingress multicast traffic, it 476 SHOULD include the backup DT. Other RBridges will prepare the RPF 477 check states for both the primary DT and backup DT. When a multicast 478 packet is sent along either the primary DT or the backup DT, it will 479 pass the RPF Check. This works when global 1:1 protection is used. 480 However, when global 1+1 protection or local protection is applied, 481 traffic duplication will happen if multicast receivers accept both 482 copies of the multicast packets from two RPF filters. In order to 483 avoid such duplication, egress RBridge multicast receivers MUST act 484 as merge points to activate a single RPF filter and discard the 485 duplicated packets from the other RPF filter. In normal case, the RPF 486 state is set up according to the primary DT. When a link fails, the 487 RPF filter based on the backup DT should be activated. 489 5. Protection Mechanisms with Resilient Distribution Trees 491 Protection mechanisms can be developed to make use of the backup DT 492 installed in advance. But protection mechanisms already developed 493 using PIM or mLDP for multicast of IP/MPLS networks are not 494 applicable to TRILL due to the following fundamental differences in 495 their distribution tree calculation. 497 o The link on a TRILL distribution tree is bidirectional while the 498 link on a distribution tree in IP/MPLS networks is unidirectional. 500 o In TRILL, a multicast source node does not have to be the root of 501 the distribution tree. It is just the opposite in IP/MPLS 502 networks. 504 o In IP/MPLS networks, distribution trees are constructed for each 505 multicast source node as well as their backup distribution trees. 506 In TRILL, a small number of core distribution trees are shared 507 among multicast groups. A backup DT does not have to share the 508 same root as the primary DT. 510 Therefore a TRILL specific multicast protection mechanism is needed. 512 Global 1:1 protection, global 1+1 protection and local protection are 513 developed in this section. In Figure 4.1, assume RB7 is the ingress 514 RBridge of the multicast stream while RB9 and RB10 are the multicast 515 receivers. Suppose link RB1-RB5 fails during the multicast 516 forwarding. The backup DT rooted at RB2 does not include link RB1- 517 RB5, therefore it can be used to protect this link. In global 1:1 518 protection, RB7 will switch the subsequent multicast traffic to this 519 backup DT when it's notified about the link failure. In the global 520 1+1 protection, RB7 will inject two copies of the multicast stream 521 and let multicast receivers RB9 and RB10 merge them. In the local 522 protection, when link RB1-RB5 fails, RB1 will locally replicate the 523 multicast traffic and send it on the backup DT. 525 5.1. Global 1:1 Protection 527 In the global 1:1 protection, the ingress RBridge of the multicast 528 traffic is responsible for switching the failure affected traffic 529 from the primary DT over to the backup DT. Since the backup DT has 530 been installed in advance, the global protection need not wait for 531 the DT recalculation and installation. When the ingress RBridge is 532 notified about the failure, it immediately makes this switch over. 534 This type of protection is simple and duplication safe. However, 535 depending on the topology of the RBridge campus, the time spent on 536 the failure detection and propagation through the IS-IS control plane 537 may still cause considerable service disruption. 539 BFD (Bidirectional Forwarding Detection) protocol can be used to 540 reduce the failure detection time [rbBFD]. Link failures can be 541 rapidly detected with one-hop BFD. Multi-destination BFD extends BFD 542 mechanism to include the fast failure detection of multicast paths 543 [mBFD]. It can be used to reduce both the failure detection and 544 propagation time in the global protection. In multi-destination BFD, 545 ingress RBridge need to send BFD control packets to poll each 546 receiver, and receivers return BFD control packets to the ingress as 547 response. If no response is received from a specific receiver for a 548 detection time, the ingress can judge that the connectivity to this 549 receiver is broken. In this way, multi-destination BFD detects the 550 connectivity of a path rather than a link. The ingress RBridge will 551 determine a minimum failed branch which contains this receiver. The 552 ingress RBridge will switch ongoing multicast traffic based on this 553 judgment. For example, on figure 4.1, if RB9 does not response while 554 RB10 still responds, RB7 will presume that link RB1-RB5 and RB5-RB9 555 are failed. Multicast traffic will be switched to a backup DT that 556 can protect these two links. Accurate link failure detection might 557 help ingress RBridges to make smarter decision but it's out of the 558 scope of this document. 560 5.2. Global 1+1 Protection 561 In the global 1+1 protection, the multicast source RBridge always 562 replicates the multicast packets and sends them onto both the primary 563 and backup DT. This may sacrifice the capacity efficiency but given 564 there is much connection redundancy and inexpensive bandwidth in Data 565 Center Networks, such kind of protection can be popular [MoFRR]. 567 5.2.1. Failure Detection 569 Egress RBridges (merge points) SHOULD realize the link failure as 570 early as possible so that failure affected egress RBridges may update 571 their RPF filters quickly to minimize the traffic disruption. Three 572 options are provided as follows. 574 1. Egress RBridges assume a minimum known packet rate for a given 575 data stream [MoFRR]. A failure detection timer Td are set as the 576 interval between two continuous packets. Td is reinitialized each 577 time a packet is received. If Td expires and packets are arriving 578 at the egress RBridge on the backup DT (within the time frame Td), 579 it updates the RPF filters and starts to receive packets forwarded 580 on the backup DT. 582 2. With multi-destination BFD, when a link failure happens, affected 583 egress RBridges can detect a lack of connectivity from the ingress 584 [mBFD]. Therefore these egress RBridges are able to update their 585 RPF filters promptly. 587 3. Egress RBridges can always rely on the IS-IS control plane to 588 learn the failure and determine whether their RPF filters should 589 be updated. 591 5.2.2. Traffic Forking and Merging 593 For the sake of protection, transit RBridges SHOULD activate both 594 primary and backup RPF filters, therefore both copies of the 595 multicast packets will pass through transit RBridges. 597 Multicast receivers (egress RBridges) MUST act as "merge points" to 598 egress only one copy of these multicast packets. This is achieved by 599 the activation of only a single RPF filter. In normal case, egress 600 RBridges activate the primary RPF filter. When a link on the pruned 601 primary DT fails, ingress RBridge cannot reach some of the receivers. 602 When these unreachable receivers realize it, they SHOULD update their 603 RPF filters to receive packets sent on the backup DT. 605 5.3. Local Protection 607 In the local protection, the Point of Local Repair (PLR) happens at 608 the upstream RBridge connecting the failed link. It is this RBridge 609 that makes the decision to replicate the multicast traffic to recover 610 this link failure. Local protection can further save the time spent 611 on failure notification through the flooding of LSPs across the 612 campus. In addition, the failure detection can be speeded up using 613 [rbBFD], therefore local protection can minimize the service 614 disruption within 50 milliseconds. 616 Since the ingress RBridge is not necessarily the root of the 617 distribution tree in TRILL, a multicast downstream point may not be 618 the descendants of the ingress point on the distribution tree. 619 Moreover, distribution trees in TRILL are bidirectional and do not 620 share the same root. There are fundamental differences between the 621 distribution tree calculation of TRILL and those used in PIM and 622 mLDP, therefore local protection mechanisms used for PIM and mLDP, 623 such as [mMRT] and [MoFRR], are not applicable here. 625 5.3.1. Start Using the Backup Distribution Tree 627 The egress nickname TRILL header field of the replicated multicast 628 TRILL data packets specifies the tree on which they are being 629 distributed. This field will be rewritten to the backup DT's root 630 nickname by the PLR. But the ingress of the multicast frame MUST 631 remain unchanged. This is a halfway change of the DT for multicast 632 packets. Afterwards, the PLR begins to forward multicast traffic 633 along the backup DT. This is a change from [RFC6325] which specifies 634 that the egress nickname in the TRILL header of a multi-destination 635 TRILL data packet must not be changed by transit RBridges. 637 In the above example, if PLR RB1 decides to send replicated multicast 638 packets according to the backup DT, it will send it to the next hop 639 RB2. . 641 5.3.2. Duplication Suppression 643 When a PLR starts to send replicated multicast packets on the backup 644 DT, some multicast packets are still being sent along the primary DT. 645 Some egress RBridges might receive duplicated multicast packets. The 646 traffic forking and merging method in the global 1+1 protection can 647 be adopted to suppress the duplication. 649 5.3.3. An Example to Walk Through 651 The example used in the above local protection is put together to get 652 a whole "walk through" below. 654 In the normal case, multicast frames ingressed by RB7 with pruned 655 distribution on primary DT rooted at RB1 are being received by RB9 656 and RB10. When the link RB1-RB5 fails, the PLR RB1 begins to 657 replicate and forward subsequent multicast packets using the pruned 658 backup DT rooted at RB2. When RB2 gets the multicast packets from the 659 link RB1-RB2, it accepts them since the RPF filter {DT=RB2, 660 ingress=RB7, receiving links=RB1-RB2, RB3-RB2, RB4-RB2, RB5-RB2 and 661 RB6-RB2} is installed on RB2. RB2 forwards the replicated multicast 662 packets to its neighbors except RB1. When the multicast packets reach 663 RB6 where both RPF filters {DT=RB1, ingress=RB7, receiving link=RB1- 664 RB6} and {DT=RB2, ingress=RB7, receiving links=RB2-RB6 and RB9-RB6} 665 are active. RB6 will let both multicast streams through. Multicast 666 packets will finally reach RB9 where the RPF filter is updated from 667 {DT=RB1, ingress=RB7, receiving link=RB5-RB9} to {DT=RB2, 668 ingress=RB7, receiving link=RB6-RB9}. RB9 will egress the multicast 669 packets on to the local link. 671 5.4. Switching Back to the Primary Distribution Tree 673 Assume an RBridge receives the LSP that indicates a link failure. 674 This RBridge starts to calculate the new primary DT based on the 675 topology with the failed link. Suppose the new primary DT is 676 installed at t1. 678 The propagation of LSPs around the campus takes time. For safety, we 679 assume all RBridges in the campus have converged to the new primary 680 DT at t1+Ts. By default, Ts (the "settling time") is set to 30s but 681 is configurable. At t1+Ts, the ingress RBridge switches the traffic 682 from the backup DT back to the new primary DT. 684 After another Ts (at t1+2*Ts), no multicast packets are being 685 forwarded along the old primary DT. The backup DT should be updated 686 according to the new primary DT. The process of this update under 687 different protection types are discussed as follows. 689 a) For the global 1:1 protection, the backup DT is simply updated at 690 t1+2*Ts. 692 b) For the global 1+1 protection, the ingress RBridge stops 693 replicating the multicast packets onto the old backup DT at t1+Ts. 694 The backup DT is updated at t1+2*Ts. It MUST wait for another Ts, 695 during which time period all RBridges converge to the new backup 696 DT. At t1+3*Ts, the ingress RBridge MAY start to replicate 697 multicast packets onto the new backup DT. 699 c) For the local protection, the PLR stops replicating and sending 700 packets on the old backup DT at t1+Ts. It is safe for RBridges to 701 start updating the backup DT at t1+2*Ts. 703 6. Security Considerations 704 This document raises no new security issues for TRILL. 706 For general TRILL Security Considerations, see [RFC6325]. 708 7. IANA Considerations 710 No new registry or registry entries are requested to be assigned by 711 IANA. The Affinity Sub-TLV has already been defined in [6326bis]. 712 This document does not change its definition. RFC Editor: please 713 remove this section before publication. 715 Acknowledgements 717 The careful review from Gayle Noble is gracefully acknowledged. The 718 authors would like to thank the comments and suggestions from Erik 719 Nordmark, Donald Eastlake, Fangwei Hu, Hongjun Zhai and Xudong Zhang. 721 8. References 723 8.1. Normative References 725 [6326bis] D. Eastlake, T. Senevirathne, et al., "Transparent 726 Interconnection of Lots of Links (TRILL) Use of IS-IS", 727 draft-ietf-isis-rfc6326bis-01.txt, work in Progress. 729 [CMT] T. Senevirathne, J. Pathangi, et al, "Coordinated Multicast 730 Trees (CMT) for TRILL", draft-ietf-trill-cmt-02.txt, work 731 in progress. 733 [RFC6325] R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol 734 Specification", RFC 6325, July 2011. 736 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 737 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 738 Protocol Specification (Revised)", RFC 4601, August 2006. 740 [RFC6388] Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas, 741 "Label Distribution Protocol Extensions for Point-to- 742 Multipoint and Multipoint-to-Multipoint Label Switched 743 Paths", RFC 6388, November 2011. 745 [rbBFD] V. Manral, D. Eastlake, et al, "TRILL (Transparent 746 Interconnetion of Lots of Links): Bidirectional Forwarding 747 Detection (BFD) Support", draft-ietf-trill-rbridge-bfd- 748 07.txt, work in progress. 750 [ClearC] Eastlake, D., M. Zhang, A. Ghanwani, V. Manral, A. 752 Banerjee, "TRILL: Clarifications, Corrections, and Updates" 753 draft-ietf-trill-clear-correct, in RFC Editor's queue. 755 8.2. Informative References 757 [mMRT] A. Atlas, R. Kebler, et al., "An Architecture for Multicast 758 Protection Using Maximally Redundant Trees", draft-atlas- 759 rtgwg-mrt-mc-arch-02.txt, work in progress. 761 [MoFRR] A. Karan, C. Filsfils, et al., "Multicast only Fast Re- 762 Route", draft-ietf-rtgwg-mofrr-02.txt, work in progress. 764 [mBFD] D. Katz, D. Ward, "BFD for Multipoint Networks", draft- 765 ietf-bfd-multipoint-02.txt, work in progress. 767 [FGL] D. Eastlake, M. Zhang, P. Agarwal, R. Perlman, D. Dutt, 768 "TRILL (Transparent Interconnection of Lots of Links): 769 Fine-Grained Labeling", draft-ietf-trill-fine-labeling, in 770 RFC Editor's queue. 772 Author's Addresses 774 Mingui Zhang 775 Huawei Technologies Co.,Ltd 776 Huawei Building, No.156 Beiqing Rd. 777 Beijing 100095 P.R. China 779 Email: zhangmingui@huawei.com 781 Tissa Senevirathne 782 Cisco Systems 783 375 East Tasman Drive, 784 San Jose, CA 95134 786 Phone: +1-408-853-2291 787 Email: tsenevir@cisco.com 789 Janardhanan Pathangi 790 Dell/Force10 Networks 791 Olympia Technology Park, 792 Guindy Chennai 600 032 794 Phone: +91 44 4220 8400 795 Email: Pathangi_Janardhanan@Dell.com 797 Ayan Banerjee 798 Insieme Networks 799 210 W Tasman Dr, 800 San Jose, CA 95134 802 Email: ayabaner@gmail.com 804 Anoop Ghanwani 805 Dell 806 350 Holger Way 807 San Jose, CA 95134 809 Phone: +1-408-571-3500 810 Email: Anoop@alumni.duke.edu 812 Donald E. Eastlake, 3rd 813 Huawei Technologies 814 155 Beaver Street 815 Milford, MA 01757 USA 817 Phone: +1-508-333-2270 818 Email: d3e3e3@gmail.com