idnits 2.17.1 draft-yizhou-trill-tree-selection-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 22, 2014) is 3412 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 282, but not defined == Unused Reference: 'RFC6439' is defined on line 696, but no explicit reference was found in the text ** Obsolete normative reference: RFC 6439 (Obsoleted by RFC 8139) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TRILL Working Group Yizhou Li 3 INTERNET-DRAFT Donald Eastlake 4 Intended Status: Standard Track Weiguo Hao 5 Huawei Technologies 6 Radia Perlman 7 EMC 8 Naveen Nimmu 9 Broadcom 10 S. Chatterjee 11 Cisco 12 Sunny Rajagopalan 13 IBM 14 Expires: June 25, 2015 December 22, 2014 16 TRILL: Data Label based Tree Selection for Multi-destination Data 17 draft-yizhou-trill-tree-selection-03 19 Abstract 21 TRILL uses distribution trees to deliver multi-destination frames. 22 Multiple trees can be used by an ingress RBridge for flows regardless 23 of the VLAN, Fine Grained Label (FGL), and/or multicast group of the 24 flow. Different ingress RBridges may choose different distribution 25 trees for TRILL Data packets in the same VLAN, FGL, and/or multicast 26 group. To avoid unnecessary link utilization, distribution trees 27 should be pruned based on VLAN and/or FGL and/or multicast 28 destination address. If any VLAN, FGL, or multicast group can be sent 29 on any tree, for typical fast path hardware, the amount of pruning 30 information is multiplied by the number of tree; however, there is a 31 limited capacity for such pruning information. 33 This document specifies an optional feature to restrict the TRILL 34 Data packets sent on particular distribution trees by VLAN, FGL, 35 and/or multicast group thus reducing the total amount of pruning 36 information so that it can more easily be accommodated by fast path 37 hardware. 39 Status of this Memo 41 This Internet-Draft is submitted to IETF in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF), its areas, and its working groups. Note that 46 other groups may also distribute working documents as 47 Internet-Drafts. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 The list of current Internet-Drafts can be accessed at 55 http://www.ietf.org/1id-abstracts.html 57 The list of Internet-Draft Shadow Directories can be accessed at 58 http://www.ietf.org/shadow.html 60 Copyright and License Notice 62 Copyright (c) 2014 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents 67 (http://trustee.ietf.org/license-info) in effect on the date of 68 publication of this document. Please review these documents 69 carefully, as they describe your rights and restrictions with respect 70 to this document. Code Components extracted from this document must 71 include Simplified BSD License text as described in Section 4.e of 72 the Trust Legal Provisions and are provided without warranty as 73 described in the Simplified BSD License. 75 Table of Contents 77 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 78 1.1. Background Description . . . . . . . . . . . . . . . . . . 4 79 1.2. Motivations . . . . . . . . . . . . . . . . . . . . . . . . 5 80 2. Terminology Used in This Document . . . . . . . . . . . . . . . 7 81 3. Data Label based Tree Selection . . . . . . . . . . . . . . . . 8 82 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 8 83 3.2. Sub-TLVs for the Router Capability TLV . . . . . . . . . . 9 84 3.2.1. The Tree and VLANs APPsub-TLV . . . . . . . . . . . . . 9 85 3.2.2. The Tree and VLANs Used APPsub-TLV . . . . . . . . . . 10 86 3.2.3. The Tree and FGLs APPsub-TLV . . . . . . . . . . . . . 11 87 3.2.4. The Tree and FGLs Used APPsub-TLV . . . . . . . . . . . 12 88 3.3. Detailed Processing . . . . . . . . . . . . . . . . . . . . 12 89 3.4. Failure Handling . . . . . . . . . . . . . . . . . . . . . 13 90 3.5. Multicast Extensions . . . . . . . . . . . . . . . . . . . 14 91 4. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 15 92 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 16 93 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 16 94 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 95 7.1 Normative References . . . . . . . . . . . . . . . . . . . 17 96 7.2 Informative References . . . . . . . . . . . . . . . . . . 17 97 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 17 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 100 1. Introduction 102 1.1. Background Description 104 One or more distribution trees, identified by their root nickname, 105 are used to distribute multi-destination data in a TRILL campus 106 [RFC6325]. The RBridge having the highest tree root priority 107 announces the total number of trees that should be computed for the 108 campus. It may also specify the ordered list of trees that RBridges 109 need to compute using the Tree Identifiers (TREE-RT-IDs) sub-TLV 110 [RFC7176]. Every RBridge can specify the trees it will use in the 111 Trees Used Identifiers (TREE-USE-IDs) sub-TLV and the VLANs or fine 112 grained labels (FGLs [RFC7172]) it is interested in Interested VLANs 113 and/or Interested Labels sub-TLVs [RFC7176]. It is suggested that, by 114 default, the ingress RBridge use the distribution tree whose root is 115 the closest [RFC6325]. Trees Used Identifiers sub-TLVs are used to 116 build the RPF Check table that is used for reverse path forwarding 117 check; Interested VLANs and Interested Labels sub-TLVs are used for 118 distribution tree pruning and the multi-destination forwarding table 119 with pruning info is built based on that. Each distribution tree 120 SHOULD be pruned per VLAN/FGL, eliminating branches that have no 121 potential receivers downstream [RFC6325]. Further pruning based on 122 L2/L3 multicast address is also possible. 124 Defaults are provided but it is implementation dependent how many 125 trees to calculate, where the tree roots are located, and which 126 tree(s) are to be used by an ingress RBridge. With the increasing 127 demand to use TRILL in data center networks, there are some features 128 we can explore for multi-destination frames in the data center use 129 case. In order to achieve non-blocking data forwarding, a fat tree 130 structure is often used. Figure 1 shows a typical fat tree structure 131 based data center network. RB1 and RB2 are aggregation switches and 132 RB11 to RB14 are access switches. It is a common practice to 133 configure the tree roots to be at the aggregation switches for more 134 efficient traffic transportation. All the ingress RBridges that are 135 access switches have the same distance to all the tree roots. 137 +-----+ +-----+ 138 | RB1 | | RB2 | 139 +-----+ +-----+ 140 / | \\ / /|\ 141 / | \ \ / / | \ 142 / | \ \ / | \-----+ 143 / | \/ \ | | 144 / | /\/ \| | 145 / /---+---/ /\ |\ | 146 / / | / \ | \ | 147 / / | / \ | \ | 148 / / | / \ | \ | 149 +-----+ +-----+ +-----+ +-----+ 150 | RB11| | RB12| | RB13| | RB14| 151 +-----+ +-----+ +-----+ +-----+ 153 Figure 1. Fat Tree Structure based TRILL network 155 1.2. Motivations 157 In the structure of figure 1, if we choose to put the tree roots at 158 RB1 and RB2, the ingress RBridge (e.g. RB11) would find more than one 159 closest tree root (i.e. RB1 & RB2). Then an ingress RBridge has two 160 options to select the tree root for multi-destination frames: choose 161 one and only one as distribution tree root or use ECMP-like algorithm 162 to balance the traffic among the multiple trees whose roots are at 163 the same distance. 165 - For the former, a single tree used by each ingress RBridge, can 166 have the obvious problem of inefficient link usage. For example, if 167 RB11 chooses the tree1 that is rooted at RB1 as the distribution 168 tree, the link between RB11 and RB2 will never be used for multi- 169 destination frames ingressed by RB11. 171 - For the latter, ECMP based tree selection results in a linear 172 increase in multicast forwarding table size with the number of trees 173 as explained in the next paragraph. 175 A multicast forwarding table at an RBridge is normally used to map 176 the key of (tree nickname + VLAN) to an index to a list of ports for 177 multicast packet replication. The key used for mapping is simply the 178 tree nickname when the RBridge does not prune the tree and the key 179 could be (tree nickname + VLAN + L2/L3 multicast address) when the 180 RBridge was programmed by control plane with L2/L3 multicast pruning 181 information. 183 For any RBridge RBn, for each VLAN x, if RBn is in a distribution 184 tree t for VLAN x, there will be an entry of (t, x, port list) in the 185 multicast forwarding table on RBn. Each entry contains a distinct 186 combination of (tree nickname, VLAN) as the lookup key. If there are 187 n such trees and m such VLANs, the multicast forwarding table size on 188 RBn is n*m entries. If fine-grained label is used [RFC7172] and/or 189 finer pruning is used (for example, VLAN + multicast group address is 190 used for pruning), the value of m increases. In the larger scale data 191 center, more trees would be necessary for better load balancing 192 purpose and it results in the increasing of value n. In either case, 193 the number of table entries n*m will increase dramatically. 195 The left table in Figure 2 shows an example of the multicast 196 forwarding table on RB11 in the Figure 1 topology with 2 distribution 197 trees in a campus. The number of entries is approximately 2 * 4K in 198 this case. If 4 distribution trees are used in a TRILL campus and RBn 199 has 4K VLANs with downstream receivers, it consumes 16K table 200 entries. TRILL multicast forwarding tables have a limited size in 201 hardware implementation. The table entries are a precious resource. 202 In some implementations, it shares with L3 IP multicast for a total 203 of 16K or 8K table entries. Therefore we want to save the table size 204 consumed as much as possible and at the same time to maintain the 205 load balancing among trees. 207 In cases where blocks of consecutive VLANs or FGLs can be assigned to 208 a tree, it would be very helpful in compressing the multicast 209 forwarding table if entries could have a Data Label value and mask 210 and the fast path hardware could do longest prefix matching. But few 211 if any fast path implementations provide such logic. 213 A straightforward way to alleviate the limited table entries problem 214 is not to prune the distribution tree. However it can only be used in 215 the restricted scenarios for the following reasons: 217 - Unnecessary bandwidth waste for multi-destination packet. There is 218 broadcast traffic in each VLAN, like ARP and unknown unicast. In 219 addition, if there is a lot of L3 multicast traffic in some VLAN, no 220 pruning may result in the worse consequence of L3 user data 221 unnecessarily flooded over the campus. The volume could be huge if 222 certain applications like IPTV are supported. Finer pruning like 223 pruning based on multicast group may be desirable in this case. 225 - Only useful at pure transit nodes. Edge nodes always need to 226 maintain the multicast forwarding table with the key of (tree 227 nickname + VLAN) since the edge node needs to decide whether and how 228 to replicate the frame to local access ports based on VLAN. It is 229 very likely that edge nodes are relatively low scale switches with 230 the smaller shared table size, say 4K, available. 232 - Security concerns. VLAN based traffic isolation is a basic 233 requirement in some scenarios. No pruning may result in the 234 unnecessary leakage of the traffic. Misbehaved RBridges may take 235 advantage of this. 237 In addition to the multicast table size concern, some silicon does 238 not support hashing based tree nickname selection at the ingress 239 RBridge currently. VLAN based tree selection is used instead. The 240 control plane of the ingress RBridge maps the incoming VLAN x to a 241 tree nickname t. Then the data plane will always use tree t for VLAN 242 x multi-destination frames. Though an ingress RBridge may choose 243 multiple trees to be used for load sharing, it can use one and only 244 one tree for each VLAN. If we make sure all ingress RBridges campus- 245 wide send VLAN x multi-destination packets only using tree t, then 246 there would be no need to store the multicast table entry with the 247 key of (tree-other- than-t, x) on any RBridge. 249 This document describes the control plane support for a VLAN based 250 tree selection mechanism to reduce the multicast forwarding table 251 size. It is compatible with the silicon implementation mentioned in 252 the previous paragraph. Here VLAN based tree selection is a general 253 term which also includes finer granularity case such as VLAN + L2/L3 254 multicast or FGL group based selection. 256 2. Terminology Used in This Document 258 This document uses the terminology from [RFC6325] and [RFC7172], some 259 of which is repeated below for convenience, along with some 260 additional terms listed below: 262 campus: Name for a TRILL network, like "bridged LAN" is a name for a 263 bridged network. It does not have any academic implication. 265 Data Label: VLAN or FGL. 267 ECMP: Equal Cost Multi-Path [RFC6325]. 269 FGL: Finge Grainge Lable [RFC7172]. 271 RBridge: An alternative name for a TRILL switch. 273 TRILL: Transparent Interconnection of Lots of Links (or Tunneled 274 Routing in the Link Layer). 276 TRILL switch: A device implementing the TRILL protocol. Sometimes 277 called an RBridge. 279 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 280 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 281 document are to be interpreted as described in RFC-2119 [RFC2119]. 283 3. Data Label based Tree Selection 285 Data Label based tree selection can be used as a complementary 286 distribution tree selection mechanism, especially when the multicast 287 forwarding table size is a concern. 289 3.1 Overview 291 The tree root with the highest priority announces the tree nicknames 292 and the Data Labels allowed on each tree. Such tree to Data Label 293 correspondence announcements can be based on static configuration or 294 some predefined algorithm beyond the scope of this document. An 295 ingress RBridge selects the tree-VLAN correspondence it wishes to use 296 from the list announced by the highest priority tree root. It SHOULD 297 NOT transmit VLAN x frame on tree y if the highest priority tree root 298 does not say VLAN x is allowed on tree y. 300 If we make sure one VLAN is allowed on one and only one tree, we can 301 keep the number of multicast forwarding table entries on any RBridge 302 fixed at 4K maximum (or up to 16M in case of fine grained label). 303 Take Figure 1 as example, two trees rooted at RB1 and RB2 304 respectively. The highest priority tree root appoints the tree1 to 305 carry VLAN 1-2000 and tree2 to carry VLAN 2001-4095. With such 306 announcement by the highest priority tree root, every RBridge which 307 understands the announcement will not send the VLAN 2001-4095 on 308 tree1 or send the VLAN 1-2000 on tree2. Then no RBridge would need to 309 store the entries for tree1/VLAN2001-4095 or tree2/VLAN1-2000. Figure 310 2 shows the multicast forwarding table on an RBridge before and after 311 we perform the VLAN based tree selection. The number of entries is 312 reduced by a factor f, f being the number of trees used in the 313 campus. In this example, it is reduced from 2*4095 to 4095. This 314 affects both transit nodes and edge nodes. Data plane encoding does 315 not change. 317 +--------------+-----+---------+ +--------------+-----+---------+ 318 |tree nickname |VLAN |port list| |tree nickname |VLAN |port list| 319 +--------------+-----+---------+ +--------------+-----+---------+ 320 | tree 1 | 1 | | | tree 1 | 1 | | 321 +--------------+-----+---------+ +--------------+-----+---------+ 322 | tree 1 | 2 | | | tree 1 | 2 | | 323 +--------------+-----+---------+ +--------------+-----+---------+ 324 | tree 1 | ... | | | tree 1 | ... | | 325 +--------------+-----+---------+ +--------------+-----+---------+ 326 | tree 1 | ... | | | tree 1 | 1999| | 327 +--------------+-----+---------+ +--------------+-----+---------+ 328 | tree 1 | ... | | | tree 1 | 2000| | 329 +--------------+-----+---------+ +--------------+-----+---------+ 330 | tree 1 | 4094| | | tree 2 | 2001| | 331 +--------------+-----+---------+ +--------------+-----+---------+ 332 | tree 1 | 4095| | | tree 2 | 2002| | 333 +--------------+-----+---------+ +--------------+-----+---------+ 334 | tree 2 | 1 | | | tree 2 | ... | | 335 +--------------+-----+---------+ +--------------+-----+---------+ 336 | tree 2 | 2 | | | tree 2 | 4094| | 337 +--------------+-----+---------+ +--------------+-----+---------+ 338 | tree 2 | ... | | | tree 2 | 4095| | 339 +--------------+-----+---------+ +--------------+-----+---------+ 340 | tree 2 | ... | | 341 +--------------+-----+---------+ 342 | tree 2 | ... | | 343 +--------------+-----+---------+ 344 | tree 2 | ... | | 345 +--------------+-----+---------+ 346 | tree 2 | 4094| | 347 +--------------+-----+---------+ 348 | tree 2 | 4095| | 349 +--------------+-----+---------+ 351 Figure 2. Multicast forwarding table before (left) & after (right) 353 3.2. Sub-TLVs for the Router Capability TLV 355 Four new APPsub-TLVs that can be carried in E-L1FS FS-LSPs 356 [rfc7180bis] are defined below. They can be considered analogous to 357 finer granularity versions of the Tree Identifiers Sub-TLV and the 358 Trees Used Identifiers Sub-TLV in [RFC7176]. 360 3.2.1. The Tree and VLANs APPsub-TLV 362 The Tree and VLANs (TREE-VLANs) APPsub-TLV is used to announce the 363 VLANs allowed on each tree by the RBridge that has the highest 364 priority to be a tree root. Multiple instances of this sub-TLV may be 365 carried. The same tree nicknames may occur in the multiple Tree-VLAN 366 RECORDs within the same or across multiple sub-TLVs. The sub-TLV 367 format is as follows: 369 1 1 1 1 1 1 370 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 371 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 372 | Type = tbd1 | (2 bytes) 373 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 374 | Length | (2 bytes) 375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 376 | Tree-VLAN RECORD (1) | (6 bytes) 377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 378 | ................. | 379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 380 | Tree-VLAN RECORD (N) | (6 bytes) 381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 383 where each Tree-VLAN RECORD is of the form: 384 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 385 | Nickname | (2 bytes) 386 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 387 | RESV | Start.VLAN | (2 bytes) 388 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 389 | RESV | End.VLAN | (2 bytes) 390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 392 o Type: TRILL GENINFO APPsub-TLV type, set to tbd1 (TREE-VLANs). 394 o Length: 6*n bytes, where there are n Tree-VLAN RECORDs. Thus the 395 value of Length can be used to determine n. If Length is not a 396 multiple of 6, the sub-TLV is corrupt and MUST be ignored. 398 o Nickname: The nickname identifying the distribution tree by its 399 root. 401 o RESV: 4 bits that MUST be sent as zero and ignored on receipt. 403 o Start.VLAN, End.VLAN: These fields are the VLAN IDs of the allowed 404 VLAN range on the tree, inclusive. To specify a single VLAN, the 405 VLAN's ID appears as both the start and end VLAN. If End.VLAN is less 406 than Start.VLAN the Tree-VLAN RECORD MUST be ignored. 408 3.2.2. The Tree and VLANs Used APPsub-TLV 410 This APPsub-TLV has the same structure as the Tree and VLANs APPsub- 411 TLV (TREE-VLANs) specified in Section 3.2.1. The only difference is 412 that its APPsub-TLV type is set to tbd2 (TREE-VLAN-USE), and the 413 Tree-VLAN RECORDs listed are those the originating RBridge allows. 415 3.2.3. The Tree and FGLs APPsub-TLV 417 The Tree and FGLs (TREE-FGLs) APPsub-TLV is used to announce the FGLs 418 allowed on each tree by the RBridge that has the highest priority to 419 be a tree root. Multiple instances of this APPsub-TLV may be carried. 420 The same tree nicknames may occur in the multiple Tree-FGL RECORDs 421 within the same or across multiple APPsub-TLVs. Its format is as 422 follows: 424 1 1 1 1 1 1 425 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 427 | Type = tbd3 | (2 bytes) 428 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 429 | Length | (2 bytes) 430 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 431 | Tree-FGL RECORD (1) | (8 bytes) 432 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 433 | ................. | 434 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 435 | Tree-FGL RECORD (N) | (8 bytes) 436 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 438 where each Tree-VLAN RECORD is of the form: 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 440 | Nickname | (2 bytes) 441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+ 442 | Start.FGL | (3 bytes) 443 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+ 444 | End.FGL | (3 bytes) 445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+ 447 o Type: TRILL GENINFO APPsub-TLV type, set to tbd3 (TREE-FGLs). 449 o Length: 8*n bytes, where there are n Tree-FGL RECORDs. Thus the 450 value of Length can be used to determine n. If Length is not a 451 multiple of 8, the sub-TLV is corrupt and MUST be ignored. 453 o Nickname: The nickname identifying the distribution tree by its 454 rooted. 456 o RESV: 4 bits that MUST be sent as zero and ignored on receipt. 458 o Start.FGL, End.FGL: These fields are the FGL IDs of the allowed 459 FGL range on the tree, inclusive. To specify a single FGL, the FGL's 460 ID appears as both the start and end FGL. If End.FGL is less than 461 Start.FGL the Tree-FGL RECORD MUST be ignored. 463 3.2.4. The Tree and FGLs Used APPsub-TLV 465 This APPsub-TLV has the same structure as the Tree and FGLs APPsub- 466 TLV (TREE-FGLs) specified in Section 3.2.3. The only difference is 467 that its APPsub-TLV type is set to tbd4 (TREE-FGL-USE), and the Tree- 468 FGL RECORDs listed are those the originating RBridge allows. 470 3.3. Detailed Processing 472 The highest priority tree root RBridge MUST include all the necessary 473 tree related APPsub-TLVs defined in [RFC7176] as usual in its LSP and 474 MAY include the Tree and VLANs Sub-TLV (TREE-VLANs) and or Tree and 475 FGLs Sub-TLV (TREE-FGLs) in its E-L1FS FS-LSP [rfc7180bis]. In this 476 way it MAY indicate that each VLAN and/or FGL is only allowed on one 477 or some other number of trees less than the number of trees being 478 calculated in the campus in order to save table space in the fast 479 path forwarding hardware. 481 Ingress RBridges that understands the TREE-VLANs APPsub-TLV SHOULD 482 select the tree-VLAN correspondences it wishes to use and put them in 483 TREE-VLAN-USE APPsub-TLVs. If there were multiple tree nicknames 484 announced in TREE-VLANs Sub-TLV for a VLAN x, ingress RBridge must 485 choose one of them if it supports this feature. For example, the 486 ingress RBridge may choose the minimum distance root from them. How 487 to make such choice is out of the scope of this document. It may be 488 desirable to have some fixed algorithm to make sure all ingress RBs 489 choose the same tree for VLAN x in this case. Any single Data Label 490 that the ingress RBridge is interested in should be related to one 491 and only one tree ID in TREE-VLAN-USE to minimize the multicast 492 forwarding table size on other RBridges but as long as the Data Label 493 is related to less than all the trees being calculated, it will 494 reduce the burden on the forwarding table size. 496 When an ingress RBridge tries to encapsulate a multi-destination 497 frame for Data Label x, it SHOULD use the tree nickname that it 498 selected previously in TREE-VLAN-USE or TREE-FGL-USE for Data Label 499 x. 501 If RBridge RBn does not perform pruning, it builds the multicast 502 forwarding table exactly same as that in [RFC6325]. 504 If RBn prunes the distribution tree based on VLANs, RBn uses the 505 information received in TREE-VLAN-USE APPsub-TLVs to mark the set of 506 VLANs reachable downstream for each adjacency and for each related 507 tree. If RBn prunes the distribution tree based on FGLs, RBn uses the 508 information received in TRILL-FGL-USE APPsub-TLVs to mark the set of 509 FLGs reachable downstream for each adjacency and for each related 510 tree. 512 Logically, an ingress RBridge that does not support VLAN based tree 513 selection is equivalent to the one that supports it and announces all 514 the combination pair of tree-id-used and interested-vlan as TREE- 515 VLAN-USE and correspondingly for FGL. 517 3.4. Failure Handling 519 Failure of a tree root that is not the highest priority: It is the 520 responsibility of the highest priority tree root to inform other 521 RBridges of any change in the allowed tree-VLAN correspondence. When 522 the highest priority tree root learns the root of tree t fails, it 523 should re-assign the VLANs allowed on tree t to other trees or to a 524 tree replacing the failed one. 526 Failure of the highest priority tree root: It is recommended that the 527 second highest priority tree root be pre-configured with the proper 528 knowledge of the tree-VLAN correspondence allowed when the highest 529 priority tree root fails. The information announced by the second 530 priority tree root would be stored by all RBridges but would not take 531 effect unless the RBridge noticed the failure of the highest priority 532 tree root. When the highest priority tree root fails, the former 533 second priority tree root will become the highest priority tree root 534 of the campus. When an RBridge notices the failure of the original 535 highest priority tree root, it can immediately use the stored 536 information announced by the original second priority tree root. It 537 is recommended that the tree-VLAN correspondence information be pre- 538 configured on the second highest priority tree root to be the same as 539 that on the highest priority tree root for the trees other than the 540 highest priority tree itself. This can minimize the change of 541 multicast forwarding table in case of the highest priority tree root 542 failure. 544 In some transient conditions or in case of misbehavior by the highest 545 priority tree root, an ingress RBridge may encounter the following 546 scenarios: 548 - No tree has been announced to allow VLAN x frames 550 - An ingress RBridge is supposed to transmit VLAN x frames on tree t, 551 but root of tree t is no longer reachable. 553 For the second case, an ingress RBridge may choose another reachable 554 tree root which allows VLAN x according to the highest priority tree 555 root announcement. If there is no such tree available, then it is 556 same as the first case above. Then the ingress RBridge should be 557 'downgraded' to a conventional BRridge in [RFC6325]. A timer should 558 be set to allow the temporary transient stage to complete before the 559 change of responsive tree or 'downgrade' takes effect. The value of 560 timer should at least be set to the LSP flooding time of the campus. 562 3.5. Multicast Extensions 564 Data Label based tree selection can be easily extended to (Data Label 565 + L2/L3 multicast group) based tree selection. For example, we can 566 appoint multicast group 1 in VLAN 10 to tree1 and appoint group 2 in 567 VLAN 10 to tree2 for better load sharing. Additional APPsub-TLVs 568 could be specified similar to that below for this purpose. 570 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 571 | Type | (2 byte) 572 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 573 | Length | (2 byte) 574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 575 | Tree Nickname | (2 bytes) 576 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 577 | RESV | VLAN ID | (2 bytes) 578 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 579 |Num Group Recs | (1 byte) 580 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 581 | GROUP RECORDS (1) | 582 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 583 | GROUP RECORDS (2) | 584 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 585 | ................. | 586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 587 | GROUP RECORDS (N) | 588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 590 where each group record is of the following form with k=4 for group 591 IPv4 address and k=6 for group MAC address and k=16 for group IPv6 592 address: 594 +-+-+-+-+-+-+-+-+ 595 | Num of Sources| (1 byte) 596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 597 | Group Address (k bytes) | 598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 599 | Source 1 Address (k bytes) | 600 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 601 | Source 2 Address (k bytes) | 602 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 603 | ..... | 604 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 605 | Source M Address (k bytes) | 606 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 608 4. Backward Compatibility 610 RBridges MUST include the TREE-USE-IDs and INT-VLAN sub-TLVs in their 611 LSPs when required by [RFC6325] whether or not they supports the new 612 TREE-VLAN-USE or TREE-FGL-USE sub-TLVs specified by this draft. 614 RBridges that understand the new TREE-VLAN-USE sub-TLV sent from 615 another RBridge RBn should use it to build the multicast forwarding 616 table and ignore the TREE-USE-IDs and INT-VLAN sub-TLVs sent from the 617 same RBridge. TREE-USE-IDs and INT-VLAN sub-TLVs are still useful for 618 some purposes other than building multicast forwarding table, e.g. 619 RPF table building, spanning tree root notification, etc. If the 620 RBridge does not receive TREE-VLAN-USE sub-TLV from RBn, it uses the 621 conventional way described in [RFC6325] to build the multicast 622 forwarding table. 624 For example, there are two distribution trees, tree1 & tree2 in the 625 campus. RB1 and RB2 are new RBridges that use the new APPsub-TLVs 626 described in this document. RB3 is an old RBridge that is compatible 627 with [RFC6325]. Assume RB2 is interested in VLANs 10 and 11 and RB3 628 is interested in VLANs 100 and 101. Hence RB1 receives ((tree1, 629 VLAN10), (tree2, VLAN11)) as TREE-VLAN-USE sub-TLV and (tree1, tree2) 630 as TREE-USE-IDs sub-TLV from RB2 on port x. And RB1 receives (tree1) 631 as TREE-USE-IDs sub-TLV and no TREE-VLAN-USE sub-TLV from RB3 on port 632 y. RB2 and RB3 announce their interested VLANs in INT-VLAN sub-TLV as 633 usual. Then RB1 will build the entry of (tree1, VLAN10, port x) and 634 (tree2, VLAN11, port x) based on RB2's LSP and mechanism specified in 635 this document. RB1 also builds entry of (tree1, VLAN100, port y), 636 (tree1, VLAN101, port y), (tree2, VLAN100, port y), (tree2, VLAN101, 637 port y) based on RB3's LSP in conventional way. The multicast 638 forwarding table on RB1 with merged entry would be like the 639 following. 641 +--------------+-----+---------+ 642 |tree nickname |VLAN |port list| 643 +--------------+-----+---------+ 644 | tree 1 | 10 | x | 645 +--------------+-----+---------+ 646 | tree 1 | 100 | y | 647 +--------------+-----+---------+ 648 | tree 1 | 101 | y | 649 +--------------+-----+---------+ 650 | tree 2 | 11 | x | 651 +--------------+-----+---------+ 652 | tree 2 | 100 | y | 653 +--------------+-----+---------+ 654 | tree 2 | 101 | y | 655 +--------------+-----+---------+ 657 It is expected that the table is not as small as the one where every 658 RBridge supports the new TREE-VLAN-USE sub-TLVs. The worst case in a 659 hybrid campus is the number of entries equal to the number in current 660 practice which does not support VLAN based tree selection. Such 661 extreme case happens when the interested VLAN set from the new 662 RBridges is a subset of the interested VLAN set from the old 663 RBridges. 665 VLAN based tree selection is compatible with the current practice. 666 Its effectiveness increases with more RBridge supporting this feature 667 in the TRILL campus. 669 5. Security Considerations 671 This document does not change the general RBridge security 672 considerations of the TRILL base protocol. See Section 6 of 673 [RFC6325] for general TRILL security considerations. 675 6. IANA Considerations 677 IANA is requested to allocatehas assigned four new TRILL APPsub-TLV 678 type codes as specified in Section 3 and updated the TRILL Parameters 679 registry as shown below. 681 Type Name Reference 682 ---- ---- --------- 684 tbd1 TREE-VLANs [this document] 685 tbd2 TREE-VLAN-USE [this document] 686 tbd3 TREE-FGLs [this document] 687 tbd4 TREE-FGL-USE [this document] 689 7. References 691 7.1 Normative References 693 [RFC6325] Perlman, R., et.al. "RBridge: Base Protocol Specification", 694 RFC 6325, July 2011. 696 [RFC6439] Eastlake, D. et.al., "RBridge: Appointed Forwarder", RFC 697 6439, November 2011. 699 [RFC7172] Eastlake 3rd, D., Zhang, M., Agarwal, P., Perlman, R., and 700 D. Dutt, "Transparent Interconnection of Lots of Links 701 (TRILL): Fine-Grained Labeling", RFC 7172, May 2014, 702 . 704 [RFC7176] Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, D., 705 and A. Banerjee, "Transparent Interconnection of Lots of 706 Links (TRILL) Use of IS-IS", RFC 7176, May 2014, 707 . 709 [rfc7180bis] Eastlake 3rd, D. et. Al. draft-eastlake-trill- 710 rfc7180bis, work in progress. 712 7.2 Informative References 714 8. Acknowledgments 716 Authors wish to thank David M. Bond, Liangliang Ma, Rakesh Kumar R 717 for the valuable comments (names in alphabet order). 719 Authors' Addresses 721 Yizhou Li 722 Huawei Technologies 723 101 Software Avenue, 724 Nanjing 210012 725 China 727 Phone: +86-25-56624629 728 EMail: liyizhou@huawei.com 729 Donald Eastlake 730 Huawei R&D USA 731 155 Beaver Street 732 Milford, MA 01757 USA 734 Phone: +1-508-333-2270 735 Email: d3e3e3@gmail.com 737 Weiguo Hao 738 Huawei Technologies 739 101 Software Avenue, 740 Nanjing 210012 741 China 743 Phone: +86-25-56623144 744 Email: haoweiguo@huawei.com 746 Radia Perlman 747 EMC 748 2010 256th Avenue NE, #200 749 Bellevue, WA 98007 750 USA 752 Email: Radia@alum.mit.edu 754 Naveen Nimmu 755 Broadcom 756 9th Floor, Building no 9, Raheja Mind space 757 Hi-Tec City, Madhapur, 758 Hyderabad - 500 081, INDIA 760 Phone: +1-408-218-8893 761 Email: naveen@broadcom.com 763 Somnath Chatterjee 764 Cisco Systems, 765 SEZ Unit, Cessna Business Park, 766 Outer ring road, 767 Bangalore - 560087 768 India 770 EMail: somnath.chatterjee01@gmail.com 772 Sunny Rajagopalan 773 IBM 775 Email: sunny.rajagopalan@us.ibm.com