idnits 2.17.1 draft-ietf-trill-tree-selection-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 30, 2016) is 2856 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 166, but not defined Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TRILL Working Group Y. Li 3 INTERNET-DRAFT D. Eastlake 4 Intended Status: Standard Track W. Hao 5 H. Chen 6 Huawei Technologies 7 S. Chatterjee 8 Cisco 9 Expires: January 1, 2017 June 30, 2016 11 TRILL: Data Label based Tree Selection for Multi-destination Data 12 draft-ietf-trill-tree-selection-05 14 Abstract 16 TRILL uses distribution trees to deliver multi-destination frames. 17 Multiple trees can be used by an ingress RBridge for flows regardless 18 of the VLAN, Fine Grained Label (FGL), and/or multicast group of the 19 flow. Different ingress RBridges may choose different distribution 20 trees for TRILL Data packets in the same VLAN, FGL, and/or multicast 21 group. To avoid unnecessary link utilization, distribution trees 22 should be pruned based on VLAN and/or FGL and/or multicast 23 destination address. If any VLAN, FGL, or multicast group can be sent 24 on any tree, for typical fast path hardware, the amount of pruning 25 information is multiplied by the number of trees, but there is a 26 limited hardware capacity for such pruning information. 28 This document specifies an optional facility to restrict the TRILL 29 Data packets sent on particular distribution trees by VLAN, FGL, 30 and/or multicast group thus reducing the total amount of pruning 31 information so that it can more easily be accommodated by fast path 32 hardware. 34 Status of this Memo 36 This Internet-Draft is submitted to IETF in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF), its areas, and its working groups. Note that 41 other groups may also distribute working documents as 42 Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/1id-abstracts.html 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html 55 Copyright and License Notice 57 Copyright (c) 2016 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 1.1. Background Description . . . . . . . . . . . . . . . . . . 4 74 1.2. Terminology Used in This Document . . . . . . . . . . . . . 4 75 2. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 5 76 3. Data Label based Tree Selection . . . . . . . . . . . . . . . 8 77 3.1. Overview of the Mechanism . . . . . . . . . . . . . . . . . 8 78 3.2. APPsub-TLVs Supporting Tree Selection . . . . . . . . . . . 9 79 3.2.1. The Tree and VLANs APPsub-TLV . . . . . . . . . . . . . 10 80 3.2.2. The Tree and VLANs Used APPsub-TLV . . . . . . . . . . 11 81 3.2.3. The Tree and FGLs APPsub-TLV . . . . . . . . . . . . . 11 82 3.2.4. The Tree and FGLs Used APPsub-TLV . . . . . . . . . . . 12 83 3.2.5. The Tree and Groups APPsub-TLV . . . . . . . . . . . . 13 84 3.2.6. The Tree and Groups Used APPsub-TLV . . . . . . . . . . 13 85 3.3. Detailed Processing . . . . . . . . . . . . . . . . . . . . 14 86 3.4. Failure Handling . . . . . . . . . . . . . . . . . . . . . 15 87 4. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16 88 5. Security Considerations . . . . . . . . . . . . . . . . . . . 17 89 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 90 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 91 7.1. Normative References . . . . . . . . . . . . . . . . . . . 18 92 7.2. Informative References . . . . . . . . . . . . . . . . . . 18 93 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 19 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 96 1. Introduction 98 1.1. Background Description 100 One or more distribution trees, identified by their root nickname, 101 are used to distribute multi-destination data in a TRILL campus 102 [RFC6325]. The RBridge having the highest tree root priority 103 announces the total number of trees that should be computed for the 104 campus. It may also specify the list of trees that RBridges need to 105 compute using the Tree Identifiers (TREE-RT-IDs) sub-TLV [RFC7176]. 106 Every RBridge can specify the trees it will use for multi-destination 107 TRILL data packets it originates in the Trees Used Identifiers (TREE- 108 USE-IDs) sub-TLV and the VLANs or fine grained labels (FGLs 109 [RFC7172]) it is interested in are specified in Interested VLANs 110 and/or Interested Labels sub-TLVs [RFC7176]. It is suggested that, by 111 default, the ingress RBridge uses the distribution tree whose root is 112 the closest [RFC6325]. Trees Used Identifiers sub-TLVs are used to 113 build the RPF (Reverse Path Forwarding) Check table that is used for 114 reverse path forwarding check, Interested VLANs and Interested Labels 115 sub-TLVs are used for distribution tree pruning and the multi- 116 destination forwarding table with pruning info is built based on that 117 RPF Check Table. To reduce unnecessary link loads, each distribution 118 tree should be pruned per VLAN/FGL, eliminating branches that have no 119 potential receivers downstream as specified in [RFC6325]. Further 120 pruning based on Layer 2 or Layer 3 multicast address is also 121 possible. 123 Defaults are provided but it depends on the implementation how many 124 trees are calculated, where the tree roots are located, and which 125 tree(s) are to be used by an ingress RBridge. With the increasing 126 demand to use TRILL in data center networks, there are some features 127 we can explore for multi-destination frames in the data center use 128 case. In order to achieve non-blocking data forwarding, a fat tree 129 structure is often used. Figure 1 shows a typical fat tree structure 130 based data center network. RB1 and RB2 are aggregation switches and 131 RB11 to RB14 are access switches. It is a common practice to 132 configure the tree roots to be at the aggregation switches for 133 efficient traffic transportation. Then all the ingress RBridges that 134 are access switches have the same distance to all the tree roots. 136 1.2. Terminology Used in This Document 138 This document uses the terminology from [RFC6325] and [RFC7172], some 139 of which is repeated below for convenience, along with some 140 additional terms listed below: 142 Campus: Name for a TRILL network, like "bridged LAN" is a name for a 143 bridged network. It does not have any academic implication. 145 Data Label: VLAN or FGL. 147 ECMP: Equal Cost Multi-Path [RFC6325]. 149 FGL: Fine Grained Label [RFC7172]. 151 IPTV: "Television" (video) over IP. 153 RBridge: An alternative name for a TRILL switch. 155 RPF: Reverse Path Forwarding. 157 TRILL: Transparent Interconnection of Lots of Links (or Tunneled 158 Routing in the Link Layer). 160 TRILL switch: A device implementing the TRILL protocol. Sometimes 161 called an RBridge. 163 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 164 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 165 document are to be interpreted as described in RFC-2119 [RFC2119]. 167 +-----+ +-----+ 168 | RB1 | | RB2 | 169 +-----+ +-----+ 170 / | \\ / /|\ 171 / | \ \ / / | \ 172 / | \ \ / | \-----+ 173 / | \/ \ | | 174 / | /\/ \| | 175 / /---+---/ /\ |\ | 176 / / | / \ | \ | 177 / / | / \ | \ | 178 / / | / \ | \ | 179 +-----+ +-----+ +-----+ +-----+ 180 | RB11| | RB12| | RB13| | RB14| 181 +-----+ +-----+ +-----+ +-----+ 183 Figure 1. Fat Tree Structure based TRILL network 185 2. Motivations 187 In the structure of Figure 1, if we choose to put the tree roots at 188 RB1 and RB2, the ingress RBridge (e.g. RB11) would find more than one 189 equal cost closest tree root (i.e. RB1 & RB2). An ingress RBridge has 190 two options to select the tree root for multi-destination frames: 191 choose one and only one as distribution tree root or use ECMP-like 192 algorithm to balance the traffic among the multiple trees whose roots 193 are at the same distance. 195 - For the former (one distribution tree root), a single tree used by 196 each ingress RBridge, can have the problem of uneven or inefficient 197 link usage. For example, if RB11 chooses the tree1 that is rooted at 198 RB1 as the distribution tree, the link between RB11 and RB2 will not 199 be used for multi-destination frames ingressed by RB11. 201 - For the latter (ECMP-Like algorithm), ECMP based tree selection 202 results in a linear increase in multicast forwarding table size with 203 the number of trees as explained in the next paragraph. 205 A multicast forwarding table at an RBridge is normally used to map 206 the key of (distribution tree nickname + VLAN) to an index to a list 207 of ports for multicast packet replication. The key used for mapping 208 is simply the tree nickname when the RBridge does not prune the tree. 209 The key could be the distribution tree nickname augmented by the Fine 210 Grained Label (FGL) and/or Layer 2 or 3 multicast address when the 211 RBridge supports FGL and/or Layer 2 or 3 pruning information. 213 For any RBridge RBn, for each VLAN x, if RBn is in a distribution 214 tree t used by traffic in VLAN x, there will be an entry of (t, x, 215 port list) in the multicast forwarding table on RBn. Typically each 216 entry contains a distinct combination of (tree nickname, VLAN) as the 217 lookup key. If there are n such trees and m such VLANs, the multicast 218 forwarding table size on RBn is n*m entries. If a fine-grained label 219 is used [RFC7172] and/or finer pruning is used (for example, VLAN + 220 multicast group address is used for pruning), the value of m 221 increases. In the larger scale data center, more trees would be 222 necessary for better load balancing purpose and this results in an 223 increased value for n. In either case, the number of table entries 224 n*m will increase dramatically. 226 The left hand table in Figure 2 shows an example of the multicast 227 forwarding table on RB11 in the Figure 1 topology with 2 distribution 228 trees in a campus using typical fast path hardware. The number of 229 entries is approximately 2 * 4K in this case. If 4 distribution trees 230 are used in a TRILL campus and RBn has 4K VLANs with downstream 231 receivers, it consumes 16K table entries. Fast path TRILL multicast 232 forwarding tables typically have a size limited by hardware. The 233 table entries are a precious resource. In some implementations, the 234 table is shared with Layer 3 IP multicast for a total of 16K or 8K 235 table entries. Therefore we want to reduce the table size consumed 236 for TRILL distribution trees as much as possible and at the same time 237 maintain the load balancing among trees. 239 In cases where blocks of consecutive VLANs or FGLs can be assigned to 240 a tree, the multicast forwarding table could be greatly compressed if 241 entries could have a Data Label value and mask with the fast path 242 hardware doing the longest prefix matching. But few if any fast path 243 implementations provide such logic. 245 A straightforward way to alleviate the limited table entries problem 246 is not to prune the distribution tree. However this can only be used 247 in restricted scenarios for the following reasons: 249 - Not pruning wastes bandwidth for multi-destination packets. There 250 is normally broadcast traffic, like ARP and unknown unicast, that can 251 be pruned on VLAN (or FGL) so it is not sent down branches of a 252 distribution tree where it is not needed. In addition, if there is a 253 lot of Layer 3 multicast traffic, no pruning may result in the worse 254 consequence of that user data unnecessarily flooded all over the 255 campus. The volume could be very large if certain applications like 256 IPTV ("Television" (video) over IP) are supported. More precise 257 pruning, such as pruning based on multicast group, may be desirable 258 in this case. 260 - Not pruning is only useful at pure transit nodes. Edge nodes always 261 need to maintain the multicast forwarding table with the key of (tree 262 nickname + VLAN (or FGL)) since the edge node needs to decide whether 263 and how to replicate the frame to local access ports. It is likely 264 that edge nodes are relatively low end switches with a smaller shared 265 table size, say 4K, available. 267 - Security concerns. VLAN (or FGL) based traffic isolation is a basic 268 requirement in some scenarios. No pruning may increase the risk of 269 leakage of the traffic. Misbehaved RBridges may take advantage of 270 this leakage of traffic. 272 In addition to the multicast table size concern, some silicon does 273 not currently support hashing-based tree nickname selection at the 274 ingress RBridge but commonly uses VLAN based tree selection. If the 275 control plane of the ingress RBridge maps the incoming VLAN x to a 276 tree nickname t. Then the data plane will always use tree t for VLAN 277 x multi-destination frames. Such an ingress RBridge may choose 278 multiple trees to be used for load sharing, it can use one and only 279 one tree for each VLAN. If we make sure all ingress RBridges campus- 280 wide send VLAN x multi-destination packets only using tree t, then 281 there would be no need to store the multicast table entry with the 282 key of (tree-other-than-t, x) on any RBridge. 284 This document describes the TRILL control plane support for 285 distribution tree selection based on VLAN, FGL, and/or multicast 286 address to reduce the multicast forwarding table size. It is 287 compatible with the silicon implementations mentioned in the previous 288 paragraph. 290 3. Data Label based Tree Selection 292 Data Label (VLAN or FGL) based tree selection can be used as a 293 distribution tree selection mechanism, especially when the multicast 294 forwarding table size is a concern. This section specifies that 295 mechanism and how to extend it so that tree selection can be based on 296 multicast group. 298 3.1. Overview of the Mechanism 300 The RBridge that has the highest priority to be a tree root announces 301 the tree nicknames and the Data Labels allowed on each tree. Such 302 tree to Data Label correspondence announcements can be based on 303 static configuration or some predefined algorithm beyond the scope of 304 this document. An ingress RBridge selects the tree-VLAN 305 correspondence it wishes to use from the list announced by the 306 highest priority tree root. It SHOULD NOT transmit VLAN x frame on 307 tree y if the highest priority tree root does not say VLAN x is 308 allowed on tree y. 310 If we make sure a particular VLAN is allowed on one and only one 311 tree, we can keep the number of multicast forwarding table entries on 312 any RBridge fixed at 4K maximum (or up to 16M in case of fine grained 313 label). Take Figure 1 as example, two trees rooted at RB1 and RB2 314 respectively. The highest priority tree root appoints the tree1 to 315 carry VLAN 1-2000 and tree2 to carry VLAN 2001-4094. With such 316 announcement by the highest priority tree root, every RBridge which 317 understands the announcement will not send VLAN 2001-4094 traffic on 318 tree1 and not send VLAN 1-2000 traffic on tree2. Then no RBridge 319 would need to store the entries for tree1/VLAN2001-4094 or 320 tree2/VLAN1-2000. Figure 2 shows the multicast forwarding table on an 321 RBridge before and after we use VLAN based tree selection. The number 322 of entries is reduced by a factor f, f being the number of trees used 323 in the campus. In this example, it is reduced from 2*4094 to 4094. 324 This affects both transit nodes and edge nodes. The data plane 325 encoding does not change. 327 +--------------+-----+---------+ +--------------+-----+---------+ 328 |tree nickname |VLAN |port list| |tree nickname |VLAN |port list| 329 +--------------+-----+---------+ +--------------+-----+---------+ 330 | tree 1 | 1 | | | tree 1 | 1 | | 331 +--------------+-----+---------+ +--------------+-----+---------+ 332 | tree 1 | 2 | | | tree 1 | 2 | | 333 +--------------+-----+---------+ +--------------+-----+---------+ 334 | tree 1 | ... | | | tree 1 | ... | | 335 +--------------+-----+---------+ +--------------+-----+---------+ 336 | tree 1 | ... | | | tree 1 | 1999| | 337 +--------------+-----+---------+ +--------------+-----+---------+ 338 | tree 1 | ... | | | tree 1 | 2000| | 339 +--------------+-----+---------+ +--------------+-----+---------+ 340 | tree 1 | 4093| | | tree 2 | 2001| | 341 +--------------+-----+---------+ +--------------+-----+---------+ 342 | tree 1 | 4094| | | tree 2 | 2002| | 343 +--------------+-----+---------+ +--------------+-----+---------+ 344 | tree 2 | 1 | | | tree 2 | ... | | 345 +--------------+-----+---------+ +--------------+-----+---------+ 346 | tree 2 | 2 | | | tree 2 | 4093| | 347 +--------------+-----+---------+ +--------------+-----+---------+ 348 | tree 2 | ... | | | tree 2 | 4094| | 349 +--------------+-----+---------+ +--------------+-----+---------+ 350 | tree 2 | ... | | 351 +--------------+-----+---------+ 352 | tree 2 | ... | | 353 +--------------+-----+---------+ 354 | tree 2 | ... | | 355 +--------------+-----+---------+ 356 | tree 2 | 4093| | 357 +--------------+-----+---------+ 358 | tree 2 | 4094| | 359 +--------------+-----+---------+ 361 Figure 2. Multicast forwarding table before (left) & after (right) 363 3.2. APPsub-TLVs Supporting Tree Selection 365 Six new APPsub-TLVs that can be carried in the TRILL GENINFO TLV 366 [RFC7357] in E-L1FS FS-LSPs [rfc7780] are defined below. The first 367 four can be considered analogous to finer granularity versions of the 368 Tree Identifiers Sub-TLV and the Trees Used Identifiers Sub-TLV in 369 [RFC7176]. Two APPsub-TLVs supporting VLAN based tree selection are 370 specified in Sections 3.2.1 and 3.2.2. They are used by the highest 371 priority tree root to announce the allowed VLANs on each tree in the 372 campus and by an ingress RBridge to announce the tree-VLAN 373 correspondence it selects from the list announced by the highest 374 priority tree root. Two APPsub-TLVs supporting FGL based tree 375 selection are specified in Section 3.2.3 and 3.2.4 for the same 376 purpose. Sections 3.2.5 and 3.2.6 define two APPsub-TLVs to support 377 finer granularity in selecting trees based on multicast group rather 378 than Data Label. 380 New APPSubTLVs Description 381 ======================= ============= 382 Tree and VLANS announcement by the highest priority 383 tree root of the VLANs allowed per tree 384 Tree and VLANS Used tree-VLAN correspondence an ingress 385 RBridge selects 386 Tree and FGLs announcement by the highest priority 387 tree root of the FGLs allowed per tree 388 Tree and FGLs Used tree-FGL correspondence an ingress 389 RBridge selects 390 Tree and GROUPs announcement by the highest priority 391 tree root of the multicast groups 392 allowed on each tree 393 Tree and GROUPs Used tree and multicast group correspondence 394 an ingress RBridge selects 396 3.2.1. The Tree and VLANs APPsub-TLV 398 The RBridge that is the highest priority tree root announces the 399 VLANs allowed on each tree with the Tree and VLANs (TREE-VLANS) 400 APPsub-TLV. Multiple instances of this sub-TLV may be carried. The 401 same tree nicknames may occur in multiple Tree-VLAN RECORDs within 402 the same or across multiple sub-TLVs. The sub-TLV format is as 403 follows: 405 1 1 1 1 1 1 406 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 407 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 408 | Type = tbd1 | (2 bytes) 409 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 410 | Length | (2 bytes) 411 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 412 | Tree-VLAN RECORD (1) | (6 bytes) 413 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 414 | ................. | 415 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 416 | Tree-VLAN RECORD (N) | (6 bytes) 417 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 419 where each Tree-VLAN RECORD is of the form: 421 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 422 | Nickname | (2 bytes) 423 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 424 | RESV | Start.VLAN | (2 bytes) 425 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 426 | RESV | End.VLAN | (2 bytes) 427 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 429 o Type: TRILL GENINFO APPsub-TLV type, set to tbd1 (TREE-VLANS). 431 o Length: 6*n bytes, where there are n Tree-VLAN RECORDs. Thus the 432 value of Length can be used to determine n. If Length is not a 433 multiple of 6, the sub-TLV is corrupt and MUST be ignored. 435 o Nickname: The nickname identifying the distribution tree by its 436 root. 438 o RESV: 4 bits that MUST be sent as zero and ignored on receipt. 440 o Start.VLAN, End.VLAN: These fields are the VLAN IDs of the allowed 441 VLAN range on the tree, inclusive. To specify a single VLAN, the 442 VLAN's ID appears as both the start and end VLAN. If End.VLAN is less 443 than Start.VLAN the Tree-VLAN RECORD MUST be ignored. 445 3.2.2. The Tree and VLANs Used APPsub-TLV 447 This APPsub-TLV has the same structure as the Tree and VLANs APPsub- 448 TLV (TREE-VLANS) specified in Section 3.2.1. The differences are 449 that its APPsub-TLV type is set to tbd2 (TREE-VLANS-USE) and the 450 Tree-VLAN correspondences in the Tree-VLAN RECORDs listed are those 451 the originating RBridge wants to use for multi-destination packets. 452 This APPsub-TLV is used by an ingress RBridge to distribute the tree- 453 VLAN correspondence it selects from the list announced by the highest 454 priority tree root. 456 3.2.3. The Tree and FGLs APPsub-TLV 458 The RBridge that is the highest priority tree root can use the Tree 459 and FGLs (TREE-FGLS) APPsub-TLV to announce the FGLs allowed on each 460 tree. Multiple instances of this APPsub-TLV may be carried. The same 461 tree nicknames may occur in the multiple Tree-FGL RECORDs within the 462 same or across multiple APPsub-TLVs. Its format is as follows: 464 1 1 1 1 1 1 465 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 467 | Type = tbd3 | (2 bytes) 468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 469 | Length | (2 bytes) 470 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 471 | Tree-FGL RECORD (1) | (8 bytes) 472 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 473 | ................. | 474 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 475 | Tree-FGL RECORD (N) | (8 bytes) 476 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 478 where each Tree-VLAN RECORD is of the form: 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 480 | Nickname | (2 bytes) 481 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+ 482 | Start.FGL | (3 bytes) 483 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+ 484 | End.FGL | (3 bytes) 485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+ 487 o Type: TRILL GENINFO APPsub-TLV type, set to tbd3 (TREE-FGLS). 489 o Length: 8*n bytes, where there are n Tree-FGL RECORDs. Thus the 490 value of Length can be used to determine n. If Length is not a 491 multiple of 8, the sub-TLV is corrupt and MUST be ignored. 493 o Nickname: The nickname identifying the distribution tree by its 494 root. 496 o RESV: 4 bits that MUST be sent as zero and ignored on receipt. 498 o Start.FGL, End.FGL: These fields are the FGL IDs of the allowed 499 FGL range on the tree, inclusive. To specify a single FGL, the FGL's 500 ID appears as both the start and end FGL. If End.FGL is less than 501 Start.FGL the Tree-FGL RECORD MUST be ignored. 503 3.2.4. The Tree and FGLs Used APPsub-TLV 505 This APPsub-TLV has the same structure as the Tree and FGLs APPsub- 506 TLV (TREE-FGLS) specified in Section 3.2.3. The only difference is 507 that its APPsub-TLV type is set to tbd4 (TREE-FGLS-USE), and the 508 Tree-FGL RECORDs listed are those the originating RBridge wants to 509 use for multi-destination packets. This APPsub-TLV is used by an 510 ingress RBridge to distribute the tree-FGL correspondence it selects 511 from the list announced by the highest priority tree root. 513 3.2.5. The Tree and Groups APPsub-TLV 515 Data Label based tree selection is easily extended to (Data Label + 516 Layer 2 or 3 multicast group) based tree selection. We can appoint 517 multicast group 1 in VLAN 10 to tree1 and appoint group 2 in VLAN 10 518 to tree2 for better load sharing. 520 The RBridge that is the highest priority tree root can announce the 521 multicast groups allowed on each tree for each data label with the 522 Tree and Groups (TREE-GROUPS) APPsub-TLV. Multiple instances of this 523 sub-TLV may be carried. The sub-TLV format is as follows: 525 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 526 | Type = tbd5 | (2 byte) 527 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 528 | Length | (2 byte) 529 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 530 | Tree Nickname | (2 bytes) 531 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 532 | Group Sub-Sub-TLVs (variable) 533 +-+-+-+-+-+-+-+-+-+.... 535 o Type: TRILL GENINFO APPsub-TLV type, set to tbd5 (TREE-GROUPS). 537 o Length: 2 + the length of the Group Sub-Sub TLVs included 539 o Nickname: The nickname identifying the distribution tree by its 540 root. 542 o Group Sub-Sub-TLVs: Zero or more of the TLV structures that are 543 allowed as sub-TLVs of the GADDR TLV [RFC7176]. Each such TLV 544 structure specifies a multicast group and either a VLAN or FGL. 545 Although these TLV structure are considered sub-TLVs when they appear 546 inside a GADDR TLV, they are technically sub-sub-TLVs when they 547 appear inside a TREE-GROUPs APPsub-TLV which is in turn inside a 548 TRILL GENINFO TLV [RFC7357]. 550 3.2.6. The Tree and Groups Used APPsub-TLV 552 This APPsub-TLV has the same structure as the Tree and GROUPs APPsub- 553 TLV (TREE-GROUPS) specified in Section 3.2.5. The only difference is 554 that its APPsub-TLV type is set to tbd6 (TREE-GROUPS-USE), and the 555 tree and multicast groups listed in this sub-TLV are those the 556 originating RBridge wants to use for multi-destination packets. This 557 APPsub-TLV is used by an ingress RBridge to distribute the tree-group 558 correspondence it selects from the list announced by the highest 559 priority tree root. 561 3.3. Detailed Processing 563 The highest priority tree root RBridge MUST include all the necessary 564 tree related sub-TLVs defined in [RFC7176] as usual in its E-L1FS FS- 565 LSP and MAY include the Tree and VLANs Sub-TLV (TREE-VLANs) and/or 566 Tree and FGLs Sub-TLV (TREE-FGLs) in its E-L1FS FS-LSP [RFC7780]. In 567 this way it MAY indicate that each VLAN and/or FGL is only allowed on 568 one or some other number of trees less than the number of trees being 569 calculated in the campus in order to save table space in the fast 570 path forwarding hardware. 572 An ingress RBridge that understands the TREE-VLANs APPsub-TLV SHOULD 573 select the tree-VLAN correspondences it wishes to use and put them in 574 TREE-VLAN-USE APPsub-TLVs. If there are multiple tree nicknames 575 announced in TREE-VLANs Sub-TLV for a VLAN x, ingress RBridge chooses 576 one of them if it supports this feature. For example, the ingress 577 RBridge may choose the closest (minimum cost) root among them. How to 578 make such a choice is out of the scope of this document. It may be 579 desirable to have some fixed algorithm to make sure all ingress RBs 580 choose the same tree for VLAN x in this case. Any single Data Label 581 that the ingress RBridge is interested in should be related to only 582 one tree ID in TREE-VLAN-USE to minimize the multicast forwarding 583 table size on other RBridges but as long as the Data Label is related 584 to less than all the trees being calculated, it will reduce the 585 burden on the forwarding table size. 587 When an ingress RBridge encapsulates a multi-destination frame for 588 Data Label x, it SHOULD use a tree nickname that it selected 589 previously in TREE-VLAN-USE or TREE-FGL-USE for Data Label x. 590 However, that may not be possible because either (1) the RBridge may 591 not have advertised such TREE-VLAN-USE or TREE-FGL-USE APPsub-TLVs, 592 in which case it can use any tree that has been advertised as 593 permitted for the Data Label by the highest priority tree root 594 RBridge, or (2) the tree or trees it advertised might be unavailable 595 due to failures. 597 If RBridge RBn does not perform pruning, it builds the multicast 598 forwarding table as specified in [RFC6325]. 600 If RBn prunes the distribution tree based on VLANs, RBn uses the 601 information received in TREE-VLAN-USE APPsub-TLVs to mark the set of 602 VLANs reachable downstream for each adjacency and for each related 603 tree. If RBn prunes the distribution tree based on FGLs, RBn uses the 604 information received in TRILL-FGL-USE APPsub-TLVs to mark the set of 605 FLGs reachable downstream for each adjacency and for each related 606 tree. 608 Logically, an ingress RBridge that does not support VLAN/FGL based 609 tree selection is equivalent to the one that supports it and 610 announces all the combination pair of tree-id-used and interested- 611 vlan/interested-fgl as TREE-VLAN-USE. 613 3.4. Failure Handling 615 This section discusses failure of a distribution tree root for the 616 cases where that is not the highest priority root and the case where 617 it is the highest priority root. It also discusses some other 618 transient error conditions. 620 Failure of a tree root that is not the highest priority: It is the 621 responsibility of the highest priority tree root to inform other 622 RBridges of any change in the allowed tree-VLAN correspondence. When 623 the highest priority tree root learns the root of tree t has failed, 624 it should re-assign the VLANs allowed on tree t to other trees or to 625 a tree replacing the failed one. 627 Failure of the highest priority tree root: It is suggested that the 628 second highest priority tree root be pre-configured with the proper 629 knowledge of the tree-VLAN correspondence allowed when the highest 630 priority tree root fails. The information announced by the second 631 priority tree root would be in the link state of all RBridges but 632 would not take effect unless the RBridge noticed the failure of the 633 highest priority tree root. When the highest priority tree root 634 fails, the former second priority tree root will become the highest 635 priority tree root of the campus. When an RBridge notices the failure 636 of the original highest priority tree root, it can immediately use 637 the stored information announced by the original second priority tree 638 root. It is suggested that the tree-VLAN correspondence information 639 be pre-configured on the second highest priority tree root to be the 640 same as that on the highest priority tree root for the trees other 641 than the highest priority tree itself. This can minimize the change 642 to multicast forwarding tables in the case of highest priority tree 643 root failure. For a large campus, it may make sense to pre-configure 644 this information in a similar way on the third, fourth, or even lower 645 priority tree root RBridges. 647 In some transient conditions or in case of misbehavior by the highest 648 priority tree root, an ingress RBridge may encounter the following 649 scenarios: 651 - No tree has been announced for which VLAN x frames are allowed. 653 - An ingress RBridge is supposed to transmit VLAN x frames on tree t, 654 but root of tree t is no longer reachable. 656 For the second case, an ingress RBridge may choose another reachable 657 tree root which allows VLAN x according to the highest priority tree 658 root announcement. If there is no such tree available, then it is the 659 same as the first case above. Then the ingress RBridge should be 660 'downgraded' to a conventional RBridge with behavior as specified in 661 [RFC6325]. A timer should be set to allow the temporary transient 662 stage to complete before the change of responsive tree or 'downgrade' 663 takes effect. The value of timer should be set to at least the LSP 664 flooding time of the campus. 666 4. Backward Compatibility 668 RBridges MUST include the TREE-USE-IDs and INT-VLAN sub-TLVs in their 669 LSPs when required by [RFC6325] whether or not they support the new 670 TREE-VLAN-USE or TREE-FGL-USE sub-TLVs specified by this draft. 672 RBridges that understand the new TREE-VLAN-USE sub-TLV sent from 673 another RBridge RBn should use it to build the multicast forwarding 674 table and ignore the TREE-USE-IDs and INT-VLAN sub-TLVs sent from the 675 same RBridge. TREE-USE-IDs and INT-VLAN sub-TLVs are still useful for 676 some purposes other than building multicast forwarding table (E.g. 677 RPF table building, spanning tree root notification, etc.) If the 678 RBridge does not receive TREE-VLAN-USE sub-TLVs from RBn, it uses the 679 conventional way described in [RFC6325] to build the multicast 680 forwarding table. 682 For example, there are two distribution trees, tree1 and tree2, in 683 the campus. RB1 and RB2 are RBridges that use the new APPsub-TLVs 684 described in this document. RB3 is an old RBridge that is compatible 685 with [RFC6325]. Assume RB2 is interested in VLANs 10 and 11 and RB3 686 is interested in VLANs 100 and 101. Hence RB1 receives ((tree1, 687 VLAN10), (tree2, VLAN11)) as a TREE-VLAN-USE sub-TLV and (tree1, 688 tree2) as a TREE-USE-IDs sub-TLV from RB2 on port x. And RB1 receives 689 (tree1) as a TREE-USE-IDs sub-TLV and no TREE-VLAN-USE sub-TLV from 690 RB3 on port y. RB2 and RB3 announce their interested VLANs in an INT- 691 VLAN sub-TLV as usual. Then RB1 will build the entry of (tree1, 692 VLAN10, port x) and (tree2, VLAN11, port x) based on RB2's LSP and 693 the mechanism specified in this document. RB1 also builds entries of 694 (tree1, VLAN100, port y), (tree1, VLAN101, port y), (tree2, VLAN100, 695 port y), (tree2, VLAN101, port y) based on RB3's LSP in conventional 696 way. The multicast forwarding table on RB1 with merged entry would be 697 like the following. 699 +--------------+-----+---------+ 700 |tree nickname |VLAN |port list| 701 +--------------+-----+---------+ 702 | tree 1 | 10 | x | 703 +--------------+-----+---------+ 704 | tree 1 | 100 | y | 705 +--------------+-----+---------+ 706 | tree 1 | 101 | y | 707 +--------------+-----+---------+ 708 | tree 2 | 11 | x | 709 +--------------+-----+---------+ 710 | tree 2 | 100 | y | 711 +--------------+-----+---------+ 712 | tree 2 | 101 | y | 713 +--------------+-----+---------+ 715 As expected, that table is not as small as the one where every 716 RBridge supports the new TREE-VLAN-USE sub-TLVs. The worst case in a 717 hybrid campus is the number of entries equal to the number in current 718 practice which does not support VLAN based tree selection. Such an 719 extreme case happens when the interested VLAN set from the new 720 RBridges is a subset of the interested VLAN set from the old 721 RBridges. 723 Data Label and multicast group based tree selection is compatible 724 with the current practice. Its effectiveness increases with more 725 RBridge supporting this feature in the TRILL campus. 727 5. Security Considerations 729 This document does not change the general RBridge security 730 considerations of the TRILL base protocol. The APPsub-TLVs specified 731 can be secured using the IS-IS authentication feature [RFC5310]. See 732 Section 6 of [RFC6325] for general TRILL security considerations. 734 6. IANA Considerations 736 IANA is requested to assign six new TRILL APPsub-TLV type codes from 737 the range less than 255 as specified in Section 3 and update the 738 "TRILL APPsub-TLV Types under IS-IS TLV 251 Application Identifier 1" 739 Registry on the IANA TRILL Parameters web page as shown below. 741 Type Name of APPsub-TLV code Reference 742 ---- ----------------------- --------- 744 tbd1 Tree and VLANs [this document 3.2.1] 745 tbd2 Tree and VLANs Used [this document 3.2.2] 746 tbd3 Tree and FGLs [this document 3.2.3] 747 tbd4 Tree and FGLs Used [this document 3.2.4] 748 tbd5 Tree and Groups [this document 3.2.5] 749 tbd6 Tree and Groups Used [this document 3.2.6] 751 7. References 753 7.1. Normative References 755 [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 756 Ghanwani, "Routing Bridges (RBridges): Base Protocol 757 Specification", RFC 6325, July 2011, . 760 [RFC7172] Eastlake 3rd, D., Zhang, M., Agarwal, P., Perlman, R., and 761 D. Dutt, "Transparent Interconnection of Lots of Links 762 (TRILL): Fine-Grained Labeling", RFC 7172, May 2014, 763 . 765 [RFC7357] Zhai, H., Hu, F., Perlman, R., Eastlake 3rd, D., and O. 766 Stokes, "Transparent Interconnection of Lots of Links 767 (TRILL): End Station Address Distribution Information 768 (ESADI) Protocol", RFC 7357, September 2014, 769 771 [RFC7176] Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, D., 772 and A. Banerjee, "Transparent Interconnection of Lots of 773 Links (TRILL) Use of IS-IS", RFC 7176, May 2014, 774 . 776 [RFC7780] Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A., 777 Ghanwani, A. and Gupta, S., "Transparent Interconnection of 778 Lots of Links (TRILL): Clarifications, Corrections, and 779 Updates", RFC 7780, February 2016. 781 7.2. Informative References 783 [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., 784 and M. Fanto, "IS-IS Generic Cryptographic Authentication", 785 RFC 5310, February 2009, . 788 8. Acknowledgments 790 Authors wish to thank David M. Bond, Liangliang Ma, Naveen Nimmu, 791 Radia Perlman, Rakesh Kumar, Robert Sparks, Daniele Ceccarelli and 792 Sunny Rajagopalan for their valuable comments and contributions. 794 Authors' Addresses 796 Yizhou Li 797 Huawei Technologies 798 101 Software Avenue, 799 Nanjing 210012 800 China 802 Phone: +86-25-56624629 803 Email: liyizhou@huawei.com 805 Donald Eastlake 806 Huawei R&D USA 807 155 Beaver Street 808 Milford, MA 01757 USA 810 Phone: +1-508-333-2270 811 Email: d3e3e3@gmail.com 813 Weiguo Hao 814 Huawei Technologies 815 101 Software Avenue, 816 Nanjing 210012 817 China 819 Phone: +86-25-56623144 820 Email: haoweiguo@huawei.com 822 Hao Chen 823 Huawei Technologies 824 101 Software Avenue, 825 Nanjing 210012 826 China 828 Email: philips.chenhao@huawei.com 830 Somnath Chatterjee 831 Cisco Systems, 832 SEZ Unit, Cessna Business Park, 833 Outer ring road, 834 Bangalore - 560087 835 India 837 Email: somnath.chatterjee01@gmail.com