idnits 2.17.1 draft-yizhou-trill-tree-selection-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 8 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 15, 2013) is 4081 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 253, but not defined == Unused Reference: '6326bis' is defined on line 637, but no explicit reference was found in the text == Unused Reference: 'RFC6439' is defined on line 641, but no explicit reference was found in the text == Unused Reference: 'RFC6165' is defined on line 646, but no explicit reference was found in the text == Unused Reference: 'RFC6327' is defined on line 649, but no explicit reference was found in the text ** Obsolete normative reference: RFC 6326 (Obsoleted by RFC 7176) == Outdated reference: A later version (-09) exists of draft-eastlake-isis-rfc6326bis-07 ** Obsolete normative reference: RFC 6439 (Obsoleted by RFC 8139) -- Obsolete informational reference (is this intentional?): RFC 6327 (Obsoleted by RFC 7177) == Outdated reference: A later version (-07) exists of draft-ietf-trill-fine-labeling-00 Summary: 3 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group Yizhou Li 2 Internet Draft Weiguo Hao 3 Intended status: Standards Track Huawei Technologies 4 Radia Perlman 5 Intel Labs 6 Naveen Nimmu 7 Broadcom 8 S. Chatterjee 9 IP Infusion 10 Sunny Rajagopalan 11 IBM 12 Expires: July 2013 January 15, 2013 14 VLAN based Tree Selection for Multi-destination Frames 15 draft-yizhou-trill-tree-selection-02.txt 17 Status of this Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html 38 This Internet-Draft will expire on July 15, 2013. 40 Copyright Notice 42 Copyright (c) 2013 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Abstract 57 TRILL uses the distribution trees to deliver multi-destination frames. 58 Multiple trees can be used by an ingress RBridge for different flow 59 based on VLAN and/or multicast group. Different ingress RBridges may 60 choose different distribution trees for the same VLAN and/or 61 multicast group traffic. Distribution trees are normally pruned based 62 on VLAN. 64 For any RBridge RBn, if RBn has downstream receivers of VLAN x in a 65 distribution tree t, there will be an entry of (t, x, port list) in 66 the multicast forwarding table on RBn. If there are n trees and m 67 VLANs, the multicast forwarding table size on RBn is typically n*m 68 entries. The value of m is up to 4096 and n is the total number of 69 distribution trees in the campus. If fine grained labeling is 70 implemented or finer granularity filtering such as VLAN plus L2/L3 71 multicast address is used for pruning, the multicast forwarding table 72 size further increases dramatically. TRILL multicast forwarding table 73 size is limited by hardware and L3 multicasting may share the same 74 table with it in hardware implementations. Therefore multicast table 75 entry is a precious resource. This document specifies a VLAN based 76 tree selection mechanism to reduce the TRILL multicast forwarding 77 table size on RBridge. 79 Table of Contents 81 1. Introduction ................................................ 3 82 1.1. Background ............................................. 3 83 1.2. Motivations ............................................ 4 84 2. Conventions used in this document............................ 6 85 3. VLAN based Tree Selection.................................... 6 86 3.1. Overview ............................................... 6 87 3.2. Sub-TLVs for the Router Capability TLV ................. 8 88 3.2.1. The Tree Identifier and VLANs Sub-TLV ............. 8 89 3.2.2. The Tree and VLANs Used Sub-TLV ................... 9 90 3.3. Detailed Processing..................................... 9 91 3.4. Failure Handling....................................... 10 92 3.5. Extensions ............................................ 11 93 4. Backward Compatibility...................................... 12 94 5. Aggregated Tree Nickname.................................... 13 95 6. Security Considerations..................................... 14 96 7. IANA Considerations ........................................ 14 97 8. References ................................................. 15 98 8.1. Normative References................................... 15 99 8.2. Informative References................................. 15 100 9. Acknowledgments ............................................ 15 102 1. Introduction 104 1.1. Background 106 One or more distribution trees can be used to distribute multi- 107 destination frames in a TRILL campus. The RBridge having the highest 108 tree root priority announces the total number of trees that should be 109 computed for the campus. It may also specify the ordered list of tree 110 root nicknames that the other RBridges need to compute in the Tree 111 Identifiers (TREE-RT-IDs) sub-TLV [RFC6326]. Every RBridge specifies 112 the trees it wants to use in the Trees Used Identifiers (TREE-USE-IDs) 113 sub-TLV and the VLAN it is interested in the Interested VLANs and 114 Spanning Tree Roots (INT-VLAN) sub-TLV [RFC6326]. It is recommended 115 that, by default, the ingress RBridge chooses the tree whose root is 116 the closest[RFC6325]. Trees Used Identifiers sub-TLV is used to build 117 the RPF table which is used for reverse path forwarding check; 118 Interested VLANs sub-TLV is used for distribution tree pruning and 119 the multicast forwarding table with pruning info is built based on 120 that. Each distribution tree SHOULD be pruned per VLAN, eliminating 121 branches that have no potential receivers downstream [RFC6325]. 122 Further pruning based on L2/L3 multicast address is also possible. 124 It is implementation dependant that how many trees to calculate, 125 where the tree roots are located and which tree(s) to be used by an 126 ingress RBridge. With the increasing demand to use TRILL in data 127 center network, there are some features we can explore for multi- 128 destination frames in the data center use case. In order to achieve 129 non-blocking data forwarding, a fat tree structure is often used. 130 Figure 1 shows a typical fat tree structure based data center network. 131 RB1&RB2 are aggregation switches and RB11 to RB14 are access switches. 132 It is a common practice to choose the tree roots to be at the 133 aggregation switches for more efficient traffic transportation. All 134 the ingress RBridges which are access switches have the same distance 135 to all the tree roots. 137 +-----+ +-----+ 138 | RB1 | | RB2 | 139 +-----+ +-----+ 140 / | \\ / /|\ 141 / | \ \ / / | \ 142 / | \ / / | \-----+ 143 / | \/ /\ | | 144 / | /\/ \| | 145 / /---+---/ /\ |\ | 146 / / | / \ | \ | 147 / / | / \ | \ | 148 / / | / \ | \ | 149 +-----+ +-----+ +-----+ +-----+ 150 | RB11| | RB12| | RB13| | RB14| 151 +-----+ +-----+ +-----+ +-----+ 153 Figure 1 Fat Tree Structure based TRILL network 155 1.2. Motivations 157 In the structure of figure 1, if we choose to put the tree root at 158 RB1 and RB2, the ingress RBridge (e.g. RB11) would find more than one 159 closest tree root (i.e. RB1 & RB2). Then an ingress RBridge has two 160 options to select the tree root for multi-destination frames: choose 161 one and only one as distribution tree root or use ECMP-like algorithm 162 to balance the traffic among the multiple trees whose roots are at 163 the same distance. For the former, single used tree per ingress 164 RBridge, has the obvious problem of inefficient link usage. For 165 example, if RB11 chooses the tree1 which is rooted at RB1 as the 166 distribution tree, the link between RB11 and RB2 will never be used 167 to ingress the multi-destination frame by RB11. For the latter, ECMP 168 based tree selection results in a linear increase in multicast 169 forwarding table size with the number of trees as explained in the 170 next paragraph. 172 A multicast forwarding table on an RBridge is normally used to map 173 the key of (tree nickname + VLAN) to an index to a list of ports for 174 multicast frame replication. The key used for mapping is simply the 175 tree nickname when the RBridge does not prune the tree and the key 176 could be (tree nickname + VLAN + L2/L3 multicast address) when the 177 RBridge was programmed by control plane with L2/L3 multicast pruning 178 information. 180 For any RBridge RBn, for each VLAN x, if RBn is in a distribution 181 tree t for VLAN x, there will be an entry of (t, x, port list) in the 182 multicast forwarding table on RBn. Each entry contains a distinct 183 combination of (tree nickname, VLAN) as the lookup key. If there are 184 n such trees and m such VLANs, the multicast forwarding table size on 185 RBn is n*m entries. If fine-grained label is used [TrillFGL] and/or 186 finer pruning is used(e.g. VLAN + multicast group address is used for 187 pruning), the value of m increases. In the larger scale data center, 188 more trees would be necessary for better load balancing purpose and 189 it results in the increasing of value n. In either case, the number 190 of table entries n*m will increase dramatically. 192 Figure 2 left table shows an example of the multicast forwarding 193 table on RB11 in figure 1 topology with 2 distribution trees in 194 campus. The number of entries is approximately 2 * 4K in this case. 195 If 4 distribution trees are used in a TRILL campus and RBn has 4K 196 VLANs with downstream receivers, it consumes 16K table entries. TRILL 197 multicast forwarding table has a limited size in hardware 198 implementation. The table entry is a precious resource. In some 199 implementations, it shares with L3 IP multicast for a total of 16K or 200 8K table entries. Therefore we want to save the table size consumed 201 as much as possible and at the same time to maintain the load 202 balancing among trees. 204 A straightforward way to alleviate the limited table entries problem 205 is not to prune the distribution tree. However it can only be used in 206 the restricted scenarios for the following reasons, 208 - Unnecessary bandwidth waste for multi-destination frame. There is 209 broadcast traffic in each VLAN, like ARP and unknown unicast. In 210 addition, if there is huge L3 multicast traffic in some VLAN, no 211 pruning may result in worse consequence of L3 user data 212 unnecessarily flooded over the campus. The volume could be huge if 213 certain application like IPTV is supported. Finer pruning like 214 pruning based on multicast group may be desirable in this case. 216 - Only useful at the pure transit nodes. Edge nodes always need to 217 maintain the multicast forwarding table with the key of (tree 218 nickname + VLAN) since the edge node needs to decide whether and 219 how to replicate the frame to local access ports based on VLAN 220 precisely. It is very likely that edge nodes are relatively low 221 scale switches with the smaller shared table size, say 4K, 222 available. 224 - Security concerns. VLAN based traffic isolation is a basic 225 requirement in some scenarios. No pruning may result in the 226 unnecessary leak of the traffic. Misbehaved RBridge may take 227 advantage of this. 229 In addition to the multicast table size concern, some silicon does 230 not support hashing based tree nickname selection at the ingress 231 RBridge currently. VLAN based tree selection is used instead. Control 232 plane of ingress RBridge maps the incoming VLAN x to a tree nickname 233 t. Then data plane will always use tree t for VLAN x multi- 234 destination frames. Though an ingress RB may choose multiple trees to 235 be used for load sharing, it can use one and only one tree for single 236 VLAN. If we make sure all ingress RBridges campus-wide send VLAN x 237 multi-destination frames only using tree t, then there would be no 238 need to store the multicast table entry with the key of (tree-other- 239 than-t, x) on any RBridge. 241 This document describes the control plane support for VLAN based tree 242 selection mechanism to reduce the multicast forwarding table size. It 243 consists with the silicon implementation mentioned in the previous 244 paragraph. Here VLAN based tree selection is a general term which 245 also includes finer granularity case, e.g. VLAN + L2/L3 multicast 246 group based selection. 248 2. Conventions used in this document 250 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 251 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 252 document are to be interpreted as described in RFC-2119 [RFC2119]. 254 3. VLAN based Tree Selection 256 VLAN based tree selection can be used as a complementary distribution 257 tree selection mechanism, especially when the multicast forwarding 258 table size is a concern. 260 3.1. Overview 262 The tree root with the highest priority announces the tree nicknames 263 and the VLANs allowed on each tree. Such tree-VLAN correspondence 264 announcement can be based on static configuration or some predefined 265 algorithm. Ingress RBridge selects the tree-VLAN correspondence it 266 wishes to use from the list announced by the highest priority tree 267 root. It should not transmit VLAN x frame on tree y if the highest 268 priority tree root does not say VLAN x is allowed on tree y. 270 If we make sure one VLAN is allowed on one and only one tree, we can 271 keep the number of multicast forwarding table entries on any RBridge 272 fixed at 4K maximum (or up to 16M in case of fine grained label). 273 Take Figure 1 as example, two trees rooted at RB1 and RB2 274 respectively. The highest priority tree root appoints the tree1 to 275 carry VLAN 1-2000 and tree2 to carry VLAN 2001-4095. With such 276 announcement by the highest priority tree root, every RBridge which 277 understands the announcement will not send the VLAN 2001-4095 on 278 tree1 or send the VLAN 1-2000 on tree2. Then no RBridge would need to 279 store the entries for tree1/VLAN2001-4095 or tree2/VLAN1-2000. Figure 280 2 shows the multicast forwarding table on an RBridge before and after 281 we perform the VLAN based tree selection. The number of entries is 282 reduced by a factor f , f being the number of trees used in the 283 campus. In this example, it is reduced from 2*4095 to 4095. This 284 affects both transit nodes and edge nodes. Data plane does not change. 286 +--------------+-----+---------+ +--------------+-----+---------+ 287 |tree nickname |VLAN |port list| |tree nickname |VLAN |port list| 288 +--------------+-----+---------+ +--------------+-----+---------+ 289 | tree 1 | 1 | | | tree 1 | 1 | | 290 +--------------+-----+---------+ +--------------+-----+---------+ 291 | tree 1 | 2 | | | tree 1 | 2 | | 292 +--------------+-----+---------+ +--------------+-----+---------+ 293 | tree 1 | ... | | | tree 1 | ... | | 294 +--------------+-----+---------+ +--------------+-----+---------+ 295 | tree 1 | ... | | | tree 1 | 1999| | 296 +--------------+-----+---------+ +--------------+-----+---------+ 297 | tree 1 | ... | | | tree 1 | 2000| | 298 +--------------+-----+---------+ +--------------+-----+---------+ 299 | tree 1 | 4094| | | tree 2 | 2001| | 300 +--------------+-----+---------+ +--------------+-----+---------+ 301 | tree 1 | 4095| | | tree 2 | 2002| | 302 +--------------+-----+---------+ +--------------+-----+---------+ 303 | tree 2 | 1 | | | tree 2 | ... | | 304 +--------------+-----+---------+ +--------------+-----+---------+ 305 | tree 2 | 2 | | | tree 2 | 4094| | 306 +--------------+-----+---------+ +--------------+-----+---------+ 307 | tree 2 | ... | | | tree 2 | 4095| | 308 +--------------+-----+---------+ +--------------+-----+---------+ 309 | tree 2 | ... | | 310 +--------------+-----+---------+ 311 | tree 2 | ... | | 312 +--------------+-----+---------+ 313 | tree 2 | ... | | 314 +--------------+-----+---------+ 315 | tree 2 | 4094| | 316 +--------------+-----+---------+ 317 | tree 2 | 4095| | 318 +--------------+-----+---------+ 320 Figure 2 Multicast forwarding table before (left) & after (right) 322 3.2. Sub-TLVs for the Router Capability TLV 324 Two new sub-TLVs that can be carried in the Router Capability TLV for 325 TRILL are defined below. They can be considered as analog of finer 326 granularity version of the Tree Identifiers Sub-TLV and the Trees 327 Used Identifiers Sub-TLV in [RFC6326]. 329 3.2.1. The Tree Identifier and VLANs Sub-TLV 331 The tree identifiers and VLAN (TREE-VLANs) sub-TLV is used to 332 announce the VLANs allowed on each tree by the IS that has the 333 highest priority tree root. Multiple instances of this sub-TLV may be 334 carried. Same tree nickname may occur in the multiple Tree-VLAN 335 Records within the same or across multiple sub-TLVs. The sub-TLV 336 format is as follows: 338 +-+-+-+-+-+-+-+-+ 339 | Type | (1 byte) 340 +-+-+-+-+-+-+-+-+ 341 | Length | (1 byte) 342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 343 | Tree-VLAN Record (1) | (6 bytes) 344 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 345 | ................. | 346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 347 | Tree-VLAN Record (N) | (6 bytes) 348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 350 where each Tree-VLAN Record is of the form: 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 353 | Tree Nickname | (2 bytes) 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 355 | RESV | Start.VLAN | (2 bytes) 356 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 357 | RESV | End.VLAN | (2 bytes) 358 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 360 o Type: Router Capability sub-TLV type, set to 20 (TREE-VLANs). 362 o Length: 6*n bytes, where there are n Tree-VLAN Records. 364 o Tree Nickname: The nickname at which a distribution tree is 365 rooted. 367 o RESV: 4 bits that MUST be sent as zero and ignored on receipt. 369 o Start.VLAN, End.VLAN: These fields are the VLAN IDs of the allowed 370 VLAN range on the tree, inclusive. To specify a single VLAN, the 371 VLAN's ID appears as both the start and end VLAN. 373 3.2.2. The Tree and VLANs Used Sub-TLV 375 This sub-TLV has the same structure as the Tree Identifiers and VLAN 376 sub-TLV (TREE-VLANs) specified in Section 3.2.2. The only difference 377 is that its sub-TLV type is set to 21 (TREE-VLAN-USE), and the Tree- 378 VLAN record listed are those the originating IS allows. 380 3.3. Detailed Processing 382 The highest priority tree root includes all the necessary tree 383 related sub-TLVs defined in [RFC6326] as usual and MAY optionally 384 include the Tree Identifier and VLANs Sub-TLV (Tree-VLANs) in its LSP. 385 The highest priority tree root may decide that each VLAN is only 386 allowed on one and only one tree to maximize the saving in the 387 multicast forwarding table size. 389 Ingress RBridge that understands the Tree-VLANs Sub-TLV should select 390 the tree-VLAN correspondences it wishes to use and put them in TREE- 391 VLAN-USE sub-TLV. If there were multiple tree nicknames announced in 392 Tree-VLANs Sub-TLV for a VLAN x, ingress RBridge must choose one of 393 them. Ingress RB may choose the minimum distance root from them. How 394 to make such choice is out of the scope of this document. It may be 395 desirable to have some fixed algorithm to make sure all ingress RBs 396 choose the same tree for VLAN x in this case. Any single VLAN that 397 the ingress RBridge is interested in should be related to one and 398 only one tree ID in TREE-VLAN-USE to minimize the multicast 399 forwarding table size on other RBridges. 401 When ingress RBridge tries to encapsulate a multi-destination frame 402 for VLAN x, it should use the tree nickname that it selected 403 previously in TREE-VLAN-USE for VLAN x. 405 If RBridge RBn does not perform pruning at all, it builds the 406 multicast forwarding table exactly same as that in [RFC6325]. 408 If RBn prunes the distribution tree based on VLANs, RBn uses the 409 information received in TREE-VLAN-USE sub-TLV to mark the set of 410 VLANs reachable downstream for each adjacency and for each related 411 tree. 413 Logically, ingress RBridge that does not support VLAN based tree 414 selection is equivalent to the one that supports it and announces all 415 the combination pair of tree-id-used and interested-vlan as TREE- 416 VLAN-USE. 418 3.4. Failure Handling 420 Failure of a tree root: It is the responsibility of the highest 421 priority tree root to inform others the change of the allowed tree- 422 VLAN correspondence. When the highest priority tree root learns the 423 root of tree t fails, it should re-assign the VLANs allowed on tree t 424 to other trees or to a tree replacing the failed one. 426 Failure of the highest priority tree root: It is recommended to pre- 427 configure the second highest priority tree root with the proper 428 knowledge of the tree-VLAN correspondence allowed when the highest 429 priority tree root fails. The information announced by the second 430 priority tree root would be stored by all RBridges but would not take 431 effect unless the RBridge noticed the failure of the highest priority 432 tree root. When the highest priority tree root fails, the original 433 second priority tree root will become the highest priority tree root 434 of the campus. When an RBridge notices the failure of the original 435 highest priority tree root, it can immediately use the stored 436 information announced by the original second priority tree root. It 437 is recommended to pre-configure the tree-VLAN correspondence 438 information on the second highest priority tree root same as that on 439 the highest priority tree root for the trees other than the highest 440 priority tree itself. This can make the change of multicast 441 forwarding table minimum in case of the highest priority tree root 442 failure. 444 In some transient moment or misbehave of the highest priority tree 445 root, an ingress RBridge may encounter the following scenarios: 447 - No tree has been announced to allow VLAN x frames 449 - An ingress RBridge is supposed to transmit VLAN x frames on tree t, 450 but root of tree t is no longer reachable. 452 For the second case, an ingress RBridge may choose another reachable 453 tree root which allows VLAN x by the highest priority tree root 454 announcement. If there is no such tree available, then it is same as 455 the first case above. Then the ingress RBridge should be 'downgraded' 456 to a conventional BRridge in [RFC6325]. A timer should be set to 457 allow the temporary transient stage completion before the change of 458 responsive tree or 'downgrade' takes effect. The value of timer 459 should at least be set to the LSP flooding time in campus. 461 3.5. Extensions 463 VLAN based tree selection can be easily extended to (VLAN+L2/L3 464 multicast group) based tree selection. For example, we can appoint 465 multicast group 1 in VLAN 10 to tree1 and appoint group 2 in VLAN 10 466 to tree2 for better load sharing. New sub-TLVs are specified below 467 for this purpose. 469 +-+-+-+-+-+-+-+-+ 470 |Type= | (1 byte) 471 +-+-+-+-+-+-+-+-+ 472 | Length | (1 byte) 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | Tree Nickname | (2 bytes) 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 | RESV | VLAN ID | (2 bytes) 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 478 |Num Group Recs | (1 byte) 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 480 | GROUP RECORDS (1) | 481 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 482 | GROUP RECORDS (2) | 483 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 484 | ................. | 485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 486 | GROUP RECORDS (N) | 487 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 489 where each group record is of the following form with k=4 for group 490 IPv4 address and k=6 for group MAC address and k=16 for group IPv6 491 address: 493 +-+-+-+-+-+-+-+-+ 494 | Num of Sources| (1 byte) 495 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 496 | Group Address (k bytes) | 497 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 498 | Source 1 Address (k bytes) | 499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 500 | Source 2 Address (k bytes) | 501 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 502 | ..... | 503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 504 | Source M Address (k bytes) | 505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 507 4. Backward Compatibility 509 RBridge should include the TREE-USE-IDs and INT-VLAN sub-TLVs when 510 necessary as per RFC6325 no matter if it supports the new TREE-VLAN- 511 USE sub-TLV specified by this draft. 513 RBridge that understands new TREE-VLAN-USE sub-TLV sent from another 514 RBridge RBn should use it to build the multicast forwarding table and 515 ignore the TREE-USE-IDs and INT-VLAN sub-TLVs sent from the same 516 RBridge. It should be noted that TREE-USE-IDs and INT-VLAN sub-TLVs 517 are still useful for some purposes other than building multicast 518 forwarding table, e.g. RPF table building, spanning tree root 519 notification, etc. If the RBridge does not receive TREE-VLAN-USE sub- 520 TLV from RBn, it uses the conventional way described in [RFC6325] to 521 build the multicast forwarding table. 523 For example, there are two distribution trees, tree1 & tree2 in the 524 campus. RB1&RB2 are new RBridges which use the new sub-TLVs described 525 in this document. RB3 is an old RBridge which is compatible with 526 [RFC6325]. Assume RB2 is interested in VLAN 10&11 and RB3 is 527 interested in VLAN 100&101. Hence RB1 receives ((tree1, 528 VLAN10),(tree2, VLAN11)) as TREE-VLAN-USE sub-TLV and (tree1, tree2) 529 as TREE-USE-IDs sub-TLV from RB2 on port x. And RB1 receives (tree1) 530 as TREE-USE-IDs sub-TLV and no TREE-VLAN-USE sub-TLV from RB3 on port 531 y. RB2 & RB3 announce their interested VLANs in INT-VLAN sub-TLV as 532 usual. Then RB1 will build the entry of (tree1, VLAN10, port x) and 533 (tree2, VLAN11, port x) based on RB2's LSP and mechanism specified 534 in this document. RB1 also builds entry of (tree1, VLAN100, port y), 535 (tree1, VLAN101, port y), (tree2, VLAN100, port y), (tree2, VLAN101, 536 port y) based on RB3's LSP in conventional way. The multicast 537 forwarding table on RB1 with merged entry would be like the following. 539 +--------------+-----+---------+ 540 |tree nickname |VLAN |port list| 541 +--------------+-----+---------+ 542 | tree 1 | 10 | x | 543 +--------------+-----+---------+ 544 | tree 1 | 100 | y | 545 +--------------+-----+---------+ 546 | tree 1 | 101 | y | 547 +--------------+-----+---------+ 548 | tree 2 | 11 | x | 549 +--------------+-----+---------+ 550 | tree 2 | 100 | y | 551 +--------------+-----+---------+ 552 | tree 2 | 101 | y | 553 +--------------+-----+---------+ 555 It is expected that the table is not shrunk as small as the one where 556 every RB supports the new TREE-VLAN-USE sub-TLVs. The worst case in a 557 hybrid campus is the number of entries equal to the number in current 558 practice which does not support VLAN based tree selection. Such 559 extreme case happens when the interested VLAN set from the new 560 RBridges is a subset of the interested VLAN set from the old RBridges. 562 VLAN based tree selection is compatibility with the current practice. 563 Its effectiveness increases with more RBridge supporting this feature 564 in the TRILL campus. 566 5. Aggregated Tree Nickname 568 VLAN based tree selection needs control plane enhancement. This 569 section illustrates another data plane mechanism to solve the 570 multicast forwarding table size limitation problem. 572 The idea is to aggregate the multicast forwarding table entries based 573 on VLAN for all trees. If there are two trees, tree1 and tree2 in the 574 campus, RBridge RBn may have two entries for VLAN x which are (tree1, 575 VLAN-x, port-list-1) and (tree2, VLAN-x, port-list-2). After 576 aggregation on tree nickname, the entry becomes (tree = wildcard/*, 577 VLAN-x, port-list-1 UNION port-list-2). 579 The multi-destination frames received by RBn for VLAN x will be 580 replicated to both tree1 and tree2 outgoing ports. The next hop 581 RBridge will perform the RPF checks. If a port is part of one tree 582 but not the other, the next hop RBridge won't have an entry for the 583 (source_RBridge, treeid) pointing to that incoming port in RPF 584 checking table. Therefore next hop RBridge will drop those frames 585 received on the wrong incoming ports. So this is a one-hop-delayed 586 pruning. In some circumstances, the pruning may be delayed more than 587 one hop. It happens when an RBridge located in multiple trees has the 588 same incoming port for a RPF check. 590 The following table gives some simple comparison on two approaches. 592 +-------------+--------------------------+------------------------+ 593 | |VLAN based tree selection | aggr tree nickname | 594 +-------------+--------------------------+------------------------+ 595 | | |depends.Hardware should | 596 |data plane | no |be capable to support | 597 |change req? | |tree nickname aggre | 598 | | |searching | 599 +-------------+--------------------------+------------------------+ 600 |ctrl plane | Yes. Needs new sub-TLV | | 601 |change req? | and new handlings | no | 602 +-------------+--------------------------+------------------------+ 603 | |inter-vlan multipathing. |allow intra-VLAN tree | 604 |multipathing |Single VLAN uses single |based multipathing. | 605 | |designated tree campus |Better load balancing | 606 | |wide |within the VLAN | 607 +-------------+--------------------------+------------------------+ 608 |optimal | yes |slightly less than | 609 |pruning? | |optimal in certain case | 610 +-------------+--------------------------+------------------------+ 612 The co-authors would like to seek comments and opinions on both 613 approaches and their applicable scenarios. 615 6. Security Considerations 617 This document does not change the general RBridge security 618 considerations of the TRILL base protocol. See Section 6 of 619 [RFC6325]. 621 7. IANA Considerations 623 IANA is requested to allocate the new sub-TLV type code as specified 624 in Section 3. 626 8. References 628 8.1. Normative References 630 [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 631 Ghanwani, "Routing Bridges (RBridges): Base Protocol 632 Specification", RFC 6325, July 2011. 634 [RFC6326] Eastlake, D., Banerjee, A., Dutt, D., Perlman, R., and A. 635 Ghanwani, "TRILL Use of IS-IS", RFC 6326, July 2011. 637 [6326bis] Eastlake, D. et.al., ''Transparent Interconnection of Lots 638 of Links (TRILL) Use of IS-IS'', draft-eastlake-isis- 639 rfc6326bis-07.txt, Work in Progress, December 2011. 641 [RFC6439] Eastlake, D. et.al., ''RBridge: Appointed Forwarder'', RFC 642 6439, November 2011. 644 8.2. Informative References 646 [RFC6165] Banerjee, A. and D. Ward, "Extensions to IS-IS for Layer-2 647 Systems", RFC 6165, April 2011. 649 [RFC6327] Eastlake 3rd, D., Perlman, R., Ghanwani, A., Dutt, D., 650 and V. Manral, "Routing Bridges (RBridges): Adjacency", RFC 651 6327, July 2011 653 [TrillFGL] Eastlake 3rd, D., Zhang, M., Agarwal, P., Dutt, D., and 654 Perlman, R., ''TRILL: Fine-Grained Labeling'', draft-ietf- 655 trill-fine-labeling-00.txt, December 2011 657 9. Acknowledgments 659 Authors wish to thank David M Bond, Donald Eastlake, Liangliang Ma, 660 Rakesh Kumar R for the valuable comments (names in alphabet order). 662 This document was prepared using 2-Word-v2.0.template.dot. 664 Authors' Addresses 666 Yizhou Li 667 Huawei Technologies 668 101 Software Avenue, 669 Nanjing 210012 670 China 672 Phone: +86-25-56625375 673 Email: liyizhou@huawei.com 675 Weiguo Hao 676 Huawei Technologies 677 101 Software Avenue, 678 Nanjing 210012 679 China 681 Phone: +86-25-56623144 682 Email: haoweiguo@huawei.com 684 Radia Perlman 685 Intel Labs 686 2200 Mission College Blvd. 687 Santa Clara, CA 95054-1549 USA 689 Phone: +1-408-765-8080 690 Email: Radia@alum.mit.edu 692 Naveen Nimmu 693 Broadcom 694 9th Floor, Building no 9, Raheja Mind space 695 Hi-Tec City, Madhapur, 696 Hyderabad - 500 081, INDIA 698 Phone: +1-408-218-8893 699 Email: naveen@broadcom.com 700 Somnath Chatterjee 701 IP Infusion 702 RMZ Centennial, Block D 703 Doddanakundi Industrial Area, 704 Kundanahalli Main Road,Mahadevapura Post, 705 Bangalore - - 560 048 Karnataka, India 707 Email: somnath.chatterjee01@gmail.com 709 Sunny Rajagopalan 710 IBM 712 Email: sunny.rajagopalan@us.ibm.com