idnits 2.17.1 draft-perlman-trill-rbridge-multilevel-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 30, 2014) is 3586 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 6439 (Obsoleted by RFC 8139) ** Obsolete normative reference: RFC 7180 (Obsoleted by RFC 7780) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group Radia Perlman 2 INTERNET-DRAFT Intel Labs 3 Intended status: Informational Donald Eastlake 4 Huawei 5 Anoop Ghanwani 6 Dell 7 Hongjun Zhai 8 ZTE 9 Expires: December 29, 2014 June 30, 2014 11 Flexible Multilevel TRILL 12 (Transparent Interconnection of Lots of Links) 13 15 Abstract 17 Extending TRILL to multiple levels has one challenge that is not 18 addressed by the already-existing capability of IS-IS to have 19 multiple levels. The issue is with TRILL switch nicknames. There 20 have been two proposed approaches. One approach, which we refer to 21 as the "unique nickname" approach, gives unique nicknames to all the 22 TRILL switches in the multilevel campus, either by having the 23 level-1/level-2 border TRILL switches advertise which nicknames are 24 not available for assignment in the area, or by partitioning the 25 16-bit nickname into an "area" field and a "nickname inside the area" 26 field. The other approach, which we refer to as the "aggregated 27 nickname" approach, involves assigning nicknames to the areas, and 28 allowing nicknames to be reused in different areas, by having the 29 border TRILL switches rewrite the nickname fields when entering or 30 leaving an area. Each of those approaches has advantages and 31 disadvantages. The design in this document allows a choice of 32 approach in each area, allowing the simplicity of the unique nickname 33 approach in installations in which there is no danger of running out 34 of nicknames, and allowing nickname rewriting to be phased into 35 larger installations on a per-area basis. 37 Status of This Memo 39 This Internet-Draft is submitted to IETF in full conformance with the 40 provisions of BCP 78 and BCP 79. Distribution of this document is 41 unlimited. Comments should be sent to the TRILL working group 42 mailing list . 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF), its areas, and its working groups. Note that 46 other groups may also distribute working documents as Internet- 47 Drafts. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 The list of current Internet-Drafts can be accessed at 55 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 56 Shadow Directories can be accessed at 57 http://www.ietf.org/shadow.html. 59 Acknowledgements 61 The helpful comments of the following are hereby acknowledged: David 62 Michael Bond and Dino Farinacci. 64 Table of Contents 66 1. Introduction............................................4 67 1.1 TRILL Scalability Issues...............................4 68 1.2 Improvements Due to Multilevel.........................5 69 1.3 Unique and Aggregated Nickanmes........................6 70 1.3 More on Areas..........................................6 71 1.4 Terminology and Acronyms...............................7 73 2. Multilevel TRILL Issues.................................8 74 2.1 Non-zero Area Addresses................................9 75 2.2 Aggregated versus Unique Nicknames.....................9 76 2.2.1 More Details on Unique Nicknames....................10 77 2.2.2 More Details on Aggregated Nicknames................11 78 2.2.2.1 Border Learning Aggregated Nicknames..............12 79 2.2.2.2 Swap Nickname Field Aggregated Nicknames..........14 80 2.2.2.3 Comparison........................................14 81 2.3 Building Multi-Area Trees.............................15 82 2.4 The RPF Check for Trees...............................15 83 2.5 Area Nickname Acquisition.............................15 84 2.6 Link State Representation of Areas....................16 86 3. Area Partition.........................................18 88 4. Multi-Destination Scope................................19 89 4.1 Unicast to Multi-destination Conversions..............19 90 4.1.1 New Tree Encoding...................................20 91 4.2 Selective Broadcast Domain Reduction..................20 93 5. Co-Existence with Old TRILL switches...................22 94 6. Multi-Access Links with End Stations...................23 95 7. Summary................................................24 96 8. Security Considerations................................24 97 9. IANA Considerations....................................24 99 Normative References......................................25 100 Informative References....................................25 101 Authors' Addresses........................................27 103 1. Introduction 105 The IETF TRILL (Transparent Interconnection of Lot of Links or 106 Tunneled Routing in the Link Layer) protocol [RFC6325] provides 107 optimal pair-wise data routing without configuration, safe forwarding 108 even during periods of temporary loops, and support for multipathing 109 of both unicast and multicast traffic in networks with arbitrary 110 topology and link technology, including multi-access links. TRILL 111 accomplishes this by using IS-IS (Intermediate System to Intermediate 112 System [IS-IS] [RFC7176]) link state routing in conjunction with a 113 header that includes a hop count. The design supports data labels 114 (VLANs and Fine Grained Labels [RFC7172]) and optimization of the 115 distribution of multi-destination data based on VLANs and multicast 116 groups. Devices that implement TRILL are called TRILL Switches or 117 RBridges. 119 Familiarity with [RFC6325] and [RFC7180] is assumed in this document. 121 1.1 TRILL Scalability Issues 123 There are multiple issues that might limit the scalability of a 124 TRILL-based network: 126 1. the routing computation load, 127 2. the volatility of the LSP (Link State PDU) database creating too 128 much control traffic, 129 3. the volatility of the LSP database causing the TRILL network to be 130 in an unconverged state too much of the time, 131 4. the size of the LSP database, 132 5. the limit of the number of TRILL switches, due to the 16-bit 133 nickname space, 134 6. the traffic due to upper layer protocols use of broadcast and 135 multicast, and 136 7. the size of the end node learning table (the table that remembers 137 (egress TRILL switch, label/MAC) pairs). 139 Extending TRILL IS-IS to be multilevel (hierarchical) helps with all 140 but the last of these issues. 142 IS-IS was designed to be multilevel [IS-IS]. A network can be 143 partitioned into "areas". Routing within an area is known as "Level 144 1 routing". Routing between areas is known as "Level 2 routing". 145 The Level 2 IS-IS network consists of Level 2 routers and links 146 between the Level 2 routers. Level 2 routers may participate in one 147 or more areas, in addition to their role as Level 2 routers. 149 Each area is connected to Level 2 through one or more "border 150 routers", which participate both as a router inside the area, and as 151 a router inside the Level 2 "area". Care must be taken that it is 152 clear, when transitioning multidestination packets between Level 2 153 and a Level 1 area in either direction, which (single) border TRILL 154 switch will transition a particular data packet between the levels or 155 else duplication of traffic can occur. 157 1.2 Improvements Due to Multilevel 159 Partitioning the network into areas solves the first four scalability 160 issues described above, namely, 162 1. the routing computation load, 164 2. the volatility of the LSP (Link State PDU) database creating too 165 much control traffic, 167 3. the volatility of the LSP database causing the TRILL network to be 168 in an unconverged state too much of the time, 170 4. the size of the LSP database. 172 Problem #6 in Section 1.1, namely, the traffic due to upper layer 173 protocols use of broadcast and multicast, can be addressed by 174 introducing a locally-scoped multidestination delivery, limited to an 175 area or a single link. See further discussion in Section 4.2. 177 Problem #5 in Section 1.1, namely, the limit of the number of TRILL 178 switches, due to the 16-bit nickname space, will only be addressed 179 with the aggregated nickname approach. Since the aggregated nickname 180 approach requires some complexity in the border TRILL switches (for 181 rewriting the nicknames in the TRILL header), the design in this 182 document allows a campus with a mixture of unique-nickname areas, and 183 aggregated-nickname areas. Nicknames must be unique across all 184 unique-nickname areas, whereas nicknames inside an aggregated- 185 nickname area are visible only inside the area. Nicknames inside an 186 aggregated-nickname area must not conflict with nicknames assigned to 187 TRILL switches in any unique-nickname areas, and must not conflict 188 with the aggregated nickname given to aggregated-nickname areas, but 189 the nicknames inside an aggregated-nickname area may be the same as 190 nicknames used within other aggregated-nickname areas. 192 TRILL switches within an area need not be aware of whether they are 193 in an aggregated nickname area or a unique nickname area. The border 194 TRILL switches in area A1 will claim, in their LSP inside area A1, 195 which nicknames (or nickname ranges) are not available for choosing 196 as nicknames by area A1 TRILL switches. 198 1.3 Unique and Aggregated Nickanmes 200 We describe two alternatives for hierarchical or multilevel TRILL. 201 One we call the "unique nickname" alternative. The other we call the 202 "aggregated nickname" alternative. In the aggregated nickname 203 alternative, border TRILL switches replace either the ingress or 204 egress nickname field in the TRILL header of unicast packets with an 205 aggregated nickname representing an entire area. 207 The unique nickname alternative has the advantage that border TRILL 208 switches are simpler and do not need to do TRILL Header nickname 209 modification. It also simplifies testing and maintenance operations 210 that originate in one area and terminate in a different area. 212 The aggregated nickname alternative has the following advantages: 214 o it solves problem #5 above, the 16-bit nickname limit, in a 215 simple way, 216 o it lessens the amount of inter-area routing information that 217 must be passed in IS-IS, and 218 o it greatly reduces the RPF (Reverse Path Forwarding) Check 219 information (since only the area nickname needs to appear, 220 rather than all the ingress TRILL switches in that area). 222 In both cases, it is possible and advantageous to compute multi- 223 destination data packet distribution trees such that the portion 224 computed within a given area is rooted within that area. 226 1.3 More on Areas 228 Each area is configured with an "area address", which is advertised 229 in IS-IS messages, so as to avoid accidentally interconnecting areas. 230 Note that, although the area address had other purposes in CLNP (IS- 231 IS was originally designed for CLNP/DECnet), for TRILL the only 232 purpose of the area address would be to avoid accidentally 233 interconnecting areas. 235 Currently, the TRILL specification says that the area address must be 236 zero. If we change the specification so that the area address value 237 of zero is just a default, then most of IS-IS multilevel machinery 238 works as originally designed. However, there are TRILL-specific 239 issues, which we address below in this document. 241 1.4 Terminology and Acronyms 243 This document generally uses the acronyms defined in [RFC6325] plus 244 the additional acronym DBRB. However, for ease of reference, most 245 acronyms used are listed here: 247 CLNP - ConnectionLess Network Protocol 249 DECnet - a proprietary routing protocol that was used by Digital 250 Equipment Corporation 252 Data Label - VLAN or Fine Grained Label [RFC7172] 254 DBRB - Designated Border RBridge 256 IS-IS - Intermediate System to Intermediate System [IS-IS] 258 LSP - Link State PDU 260 PDU - Protocol Data Unit 262 RBridge - Routing Bridge, an alterntive name for a TRILL switch 264 RPF - Reverse Path Forwarding 266 TRILL - Transparent Interconnection of Lots of Links or Tunneled 267 Routing in the Link Layer [RFC6325] 269 TRILL switch - an alternative name for an RBridge 271 VLAN - Virtual Local Area Network 273 2. Multilevel TRILL Issues 275 The TRILL-specific issues introduced by multilevel include the 276 following: 278 a. Configuration of non-zero area addresses, encoding them in IS-IS 279 PDUs, and possibly interworking with old TRILL switches that do 280 not understand nonzero area addresses. 282 See Section 2.1. 284 b. Nickname management. 286 See Sections 2.5 and 2.2. 288 c. Advertisement of pruning information (Data Label reachability, IP 289 multicast addresses) across areas. 291 Distribution tree pruning information is only an optimization, 292 as long as multi-destination packets are not prematurely 293 pruned. For instance, border TRILL switches could advertise 294 they can reach all possible Data Labels, and have an IP 295 multicast router attached. This would cause all multi- 296 destination traffic to be transmitted to border TRILL switches, 297 and possibly pruned there, when the traffic could have been 298 pruned earlier based on Data Label or multicast group if border 299 TRILL switches advertised more detailed Data Label and/or 300 multicast listener and multicast router attachment information. 302 d. Computation of distribution trees across areas for multi- 303 destination data. 305 See Section 2.3. 307 e. Computation of RPF information for those distribution trees. 309 See Section 2.4. 311 f. Computation of pruning information across areas. 313 See Sections 2.3 and 2.6. 315 g. Compatibility, as much as practical, with existing, unmodified 316 TRILL switches. 318 The most important form of compatibility is with existing TRILL 319 fast path hardware. Changes that require upgrade to the slow 320 path firmware/software are more tolerable. Compatibility for 321 the relatively small number of border TRILL switches is less 322 important than compatibility for non-border TRILL switches. 324 See Section 5. 326 2.1 Non-zero Area Addresses 328 The current TRILL base protocol specification [RFC6325] [RFC7176] 329 [RFC7180] says that the area address in IS-IS must be zero. The 330 purpose of the area address is to ensure that different areas are not 331 accidentally merged. Furthermore, zero is an invalid area address 332 for layer 3 IS-IS, so it was chosen as an additional safety mechanism 333 to ensure that layer 3 IS-IS would not be confused with TRILL IS-IS. 334 However, TRILL uses other techniques to avoid such confusion, such as 335 a different multicast address and Ethertype on Ethernet [RFC6325] and 336 different PPP codepoints on PPP [RFC6361], so it is not necessary to 337 worry about this. 339 Since current TRILL switches will reject any IS-IS messages with 340 nonzero area addresses, the choices are as follows: 342 a.1 upgrade all TRILL switches that are to interoperate in a 343 potentially multilevel environment to understand non-zero area 344 addresses, 345 a.2 neighbors of old TRILL switches must remove the area address from 346 IS-IS messages when talking to an old TRILL switch (which might 347 break IS-IS security and/or cause inadvertent merging of areas), 348 a.3 ignore the problem of accidentally merging areas entirely, or 349 a.4 keep the fixed "area address" field as 0 in TRILL, and add a new, 350 optional TLV for "area name" that, if present, could be compared, 351 by new TRILL switches, to prevent accidental area merging. 353 In principal, different solutions could be used in different areas 354 but it would be much simpler to adopt one of these choices uniformly. 356 2.2 Aggregated versus Unique Nicknames 358 In the unique nickname alternative, all nicknames across the campus 359 must be unique. In the aggregated nickname alternative, TRILL switch 360 nicknames within an aggregated area are only of local significance, 361 and the only nickname externally (outside that area) visible is the 362 "area nickname" (or nicknames), which aggregates all the internal 363 nicknames. 365 The unique nickname approach simplifies border TRILL switches. 367 The aggregated nickname approach eliminates the potential problem of 368 nickname exhaustion, minimizes the amount of nickname information 369 that would need to be forwarded between areas, minimizes the size of 370 the forwarding table, and simplifies RPF calculation and RPF 371 information. 373 2.2.1 More Details on Unique Nicknames 375 With unique cross-area nicknames, it would be intractable to have a 376 flat nickname space with TRILL switches in different areas contending 377 for the same nicknames. Instead, each area would need to be 378 configured with a block of nicknames. Either some TRILL switches 379 would need to announce that all the nicknames other than that block 380 are taken (to prevent the TRILL switches inside the area from 381 choosing nicknames outside the area's nickname block), or a new TLV 382 would be needed to announce the allowable nicknames, and all TRILL 383 switches in the area would need to understand that new TLV. An 384 example of the second approach is given in [NickFlags]. 386 Currently the encoding of nickname information in TLVs is by listing 387 of individual nicknames; this would make it painful for a border 388 TRILL switch to announce into an area that it is holding all other 389 nicknames to limit the nicknames available within that area. The 390 information could be encoded as ranges of nicknames to make this 391 somewhat manageable [NickFlags]; however, a new TLV for announcing 392 nickname ranges would not be intelligible to old TRILL switches. 394 There is also an issue with the unique nicknames approach in building 395 distribution trees, as follows: 397 With unique nicknames in the TRILL campus and TRILL header 398 nicknames not rewritten by the border TRILL switches, there would 399 have to be globally known nicknames for the trees. Suppose there 400 are k trees. For all of the trees with nicknames located outside 401 an area, the local trees would be rooted at a border TRILL switch 402 or switches. Therefore, there would be either no splitting of 403 multi-destination traffic with the area or restricted splitting of 404 multi-destination traffic between trees rooted at a highly 405 restricted set of TRILL switches. 407 As an alternative, just the "egress nickname" field of multi- 408 destination TRILL Data packets could be mapped at the border, 409 leaving known unicast packets un-mapped. However, this surrenders 410 much of the unique nickname advantage of simpler border TRILL 411 switches. 413 Scaling to a very large campus with unique nicknames might exhaust 414 the 16-bit TRILL nicknames space. One method of expanding that to a 415 24-bit space is given in [MoreNicks]; however, that technique would 416 require all TRILL switches in the campus to understand larger 417 nicknames. 419 For an example of a more specific multilevel proposal using unique 420 nicknames, see [DraftUnique]. 422 2.2.2 More Details on Aggregated Nicknames 424 The aggregated nickname approach enables passing far less nickname 425 information. It works as follows, assuming both the source and 426 destination areas are using aggregated nicknames: 428 Each area would be assigned a 16-bit nickname. This would not be 429 the nickname of any actual TRILL switch. Instead, it would be the 430 nickname of the area itself. Border TRILL switches would know the 431 area nickname for their own area(s). 433 The TRILL Header nickname fields in TRILL Data packets being 434 transported through a multilevel TRILL campus with aggregated 435 nicknames are as follows: 437 - When both the ingress and egress TRILL switches are in the same 438 area, there need be no change from the existing base TRILL 439 protocol standard in the TRILL Header nickname fields. 441 - When being transported in Level 2, the ingress nickname is the 442 nickname of the ingress TRILL switch's area while the egress 443 nickname is either the nickname of the egress TRILL switch's 444 area or a tree nickname. 446 - When being transported in Level 1 to Level 2, the ingress 447 nickname is the nickname of the ingress TRILL switch itself 448 while the egress nickname is either the nickname of the area of 449 the egress TRILL switch or a tree nickname. 451 - When being transported from Level 2 to Level 1, the ingress 452 nickname is the nickname of the ingress TRILL switch's area 453 while the egress nickname is either the nickname of the egress 454 TRILL switch itself or a tree nickname. 456 There are two variations of the aggregated nickname approach. The 457 first is the Border Learning approach, which is described in Section 458 2.2.2.1. The second is the Swap Nickname Field approach, which is 459 described in Section 2.2.2.2. Section 2.2.2.3 compares the advantages 460 and disadvantages of these two variations of the aggregated nickname 461 approach. 463 2.2.2.1 Border Learning Aggregated Nicknames 465 This section provides an illustrative example and description of the 466 border learning variation of aggregated nicknames. 468 In the following picture, RB2 and RB3 are area border TRILL switches 469 (RBridges). A source S is attached to RB1. The two areas have 470 nicknames 15961 and 15918, respectively. RB1 has a nickname, say 27, 471 and RB4 has a nickname, say 44 (and in fact, they could even have the 472 same nickname, since the TRILL switch nickname will not be visible 473 outside these aggreated areas). 475 Area 15961 level 2 Area 15918 476 +-------------------+ +-----------------+ +--------------+ 477 | | | | | | 478 | S--RB1---Rx--Rz----RB2---Rb---Rc--Rd---Re--RB3---Rk--RB4---D | 479 | 27 | | | | 44 | 480 | | | | | | 481 +-------------------+ +-----------------+ +--------------+ 483 Let's say that S transmits a frame to destination D, which is 484 connected to RB4, and let's say that D's location is learned by the 485 relevant TRILL switches already. These relevant switches have 486 learned the following: 488 1) RB1 has learned that D is connected to nickname 15918 489 2) RB3 has learned that D is attached to nickname 44. 491 The following sequence of events will occur: 493 - S transmits an Ethernet frame with source MAC = S and destination 494 MAC = D. 496 - RB1 encapsulates with a TRILL header with ingress RBridge = 27, 497 and egress = 15918 producing a TRILL Data packet. 499 - RB2 has announced in the Level 1 IS-IS instance in area 15961, 500 that it is attached to all the area nicknames, including 15918. 501 Therefore, IS-IS routes the packet to RB2. Alternatively, if a 502 distinguished range of nicknames is used for Level 2, Level 1 503 TRILL switches seeing such an egress nickname will know to route 504 to the nearest border router, which can be indicated by the IS-IS 505 attached bit. 507 - RB2, when transitioning the packet from Level 1 to Level 2, 508 replaces the ingress TRILL switch nickname with the area nickname, 509 so replaces 27 with 15961. Within Level 2, the ingress RBridge 510 field in the TRILL header will therefore be 15961, and the egress 511 RBridge field will be 15918. Also RB2 learns that S is attached to 512 nickname 27 in area 15961 to accommodate return traffic. 514 - The packet is forwarded through Level 2, to RB3, which has 515 advertised, in Level 2, reachability to the nickname 15918. 517 - RB3, when forwarding into area 15918, replaces the egress nickname 518 in the TRILL header with RB4's nickname (44). So, within the 519 destination area, the ingress nickname will be 15961 and the 520 egress nickname will be 44. 522 - RB4, when decapsulating, learns that S is attached to nickname 523 15961, which is the area nickname of the ingress. 525 Now suppose that D's location has not been learned by RB1 and/or RB3. 526 What will happen, as it would in TRILL today, is that RB1 will 527 forward the packet as multi-destination, choosing a tree. As the 528 multi-destination packet transitions into Level 2, RB2 replaces the 529 ingress nickname with the area nickname. If RB1 does not know the 530 location of D, the packet must be flooded, subject to possible 531 pruning, in Level 2 and, subject to possible pruning, from Level 2 532 into every Level 1 area that it reaches on the Level 2 distribution 533 tree. 535 Now suppose that RB1 has learned the location of D (attached to 536 nickname 15918), but RB3 does not know where D is. In that case, RB3 537 must turn the packet into a multi-destination packet within area 538 15918. In this case, care must be taken so that, in case RB3 is not 539 the Designated transitioner between Level 2 and its area for that 540 multi-destination packet, but was on the unicast path, that another 541 border TRILL switch in that area not forward the now multi- 542 destination packet back into Level 2. Therefore, it would be 543 desirable to have a marking, somehow, that indicates the scope of 544 this packet's distribution to be "only this area" (see also Section 545 4). 547 In cases where there are multiple transitioners for unicast packets, 548 the border learning mode of operation requires that the address 549 learning between them be shared by some protocol such as running 550 ESADI [RFCesadi] for all Data Labels of interest to avoid excessive 551 unknown unicast flooding. 553 The potential issue described at the end of Section 2.2.1 with trees 554 in the unique nickname alternative is eliminated with aggregated 555 nicknames. With aggregated nicknames, each border TRILL switch that 556 will transition multi-destination packets can have a mapping between 557 Level 2 tree nicknames and Level 1 tree nicknames. There need not 558 even be agreement about the total number of trees; just that the 559 border TRILL switch have some mapping, and replace the egress TRILL 560 switch nickname (the tree name) when transitioning levels. 562 2.2.2.2 Swap Nickname Field Aggregated Nicknames 564 As a variant, two additional fields could exist in TRILL Data packets 565 we call the "ingress swap nickname field" and the "egress swap 566 nickname field". The changes in the example above would be as 567 follows: 569 - RB1 will have learned the area nickname of D and the TRILL switch 570 nickname of RB4 to which D is attached. In encapsulating a frame 571 to D, it puts the area nickname of D (15918) in the egress 572 nickname field of the TRILL Header and puts the nickname of RB3 573 (44) in a egress swap nickname field. 575 - RB2 moves the ingress nickname to the ingress swap nickname field 576 and inserts 15961, the area nickname for S, into the ingress 577 nickname field. 579 - RB3 swaps the egress nickname and the egress swap nickname fields, 580 which sets the egress nickname to 44. 582 - RB4 learns the correspondence between the source MAC/VLAN of S and 583 the { ingress nickname, ingress swap nickname field } pair as it 584 decapsulates and egresses the frame. 586 See [DraftAggregated] for a multilevel proposal using aggregated swap 587 nicknames. 589 2.2.2.3 Comparison 591 The Border Learning variant described in Section 2.2.2.1 above 592 minimizes the change in non-border TRILL switches but imposes the 593 burden on border TRILL switches of learning and doing lookups in all 594 the end station MAC addresses within their area(s) that are used for 595 communication outside the area. This burden could be reduced by 596 decreasing the area size and increasing the number of areas. 598 The Swap Nickname Field variant described in Section 2.2.2.2 599 eliminates the extra address learning burden on border TRILL switches 600 but requires more extensive changes to non-border TRILL switches. In 601 particular they must learn to associate both a TRILL switch nickname 602 and an area nickname with end station MAC/label pairs (except for 603 addresses that are local to their area). 605 The Swap Nickname Field alternative is more scalable but less 606 backward compatible for non-border TRILL switches. It would be 607 possible for border and other level 2 TRILL switches to support both 608 Border Learning, for support of legacy Level 1 TRILL switches, and 609 Swap Nickname, to support Level 1 TRILL switches that understood the 610 Swap Nickname method. 612 2.3 Building Multi-Area Trees 614 It is easy to build a multi-area tree by building a tree in each area 615 separately, (including the Level 2 "area"), and then having only a 616 single border TRILL switch, say RBx, in each area, attach to the 617 Level 2 area. RBx would forward all multi-destination packets 618 between that area and Level 2. 620 People might find this unacceptable, however, because of the desire 621 to path split (not always sending all multi-destination traffic 622 through the same border TRILL switch). 624 This is the same issue as with multiple ingress TRILL switches 625 injecting traffic from a pseudonode, and can be solved with the 626 mechanism that was adopted for that purpose: the affinity TLV 627 [DraftCMT]. For each tree in the area, at most one border RB 628 announces itself in an affinity TLV with that tree name. 630 2.4 The RPF Check for Trees 632 For multi-destination data originating locally in RBx's area, 633 computation of the RPF check is done as today. For multi-destination 634 packets originating outside RB1's area, computation of the RPF check 635 must be done based on which one of the border TRILL switches (say 636 RB1, RB2, or RB3) injected the packet into the area. 638 A TRILL switch, say RB4, located inside an area, must be able to know 639 which of RB1, RB2, or RB3 transitioned the packet into the area from 640 Level 2. (or into Level 2 from an area). 642 This could be done based on having the DBRB announce the transitioner 643 assignments to all the TRILL switches in the area, or the Affinity 644 TLV mechanism given in [DraftCMT], or the New Tree Encoding mechanism 645 discussed in Section 4.1.1. 647 2.5 Area Nickname Acquisition 649 In the aggregated nickname alternative, each area must acquire a 650 unique area nickname. It is probably simpler to allocate a block of 651 nicknames (say, the top 4000) to be area addresses, and not used by 652 any TRILL switches. 654 The area nicknames need to be advertised and acquired through Level 655 2. 657 Within an area, all the border TRILL switches must discover each 658 other through the Level 1 link state database, by advertising, in 659 their LSP "I am a border RBridge". 661 Of the border TRILL switches, one will have highest priority (say 662 RB7). RB7 can dynamically participate, in Level 2, to acquire a 663 nickname for the area. Alternatively, RB7 could give the area a 664 pseudonode IS-IS ID, such as RB7.5, within Level 2. So an area would 665 appear, in Level 2, as a pseudonode and the pseudonode can 666 participate, in Level 2, to acquire a nickname for the area. 668 Within Level 2, all the border TRILL switches for an area can 669 advertise reachability to the pseudonode, which would mean 670 connectivity to the area nickname. 672 2.6 Link State Representation of Areas 674 Within an area, say area A1, there is an election for the DBRB, 675 (Designated Border RBridge), say RB1. This can be done through LSPs 676 within area A1. The border TRILL switches announce themselves, 677 together with their DBRB priority. (Note that the election of the 678 DBRB cannot be done based on Hello messages, because the border TRILL 679 switches are not necessarily physical neighbors of each other. They 680 can, however, reach each other through connectivity within the area, 681 which is why it will work to find each other through Level 1 LSPs.) 683 RB1 acquires the area nickname (in the aggregated nickname approach), 684 gives the area a pseudonode IS-IS ID (just like the DRB would give a 685 pseudonode IS-IS ID to a link). RB1 advertises, in area A1, what the 686 pseudonode IS-IS ID for the area is (and the area nickname that RB1 687 has acquired). 689 The pseudonode LSP initiated by RB1 for the area includes any 690 information external to area A1 that should be input into area A1 691 (such as area nicknames of external areas, or perhaps (in the unique 692 nickname variant) all the nicknames of external TRILL switches in the 693 TRILL campus and pruning information such as multicast listeners and 694 labels). All the other border TRILL switches for the area announce 695 (in their LSP) attachment to that pseudonode. 697 Within Level 2, RB1 generates a Level 2 LSP on behalf of the area, 698 also represented as a pseudonode. The same pseudonode ID could be 699 used within Level 1 and Level 2, for the area. (There does not seem 700 any reason why it would be useful for it to be different, but there's 701 also no reason why it would need to be the same). Likewise, all the 702 area A1 border TRILL switches would announce, in their Level 2 LSPs, 703 connection to the pseudonode. 705 3. Area Partition 707 It is possible for an area to become partitioned, so that there is 708 still a path from one section of the area to the other, but that path 709 is via the Level 2 area. 711 With multilevel TRILL, an area will naturally break into two areas in 712 this case. 714 An area address might be configured to ensure two areas are not 715 inadvertently connected. That area address appears in Hellos and 716 LSPs within the area. If two chunks, connected only via Level 2, 717 were configured with the same area address, this would not cause any 718 problems. (They would just operate as separate Level 1 areas.) 720 A more serious problem occurs if the Level 2 area is partitioned in 721 such a way that it could be healed by using a path through a Level 1 722 area. TRILL will not attempt to solve this problem. Within the Level 723 1 area, a single border RBridge will be the DBRB, and will be in 724 charge of deciding which (single) RBridge will transition any 725 particular multi-destination packets between that area and Level 2. 726 If the Level 2 area is partitioned, this will result in multi- 727 destination data only reaching the portion of the TRILL campus 728 reachable through the partition attached to the TRILL switch that 729 transitions that packet. It will not cause a loop. 731 4. Multi-Destination Scope 733 There are at least two reasons it would be desirable to be able to 734 mark a multi-destination packet with a scope that indicates the 735 packet should not exit the area, as follows: 737 1. To address an issue in the border learning variant of the 738 aggregated nickname alternative, when a unicast packet turns into 739 a multi-destination packet when transitioning from Level 2 to 740 Level 1, as discussed in Section 4.1. 742 2. To constrain the broadcast domain for certain discovery, 743 directory, or service protocols as discussed in Section 4.2. 745 Multi-destination packet distribution scope restriction could be done 746 in a number of ways. For example, there could be a flag in the packet 747 that means "for this area only". However, the technique that might 748 require the least change to TRILL switch fast path logic would be to 749 indicate this in the egress nickname that designates the distribution 750 tree being used. There could be two general tree nicknames for each 751 tree, one being for distribution restricted to the area and the other 752 being for multi-area trees. Or there would be a set of N (perhaps 16) 753 special currently reserved nicknames used to specify the N highest 754 priority trees but with the variation that if the special nickname is 755 used for the tree, the packet is not transitioned between areas. Or 756 one or more special trees could be built that were restricted to the 757 local area. 759 4.1 Unicast to Multi-destination Conversions 761 In the border learning variant of the aggregated nickname 762 alternative, a unicast packet might be known at the Level 1 to Level 763 2 transition, be forwarded as a unicast packet to the least cost 764 border TRILL switch advertising connectivity to the destination area, 765 but turn out to have an unknown destination { MAC, Data Label } pair 766 when it arrives at that border TRILL switch. 768 In this case, the packet must be converted into a multi-destination 769 packet and flooded in the destination area. However, if the border 770 TRILL switch doing the conversion is not the border TRILL switch 771 designated to transition the resulting multi-destination packet, 772 there is the danger that the designated transitioner may pick up the 773 packet and flood it back into Level 2 from which it may be flooded 774 into multiple areas. This danger can be avoided by restricting any 775 multi-destination packet that results from such a conversion to the 776 destination area through a flag in the packet or though distributing 777 it on a tree that is restricted to the area, using a tree that is 778 limited to the area, or other techniques (see Section 4). 780 Alternatively, a multi-destination packet intended only for the area 781 could be tunneled (within the area) to the RBridge RBx, that is the 782 appointed transitioner for that form of packet (say, based on VLAN or 783 FGL), with instructions that RBx only transmit the packet within the 784 area, and RBx could initiate the multi-destination packet within the 785 area. Since RBx introduced the packet, and is the only one allowed 786 to transition that packet to Level 2, this would accomplish scoping 787 of the packet to within the area. Since this case only occurs in the 788 unusual case when unicast packets need to be turned into multi- 789 destination as described above, the suboptimality of tunneling 790 between the border TRILL switch that receives the unicast packet and 791 the appointed level transitioner for that packet, would not be an 792 issue. 794 4.1.1 New Tree Encoding 796 The current encoding, in a TRILL header, of a tree, is of the 797 nickname of the tree root. This requires all 16 bits of the egress 798 nickname field. TRILL could instead, for example, use the bottom 6 799 bits to encode the tree number (allowing 64 trees), leavinig 10 bits 800 to encode information such as: 802 o scope: a flag indicating whether it should be single area only, or 803 entire campus 804 o border injector: an indicator of which of the k border TRILL 805 switches injected this packet 807 If TRILL were to adopt this new encoding, it would also avoid the 808 limitations of the Affinity sub-TLV [DraftCMT] in the single area 809 case; any of the TRILL switches attached to a pseudonode could inject 810 a multi-destination packet. This would require all TRILL switches to 811 be changed to understand the new encoding for a tree, and it would 812 require a TLV in the LSP to indicate which number each of the TRILL 813 switches attached to the pseudonode would be. 815 4.2 Selective Broadcast Domain Reduction 817 There are a number of service, discovery, and directory protocols 818 that, for convenience, are accessed via multicast or broadcast 819 frames. Examples are DHCP, the NetBIOS Service Location Protocol, and 820 multicast DNS. 822 Some such protocols provide means to restrict distribution to an IP 823 subnet or equivalent to reduce size of the broadcast domain they are 824 using and then provide a proxy that can be placed in that subnet to 825 use unicast to access a service elsewhere. In cases where a proxy 826 mechanism is not currently defined, it may be possible to create one 827 that references a central server or cache. With multilevel TRILL, it 828 is possible to construct very large IP subnets that could become 829 saturated with multi-destination traffic of this type unless packets 830 can be further restricted in their distribution. Such restricted 831 distribution can be accomplished for some protocols, say protocol P, 832 as follows: 834 - Either (1) at all ingress TRILL switches in an area place all 835 protocol P multi-destination packets on a distribution tree in 836 such a way that the packets are restricted to the area or (2) at 837 all border TRILL switches between that area and Level 2, detect 838 protocol P multi-destination packets and do not transition them. 840 - Then place one, or a few for redundancy, protocol P proxyies 841 inside each area. These proxies unicast protocol P requests or 842 other messages to the actual campus server(s) for P. They also 843 receive unicast responses or other messages from those servers and 844 deliver them within the area via unicast, multicast, or broadcast 845 as appropriate. Such proxies would not be needed if it was 846 acceptable for all protocol P traffic to be restricted to an area. 848 While it might seem logical to connect the campus servers to TRILL 849 switches in Level 2, they could be placed within one or more areas so 850 that, in some cases, those areas might not require a local proxy 851 server. 853 5. Co-Existence with Old TRILL switches 855 TRILL switches that are not multilevel aware may have a problem with 856 calculating RPF Check and filtering information, since they would not 857 be aware of assignment of border TRILL switch transitioning. 859 A possible solution, as long as any old TRILL switches exist within 860 an area, is to have the border TRILL switches elect a single DBRB 861 (Designated Border RBridge), and have all inter-area traffic go 862 through the DBRB (unicast as well as multi-destination). If that 863 DBRB goes down, a new one will be elected, but at any one time, all 864 inter-area traffic (unicast as well as multi-destination) would go 865 through that one DRBR. However this eliminates load splitting at 866 level transition. 868 6. Multi-Access Links with End Stations 870 Care must be taken, in the case where there are multiple TRILL 871 switches on a link with end stations, that only one TRILL switch 872 ingress/egress any given data packet from/to the end nodes. With 873 existing, single level TRILL, this is done by electing a single 874 Designated RBridge per link, which appoints a single Appointed 875 Forwarder per VLAN [RFC7177] [RFC6439]. But suppose there are two 876 (or more) TRILL switches on a link; RB1 in area 1000, and RB2, in 877 area 2000, and that the link contains end nodes. If RB1 and RB2 878 ignore each other's Hellos then they will both ingress/egress end 879 node traffic from the link. 881 A simple rule is to use the TRILL switch or switches having the 882 lowest numbered area, comparing area numbers as unsigned integers, to 883 handle native traffic. This would automatically give multilevel- 884 ignorant legacy TRILL switches, that would be using area number zero, 885 highest priority for handling end stations, which they would try to 886 do anyway. 888 Other methods are possible. For example doing the selection of 889 Appointed Forwarders and of the TRILL switch in charge of that 890 selection across all TRILL switches on the link regardless of area. 891 However, a special case would then have to be made in any case for 892 legacy TRILL switches using area number zero. 894 7. Summary 896 This draft discusses issues and possible approaches to multilevel 897 TRILL. The alternative using area nicknames for aggregation has 898 significant advantages in terms of scalability over using campus wide 899 unique nicknames, not just of avoiding nickname exhaustion, but by 900 allowing RPF Checks to be aggregated based on an entire area; 901 however, the alternative using unique nicknames is simpler and avoids 902 the changes in border TRILL switches required to support aggregated 903 nicknames. It is possible to support both. For example, a TRILL 904 campus could use simpler unique nicknames until scaling begins to 905 cause problems and then start to introduce areas with aggregated 906 nicknames. 908 Some issues are not difficult, such as dealing with partitioned 909 areas. Some issues are more difficult, especially dealing with old 910 TRILL switches. 912 8. Security Considerations 914 This informational document explores alternatives for the use of 915 multilevel IS-IS in TRILL. It does not consider security issues. For 916 general TRILL Security Considerations, see [RFC6325]. 918 9. IANA Considerations 920 This document requires no IANA actions. RFC Editor: Please delete 921 this section before publication. 923 Normative References 925 [IS-IS] - ISO/IEC 10589:2002, Second Edition, "Intermediate System to 926 Intermediate System Intra-Domain Routing Exchange Protocol for 927 use in Conjunction with the Protocol for Providing the 928 Connectionless-mode Network Service (ISO 8473)", 2002. 930 [RFC6325] - Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 931 Ghanwani, "Routing Bridges (RBridges): Base Protocol 932 Specification", RFC 6325, July 2011. 934 [RFC6439] - Perlman, R., Eastlake, D., Li, Y., Banerjee, A., and F. 935 Hu, "Routing Bridges (RBridges): Appointed Forwarders", RFC 936 6439, November 2011. 938 [RFC7172] - Eastlake 3rd, D., Zhang, M., Agarwal, P., Perlman, R., 939 and D. Dutt, "Transparent Interconnection of Lots of Links 940 (TRILL): Fine-Grained Labeling", RFC 7172, May 2014 942 [RFC7176] - Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, 943 D., and A. Banerjee, "Transparent Interconnection of Lots of 944 Links (TRILL) Use of IS-IS", RFC 7176, May 2014. 946 [RFC7177] - Eastlake 3rd, D., Perlman, R., Ghanwani, A., Yang, H., 947 and V. Manral, "Transparent Interconnection of Lots of Links 948 (TRILL): Adjacency", RFC 7177, May 2014. 950 [RFC7180] - Eastlake 3rd, D., Zhang, M., Ghanwani, A., Manral, V., 951 and A. Banerjee, "Transparent Interconnection of Lots of Links 952 (TRILL): Clarifications, Corrections, and Updates", RFC 7180, 953 May 2014. 955 Informative References 957 [RFC6361] - Carlson, J. and D. Eastlake 3rd, "PPP Transparent 958 Interconnection of Lots of Links (TRILL) Protocol Control 959 Protocol", RFC 6361, August 2011. 961 [RFCesadi] - Hongjun Zhai, Fangwei Hu, Radia Perlman, Donald 962 Eastlake, Olen Stokes, draft-ietf-trill-esadi, in RFC Editor's 963 Queue. 965 [DraftAggregated] - Bhargav Bhikkaji, Balaji Venkat Venkataswami, 966 Narayana Perumal Swamy, "Connecting Disparate Data 967 Center/PBB/Campus TRILL sites using BGP", draft-balaji-trill- 968 over-ip-multi-level, Work In Progress. 970 [DraftCMT] - Tissa Senevirathne, Janardhanan Pathang, Jon Hudson, 971 "Coordinated Multicast Trees (CMT) for TRILL", draft-tissa- 972 trill-cmt, Work in Progress. 974 [DraftUnique] - Tissa Senevirathne, Les Ginsberg, Janardhanan 975 Pathangi, Jon Hudson, Sam Aldrin, Ayan Banerjee, Sameer 976 Merchant, "Default Nickname Based Approach for Multilevel 977 TRILL", draft-tissa-trill-multilevel, Work In Progress. 979 [MoreNicks] - draft-tissa-trill-mt-encode, Work In Progress. 981 [NickFlags] - Eastlake, D., W. Hao, draft-eastlake-trill-nick-label- 982 prop, Work In Progress. 984 Authors' Addresses 986 Radia Perlman 987 Intel Labs 988 2200 Mission College Blvd. 989 Santa Clara, CA 95054-1549 USA 991 Phone: +1-408-765-8080 992 Email: Radia@alum.mit.edu 994 Donald Eastlake 995 Huawei Technologies 996 155 Beaver Street 997 Milford, MA 01757 USA 999 Phone: +1-508-333-2270 1000 Email: d3e3e3@gmail.com 1002 Anoop Ghanwani 1003 Dell 1004 350 Holger Way 1005 San Jose, CA 95134 USA 1007 Phone: +1-408-571-3500 1008 Email: anoop@alumni.duke.edu 1010 Hongjun Zhai 1011 ZTE 1012 68 Zijinghua Road, Yuhuatai District 1013 Nanjing, Jiangsu 210012 China 1015 Phone: +86 25 52877345 1016 Email: zhai.hongjun@zte.com.cn 1018 Copyright and IPR Provisions 1020 Copyright (c) 2014 IETF Trust and the persons identified as the 1021 document authors. All rights reserved. 1023 This document is subject to BCP 78 and the IETF Trust's Legal 1024 Provisions Relating to IETF Documents 1025 (http://trustee.ietf.org/license-info) in effect on the date of 1026 publication of this document. Please review these documents 1027 carefully, as they describe your rights and restrictions with respect 1028 to this document. Code Components extracted from this document must 1029 include Simplified BSD License text as described in Section 4.e of 1030 the Trust Legal Provisions and are provided without warranty as 1031 described in the Simplified BSD License. The definitive version of 1032 an IETF Document is that published by, or under the auspices of, the 1033 IETF. Versions of IETF Documents that are published by third parties, 1034 including those that are translated into other languages, should not 1035 be considered to be definitive versions of IETF Documents. The 1036 definitive version of these Legal Provisions is that published by, or 1037 under the auspices of, the IETF. Versions of these Legal Provisions 1038 that are published by third parties, including those that are 1039 translated into other languages, should not be considered to be 1040 definitive versions of these Legal Provisions. For the avoidance of 1041 doubt, each Contributor to the IETF Standards Process licenses each 1042 Contribution that he or she makes as part of the IETF Standards 1043 Process to the IETF Trust pursuant to the provisions of RFC 5378. No 1044 language to the contrary, or terms, conditions or rights that differ 1045 from or are inconsistent with the rights and licenses granted under 1046 RFC 5378, shall have any effect and shall be null and void, whether 1047 published or posted by such Contributor, or included with or in such 1048 Contribution.