idnits 2.17.1 draft-ietf-trill-rbridge-multilevel-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 19, 2016) is 2929 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 6439 (Obsoleted by RFC 8139) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group Radia Perlman 2 INTERNET-DRAFT EMC 3 Intended status: Informational Donald Eastlake 4 Mingui Zhang 5 Huawei 6 Anoop Ghanwani 7 Dell 8 Hongjun Zhai 9 JIT 10 Expires: October 18, 2016 April 19, 2016 12 Alternatives for Multilevel TRILL 13 (Transparent Interconnection of Lots of Links) 14 16 Abstract 18 Extending TRILL to multiple levels has challenges that are not 19 addressed by the already-existing capability of IS-IS to have 20 multiple levels. One issue is with the handling of multi-destination 21 packet distribution trees. Another issue is with TRILL switch 22 nicknames. There have been two proposed approaches. One approach, 23 which we refer to as the "unique nickname" approach, gives unique 24 nicknames to all the TRILL switches in the multilevel campus, either 25 by having the level-1/level-2 border TRILL switches advertise which 26 nicknames are not available for assignment in the area, or by 27 partitioning the 16-bit nickname into an "area" field and a "nickname 28 inside the area" field. The other approach, which we refer to as the 29 "aggregated nickname" approach, involves hiding the nicknames within 30 areas, allowing nicknames to be reused in different areas, by having 31 the border TRILL switches rewrite the nickname fields when entering 32 or leaving an area. Each of those approaches has advantages and 33 disadvantages. This informational document suggests allowing a choice 34 of approach in each area. This allows the simplicity of the unique 35 nickname approach in installations in which there is no danger of 36 running out of nicknames and allows the complexity of hiding the 37 nicknames in an area to be phased into larger installations on a per- 38 area basis. 40 Status of This Memo 42 This Internet-Draft is submitted to IETF in full conformance with the 43 provisions of BCP 78 and BCP 79. Distribution of this document is 44 unlimited. Comments should be sent to the TRILL working group 45 mailing list . 47 Internet-Drafts are working documents of the Internet Engineering 48 Task Force (IETF), its areas, and its working groups. Note that 49 other groups may also distribute working documents as Internet- 50 Drafts. 52 Internet-Drafts are draft documents valid for a maximum of six months 53 and may be updated, replaced, or obsoleted by other documents at any 54 time. It is inappropriate to use Internet-Drafts as reference 55 material or to cite them other than as "work in progress." 57 The list of current Internet-Drafts can be accessed at 58 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 59 Shadow Directories can be accessed at 60 http://www.ietf.org/shadow.html. 62 Table of Contents 64 1. Introduction............................................4 65 1.1 TRILL Scalability Issues...............................4 66 1.2 Improvements Due to Multilevel.........................5 67 1.3 Unique and Aggregated Nicknames........................6 68 1.3 More on Areas..........................................6 69 1.4 Terminology and Acronyms...............................7 71 2. Multilevel TRILL Issues.................................8 72 2.1 Non-zero Area Addresses................................9 73 2.2 Aggregated versus Unique Nicknames.....................9 74 2.2.1 More Details on Unique Nicknames....................10 75 2.2.2 More Details on Aggregated Nicknames................11 76 2.2.2.1 Border Learning Aggregated Nicknames..............12 77 2.2.2.2 Swap Nickname Field Aggregated Nicknames..........14 78 2.2.2.3 Comparison........................................14 79 2.3 Building Multi-Area Trees.............................15 80 2.4 The RPF Check for Trees...............................15 81 2.5 Area Nickname Acquisition.............................16 82 2.6 Link State Representation of Areas....................16 84 3. Area Partition.........................................18 86 4. Multi-Destination Scope................................19 87 4.1 Unicast to Multi-destination Conversions..............19 88 4.1.1 New Tree Encoding...................................20 89 4.2 Selective Broadcast Domain Reduction..................20 91 5. Co-Existence with Old TRILL switches...................22 92 6. Multi-Access Links with End Stations...................23 93 7. Summary................................................24 95 8. Security Considerations................................25 96 9. IANA Considerations....................................25 98 Normative References......................................26 99 Informative References....................................26 101 Acknowledgements..........................................28 102 Authors' Addresses........................................29 104 1. Introduction 106 The IETF TRILL (Transparent Interconnection of Lot of Links or 107 Tunneled Routing in the Link Layer) protocol [RFC6325] [RFC7177] 108 provides optimal pair-wise data routing without configuration, safe 109 forwarding even during periods of temporary loops, and support for 110 multipathing of both unicast and multicast traffic in networks with 111 arbitrary topology and link technology, including multi-access links. 112 TRILL accomplishes this by using IS-IS (Intermediate System to 113 Intermediate System [IS-IS] [RFC7176]) link state routing in 114 conjunction with a header that includes a hop count. The design 115 supports data labels (VLANs and Fine Grained Labels [RFC7172]) and 116 optimization of the distribution of multi-destination data based on 117 VLANs and multicast groups. Devices that implement TRILL are called 118 TRILL Switches or RBridges. 120 Familiarity with [IS-IS], [RFC6325], and [RFC7780] is assumed in this 121 document. 123 1.1 TRILL Scalability Issues 125 There are multiple issues that might limit the scalability of a 126 TRILL-based network: 128 1. the routing computation load, 129 2. the volatility of the link state database (LSDB) creating too much 130 control traffic, 131 3. the volatility of the LSDB causing the TRILL network to be in an 132 unconverged state too much of the time, 133 4. the size of the LSDB, 134 5. the limit of the number of TRILL switches, due to the 16-bit 135 nickname space, 136 6. the traffic due to upper layer protocols use of broadcast and 137 multicast, and 138 7. the size of the end node learning table (the table that remembers 139 (egress TRILL switch, label/MAC) pairs). 141 Extending TRILL IS-IS to be multilevel (hierarchical) helps with all 142 but the last of these issues. 144 IS-IS was designed to be multilevel [IS-IS]. A network can be 145 partitioned into "areas". Routing within an area is known as "Level 146 1 routing". Routing between areas is known as "Level 2 routing". 147 The Level 2 IS-IS network consists of Level 2 routers and links 148 between the Level 2 routers. Level 2 routers may participate in one 149 or more Level 1 areas, in addition to their role as Level 2 routers. 151 Each area is connected to Level 2 through one or more "border 152 routers", which participate both as a router inside the area, and as 153 a router inside the Level 2 "area". Care must be taken that it is 154 clear, when transitioning multi-destination packets between Level 2 155 and a Level 1 area in either direction, that exactly one border TRILL 156 switch will transition a particular data packet between the levels or 157 else duplication or loss of traffic can occur. 159 1.2 Improvements Due to Multilevel 161 Partitioning the network into areas solves the first four scalability 162 issues described above, namely, 164 1. the routing computation load, 166 2. the volatility of the LSDB creating too much control traffic, 168 3. the volatility of the LSDB causing the TRILL network to be in an 169 unconverged state too much of the time, 171 4. the size of the LSDB. 173 Problem #6 in Section 1.1, namely, the traffic due to upper layer 174 protocols use of broadcast and multicast, can be addressed by 175 introducing a locally-scoped multi-destination delivery, limited to 176 an area or a single link. See further discussion in Section 4.2. 178 Problem #5 in Section 1.1, namely, the limit of the number of TRILL 179 switches, due to the 16-bit nickname space, will only be addressed 180 with the aggregated nickname approach. Since the aggregated nickname 181 approach requires some complexity in the border TRILL switches (for 182 rewriting the nicknames in the TRILL header), the design in this 183 document allows a campus with a mixture of unique-nickname areas, and 184 aggregated-nickname areas. Nicknames must be unique across all Level 185 2 and unique-nickname area TRILL switches, whereas nicknames inside 186 an aggregated-nickname area are visible only inside the area. 187 Nicknames inside an aggregated-nickname area must not conflict with 188 nicknames visible in Level 2 (which includes all nicknames inside 189 unique nickname areas), but the nicknames inside an aggregated- 190 nickname area may be the same as nicknames used within other 191 aggregated-nickname areas. 193 TRILL switches within an area need not be aware of whether they are 194 in an aggregated nickname area or a unique nickname area. The border 195 TRILL switches in area A1 will claim, in their LSP inside area A1, 196 which nicknames (or nickname ranges) are not available for choosing 197 as nicknames by area A1 TRILL switches. 199 1.3 Unique and Aggregated Nicknames 201 We describe two alternatives for hierarchical or multilevel TRILL. 202 One we call the "unique nickname" alternative. The other we call the 203 "aggregated nickname" alternative. In the aggregated nickname 204 alternative, border TRILL switches replace either the ingress or 205 egress nickname field in the TRILL header of unicast packets with an 206 aggregated nickname representing an entire area. 208 The unique nickname alternative has the advantage that border TRILL 209 switches are simpler and do not need to do TRILL Header nickname 210 modification. It also simplifies testing and maintenance operations 211 that originate in one area and terminate in a different area. 213 The aggregated nickname alternative has the following advantages: 215 o it solves problem #5 above, the 16-bit nickname limit, in a 216 simple way, 217 o it lessens the amount of inter-area routing information that 218 must be passed in IS-IS, and 219 o it logically reduces the RPF (Reverse Path Forwarding) Check 220 information (since only the area nickname needs to appear, 221 rather than all the ingress TRILL switches in that area). 223 In both cases, it is possible and advantageous to compute multi- 224 destination data packet distribution trees such that the portion 225 computed within a given area is rooted within that area. 227 1.3 More on Areas 229 Each area is configured with an "area address", which is advertised 230 in IS-IS messages, so as to avoid accidentally interconnecting areas. 231 Although the area address had other purposes in CLNP (Connectionless 232 Network Layer Protocol, IS-IS was originally designed for 233 CLNP/DECnet), for TRILL the only purpose of the area address would be 234 to avoid accidentally interconnecting areas. 236 Currently, the TRILL specification says that the area address must be 237 zero. If we change the specification so that the area address value 238 of zero is just a default, then most of IS-IS multilevel machinery 239 works as originally designed. However, there are TRILL-specific 240 issues, which we address below in this document. 242 1.4 Terminology and Acronyms 244 This document generally uses the acronyms defined in [RFC6325] plus 245 the additional acronym DBRB. However, for ease of reference, most 246 acronyms used are listed here: 248 CLNP - ConnectionLess Network Protocol 250 DECnet - a proprietary routing protocol that was used by Digital 251 Equipment Corporation. "DECnet Phase 5" was the origin of IS-IS. 253 Data Label - VLAN or Fine Grained Label [RFC7172] 255 DBRB - Designated Border RBridge 257 ESADI - End Station Address Distribution Information 259 IS-IS - Intermediate System to Intermediate System [IS-IS] 261 LSDB - Link State Data Base 263 LSP - Link State PDU 265 PDU - Protocol Data Unit 267 RBridge - Routing Bridge, an alterntive name for a TRILL switch 269 RPF - Reverse Path Forwarding 271 TLV - Type Length Value 273 TRILL - Transparent Interconnection of Lots of Links or Tunneled 274 Routing in the Link Layer [RFC6325] 276 TRILL switch - a device that implements the TRILL protcol 277 [RFC6325], sometimes called an RBridge 279 VLAN - Virtual Local Area Network 281 2. Multilevel TRILL Issues 283 The TRILL-specific issues introduced by multilevel include the 284 following: 286 a. Configuration of non-zero area addresses, encoding them in IS-IS 287 PDUs, and possibly interworking with old TRILL switches that do 288 not understand nonzero area addresses. 290 See Section 2.1. 292 b. Nickname management. 294 See Sections 2.5 and 2.2. 296 c. Advertisement of pruning information (Data Label reachability, IP 297 multicast addresses) across areas. 299 Distribution tree pruning information is only an optimization, 300 as long as multi-destination packets are not prematurely 301 pruned. For instance, border TRILL switches could advertise 302 they can reach all possible Data Labels, and have an IP 303 multicast router attached. This would cause all multi- 304 destination traffic to be transmitted to border TRILL switches, 305 and possibly pruned there, when the traffic could have been 306 pruned earlier based on Data Label or multicast group if border 307 TRILL switches advertised more detailed Data Label and/or 308 multicast listener and multicast router attachment information. 310 d. Computation of distribution trees across areas for multi- 311 destination data. 313 See Section 2.3. 315 e. Computation of RPF information for those distribution trees. 317 See Section 2.4. 319 f. Computation of pruning information across areas. 321 See Sections 2.3 and 2.6. 323 g. Compatibility, as much as practical, with existing, unmodified 324 TRILL switches. 326 The most important form of compatibility is with existing TRILL 327 fast path hardware. Changes that require upgrade to the slow 328 path firmware/software are more tolerable. Compatibility for 329 the relatively small number of border TRILL switches is less 330 important than compatibility for non-border TRILL switches. 332 See Section 5. 334 2.1 Non-zero Area Addresses 336 The current TRILL base protocol specification [RFC6325] [RFC7177] 337 [RFC7780] says that the area address in IS-IS must be zero. The 338 purpose of the area address is to ensure that different areas are not 339 accidentally merged. Furthermore, zero is an invalid area address 340 for layer 3 IS-IS, so it was chosen as an additional safety mechanism 341 to ensure that layer 3 IS-IS would not be confused with TRILL IS-IS. 342 However, TRILL uses other techniques to avoid such confusion, such as 343 different multicast addresses and Ethertypes on Ethernet [RFC6325], 344 different PPP (Point-to-Point Protocol) codepoints on PPP [RFC6361], 345 and the like. Thus, using an area address in TRILL that might be used 346 in layer 3 IS-IS is not a problem. 348 Since current TRILL switches will reject any IS-IS messages with 349 nonzero area addresses, the choices are as follows: 351 a.1 upgrade all TRILL switches that are to interoperate in a 352 potentially multilevel environment to understand non-zero area 353 addresses, 354 a.2 neighbors of old TRILL switches must remove the area address from 355 IS-IS messages when talking to an old TRILL switch (which might 356 break IS-IS security and/or cause inadvertent merging of areas), 357 a.3 ignore the problem of accidentally merging areas entirely, or 358 a.4 keep the fixed "area address" field as 0 in TRILL, and add a new, 359 optional TLV for "area name" to Hellos that, if present, could be 360 compared, by new TRILL switches, to prevent accidental area 361 merging. 363 In principal, different solutions could be used in different areas 364 but it would be much simpler to adopt one of these choices uniformly. 366 2.2 Aggregated versus Unique Nicknames 368 In the unique nickname alternative, all nicknames across the campus 369 must be unique. In the aggregated nickname alternative, TRILL switch 370 nicknames within an aggregated area are only of local significance, 371 and the only nickname externally (outside that area) visible is the 372 "area nickname" (or nicknames), which aggregates all the internal 373 nicknames. 375 The unique nickname approach simplifies border TRILL switches. 377 The aggregated nickname approach eliminates the potential problem of 378 nickname exhaustion, minimizes the amount of nickname information 379 that would need to be forwarded between areas, minimizes the size of 380 the forwarding table, and simplifies RPF calculation and RPF 381 information. 383 2.2.1 More Details on Unique Nicknames 385 With unique cross-area nicknames, it would be intractable to have a 386 flat nickname space with TRILL switches in different areas contending 387 for the same nicknames. Instead, each area would need to be 388 configured with a block of nicknames. Either some TRILL switches 389 would need to announce that all the nicknames other than that block 390 are taken (to prevent the TRILL switches inside the area from 391 choosing nicknames outside the area's nickname block), or a new TLV 392 would be needed to announce the allowable nicknames, and all TRILL 393 switches in the area would need to understand that new TLV. An 394 example of the second approach is given in [NickFlags]. 396 Currently the encoding of nickname information in TLVs is by listing 397 of individual nicknames; this would make it painful for a border 398 TRILL switch to announce into an area that it is holding all other 399 nicknames to limit the nicknames available within that area. The 400 information could be encoded as ranges of nicknames to make this 401 somewhat manageable [NickFlags]; however, a new TLV for announcing 402 nickname ranges would not be intelligible to old TRILL switches. 404 There is also an issue with the unique nicknames approach in building 405 distribution trees, as follows: 407 With unique nicknames in the TRILL campus and TRILL header 408 nicknames not rewritten by the border TRILL switches, there would 409 have to be globally known nicknames for the trees. Suppose there 410 are k trees. For all of the trees with nicknames located outside 411 an area, the local trees would be rooted at a border TRILL switch 412 or switches. Therefore, there would be either no splitting of 413 multi-destination traffic with the area or restricted splitting of 414 multi-destination traffic between trees rooted at a highly 415 restricted set of TRILL switches. 417 As an alternative, just the "egress nickname" field of multi- 418 destination TRILL Data packets could be mapped at the border, 419 leaving known unicast packets un-mapped. However, this surrenders 420 much of the unique nickname advantage of simpler border TRILL 421 switches. 423 Scaling to a very large campus with unique nicknames might exhaust 424 the 16-bit TRILL nicknames space. One method might be to expand 425 nicknames to 24 bits; however, that technique would require TRILL 426 message format changes and that all TRILL switches in the campus 427 understand larger nicknames. 429 2.2.2 More Details on Aggregated Nicknames 431 The aggregated nickname approach enables passing far less nickname 432 information. It works as follows, assuming both the source and 433 destination areas are using aggregated nicknames: 435 There are two ways areas could be identified. 437 One method would be to assign each area a 16-bit nickname. This 438 would not be the nickname of any actual TRILL switch. Instead, it 439 would be the nickname of the area itself. Border TRILL switches 440 would know the area nickname for their own area(s). For an 441 example of a more specific multilevel proposal using unique 442 nicknames, see [DraftUnique]. 444 Alternatively, areas could be identified by the set of nicknames 445 that identify the border routers for that area. (See [SingleName] 446 for a multilevel proposal using such a set of nicknames.) 448 The TRILL Header nickname fields in TRILL Data packets being 449 transported through a multilevel TRILL campus with aggregated 450 nicknames are as follows: 452 - When both the ingress and egress TRILL switches are in the same 453 area, there need be no change from the existing base TRILL 454 protocol standard in the TRILL Header nickname fields. 456 - When being transported in Level 2, the ingress nickname is the 457 nickname of the ingress TRILL switch's area while the egress 458 nickname is either the nickname of the egress TRILL switch's 459 area or a tree nickname. 461 - When being transported from Level 1 to Level 2, the ingress 462 nickname is the nickname of the ingress TRILL switch itself 463 while the egress nickname is either a nickname for the area of 464 the egress TRILL switch or a tree nickname. 466 - When being transported from Level 2 to Level 1, the ingress 467 nickname is a nickname for the ingress TRILL switch's area while 468 the egress nickname is either the nickname of the egress TRILL 469 switch itself or a tree nickname. 471 There are two variations of the aggregated nickname approach. The 472 first is the Border Learning approach, which is described in Section 473 2.2.2.1. The second is the Swap Nickname Field approach, which is 474 described in Section 2.2.2.2. Section 2.2.2.3 compares the advantages 475 and disadvantages of these two variations of the aggregated nickname 476 approach. 478 2.2.2.1 Border Learning Aggregated Nicknames 480 This section provides an illustrative example and description of the 481 border learning variation of aggregated nicknames where a single 482 nickname is used to identify an area. 484 In the following picture, RB2 and RB3 are area border TRILL switches 485 (RBridges). A source S is attached to RB1. The two areas have 486 nicknames 15961 and 15918, respectively. RB1 has a nickname, say 27, 487 and RB4 has a nickname, say 44 (and in fact, they could even have the 488 same nickname, since the TRILL switch nickname will not be visible 489 outside these aggregated areas). 491 Area 15961 level 2 Area 15918 492 +-------------------+ +-----------------+ +--------------+ 493 | | | | | | 494 | S--RB1---Rx--Rz----RB2---Rb---Rc--Rd---Re--RB3---Rk--RB4---D | 495 | 27 | | | | 44 | 496 | | | | | | 497 +-------------------+ +-----------------+ +--------------+ 499 Let's say that S transmits a frame to destination D, which is 500 connected to RB4, and let's say that D's location has already been 501 learned by the relevant TRILL switches. These relevant switches have 502 learned the following: 504 1) RB1 has learned that D is connected to nickname 15918 505 2) RB3 has learned that D is attached to nickname 44. 507 The following sequence of events will occur: 509 - S transmits an Ethernet frame with source MAC = S and destination 510 MAC = D. 512 - RB1 encapsulates with a TRILL header with ingress RBridge = 27, 513 and egress = 15918 producing a TRILL Data packet. 515 - RB2 has announced in the Level 1 IS-IS instance in area 15961, 516 that it is attached to all the area nicknames, including 15918. 517 Therefore, IS-IS routes the packet to RB2. Alternatively, if a 518 distinguished range of nicknames is used for Level 2, Level 1 519 TRILL switches seeing such an egress nickname will know to route 520 to the nearest border router, which can be indicated by the IS-IS 521 attached bit. 523 - RB2, when transitioning the packet from Level 1 to Level 2, 524 replaces the ingress TRILL switch nickname with the area nickname, 525 so replaces 27 with 15961. Within Level 2, the ingress RBridge 526 field in the TRILL header will therefore be 15961, and the egress 527 RBridge field will be 15918. Also RB2 learns that S is attached to 528 nickname 27 in area 15961 to accommodate return traffic. 530 - The packet is forwarded through Level 2, to RB3, which has 531 advertised, in Level 2, reachability to the nickname 15918. 533 - RB3, when forwarding into area 15918, replaces the egress nickname 534 in the TRILL header with RB4's nickname (44). So, within the 535 destination area, the ingress nickname will be 15961 and the 536 egress nickname will be 44. 538 - RB4, when decapsulating, learns that S is attached to nickname 539 15961, which is the area nickname of the ingress. 541 Now suppose that D's location has not been learned by RB1 and/or RB3. 542 What will happen, as it would in TRILL today, is that RB1 will 543 forward the packet as multi-destination, choosing a tree. As the 544 multi-destination packet transitions into Level 2, RB2 replaces the 545 ingress nickname with the area nickname. If RB1 does not know the 546 location of D, the packet must be flooded, subject to possible 547 pruning, in Level 2 and, subject to possible pruning, from Level 2 548 into every Level 1 area that it reaches on the Level 2 distribution 549 tree. 551 Now suppose that RB1 has learned the location of D (attached to 552 nickname 15918), but RB3 does not know where D is. In that case, RB3 553 must turn the packet into a multi-destination packet within area 554 15918. In this case, care must be taken so that in the case in which 555 RB3 is not the Designated transitioner between Level 2 and its area 556 for that multi-destination packet, but was on the unicast path, that 557 border TRILL switch in that area does not forward the now multi- 558 destination packet back into Level 2. Therefore, it would be 559 desirable to have a marking, somehow, that indicates the scope of 560 this packet's distribution to be "only this area" (see also Section 561 4). 563 In cases where there are multiple transitioners for unicast packets, 564 the border learning mode of operation requires that the address 565 learning between them be shared by some protocol such as running 566 ESADI [RFC7357] for all Data Labels of interest to avoid excessive 567 unknown unicast flooding. 569 The potential issue described at the end of Section 2.2.1 with trees 570 in the unique nickname alternative is eliminated with aggregated 571 nicknames. With aggregated nicknames, each border TRILL switch that 572 will transition multi-destination packets can have a mapping between 573 Level 2 tree nicknames and Level 1 tree nicknames. There need not 574 even be agreement about the total number of trees; just that the 575 border TRILL switch have some mapping, and replace the egress TRILL 576 switch nickname (the tree name) when transitioning levels. 578 2.2.2.2 Swap Nickname Field Aggregated Nicknames 580 As a variant, two additional fields could exist in TRILL Data packets 581 we call the "ingress swap nickname field" and the "egress swap 582 nickname field". The changes in the example above would be as 583 follows: 585 - RB1 will have learned the area nickname of D and the TRILL switch 586 nickname of RB4 to which D is attached. In encapsulating a frame 587 to D, it puts an area nickname of D (15918) in the egress nickname 588 field of the TRILL Header and puts a nickname of RB3 (44) in a 589 egress swap nickname field. 591 - RB2 moves the ingress nickname to the ingress swap nickname field 592 and inserts 15961, an area nickname for S, into the ingress 593 nickname field. 595 - RB3 swaps the egress nickname and the egress swap nickname fields, 596 which sets the egress nickname to 44. 598 - RB4 learns the correspondence between the source MAC/VLAN of S and 599 the { ingress nickname, ingress swap nickname field } pair as it 600 decapsulates and egresses the frame. 602 See [DraftAggregated] for a multilevel proposal using aggregated swap 603 nicknames with a single nickname representing an area. 605 2.2.2.3 Comparison 607 The Border Learning variant described in Section 2.2.2.1 above 608 minimizes the change in non-border TRILL switches but imposes the 609 burden on border TRILL switches of learning and doing lookups in all 610 the end station MAC addresses within their area(s) that are used for 611 communication outside the area. This burden could be reduced by 612 decreasing the area size and increasing the number of areas. 614 The Swap Nickname Field variant described in Section 2.2.2.2 615 eliminates the extra address learning burden on border TRILL switches 616 but requires more extensive changes to non-border TRILL switches. In 617 particular they must learn to associate both a TRILL switch nickname 618 and an area nickname with end station MAC/label pairs (except for 619 addresses that are local to their area). 621 The Swap Nickname Field alternative is more scalable but less 622 backward compatible for non-border TRILL switches. It would be 623 possible for border and other level 2 TRILL switches to support both 624 Border Learning, for support of legacy Level 1 TRILL switches, and 625 Swap Nickname, to support Level 1 TRILL switches that understood the 626 Swap Nickname method. 628 2.3 Building Multi-Area Trees 630 It is easy to build a multi-area tree by building a tree in each area 631 separately, (including the Level 2 "area"), and then having only a 632 single border TRILL switch, say RBx, in each area, attach to the 633 Level 2 area. RBx would forward all multi-destination packets 634 between that area and Level 2. 636 People might find this unacceptable, however, because of the desire 637 to path split (not always sending all multi-destination traffic 638 through the same border TRILL switch). 640 This is the same issue as with multiple ingress TRILL switches 641 injecting traffic from a pseudonode, and can be solved with the 642 mechanism that was adopted for that purpose: the affinity TLV 643 [RFC7783]. For each tree in the area, at most one border RB 644 announces itself in an affinity TLV with that tree name. 646 2.4 The RPF Check for Trees 648 For multi-destination data originating locally in RBx's area, 649 computation of the RPF check is done as today. For multi-destination 650 packets originating outside RBx's area, computation of the RPF check 651 must be done based on which one of the border TRILL switches (say 652 RB1, RB2, or RB3) injected the packet into the area. 654 A TRILL switch, say RB4, located inside an area, must be able to know 655 which of RB1, RB2, or RB3 transitioned the packet into the area from 656 Level 2. (or into Level 2 from an area). 658 This could be done based on having the DBRB announce the transitioner 659 assignments to all the TRILL switches in the area, or the Affinity 660 TLV mechanism given in [RFC7783], or the New Tree Encoding mechanism 661 discussed in Section 4.1.1. 663 2.5 Area Nickname Acquisition 665 In the aggregated nickname alternative, each area must acquire a 666 unique area nickname. It is probably simpler to allocate a block of 667 nicknames (say, the top 4000) to be area addresses, and not used by 668 any TRILL switches. 670 The nicknames used for area identification need to be advertised and 671 acquired through Level 2. 673 Within an area, all the border TRILL switches can discover each other 674 through the Level 1 link state database, by using the IS-IS attach 675 bit or by explicitly advertising in their LSP "I am a border 676 RBridge". 678 Of the border TRILL switches, one will have highest priority (say 679 RB7). RB7 can dynamically participate, in Level 2, to acquire a 680 nickname for identifying the area. Alternatively, RB7 could give the 681 area a pseudonode IS-IS ID, such as RB7.5, within Level 2. So an 682 area would appear, in Level 2, as a pseudonode and the pseudonode 683 could participate, in Level 2, to acquire a nickname for the area. 685 Within Level 2, all the border TRILL switches for an area can 686 advertise reachability to the area, which would mean connectivity to 687 a nickname identifying the area. 689 2.6 Link State Representation of Areas 691 Within an area, say area A1, there is an election for the DBRB, 692 (Designated Border RBridge), say RB1. This can be done through LSPs 693 within area A1. The border TRILL switches announce themselves, 694 together with their DBRB priority. (Note that the election of the 695 DBRB cannot be done based on Hello messages, because the border TRILL 696 switches are not necessarily physical neighbors of each other. They 697 can, however, reach each other through connectivity within the area, 698 which is why it will work to find each other through Level 1 LSPs.) 700 RB1 acquires an area nickname (in the aggregated nickname approach) 701 and may give the area a pseudonode IS-IS ID (just like the DRB would 702 give a pseudonode IS-IS ID to a link) depending on how the area 703 nickname is handled. RB1 advertises, in area A1, an area nickname 704 that RB1 has acquired (and what the pseudonode IS-IS ID for the area 705 is if needed). 707 Level 1 LSPs (possibly pseudonode) initiated by RB1 for the area 708 include any information external to area A1 that should be input into 709 area A1 (such as nicknames of external areas, or perhaps (in the 710 unique nickname variant) all the nicknames of external TRILL switches 711 in the TRILL campus and pruning information such as multicast 712 listeners and labels). All the other border TRILL switches for the 713 area announce (in their LSP) attachment to that area. 715 Within Level 2, RB1 generates a Level 2 LSP on behalf of the area. 716 The same pseudonode ID could be used within Level 1 and Level 2, for 717 the area. (There does not seem any reason why it would be useful for 718 it to be different, but there's also no reason why it would need to 719 be the same). Likewise, all the area A1 border TRILL switches would 720 announce, in their Level 2 LSPs, connection to the area. 722 3. Area Partition 724 It is possible for an area to become partitioned, so that there is 725 still a path from one section of the area to the other, but that path 726 is via the Level 2 area. 728 With multilevel TRILL, an area will naturally break into two areas in 729 this case. 731 Area addresses might be configured to ensure two areas are not 732 inadvertently connected. Area addresses appear in Hellos and LSPs 733 within the area. If two chunks, connected only via Level 2, were 734 configured with the same area address, this would not cause any 735 problems. (They would just operate as separate Level 1 areas.) 737 A more serious problem occurs if the Level 2 area is partitioned in 738 such a way that it could be healed by using a path through a Level 1 739 area. TRILL will not attempt to solve this problem. Within the Level 740 1 area, a single border RBridge will be the DBRB, and will be in 741 charge of deciding which (single) RBridge will transition any 742 particular multi-destination packets between that area and Level 2. 743 If the Level 2 area is partitioned, this will result in multi- 744 destination data only reaching the portion of the TRILL campus 745 reachable through the partition attached to the TRILL switch that 746 transitions that packet. It will not cause a loop. 748 4. Multi-Destination Scope 750 There are at least two reasons it would be desirable to be able to 751 mark a multi-destination packet with a scope that indicates the 752 packet should not exit the area, as follows: 754 1. To address an issue in the border learning variant of the 755 aggregated nickname alternative, when a unicast packet turns into 756 a multi-destination packet when transitioning from Level 2 to 757 Level 1, as discussed in Section 4.1. 759 2. To constrain the broadcast domain for certain discovery, 760 directory, or service protocols as discussed in Section 4.2. 762 Multi-destination packet distribution scope restriction could be done 763 in a number of ways. For example, there could be a flag in the packet 764 that means "for this area only". However, the technique that might 765 require the least change to TRILL switch fast path logic would be to 766 indicate this in the egress nickname that designates the distribution 767 tree being used. There could be two general tree nicknames for each 768 tree, one being for distribution restricted to the area and the other 769 being for multi-area trees. Or there would be a set of N (perhaps 16) 770 special currently reserved nicknames used to specify the N highest 771 priority trees but with the variation that if the special nickname is 772 used for the tree, the packet is not transitioned between areas. Or 773 one or more special trees could be built that were restricted to the 774 local area. 776 4.1 Unicast to Multi-destination Conversions 778 In the border learning variant of the aggregated nickname 779 alternative, the following situation may occur: 780 - a unicast packet might be known at the Level 1 to Level 2 781 transition and be forwarded as a unicast packet to the least cost 782 border TRILL switch advertising connectivity to the destination 783 area, but 784 - upon arriving at the border TRILL switch, it turns out to have an 785 unknown destination { MAC, Data Label } pair. 787 In this case, the packet must be converted into a multi-destination 788 packet and flooded in the destination area. However, if the border 789 TRILL switch doing the conversion is not the border TRILL switch 790 designated to transition the resulting multi-destination packet, 791 there is the danger that the designated transitioner may pick up the 792 packet and flood it back into Level 2 from which it may be flooded 793 into multiple areas. This danger can be avoided by restricting any 794 multi-destination packet that results from such a conversion to the 795 destination area through a flag in the packet or though distributing 796 it on a tree that is restricted to the area, or other techniques (see 797 Section 4). 799 Alternatively, a multi-destination packet intended only for the area 800 could be tunneled (within the area) to the RBridge RBx, that is the 801 appointed transitioner for that form of packet (say, based on VLAN or 802 FGL), with instructions that RBx only transmit the packet within the 803 area, and RBx could initiate the multi-destination packet within the 804 area. Since RBx introduced the packet, and is the only one allowed 805 to transition that packet to Level 2, this would accomplish scoping 806 of the packet to within the area. Since this case only occurs in the 807 unusual case when unicast packets need to be turned into multi- 808 destination as described above, the suboptimality of tunneling 809 between the border TRILL switch that receives the unicast packet and 810 the appointed level transitioner for that packet, would not be an 811 issue. 813 4.1.1 New Tree Encoding 815 The current encoding, in a TRILL header, of a tree, is of the 816 nickname of the tree root. This requires all 16 bits of the egress 817 nickname field. TRILL could instead, for example, use the bottom 6 818 bits to encode the tree number (allowing 64 trees), leaving 10 bits 819 to encode information such as: 821 o scope: a flag indicating whether it should be single area only, or 822 entire campus 823 o border injector: an indicator of which of the k border TRILL 824 switches injected this packet 826 If TRILL were to adopt this new encoding, any of the TRILL switches 827 in an edge group could inject a multi-destination packet. This would 828 require all TRILL switches to be changed to understand the new 829 encoding for a tree, and it would require a TLV in the LSP to 830 indicate which number each of the TRILL switches in an edge group 831 would be. 833 4.2 Selective Broadcast Domain Reduction 835 There are a number of service, discovery, and directory protocols 836 that, for convenience, are accessed via multicast or broadcast 837 frames. Examples are DHCP, (Dynamic Host Configuration Protocol) the 838 NetBIOS Service Location Protocol, and multicast DNS (Domain Name 839 Service). 841 Some such protocols provide means to restrict distribution to an IP 842 subnet or equivalent to reduce size of the broadcast domain they are 843 using and then provide a proxy that can be placed in that subnet to 844 use unicast to access a service elsewhere. In cases where a proxy 845 mechanism is not currently defined, it may be possible to create one 846 that references a central server or cache. With multilevel TRILL, it 847 is possible to construct very large IP subnets that could become 848 saturated with multi-destination traffic of this type unless packets 849 can be further restricted in their distribution. Such restricted 850 distribution can be accomplished for some protocols, say protocol P, 851 in a variety of ways including the following: 853 - Either (1) at all ingress TRILL switches in an area place all 854 protocol P multi-destination packets on a distribution tree in 855 such a way that the packets are restricted to the area or (2) at 856 all border TRILL switches between that area and Level 2, detect 857 protocol P multi-destination packets and do not transition them. 859 - Then place one, or a few for redundancy, protocol P proxies inside 860 each area where protocol P may be in use. These proxies unicast 861 protocol P requests or other messages to the actual campus 862 server(s) for P. They also receive unicast responses or other 863 messages from those servers and deliver them within the area via 864 unicast, multicast, or broadcast as appropriate. (Such proxies 865 would not be needed if it was acceptable for all protocol P 866 traffic to be restricted to an area.) 868 While it might seem logical to connect the campus servers to TRILL 869 switches in Level 2, they could be placed within one or more areas so 870 that, in some cases, those areas might not require a local proxy 871 server. 873 5. Co-Existence with Old TRILL switches 875 TRILL switches that are not multilevel aware may have a problem with 876 calculating RPF Check and filtering information, since they would not 877 be aware of the assignment of border TRILL switch transitioning. 879 A possible solution, as long as any old TRILL switches exist within 880 an area, is to have the border TRILL switches elect a single DBRB 881 (Designated Border RBridge), and have all inter-area traffic go 882 through the DBRB (unicast as well as multi-destination). If that 883 DBRB goes down, a new one will be elected, but at any one time, all 884 inter-area traffic (unicast as well as multi-destination) would go 885 through that one DRBR. However this eliminates load splitting at 886 level transition. 888 6. Multi-Access Links with End Stations 890 Care must be taken, in the case where there are multiple TRILL 891 switches on a link with end stations, that only one TRILL switch 892 ingress/egress any given data packet from/to the end nodes. With 893 existing, single level TRILL, this is done by electing a single 894 Designated RBridge per link, which appoints a single Appointed 895 Forwarder per VLAN [RFC7177] [RFC6439]. But suppose there are two 896 (or more) TRILL switches on a link in different areas, say RB1 in 897 area 1000 and RB2 in area 2000, and that the link contains end nodes. 898 If RB1 and RB2 ignore each other's Hellos then they will both 899 ingress/egress end node traffic from the link. 901 A simple rule is to use the TRILL switch or switches having the 902 lowest numbered area, comparing area numbers as unsigned integers, to 903 handle native traffic. This would automatically give multilevel- 904 ignorant legacy TRILL switches, that would be using area number zero, 905 highest priority for handling end stations, which they would try to 906 do anyway. 908 Other methods are possible. For example doing the selection of 909 Appointed Forwarders and of the TRILL switch in charge of that 910 selection across all TRILL switches on the link regardless of area. 911 However, a special case would then have to be made in any case for 912 legacy TRILL switches using area number zero. 914 Any of these techniques require multilevel aware RBridges to take 915 actions based on Hellos from RBridges in other areas even though they 916 will not form an adjacency with such RBridges. 918 7. Summary 920 This draft discusses issues and possible approaches to multilevel 921 TRILL. The alternative using aggregated areas has significant 922 advantages in terms of scalability over using campus wide unique 923 nicknames, not just in avoiding nickname exhaustion, but by allowing 924 RPF Checks to be aggregated based on an entire area. However, the 925 alternative of using unique nicknames is simpler and avoids the 926 changes in border TRILL switches required to support aggregated 927 nicknames. It is possible to support both. For example, a TRILL 928 campus could use simpler unique nicknames until scaling begins to 929 cause problems and then start to introduce areas with aggregated 930 nicknames. 932 Some issues are not difficult, such as dealing with partitioned 933 areas. Other issues are more difficult, especially dealing with old 934 TRILL switches. 936 8. Security Considerations 938 This informational document explores alternatives for the use of 939 multilevel IS-IS in TRILL. It does not consider security issues. For 940 general TRILL Security Considerations, see [RFC6325]. 942 9. IANA Considerations 944 This document requires no IANA actions. RFC Editor: Please remove 945 this section before publication. 947 Normative References 949 [IS-IS] - ISO/IEC 10589:2002, Second Edition, "Intermediate System to 950 Intermediate System Intra-Domain Routing Exchange Protocol for 951 use in Conjunction with the Protocol for Providing the 952 Connectionless-mode Network Service (ISO 8473)", 2002. 954 [RFC6325] - Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 955 Ghanwani, "Routing Bridges (RBridges): Base Protocol 956 Specification", RFC 6325, July 2011. 958 [RFC6439] - Perlman, R., Eastlake, D., Li, Y., Banerjee, A., and F. 959 Hu, "Routing Bridges (RBridges): Appointed Forwarders", RFC 960 6439, November 2011. 962 [RFC7177] - Eastlake 3rd, D., Perlman, R., Ghanwani, A., Yang, H., 963 and V. Manral, "Transparent Interconnection of Lots of Links 964 (TRILL): Adjacency", RFC 7177, May 2014, . 967 [RFC7780] - Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A., 968 Ghanwani, A., and S. Gupta, "Transparent Interconnection of 969 Lots of Links (TRILL): Clarifications, Corrections, and 970 Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016, 971 . 973 Informative References 975 [RFC6361] - Carlson, J. and D. Eastlake 3rd, "PPP Transparent 976 Interconnection of Lots of Links (TRILL) Protocol Control 977 Protocol", RFC 6361, August 2011. 979 [RFC7172] - Eastlake 3rd, D., Zhang, M., Agarwal, P., Perlman, R., 980 and D. Dutt, "Transparent Interconnection of Lots of Links 981 (TRILL): Fine-Grained Labeling", RFC 7172, May 2014 983 [RFC7176] - Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, 984 D., and A. Banerjee, "Transparent Interconnection of Lots of 985 Links (TRILL) Use of IS-IS", RFC 7176, May 2014. 987 [RFC7357] - Zhai, H., Hu, F., Perlman, R., Eastlake 3rd, D., and O. 988 Stokes, "Transparent Interconnection of Lots of Links (TRILL): 989 End Station Address Distribution Information (ESADI) Protocol", 990 RFC 7357, September 2014, . 993 [RFC7783] - Senevirathne, T., Pathangi, J., and J. Hudson, 994 "Coordinated Multicast Trees (CMT) for Transparent 995 Interconnection of Lots of Links (TRILL)", RFC 7783, DOI 996 10.17487/RFC7783, February 2016, . 999 [DraftAggregated] - Bhargav Bhikkaji, Balaji Venkat Venkataswami, 1000 Narayana Perumal Swamy, "Connecting Disparate Data 1001 Center/PBB/Campus TRILL sites using BGP", draft-balaji-trill- 1002 over-ip-multi-level, Work In Progress. 1004 [DraftUnique] - Tissa Senevirathne, Les Ginsberg, Janardhanan 1005 Pathangi, Jon Hudson, Sam Aldrin, Ayan Banerjee, Sameer 1006 Merchant, "Default Nickname Based Approach for Multilevel 1007 TRILL", draft-tissa-trill-multilevel, Work In Progress. 1009 [NickFlags] - Eastlake, D., W. Hao, draft-eastlake-trill-nick-label- 1010 prop, Work In Progress. 1012 [SingleName] - Mingui Zhang, et. al, "Single Area Border RBridge 1013 Nickname for TRILL Multilevel", draft-zhang-trill-multilevel- 1014 single-nickname, Work in Progress. 1016 Acknowledgements 1018 The helpful comments of the following are hereby acknowledged: David 1019 Michael Bond, Dino Farinacci, and Gayle Noble. 1021 The document was prepared in raw nroff. All macros used were defined 1022 within the source file. 1024 Authors' Addresses 1026 Radia Perlman 1027 EMC 1028 2010 256th Avenue NE, #200 1029 Bellevue, WA 98007 USA 1031 EMail: radia@alum.mit.edu 1033 Donald Eastlake 1034 Huawei Technologies 1035 155 Beaver Street 1036 Milford, MA 01757 USA 1038 Phone: +1-508-333-2270 1039 Email: d3e3e3@gmail.com 1041 Mingui Zhang 1042 Huawei Technologies 1043 No.156 Beiqing Rd. Haidian District, 1044 Beijing 100095 P.R. China 1046 EMail: zhangmingui@huawei.com 1048 Anoop Ghanwani 1049 Dell 1050 5450 Great America Parkway 1051 Santa Clara, CA 95054 USA 1053 EMail: anoop@alumni.duke.edu 1055 Hongjun Zhai 1056 Jinling Institute of Technology 1057 99 Hongjing Avenue, Jiangning District 1058 Nanjing, Jiangsu 211169 China 1060 EMail: honjun.zhai@tom.com 1062 Copyright and IPR Provisions 1064 Copyright (c) 2016 IETF Trust and the persons identified as the 1065 document authors. All rights reserved. 1067 This document is subject to BCP 78 and the IETF Trust's Legal 1068 Provisions Relating to IETF Documents 1069 (http://trustee.ietf.org/license-info) in effect on the date of 1070 publication of this document. Please review these documents 1071 carefully, as they describe your rights and restrictions with respect 1072 to this document. Code Components extracted from this document must 1073 include Simplified BSD License text as described in Section 4.e of 1074 the Trust Legal Provisions and are provided without warranty as 1075 described in the Simplified BSD License. The definitive version of 1076 an IETF Document is that published by, or under the auspices of, the 1077 IETF. Versions of IETF Documents that are published by third parties, 1078 including those that are translated into other languages, should not 1079 be considered to be definitive versions of IETF Documents. The 1080 definitive version of these Legal Provisions is that published by, or 1081 under the auspices of, the IETF. Versions of these Legal Provisions 1082 that are published by third parties, including those that are 1083 translated into other languages, should not be considered to be 1084 definitive versions of these Legal Provisions. For the avoidance of 1085 doubt, each Contributor to the IETF Standards Process licenses each 1086 Contribution that he or she makes as part of the IETF Standards 1087 Process to the IETF Trust pursuant to the provisions of RFC 5378. No 1088 language to the contrary, or terms, conditions or rights that differ 1089 from or are inconsistent with the rights and licenses granted under 1090 RFC 5378, shall have any effect and shall be null and void, whether 1091 published or posted by such Contributor, or included with or in such 1092 Contribution.