idnits 2.17.1 draft-ietf-trill-rbridge-multilevel-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 4, 2016) is 2882 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 6439 (Obsoleted by RFC 8139) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group Radia Perlman 2 INTERNET-DRAFT EMC 3 Intended status: Informational Donald Eastlake 4 Mingui Zhang 5 Huawei 6 Anoop Ghanwani 7 Dell 8 Hongjun Zhai 9 JIT 10 Expires: December 3, 2016 June 4, 2016 12 Alternatives for Multilevel TRILL 13 (Transparent Interconnection of Lots of Links) 14 16 Abstract 18 Extending TRILL to multiple levels has challenges that are not 19 addressed by the already-existing capability of IS-IS to have 20 multiple levels. One issue is with the handling of multi-destination 21 packet distribution trees. Another issue is with TRILL switch 22 nicknames. There have been two proposed approaches. One approach, 23 which we refer to as the "unique nickname" approach, gives unique 24 nicknames to all the TRILL switches in the multilevel campus, either 25 by having the Level-1/Level-2 border TRILL switches advertise which 26 nicknames are not available for assignment in the area, or by 27 partitioning the 16-bit nickname into an "area" field and a "nickname 28 inside the area" field. The other approach, which we refer to as the 29 "aggregated nickname" approach, involves hiding the nicknames within 30 areas, allowing nicknames to be reused in different areas, by having 31 the border TRILL switches rewrite the nickname fields when entering 32 or leaving an area. Each of those approaches has advantages and 33 disadvantages. 35 This informational document suggests allowing a choice of approach in 36 each area. This allows the simplicity of the unique nickname approach 37 in installations in which there is no danger of running out of 38 nicknames and allows the complexity of hiding the nicknames in an 39 area to be phased into larger installations on a per-area basis. 41 Status of This Memo 43 This Internet-Draft is submitted to IETF in full conformance with the 44 provisions of BCP 78 and BCP 79. Distribution of this document is 45 unlimited. Comments should be sent to the TRILL working group 46 mailing list . 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF), its areas, and its working groups. Note that 50 other groups may also distribute working documents as Internet- 51 Drafts. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 The list of current Internet-Drafts can be accessed at 59 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 60 Shadow Directories can be accessed at 61 http://www.ietf.org/shadow.html. 63 Table of Contents 65 1. Introduction............................................4 66 1.1 TRILL Scalability Issues...............................4 67 1.2 Improvements Due to Multilevel.........................5 68 1.3 Unique and Aggregated Nicknames........................6 69 1.3 More on Areas..........................................7 70 1.4 Terminology and Acronyms...............................7 72 2. Multilevel TRILL Issues.................................9 73 2.1 Non-zero Area Addresses...............................10 74 2.2 Aggregated versus Unique Nicknames....................10 75 2.2.1 More Details on Unique Nicknames....................11 76 2.2.2 More Details on Aggregated Nicknames................12 77 2.2.2.1 Border Learning Aggregated Nicknames..............13 78 2.2.2.2 Swap Nickname Field Aggregated Nicknames..........15 79 2.2.2.3 Comparison........................................15 80 2.3 Building Multi-Area Trees.............................16 81 2.4 The RPF Check for Trees...............................16 82 2.5 Area Nickname Acquisition.............................17 83 2.6 Link State Representation of Areas....................17 85 3. Area Partition.........................................19 87 4. Multi-Destination Scope................................20 88 4.1 Unicast to Multi-destination Conversions..............20 89 4.1.1 New Tree Encoding...................................21 90 4.2 Selective Broadcast Domain Reduction..................21 92 5. Co-Existence with Old TRILL switches...................23 93 6. Multi-Access Links with End Stations...................24 94 7. Summary................................................25 96 8. Security Considerations................................26 97 9. IANA Considerations....................................26 99 Normative References......................................27 100 Informative References....................................27 102 Acknowledgements..........................................29 103 Authors' Addresses........................................30 105 1. Introduction 107 The IETF TRILL (Transparent Interconnection of Lot of Links or 108 Tunneled Routing in the Link Layer) protocol [RFC6325] [RFC7177] 109 [RFC7780] provides optimal pair-wise data routing without 110 configuration, safe forwarding even during periods of temporary 111 loops, and support for multipathing of both unicast and multicast 112 traffic in networks with arbitrary topology and link technology, 113 including multi-access links. TRILL accomplishes this by using IS-IS 114 (Intermediate System to Intermediate System [IS-IS] [RFC7176]) link 115 state routing in conjunction with a header that includes a hop count. 116 The design supports data labels (VLANs and Fine Grained Labels 117 [RFC7172]) and optimization of the distribution of multi-destination 118 data based on VLANs and multicast groups. Devices that implement 119 TRILL are called TRILL Switches or RBridges. 121 Familiarity with [IS-IS], [RFC6325], and [RFC7780] is assumed in this 122 document. 124 1.1 TRILL Scalability Issues 126 There are multiple issues that might limit the scalability of a 127 TRILL-based network as follows: 129 1. The routing computation load 130 2. The volatility of the link state database (LSDB) creating too much 131 control traffic 132 3. The volatility of the LSDB causing the TRILL network to be in an 133 unconverged state too much of the time 134 4. The size of the LSDB 135 5. The limit of the number of TRILL switches, due to the 16-bit 136 nickname space 137 6. The traffic due to upper layer protocols use of broadcast and 138 multicast 139 7. The size of the end node learning table (the table that remembers 140 (egress TRILL switch, label/MAC) pairs) 142 As discussed below, extending TRILL IS-IS to be multilevel 143 (hierarchical) helps with all but the last of these issues. 145 IS-IS was designed to be multilevel [IS-IS]. A network can be 146 partitioned into "areas". Routing within an area is known as "Level 147 1 routing". Routing between areas is known as "Level 2 routing". 148 The Level 2 IS-IS network consists of Level 2 routers and links 149 between the Level 2 routers. Level 2 routers may participate in one 150 or more Level 1 areas, in addition to their role as Level 2 routers. 152 Each area is connected to Level 2 through one or more "border 153 routers", which participate both as a router inside the area, and as 154 a router inside the Level 2 "area". Care must be taken that it is 155 clear, when transitioning multi-destination packets between Level 2 156 and a Level 1 area in either direction, that exactly one border TRILL 157 switch will transition a particular data packet between the levels or 158 else duplication or loss of traffic can occur. 160 1.2 Improvements Due to Multilevel 162 Partitioning the network into areas solves the first four scalability 163 issues described above as listed below. In this section, N indicates 164 the number of TRILL switches in a TRILL campus. 166 1. The routing computation load 168 The optimized computational effort to calculate least cost routes 169 at a TRILL switch in a single level campus is on the order of 170 N*log(N). In an optimized multi-level campus, it is on the order 171 of sqrt(N)*log(N). So, for example, assuming N is 3,000, the level 172 of computational effort would be reduced by about a factor of 50. 174 2. The volatility of the LSDB creating too much control traffic 176 The rate of LSDB changes would be approximately proportional to 177 the number of routers/links in the TRILL campus for a single level 178 campus. With an optimized multi-level campus, each area would have 179 about sqrt(N) routers reducing volatility by about a factor of 180 sqrt(N). 182 3. The volatility of the LSDB causing the TRILL network to be in an 183 unconverged state too much of the time 185 With the simplifying assumption that routing converges after each 186 change before the next change, the fraction of time that routing 187 is unconverged is proportional to the product of the volatility 188 and the convergence time and the convergence time is approximately 189 proportional to the computation involved at each router. Thus, 190 based on these simplifying assumptions, the fraction of time 191 routing at a router is not converged with the network would 192 improve, in going from single to multi-level, by about a factor of 193 N. 195 4. The size of the LSDB 197 The size of the LSDB is also approximately proportional to the 198 number of routers/links and so, as with item 2 above, should 199 improve by about a factor of sqrt(N) in going from single to 200 multi-level. 202 Problem #6 in Section 1.1, namely, the traffic due to upper layer 203 protocols use of broadcast and multicast, can be addressed by 204 introducing a locally-scoped multi-destination delivery, limited to 205 an area or a single link. See further discussion in Section 4.2. 207 Problem #5 in Section 1.1, namely, the limit of the number of TRILL 208 switches, due to the 16-bit nickname space, will only be addressed 209 with the aggregated nickname approach. Since the aggregated nickname 210 approach requires some complexity in the border TRILL switches (for 211 rewriting the nicknames in the TRILL header), the design in this 212 document allows a campus with a mixture of unique-nickname areas, and 213 aggregated-nickname areas. Nicknames must be unique across all Level 214 2 and unique-nickname area TRILL switches, whereas nicknames inside 215 an aggregated-nickname area are visible only inside the area. 216 Nicknames inside an aggregated-nickname area must not conflict with 217 nicknames visible in Level 2 (which includes all nicknames inside 218 unique nickname areas), but the nicknames inside an aggregated- 219 nickname area may be the same as nicknames used within other 220 aggregated-nickname areas. 222 TRILL switches within an area need not be aware of whether they are 223 in an aggregated nickname area or a unique nickname area. The border 224 TRILL switches in area A1 will claim, in their LSP inside area A1, 225 which nicknames (or nickname ranges) are not available for choosing 226 as nicknames by area A1 TRILL switches. 228 1.3 Unique and Aggregated Nicknames 230 We describe two alternatives for hierarchical or multilevel TRILL. 231 One we call the "unique nickname" alternative. The other we call the 232 "aggregated nickname" alternative. In the aggregated nickname 233 alternative, border TRILL switches replace either the ingress or 234 egress nickname field in the TRILL header of unicast packets with an 235 aggregated nickname representing an entire area. 237 The unique nickname alternative has the advantage that border TRILL 238 switches are simpler and do not need to do TRILL Header nickname 239 modification. It also simplifies testing and maintenance operations 240 that originate in one area and terminate in a different area. 242 The aggregated nickname alternative has the following advantages: 244 o it solves problem #5 above, the 16-bit nickname limit, in a 245 simple way, 246 o it lessens the amount of inter-area routing information that 247 must be passed in IS-IS, and 248 o it logically reduces the RPF (Reverse Path Forwarding) Check 249 information (since only the area nickname needs to appear, 250 rather than all the ingress TRILL switches in that area). 252 In both cases, it is possible and advantageous to compute multi- 253 destination data packet distribution trees such that the portion 254 computed within a given area is rooted within that area. 256 1.3 More on Areas 258 Each area is configured with an "area address", which is advertised 259 in IS-IS messages, so as to avoid accidentally interconnecting areas. 260 Although the area address had other purposes in CLNP (Connectionless 261 Network Layer Protocol, IS-IS was originally designed for 262 CLNP/DECnet), for TRILL the only purpose of the area address would be 263 to avoid accidentally interconnecting areas. 265 Currently, the TRILL specification says that the area address must be 266 zero. If we change the specification so that the area address value 267 of zero is just a default, then most of IS-IS multilevel machinery 268 works as originally designed. However, there are TRILL-specific 269 issues, which we address below in this document. 271 1.4 Terminology and Acronyms 273 This document generally uses the acronyms defined in [RFC6325] plus 274 the additional acronym DBRB. However, for ease of reference, most 275 acronyms used are listed here: 277 CLNP - ConnectionLess Network Protocol 279 DECnet - a proprietary routing protocol that was used by Digital 280 Equipment Corporation. "DECnet Phase 5" was the origin of IS-IS. 282 Data Label - VLAN or Fine Grained Label [RFC7172] 284 DBRB - Designated Border RBridge 286 ESADI - End Station Address Distribution Information 288 IS-IS - Intermediate System to Intermediate System [IS-IS] 290 LSDB - Link State Data Base 292 LSP - Link State PDU 294 PDU - Protocol Data Unit 295 RBridge - Routing Bridge, an alternative name for a TRILL switch 297 RPF - Reverse Path Forwarding 299 TLV - Type Length Value 301 TRILL - Transparent Interconnection of Lots of Links or Tunneled 302 Routing in the Link Layer [RFC6325] [RFC7780] 304 TRILL switch - a device that implements the TRILL protocol 305 [RFC6325] [RFC7780], sometimes called an RBridge 307 VLAN - Virtual Local Area Network 309 2. Multilevel TRILL Issues 311 The TRILL-specific issues introduced by multilevel include the 312 following: 314 a. Configuration of non-zero area addresses, encoding them in IS-IS 315 PDUs, and possibly interworking with old TRILL switches that do 316 not understand non-zero area addresses. 318 See Section 2.1. 320 b. Nickname management. 322 See Sections 2.5 and 2.2. 324 c. Advertisement of pruning information (Data Label reachability, IP 325 multicast addresses) across areas. 327 Distribution tree pruning information is only an optimization, 328 as long as multi-destination packets are not prematurely 329 pruned. For instance, border TRILL switches could advertise 330 they can reach all possible Data Labels, and have an IP 331 multicast router attached. This would cause all multi- 332 destination traffic to be transmitted to border TRILL switches, 333 and possibly pruned there, when the traffic could have been 334 pruned earlier based on Data Label or multicast group if border 335 TRILL switches advertised more detailed Data Label and/or 336 multicast listener and multicast router attachment information. 338 d. Computation of distribution trees across areas for multi- 339 destination data. 341 See Section 2.3. 343 e. Computation of RPF information for those distribution trees. 345 See Section 2.4. 347 f. Computation of pruning information across areas. 349 See Sections 2.3 and 2.6. 351 g. Compatibility, as much as practical, with existing, unmodified 352 TRILL switches. 354 The most important form of compatibility is with existing TRILL 355 fast path hardware. Changes that require upgrade to the slow 356 path firmware/software are more tolerable. Compatibility for 357 the relatively small number of border TRILL switches is less 358 important than compatibility for non-border TRILL switches. 360 See Section 5. 362 2.1 Non-zero Area Addresses 364 The current TRILL base protocol specification [RFC6325] [RFC7177] 365 [RFC7780] says that the area address in IS-IS must be zero. The 366 purpose of the area address is to ensure that different areas are not 367 accidentally merged. Furthermore, zero is an invalid area address 368 for layer 3 IS-IS, so it was chosen as an additional safety mechanism 369 to ensure that layer 3 IS-IS would not be confused with TRILL IS-IS. 370 However, TRILL uses other techniques to avoid such confusion, such as 371 different multicast addresses and Ethertypes on Ethernet [RFC6325], 372 different PPP (Point-to-Point Protocol) code points on PPP [RFC6361], 373 and the like. Thus, using an area address in TRILL that might be used 374 in layer 3 IS-IS is not a problem. 376 Since current TRILL switches will reject any IS-IS messages with non- 377 zero area addresses, the choices are as follows: 379 a.1 upgrade all TRILL switches that are to interoperate in a 380 potentially multilevel environment to understand non-zero area 381 addresses, 382 a.2 neighbors of old TRILL switches must remove the area address from 383 IS-IS messages when talking to an old TRILL switch (which might 384 break IS-IS security and/or cause inadvertent merging of areas), 385 a.3 ignore the problem of accidentally merging areas entirely, or 386 a.4 keep the fixed "area address" field as 0 in TRILL, and add a new, 387 optional TLV for "area name" to Hellos that, if present, could be 388 compared, by new TRILL switches, to prevent accidental area 389 merging. 391 In principal, different solutions could be used in different areas 392 but it would be much simpler to adopt one of these choices uniformly. 394 2.2 Aggregated versus Unique Nicknames 396 In the unique nickname alternative, all nicknames across the campus 397 must be unique. In the aggregated nickname alternative, TRILL switch 398 nicknames within an aggregated area are only of local significance, 399 and the only nickname externally (outside that area) visible is the 400 "area nickname" (or nicknames), which aggregates all the internal 401 nicknames. 403 The unique nickname approach simplifies border TRILL switches. 405 The aggregated nickname approach eliminates the potential problem of 406 nickname exhaustion, minimizes the amount of nickname information 407 that would need to be forwarded between areas, minimizes the size of 408 the forwarding table, and simplifies RPF calculation and RPF 409 information. 411 2.2.1 More Details on Unique Nicknames 413 With unique cross-area nicknames, it would be intractable to have a 414 flat nickname space with TRILL switches in different areas contending 415 for the same nicknames. Instead, each area would need to be 416 configured with a block of nicknames. Either some TRILL switches 417 would need to announce that all the nicknames other than that block 418 are taken (to prevent the TRILL switches inside the area from 419 choosing nicknames outside the area's nickname block), or a new TLV 420 would be needed to announce the allowable nicknames, and all TRILL 421 switches in the area would need to understand that new TLV. An 422 example of the second approach is given in [NickFlags]. 424 Currently the encoding of nickname information in TLVs is by listing 425 of individual nicknames; this would make it painful for a border 426 TRILL switch to announce into an area that it is holding all other 427 nicknames to limit the nicknames available within that area. The 428 information could be encoded as ranges of nicknames to make this 429 somewhat manageable [NickFlags]; however, a new TLV for announcing 430 nickname ranges would not be intelligible to old TRILL switches. 432 There is also an issue with the unique nicknames approach in building 433 distribution trees, as follows: 435 With unique nicknames in the TRILL campus and TRILL header 436 nicknames not rewritten by the border TRILL switches, there would 437 have to be globally known nicknames for the trees. Suppose there 438 are k trees. For all of the trees with nicknames located outside 439 an area, the local trees would be rooted at a border TRILL switch 440 or switches. Therefore, there would be either no splitting of 441 multi-destination traffic within the area or restricted splitting 442 of multi-destination traffic between trees rooted at a highly 443 restricted set of TRILL switches. 445 As an alternative, just the "egress nickname" field of multi- 446 destination TRILL Data packets could be mapped at the border, 447 leaving known unicast packets un-mapped. However, this surrenders 448 much of the unique nickname advantage of simpler border TRILL 449 switches. 451 Scaling to a very large campus with unique nicknames might exhaust 452 the 16-bit TRILL nicknames space particularly if (1) additional 453 nicknames are consumed to support active-active end station groups at 454 the TRILL edge using the techniques standardized in [RFC7781] and (2) 455 use of the nickname space is less efficient due to the allocation of, 456 for example, power-of-two size blocks of nicknames to areas in the 457 same way that use of the IP address space is made less efficient by 458 hierarchical allocation (see [RFC3194]). One method to avoid nickname 459 exhaustion might be to expand nicknames to 24 bits; however, that 460 technique would require TRILL message format changes and that all 461 TRILL switches in the campus understand larger nicknames. 463 2.2.2 More Details on Aggregated Nicknames 465 The aggregated nickname approach enables passing far less nickname 466 information. It works as follows, assuming both the source and 467 destination areas are using aggregated nicknames: 469 There are two ways areas could be identified. 471 One method would be to assign each area a 16-bit nickname. This 472 would not be the nickname of any actual TRILL switch. Instead, it 473 would be the nickname of the area itself. Border TRILL switches 474 would know the area nickname for their own area(s). For an 475 example of a more specific multilevel proposal using unique 476 nicknames, see [DraftUnique]. 478 Alternatively, areas could be identified by the set of nicknames 479 that identify the border routers for that area. (See [SingleName] 480 for a multilevel proposal using such a set of nicknames.) 482 The TRILL Header nickname fields in TRILL Data packets being 483 transported through a multilevel TRILL campus with aggregated 484 nicknames are as follows: 486 - When both the ingress and egress TRILL switches are in the same 487 area, there need be no change from the existing base TRILL 488 protocol standard in the TRILL Header nickname fields. 490 - When being transported in Level 2, the ingress nickname is a 491 nickname of the ingress TRILL switch's area while the egress 492 nickname is either a nickname of the egress TRILL switch's area 493 or a tree nickname. 495 - When being transported from Level 1 to Level 2, the ingress 496 nickname is the nickname of the ingress TRILL switch itself 497 while the egress nickname is either a nickname for the area of 498 the egress TRILL switch or a tree nickname. 500 - When being transported from Level 2 to Level 1, the ingress 501 nickname is a nickname for the ingress TRILL switch's area while 502 the egress nickname is either the nickname of the egress TRILL 503 switch itself or a tree nickname. 505 There are two variations of the aggregated nickname approach. The 506 first is the Border Learning approach, which is described in Section 507 2.2.2.1. The second is the Swap Nickname Field approach, which is 508 described in Section 2.2.2.2. Section 2.2.2.3 compares the advantages 509 and disadvantages of these two variations of the aggregated nickname 510 approach. 512 2.2.2.1 Border Learning Aggregated Nicknames 514 This section provides an illustrative example and description of the 515 border learning variation of aggregated nicknames where a single 516 nickname is used to identify an area. 518 In the following picture, RB2 and RB3 are area border TRILL switches 519 (RBridges). A source S is attached to RB1. The two areas have 520 nicknames 15961 and 15918, respectively. RB1 has a nickname, say 27, 521 and RB4 has a nickname, say 44 (and in fact, they could even have the 522 same nickname, since the TRILL switch nickname will not be visible 523 outside these aggregated areas). 525 Area 15961 level 2 Area 15918 526 +-------------------+ +-----------------+ +--------------+ 527 | | | | | | 528 | S--RB1---Rx--Rz----RB2---Rb---Rc--Rd---Re--RB3---Rk--RB4---D | 529 | 27 | | | | 44 | 530 | | | | | | 531 +-------------------+ +-----------------+ +--------------+ 533 Let's say that S transmits a frame to destination D, which is 534 connected to RB4, and let's say that D's location has already been 535 learned by the relevant TRILL switches. These relevant switches have 536 learned the following: 538 1) RB1 has learned that D is connected to nickname 15918 539 2) RB3 has learned that D is attached to nickname 44. 541 The following sequence of events will occur: 543 - S transmits an Ethernet frame with source MAC = S and destination 544 MAC = D. 546 - RB1 encapsulates with a TRILL header with ingress RBridge = 27, 547 and egress = 15918 producing a TRILL Data packet. 549 - RB2 has announced in the Level 1 IS-IS instance in area 15961, 550 that it is attached to all the area nicknames, including 15918. 551 Therefore, IS-IS routes the packet to RB2. Alternatively, if a 552 distinguished range of nicknames is used for Level 2, Level 1 553 TRILL switches seeing such an egress nickname will know to route 554 to the nearest border router, which can be indicated by the IS-IS 555 attached bit. 557 - RB2, when transitioning the packet from Level 1 to Level 2, 558 replaces the ingress TRILL switch nickname with the area nickname, 559 so replaces 27 with 15961. Within Level 2, the ingress RBridge 560 field in the TRILL header will therefore be 15961, and the egress 561 RBridge field will be 15918. Also RB2 learns that S is attached to 562 nickname 27 in area 15961 to accommodate return traffic. 564 - The packet is forwarded through Level 2, to RB3, which has 565 advertised, in Level 2, reachability to the nickname 15918. 567 - RB3, when forwarding into area 15918, replaces the egress nickname 568 in the TRILL header with RB4's nickname (44). So, within the 569 destination area, the ingress nickname will be 15961 and the 570 egress nickname will be 44. 572 - RB4, when decapsulating, learns that S is attached to nickname 573 15961, which is the area nickname of the ingress. 575 Now suppose that D's location has not been learned by RB1 and/or RB3. 576 What will happen, as it would in TRILL today, is that RB1 will 577 forward the packet as multi-destination, choosing a tree. As the 578 multi-destination packet transitions into Level 2, RB2 replaces the 579 ingress nickname with the area nickname. If RB1 does not know the 580 location of D, the packet must be flooded, subject to possible 581 pruning, in Level 2 and, subject to possible pruning, from Level 2 582 into every Level 1 area that it reaches on the Level 2 distribution 583 tree. 585 Now suppose that RB1 has learned the location of D (attached to 586 nickname 15918), but RB3 does not know where D is. In that case, RB3 587 must turn the packet into a multi-destination packet within area 588 15918. In this case, care must be taken so that in the case in which 589 RB3 is not the Designated transitioner between Level 2 and its area 590 for that multi-destination packet, but was on the unicast path, that 591 border TRILL switch in that area does not forward the now multi- 592 destination packet back into Level 2. Therefore, it would be 593 desirable to have a marking, somehow, that indicates the scope of 594 this packet's distribution to be "only this area" (see also Section 595 4). 597 In cases where there are multiple transitioners for unicast packets, 598 the border learning mode of operation requires that the address 599 learning between them be shared by some protocol such as running 600 ESADI [RFC7357] for all Data Labels of interest to avoid excessive 601 unknown unicast flooding. 603 The potential issue described at the end of Section 2.2.1 with trees 604 in the unique nickname alternative is eliminated with aggregated 605 nicknames. With aggregated nicknames, each border TRILL switch that 606 will transition multi-destination packets can have a mapping between 607 Level 2 tree nicknames and Level 1 tree nicknames. There need not 608 even be agreement about the total number of trees; just that the 609 border TRILL switch have some mapping, and replace the egress TRILL 610 switch nickname (the tree name) when transitioning levels. 612 2.2.2.2 Swap Nickname Field Aggregated Nicknames 614 As a variant, two additional fields could exist in TRILL Data packets 615 we call the "ingress swap nickname field" and the "egress swap 616 nickname field". The changes in the example above would be as 617 follows: 619 - RB1 will have learned the area nickname of D and the TRILL switch 620 nickname of RB4 to which D is attached. In encapsulating a frame 621 to D, it puts an area nickname of D (15918) in the egress nickname 622 field of the TRILL Header and puts a nickname of RB3 (44) in a 623 egress swap nickname field. 625 - RB2 moves the ingress nickname to the ingress swap nickname field 626 and inserts 15961, an area nickname for S, into the ingress 627 nickname field. 629 - RB3 swaps the egress nickname and the egress swap nickname fields, 630 which sets the egress nickname to 44. 632 - RB4 learns the correspondence between the source MAC/VLAN of S and 633 the { ingress nickname, ingress swap nickname field } pair as it 634 decapsulates and egresses the frame. 636 See [DraftAggregated] for a multilevel proposal using aggregated swap 637 nicknames with a single nickname representing an area. 639 2.2.2.3 Comparison 641 The Border Learning variant described in Section 2.2.2.1 above 642 minimizes the change in non-border TRILL switches but imposes the 643 burden on border TRILL switches of learning and doing lookups in all 644 the end station MAC addresses within their area(s) that are used for 645 communication outside the area. This burden could be reduced by 646 decreasing the area size and increasing the number of areas. 648 The Swap Nickname Field variant described in Section 2.2.2.2 649 eliminates the extra address learning burden on border TRILL switches 650 but requires more extensive changes to non-border TRILL switches. In 651 particular they must learn to associate both a TRILL switch nickname 652 and an area nickname with end station MAC/label pairs (except for 653 addresses that are local to their area). 655 The Swap Nickname Field alternative is more scalable but less 656 backward compatible for non-border TRILL switches. It would be 657 possible for border and other level 2 TRILL switches to support both 658 Border Learning, for support of legacy Level 1 TRILL switches, and 659 Swap Nickname, to support Level 1 TRILL switches that understood the 660 Swap Nickname method. 662 2.3 Building Multi-Area Trees 664 It is easy to build a multi-area tree by building a tree in each area 665 separately, (including the Level 2 "area"), and then having only a 666 single border TRILL switch, say RBx, in each area, attach to the 667 Level 2 area. RBx would forward all multi-destination packets 668 between that area and Level 2. 670 People might find this unacceptable, however, because of the desire 671 to path split (not always sending all multi-destination traffic 672 through the same border TRILL switch). 674 This is the same issue as with multiple ingress TRILL switches 675 injecting traffic from a pseudonode, and can be solved with the 676 mechanism that was adopted for that purpose: the affinity TLV 677 [RFC7783]. For each tree in the area, at most one border RB 678 announces itself in an affinity TLV with that tree name. 680 2.4 The RPF Check for Trees 682 For multi-destination data originating locally in RBx's area, 683 computation of the RPF check is done as today. For multi-destination 684 packets originating outside RBx's area, computation of the RPF check 685 must be done based on which one of the border TRILL switches (say 686 RB1, RB2, or RB3) injected the packet into the area. 688 A TRILL switch, say RB4, located inside an area, must be able to know 689 which of RB1, RB2, or RB3 transitioned the packet into the area from 690 Level 2 (or into Level 2 from an area). 692 This could be done based on having the DBRB announce the transitioner 693 assignments to all the TRILL switches in the area, or the Affinity 694 TLV mechanism given in [RFC7783], or the New Tree Encoding mechanism 695 discussed in Section 4.1.1. 697 2.5 Area Nickname Acquisition 699 In the aggregated nickname alternative, each area must acquire a 700 unique area nickname or can be identified by the set of border TRILL 701 switches. It is probably simpler to allocate a block of nicknames 702 (say, the top 4000) to either (1) represent areas and not specific 703 TRILL switches or (2) used by border TRILL switches if the set of 704 such border TRILL switches represent the area. 706 The nicknames used for area identification need to be advertised and 707 acquired through Level 2. 709 Within an area, all the border TRILL switches can discover each other 710 through the Level 1 link state database, by using the IS-IS attach 711 bit or by explicitly advertising in their LSP "I am a border 712 RBridge". 714 Of the border TRILL switches, one will have highest priority (say 715 RB7). RB7 can dynamically participate, in Level 2, to acquire a 716 nickname for identifying the area. Alternatively, RB7 could give the 717 area a pseudonode IS-IS ID, such as RB7.5, within Level 2. So an 718 area would appear, in Level 2, as a pseudonode and the pseudonode 719 could participate, in Level 2, to acquire a nickname for the area. 721 Within Level 2, all the border TRILL switches for an area can 722 advertise reachability to the area, which would mean connectivity to 723 a nickname identifying the area. 725 2.6 Link State Representation of Areas 727 Within an area, say area A1, there is an election for the DBRB, 728 (Designated Border RBridge), say RB1. This can be done through LSPs 729 within area A1. The border TRILL switches announce themselves, 730 together with their DBRB priority. (Note that the election of the 731 DBRB cannot be done based on Hello messages, because the border TRILL 732 switches are not necessarily physical neighbors of each other. They 733 can, however, reach each other through connectivity within the area, 734 which is why it will work to find each other through Level 1 LSPs.) 736 RB1 can acquire an area nickname (in the aggregated nickname 737 approach) and may give the area a pseudonode IS-IS ID (just like the 738 DRB would give a pseudonode IS-IS ID to a link) depending on how the 739 area nickname is handled. RB1 advertises, in area A1, an area 740 nickname that RB1 has acquired (and what the pseudonode IS-IS ID for 741 the area is if needed). 743 Level 1 LSPs (possibly pseudonode) initiated by RB1 for the area 744 include any information external to area A1 that should be input into 745 area A1 (such as nicknames of external areas, or perhaps (in the 746 unique nickname variant) all the nicknames of external TRILL switches 747 in the TRILL campus and pruning information such as multicast 748 listeners and labels). All the other border TRILL switches for the 749 area announce (in their LSP) attachment to that area. 751 Within Level 2, RB1 generates a Level 2 LSP on behalf of the area. 752 The same pseudonode ID could be used within Level 1 and Level 2, for 753 the area. (There does not seem any reason why it would be useful for 754 it to be different, but there's also no reason why it would need to 755 be the same). Likewise, all the area A1 border TRILL switches would 756 announce, in their Level 2 LSPs, connection to the area. 758 3. Area Partition 760 It is possible for an area to become partitioned, so that there is 761 still a path from one section of the area to the other, but that path 762 is via the Level 2 area. 764 With multilevel TRILL, an area will naturally break into two areas in 765 this case. 767 Area addresses might be configured to ensure two areas are not 768 inadvertently connected. Area addresses appear in Hellos and LSPs 769 within the area. If two chunks, connected only via Level 2, were 770 configured with the same area address, this would not cause any 771 problems. (They would just operate as separate Level 1 areas.) 773 A more serious problem occurs if the Level 2 area is partitioned in 774 such a way that it could be healed by using a path through a Level 1 775 area. TRILL will not attempt to solve this problem. Within the Level 776 1 area, a single border RBridge will be the DBRB, and will be in 777 charge of deciding which (single) RBridge will transition any 778 particular multi-destination packets between that area and Level 2. 779 If the Level 2 area is partitioned, this will result in multi- 780 destination data only reaching the portion of the TRILL campus 781 reachable through the partition attached to the TRILL switch that 782 transitions that packet. It will not cause a loop. 784 4. Multi-Destination Scope 786 There are at least two reasons it would be desirable to be able to 787 mark a multi-destination packet with a scope that indicates the 788 packet should not exit the area, as follows: 790 1. To address an issue in the border learning variant of the 791 aggregated nickname alternative, when a unicast packet turns into 792 a multi-destination packet when transitioning from Level 2 to 793 Level 1, as discussed in Section 4.1. 795 2. To constrain the broadcast domain for certain discovery, 796 directory, or service protocols as discussed in Section 4.2. 798 Multi-destination packet distribution scope restriction could be done 799 in a number of ways. For example, there could be a flag in the packet 800 that means "for this area only". However, the technique that might 801 require the least change to TRILL switch fast path logic would be to 802 indicate this in the egress nickname that designates the distribution 803 tree being used. There could be two general tree nicknames for each 804 tree, one being for distribution restricted to the area and the other 805 being for multi-area trees. Or there would be a set of N (perhaps 16) 806 special currently reserved nicknames used to specify the N highest 807 priority trees but with the variation that if the special nickname is 808 used for the tree, the packet is not transitioned between areas. Or 809 one or more special trees could be built that were restricted to the 810 local area. 812 4.1 Unicast to Multi-destination Conversions 814 In the border learning variant of the aggregated nickname 815 alternative, the following situation may occur: 816 - a unicast packet might be known at the Level 1 to Level 2 817 transition and be forwarded as a unicast packet to the least cost 818 border TRILL switch advertising connectivity to the destination 819 area, but 820 - upon arriving at the border TRILL switch, it turns out to have an 821 unknown destination { MAC, Data Label } pair. 823 In this case, the packet must be converted into a multi-destination 824 packet and flooded in the destination area. However, if the border 825 TRILL switch doing the conversion is not the border TRILL switch 826 designated to transition the resulting multi-destination packet, 827 there is the danger that the designated transitioner may pick up the 828 packet and flood it back into Level 2 from which it may be flooded 829 into multiple areas. This danger can be avoided by restricting any 830 multi-destination packet that results from such a conversion to the 831 destination area through a flag in the packet or though distributing 832 it on a tree that is restricted to the area, or other techniques (see 833 Section 4). 835 Alternatively, a multi-destination packet intended only for the area 836 could be tunneled (within the area) to the RBridge RBx, that is the 837 appointed transitioner for that form of packet (say, based on VLAN or 838 FGL), with instructions that RBx only transmit the packet within the 839 area, and RBx could initiate the multi-destination packet within the 840 area. Since RBx introduced the packet, and is the only one allowed 841 to transition that packet to Level 2, this would accomplish scoping 842 of the packet to within the area. Since this case only occurs in the 843 unusual case when unicast packets need to be turned into multi- 844 destination as described above, the suboptimality of tunneling 845 between the border TRILL switch that receives the unicast packet and 846 the appointed level transitioner for that packet, would not be an 847 issue. 849 4.1.1 New Tree Encoding 851 The current encoding, in a TRILL header, of a tree, is of the 852 nickname of the tree root. This requires all 16 bits of the egress 853 nickname field. TRILL could instead, for example, use the bottom 6 854 bits to encode the tree number (allowing 64 trees), leaving 10 bits 855 to encode information such as: 857 o scope: a flag indicating whether it should be single area only, or 858 entire campus 859 o border injector: an indicator of which of the k border TRILL 860 switches injected this packet 862 If TRILL were to adopt this new encoding, any of the TRILL switches 863 in an edge group could inject a multi-destination packet. This would 864 require all TRILL switches to be changed to understand the new 865 encoding for a tree, and it would require a TLV in the LSP to 866 indicate which number each of the TRILL switches in an edge group 867 would be. 869 4.2 Selective Broadcast Domain Reduction 871 There are a number of service, discovery, and directory protocols 872 that, for convenience, are accessed via multicast or broadcast 873 frames. Examples are DHCP, (Dynamic Host Configuration Protocol) the 874 NetBIOS Service Location Protocol, and multicast DNS (Domain Name 875 Service). 877 Some such protocols provide means to restrict distribution to an IP 878 subnet or equivalent to reduce size of the broadcast domain they are 879 using and then provide a proxy that can be placed in that subnet to 880 use unicast to access a service elsewhere. In cases where a proxy 881 mechanism is not currently defined, it may be possible to create one 882 that references a central server or cache. With multilevel TRILL, it 883 is possible to construct very large IP subnets that could become 884 saturated with multi-destination traffic of this type unless packets 885 can be further restricted in their distribution. Such restricted 886 distribution can be accomplished for some protocols, say protocol P, 887 in a variety of ways including the following: 889 - Either (1) at all ingress TRILL switches in an area place all 890 protocol P multi-destination packets on a distribution tree in 891 such a way that the packets are restricted to the area or (2) at 892 all border TRILL switches between that area and Level 2, detect 893 protocol P multi-destination packets and do not transition them. 895 - Then place one, or a few for redundancy, protocol P proxies inside 896 each area where protocol P may be in use. These proxies unicast 897 protocol P requests or other messages to the actual campus 898 server(s) for P. They also receive unicast responses or other 899 messages from those servers and deliver them within the area via 900 unicast, multicast, or broadcast as appropriate. (Such proxies 901 would not be needed if it was acceptable for all protocol P 902 traffic to be restricted to an area.) 904 While it might seem logical to connect the campus servers to TRILL 905 switches in Level 2, they could be placed within one or more areas so 906 that, in some cases, those areas might not require a local proxy 907 server. 909 5. Co-Existence with Old TRILL switches 911 TRILL switches that are not multilevel aware may have a problem with 912 calculating RPF Check and filtering information, since they would not 913 be aware of the assignment of border TRILL switch transitioning. 915 A possible solution, as long as any old TRILL switches exist within 916 an area, is to have the border TRILL switches elect a single DBRB 917 (Designated Border RBridge), and have all inter-area traffic go 918 through the DBRB (unicast as well as multi-destination). If that 919 DBRB goes down, a new one will be elected, but at any one time, all 920 inter-area traffic (unicast as well as multi-destination) would go 921 through that one DRBR. However this eliminates load splitting at 922 level transition. 924 6. Multi-Access Links with End Stations 926 Care must be taken, in the case where there are multiple TRILL 927 switches on a link with end stations, that only one TRILL switch 928 ingress/egress any given data packet from/to the end nodes. With 929 existing, single level TRILL, this is done by electing a single 930 Designated RBridge per link, which appoints a single Appointed 931 Forwarder per VLAN [RFC7177] [RFC6439]. But suppose there are two 932 (or more) TRILL switches on a link in different areas, say RB1 in 933 area A1 and RB2 in area A2, and that the link contains end nodes. If 934 RB1 and RB2 ignore each other's Hellos then they will both 935 ingress/egress end node traffic from the link. 937 A simple rule is to use the TRILL switch or switches having the 938 lowest numbered area, comparing area numbers as unsigned integers, to 939 handle native traffic. This would automatically give multilevel- 940 ignorant legacy TRILL switches, that would be using area number zero, 941 highest priority for handling end stations, which they would try to 942 do anyway. 944 Other methods are possible. For example doing the selection of 945 Appointed Forwarders and of the TRILL switch in charge of that 946 selection across all TRILL switches on the link regardless of area. 947 However, a special case would then have to be made for legacy TRILL 948 switches using area number zero. 950 Any of these techniques require multilevel aware RBridges to take 951 actions based on Hellos from RBridges in other areas even though they 952 will not form an adjacency with such RBridges. 954 7. Summary 956 This draft discusses issues and possible approaches to multilevel 957 TRILL. The alternative using aggregated areas has significant 958 advantages in terms of scalability over using campus wide unique 959 nicknames, not just in avoiding nickname exhaustion, but by allowing 960 RPF Checks to be aggregated based on an entire area. However, the 961 alternative of using unique nicknames is simpler and avoids the 962 changes in border TRILL switches required to support aggregated 963 nicknames. It is possible to support both. For example, a TRILL 964 campus could use simpler unique nicknames until scaling begins to 965 cause problems and then start to introduce areas with aggregated 966 nicknames. 968 Some issues are not difficult, such as dealing with partitioned 969 areas. Other issues are more difficult, especially dealing with old 970 TRILL switches. 972 8. Security Considerations 974 This informational document explores alternatives for the use of 975 multilevel IS-IS in TRILL and generally does not consider security 976 issues. 978 If aggregated nicknames are used in two areas that have the same area 979 address and those areas merge, there is a possibility of a transient 980 nickname collision that would not occur with unique nicknames. Such a 981 collision could cause a data packet to be delivered to the wrong 982 egress TRILL switch but it would still not be delivered to any end 983 station in the wrong Data Label; thus such delivery would still 984 conform to security policies. 986 For general TRILL Security Considerations, see [RFC6325]. 988 9. IANA Considerations 990 This document requires no IANA actions. RFC Editor: Please remove 991 this section before publication. 993 Normative References 995 [IS-IS] - ISO/IEC 10589:2002, Second Edition, "Intermediate System to 996 Intermediate System Intra-Domain Routing Exchange Protocol for 997 use in Conjunction with the Protocol for Providing the 998 Connectionless-mode Network Service (ISO 8473)", 2002. 1000 [RFC6325] - Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 1001 Ghanwani, "Routing Bridges (RBridges): Base Protocol 1002 Specification", RFC 6325, July 2011. 1004 [RFC6439] - Perlman, R., Eastlake, D., Li, Y., Banerjee, A., and F. 1005 Hu, "Routing Bridges (RBridges): Appointed Forwarders", RFC 1006 6439, November 2011. 1008 [RFC7177] - Eastlake 3rd, D., Perlman, R., Ghanwani, A., Yang, H., 1009 and V. Manral, "Transparent Interconnection of Lots of Links 1010 (TRILL): Adjacency", RFC 7177, May 2014, . 1013 [RFC7780] - Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A., 1014 Ghanwani, A., and S. Gupta, "Transparent Interconnection of 1015 Lots of Links (TRILL): Clarifications, Corrections, and 1016 Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016, 1017 . 1019 Informative References 1021 [RFC3194] - Durand, A. and C. Huitema, "The H-Density Ratio for 1022 Address Assignment Efficiency An Update on the H ratio", RFC 1023 3194, DOI 10.17487/RFC3194, November 2001, . 1026 [RFC6361] - Carlson, J. and D. Eastlake 3rd, "PPP Transparent 1027 Interconnection of Lots of Links (TRILL) Protocol Control 1028 Protocol", RFC 6361, August 2011. 1030 [RFC7172] - Eastlake 3rd, D., Zhang, M., Agarwal, P., Perlman, R., 1031 and D. Dutt, "Transparent Interconnection of Lots of Links 1032 (TRILL): Fine-Grained Labeling", RFC 7172, May 2014 1034 [RFC7176] - Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, 1035 D., and A. Banerjee, "Transparent Interconnection of Lots of 1036 Links (TRILL) Use of IS-IS", RFC 7176, May 2014. 1038 [RFC7357] - Zhai, H., Hu, F., Perlman, R., Eastlake 3rd, D., and O. 1039 Stokes, "Transparent Interconnection of Lots of Links (TRILL): 1041 End Station Address Distribution Information (ESADI) Protocol", 1042 RFC 7357, September 2014, . 1045 [RFC7781] - Zhai, H., Senevirathne, T., Perlman, R., Zhang, M., and 1046 Y. Li, "Transparent Interconnection of Lots of Links (TRILL): 1047 Pseudo-Nickname for Active-Active Access", RFC 7781, DOI 1048 10.17487/RFC7781, February 2016, . 1051 [RFC7783] - Senevirathne, T., Pathangi, J., and J. Hudson, 1052 "Coordinated Multicast Trees (CMT) for Transparent 1053 Interconnection of Lots of Links (TRILL)", RFC 7783, DOI 1054 10.17487/RFC7783, February 2016, . 1057 [DraftAggregated] - Bhargav Bhikkaji, Balaji Venkat Venkataswami, 1058 Narayana Perumal Swamy, "Connecting Disparate Data 1059 Center/PBB/Campus TRILL sites using BGP", draft-balaji-trill- 1060 over-ip-multi-level, Work In Progress. 1062 [DraftUnique] - Tissa Senevirathne, Les Ginsberg, Janardhanan 1063 Pathangi, Jon Hudson, Sam Aldrin, Ayan Banerjee, Sameer 1064 Merchant, "Default Nickname Based Approach for Multilevel 1065 TRILL", draft-tissa-trill-multilevel, Work In Progress. 1067 [NickFlags] - Eastlake, D., W. Hao, draft-eastlake-trill-nick-label- 1068 prop, Work In Progress. 1070 [SingleName] - Mingui Zhang, et. al, "Single Area Border RBridge 1071 Nickname for TRILL Multilevel", draft-zhang-trill-multilevel- 1072 single-nickname, Work in Progress. 1074 Acknowledgements 1076 The helpful comments and contributions of the following are hereby 1077 acknowledged: 1079 David Michael Bond, Dino Farinacci, Gayle Noble, Alexander 1080 Vainshtein, and Stig Venaas. 1082 The document was prepared in raw nroff. All macros used were defined 1083 within the source file. 1085 Authors' Addresses 1087 Radia Perlman 1088 EMC 1089 2010 256th Avenue NE, #200 1090 Bellevue, WA 98007 USA 1092 EMail: radia@alum.mit.edu 1094 Donald Eastlake 1095 Huawei Technologies 1096 155 Beaver Street 1097 Milford, MA 01757 USA 1099 Phone: +1-508-333-2270 1100 Email: d3e3e3@gmail.com 1102 Mingui Zhang 1103 Huawei Technologies 1104 No.156 Beiqing Rd. Haidian District, 1105 Beijing 100095 P.R. China 1107 EMail: zhangmingui@huawei.com 1109 Anoop Ghanwani 1110 Dell 1111 5450 Great America Parkway 1112 Santa Clara, CA 95054 USA 1114 EMail: anoop@alumni.duke.edu 1116 Hongjun Zhai 1117 Jinling Institute of Technology 1118 99 Hongjing Avenue, Jiangning District 1119 Nanjing, Jiangsu 211169 China 1121 EMail: honjun.zhai@tom.com 1123 Copyright and IPR Provisions 1125 Copyright (c) 2016 IETF Trust and the persons identified as the 1126 document authors. All rights reserved. 1128 This document is subject to BCP 78 and the IETF Trust's Legal 1129 Provisions Relating to IETF Documents 1130 (http://trustee.ietf.org/license-info) in effect on the date of 1131 publication of this document. Please review these documents 1132 carefully, as they describe your rights and restrictions with respect 1133 to this document. Code Components extracted from this document must 1134 include Simplified BSD License text as described in Section 4.e of 1135 the Trust Legal Provisions and are provided without warranty as 1136 described in the Simplified BSD License. The definitive version of 1137 an IETF Document is that published by, or under the auspices of, the 1138 IETF. Versions of IETF Documents that are published by third parties, 1139 including those that are translated into other languages, should not 1140 be considered to be definitive versions of IETF Documents. The 1141 definitive version of these Legal Provisions is that published by, or 1142 under the auspices of, the IETF. Versions of these Legal Provisions 1143 that are published by third parties, including those that are 1144 translated into other languages, should not be considered to be 1145 definitive versions of these Legal Provisions. For the avoidance of 1146 doubt, each Contributor to the IETF Standards Process licenses each 1147 Contribution that he or she makes as part of the IETF Standards 1148 Process to the IETF Trust pursuant to the provisions of RFC 5378. No 1149 language to the contrary, or terms, conditions or rights that differ 1150 from or are inconsistent with the rights and licenses granted under 1151 RFC 5378, shall have any effect and shall be null and void, whether 1152 published or posted by such Contributor, or included with or in such 1153 Contribution.