idnits 2.17.1 draft-perlman-trill-rbridge-multilevel-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 31, 2011) is 4559 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 6326 (Obsoleted by RFC 7176) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group Radia Perlman 2 INTERNET-DRAFT Intel Labs 3 Intended status: Informational Donald Eastlake 4 Huawei 5 Anoop Ghanwani 6 Brocade 7 Hongjun Zhai 8 ZTE 9 Expires: April 30, 2012 October 31, 2011 11 RBridges: Multilevel TRILL 12 14 Abstract 16 This is an informational document describing issues and comparing 17 advantages and disadvantages of various possible approaches to 18 extending TRILL to use multiple levels of IS-IS. 20 Status of This Memo 22 This Internet-Draft is submitted to IETF in full conformance with the 23 provisions of BCP 78 and BCP 79. Distribution of this document is 24 unlimited. Comments should be sent to the TRILL working group 25 mailing list . 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as Internet- 30 Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/1id-abstracts.html 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html 43 Acknowledgements 45 The helpful comments of the following are hereby acknowledged: David 46 Michael Bond and Dino Farinacci. 48 Table of Contents 50 1. Introduction............................................3 51 1.1 TRILL Scalability Issues...............................3 52 1.2 Improvements Due to Multilevel.........................4 53 1.3 More on Areas..........................................4 54 1.4 Terminology and Acronyms...............................5 56 2. Multilevel TRILL Issues.................................6 57 2.1 Non-zero Area Addresses................................7 58 2.2 Aggregated versus Unique Nicknames.....................7 59 2.2.1 More Details on Unique Nicknames.....................8 60 2.2.2 More Details on Aggregated Nicknames.................8 61 2.2.2.1 Border Learning Aggregated Nicknames...............9 62 2.2.2.2 Swap Nickname Field Aggregated Nicknames..........11 63 2.2.2.3 Comparison........................................12 64 2.3 Building Multi-Area Trees.............................12 65 2.4 The RPF Check for Trees...............................13 66 2.5 Area Nickname Acquisition.............................14 67 2.6 Link State Representation of Areas....................14 69 3. Area Partition.........................................16 71 4. Multi-Destination Scope................................17 72 4.1 Unicast to Multi-destination Conversions..............17 73 4.2 Selective Broadcast Domain Reduction..................18 75 5. Co-Existence with Old RBridges.........................19 76 6. Summary................................................20 78 7. Security Considerations................................20 79 8. IANA Considerations....................................20 80 9. References.............................................21 81 9.1 Normative References..................................21 82 9.2 Informative References................................21 84 1. Introduction 86 The IETF TRILL protocol [RFC6325] provides optimal pair-wise data 87 frame forwarding without configuration, safe forwarding even during 88 periods of temporary loops, and support for multipathing of both 89 unicast and multicast traffic. TRILL accomplishes this by using IS-IS 90 link state routing and encapsulating traffic using a header that 91 includes a hop count. The design supports VLANs and optimization of 92 the distribution of multi-destination frames based on VLANs and IP 93 derived multicast groups. Devices that implement TRILL are called 94 RBridges. 96 Familiarity with [RFC6325] is assumed in this document. 98 1.1 TRILL Scalability Issues 100 There are multiple issues that might limit the scalability of a 101 TRILL-based network: 103 1. the routing computation load, 104 2. the volatility of the LSP database creating too much control 105 traffic, 106 3. the volatility of the LSP database causing the TRILL network to be 107 in an unconverged state too much of the time, 108 4. the size of the LSP database, 109 5. the hard limit of the number of RBridges, due to the 16-bit 110 nickname space, 111 6. the traffic due to upper layer protocols use of broadcast and 112 multicast, and 113 7. the size of the end node learning table (the table that remembers 114 (egress RBridge, VLAN/MAC) pairs). 116 Extending TRILL IS-IS to be multilevel (hierarchical) helps with some 117 of these issues. 119 IS-IS was designed to be multilevel [IS-IS] [RFC1195]. A network can 120 be partitioned into "areas". Routing within an area is known as 121 "Level 1 routing". Routing between areas is known as "Level 2 122 routing". The Level 2 IS-IS network consists of Level 2 routers and 123 links between the Level 2 routers. Level 2 routers may participate 124 in one or more areas, in addition to their role as Level 2 routers. 126 Each area is connected to Level 2 through one or more "border 127 routers", which participate both as a router inside the area, and as 128 a router inside the Level 2 "area". Care must always be taken that 129 it is clear, when transitioning between Level 2 and a Level 1 area in 130 either direction, which (single) border RBridge will transition the 131 frame between the levels. 133 1.2 Improvements Due to Multilevel 135 Partitioning the network into areas reduces the size of the LSP 136 database in each router, and stops volatility of the topology in one 137 area from disrupting other areas. Allowing TRILL to utilize IS-IS's 138 hierarchy solves issues 1 through 4 above, but does not necessarily 139 help the other 3 issues (hard limit of 16-bit RBridge nicknames, 140 traffic due to upper layer protocols using multicast, and size of end 141 node learning table). However, partitioning the network into areas 142 facilitates techniques described in Section 4 to limit the broadcast 143 domain for some traffic, thus reducing problem 6 (traffic due to 144 upper layer protocols use of broadcast and multicast). 146 We propose two alternatives of hierarchical or multilevel TRILL. One 147 we call the "unique nickname" alternative. The other we call the 148 "aggregated nickname" alternative. In the aggregated nickname 149 alternative, border RBridges replace either the ingress or egress 150 nickname field in the TRILL header of unicast frames with an 151 aggregated nickname representing an entire area. 153 The aggregated nickname alternative has the following advantages: 155 o it solves the 16-bit RBridge nickname limit, 156 o it lessens the amount of inter-area routing information that 157 must be passed in IS-IS, 158 o it greatly reduces the RPF information (since only the area 159 nickname needs to appear, rather than all the ingress RBridges 160 in that area), and 161 o it enables computation of trees such that the portion computed 162 within a given area is rooted within that area. 164 The unique nickname alternative has the advantage that border 165 RBridges are simpler and do not need to do TRILL Header nickname 166 modification. 168 1.3 More on Areas 170 Each area is configured with an "area address", which is advertised 171 in IS-IS messages, so as to avoid accidentally interconnecting areas. 172 Note that, although the area address had other purposes in CLNP, (IS- 173 IS was originally designed for CLNP/DECnet), for TRILL the only 174 purpose of the area address would be to avoid accidentally 175 interconnecting areas. 177 Currently, the TRILL specification says that the area address must be 178 zero. If we change the specification so that the area address value 179 of zero is just a default, then most of IS-IS multilevel machinery 180 works as originally designed. However, there are TRILL-specific 181 issues, which we address below in this document. 183 1.4 Terminology and Acronyms 185 This document generally uses the acronyms defined in [RFC6325] plus 186 the additional acronym DBRB. However, for ease of reference, most 187 acronyms used are listed here: 189 CLNP - ConnectionLess Network Protocol 191 DBRB - Designated Border RBridge 193 IS-IS - Intermediate System to Intermediate System 195 LSP - Link State PDU 197 PDU - Protocol Data Unit 199 RBridge - Routing Bridge 201 RPF - Reverse Path Forwarding 203 TRILL - TRansparent Interconnection of Lots of Links 205 VLAN - Virtual Local Area Network 207 2. Multilevel TRILL Issues 209 The TRILL-specific issues introduced by hierarchy include the 210 following: 212 a. Configuration of non-zero area addresses, encoding them in IS-IS 213 PDUs, and possibly interworking with old RBridges that do not 214 understand nonzero area addresses. 216 See Section 2.1. 218 b. Nickname management. 220 See Sections 2.5 and 2.2. 222 c. Advertisement of pruning information (VLAN reachability, IP 223 multicast addresses) across areas. 225 Distribution tree pruning information is only an optimization, 226 as long as multi-destination frames are not prematurely pruned. 227 For instance, border RBridges could advertise they can reach 228 all possible VLANs, and have an IP multicast router attached. 229 This would cause all multi-destination traffic to be 230 transmitted to border RBridges, and possibly pruned there, when 231 the traffic could have been pruned earlier based on VLAN or 232 multicast group if border RBridges advertised more detailed 233 VLAN and/or multicast listener and multicast router attachment 234 information. 236 d. Computation of distribution trees across areas for multi- 237 destination frames. 239 See Section 2.3. 241 e. Computation of RPF information for those distribution trees. 243 See Section 2.4. 245 f. Computation of filtering information across areas. 247 See Sections 2.3 and 2.6. 249 g. Compatibility, as much as practical, with existing, unmodified 250 RBridges. 252 The most important form of compatibility is with existing TRILL 253 fast path hardware. Changes that require upgrade to the slow 254 path firmware/software are more tolerable. Compatibility for 255 the relatively small number of border RBridges is less 256 important than compatibility for non-border RBridges. See 257 Section 5. 259 2.1 Non-zero Area Addresses 261 The current TRILL base protocol specification [RFC6325] [RFC6326] 262 says that the area address in IS-IS must be zero. The purpose of the 263 area address is to ensure that different areas are not accidentally 264 hooked together. Furthermore, zero is an invalid area address for 265 layer 3 IS-IS, so it was chosen as an additional safety mechanism to 266 ensure that layer 3 IS-IS would not be confused with TRILL IS-IS. 267 However, TRILL uses a different multicast address and an Ethertype to 268 avoid such confusion, so it is not necessary to worry about this. 270 Since current TRILL RBridges will reject any IS-IS messages with 271 nonzero area addresses, the choices are as follows: 273 a.1 upgrade all RBridges that are to interoperate in a potentially 274 multi-level environment to understand non-zero area addresses, 275 a.2 neighbors of old RBridges must remove the area address from IS-IS 276 messages when talking to an old RBridge (which might break IS-IS 277 security and/or cause inadvertent merging of areas), 278 a.3 ignore the problem of accidentally merging areas entirely, or 279 a.4 keep the fixed "area address" field as 0 in TRILL, and add a new, 280 optional TLV for "area name" that, if present, could be compared, 281 by new RBridges, to prevent accidental area merging. 283 In principal, different solutions could be used in different areas 284 but it would be much simpler to adopt one of these choices uniformly. 286 2.2 Aggregated versus Unique Nicknames 288 In the unique nickname alternative, all nicknames across the campus 289 must be unique. In the aggregated nickname alternative, RBridge 290 nicknames are only of local significance within an area, and the only 291 nickname externally (outside the area) visible is the "area nickname" 292 (or nicknames), which aggregates all the internal nicknames. 294 The unique nickname approach simplifies border RBridges. 296 The aggregated nickname approach eliminates the potential problem of 297 nickname exhaustion, minimizes the amount of nickname information 298 that would need to be forwarded between areas, minimizes the size of 299 the forwarding table, and simplifies RPF calculation and RPF 300 information. 302 2.2.1 More Details on Unique Nicknames 304 With unique cross-area nicknames, it would be intractable to have a 305 flat nickname space with RBridges in different areas contending for 306 the same nicknames. Instead, each area would need to be configured 307 with a block of nicknames. Either some RBridges would need to 308 announce that all the nicknames other than that block are taken (to 309 prevent the RBridges inside the area from choosing nicknames outside 310 the area's nickname block), or a new TLV would be needed to announce 311 the allowable nicknames, and all RBridges in the area would need to 312 understand that new TLV. 314 Currently the encoding of nickname information in TLVs is by listing 315 of individual nicknames; this would make it painful for a border 316 RBridge to announce into an area that it is holding all other 317 nicknames to limit the nicknames available within that area. The 318 information could be encoded as ranges of nicknames to make this 319 somewhat manageable; however, a new TLV for announcing nickname 320 ranges would not be intelligible to old RBridges. 322 There is also an issue with the unique nicknames approach in building 323 distribution trees, as follows: 325 With unique nicknames in the TRILL campus and TRILL header nicknames 326 not rewritten by the border RBridges, there would have to be globally 327 known nicknames for the trees. Suppose there are k trees. For all 328 of the trees with nicknames located outside an area, the trees would 329 be rooted at a border RBridge or RBridges. Therefore, there would be 330 either no splitting of multi-destination traffic with the area or 331 restricted splitting of multi-destination traffic between trees 332 rooted at a highly restricted set of RBridges. 334 2.2.2 More Details on Aggregated Nicknames 336 The aggregated nickname approach enables passing far less nickname 337 information and works as follows: 339 Each area would be assigned a 16-bit nickname. This would not be the 340 nickname of any actual RBridge. Instead, it would be the nickname of 341 the area itself. Border RBridges would know the area nickname for 342 their own area(s). 344 The TRILL Header nickname fields in TRILL Data frames being 345 transported through a multilevel RBridge campus with aggregated 346 nicknames are as follows: 348 - When being transported in Level 2, the ingress nickname is the 349 nickname of the ingress RBridge's area while the egress nickname 350 is either the nickname of the egress RBridge's area or a tree 351 nickname 353 - When being transported in Level 1 to Level 2, the ingress 354 nickname is the nickname of the ingress RBridge itself while the 355 egress nickname is either the nickname of the area of the egress 356 RBridge or a tree nickname. 358 - When being transported from Level 2 in Level 1, the ingress 359 nickname is the nickname of the ingress RBridge's area while the 360 egress nickname is either the nickname of the egress RBridge 361 itself or a tree nickname. 363 - When both the ingress and egress RBridges are in the same area, 364 there need be no change from the existing base TRILL protocol 365 standard in the TRILL Header nickname fields. 367 There are two variation of the aggregated nickname approach. The 368 first is the Border Learning approach, which is described in Section 369 2.2.2.1. The second is the Swap Nickname Field approach, which is 370 described in Section 2.2.2.2. Section 2.2.2.3 compares the advantages 371 and disadvantages of these two variations. 373 2.2.2.1 Border Learning Aggregated Nicknames 375 This section provides an illustrative example and description of the 376 border learning variation of aggregated nicknames. 378 In the following picture, RB2 and RB3 are area border RBridges. A 379 source S is attached to RB1. The two areas have nicknames 15961 and 380 15918, respectively. RB1 has a nickname, say 27, and RB4 has a 381 nickname, say 44 (and in fact, they could even have the same 382 nickname, since the RBridge nickname will not be visible outside the 383 area). 385 Area 15961 level 2 Area 15918 386 +-------------------+ +-----------------+ +--------------+ 387 | | | | | | 388 | S--RB1---Rx--Rz----RB2---Rb---Rc--Rd---Re--RB3---Rk--RB4---D | 389 | 27 | | | | 44 | 390 | | | | | | 391 +-------------------+ +-----------------+ +--------------+ 393 Let's say that S transmits a frame to destination D, which is 394 connected to RB4, and let's say that D's location is learned by the 395 relevant RBridges already. The relevant RBridges have learned the 396 following: 398 1) RB1 has learned that D is connected to nickname 15918 399 2) RB3 has learned that D is attached to nickname 44. 401 The following sequence of events will occur: 403 - S transmits an Ethernet frame with source MAC = S and destination 404 MAC = D. 406 - RB1 encapsulates with a TRILL header with ingress RBridge = 27, 407 and egress = 15918. 409 - RB2 has announced in the Level 1 IS-IS instance in area 15961, 410 that it is attached to all the area nicknames, including 15918. 411 Therefore, IS-IS routes the frame to RB2. (Alternatively, if a 412 distinguished range of nicknames is used for Level 2, Level 1 413 RBridges seeing such an egress nickname will know to route to the 414 nearest border router, which can be indicated by the IS-IS 415 attached bit.) 417 - RB2, when transitioning the frame from Level 1 to Level 2, 418 replaces the ingress RBridge nickname with the area nickname, so 419 replaces 27 with 15961. Within Level 2, the ingress RBridge field 420 in the TRILL header will therefore be 15961, and the egress 421 RBridge field will be 15918. Also RB2 learns that S is attached to 422 nickname 27 in area 15961 to accommodate return traffic. 424 - The frame is forwarded through Level 2, to RB3, which has 425 advertised, in Level 2, reachability to the nickname 15918. 427 - RB3, when forwarding into area 15918, replaces the egress nickname 428 in the TRILL header with RB4's nickname (44). So, within the 429 destination area, the ingress nickname will be 15961 and the 430 egress nickname will be 44. 432 - RB4, when decapsulating, learns that S is attached to nickname 433 15961, which is the area nickname of the ingress. 435 Now suppose that D's location has not been learned by RB1 and/or RB3. 436 What will happen, as it would in TRILL today, is that RB1 will 437 forward the frame as a multi-destination frame, choosing a tree. As 438 the multi-destination frame transitions into Level 2, RB2 replaces 439 the ingress nickname with the area nickname. If RB1 does not know the 440 location of D, the frame must be flooded, subject to possible 441 pruning, in Level 2 and, subject to possible pruning, from Level 2 442 into every Level 1 area that it reaches on the Level 2 distribution 443 tree. 445 Now suppose that RB1 has learned the location of D (attached to 446 nickname 15918), but RB3 does not know where D is. In that case, RB3 447 must turn the frame into a multi-destination frame within area 15918. 449 In this case, care must be taken so that, in case RB3 is not the 450 Designated transitioner between Level 2 and its area for that multi- 451 destination frame, but was on the unicast path, that another border 452 RBridge in that area not forward the now multi-destination frame back 453 into Level 2. Therefore, it would be desirable to have a marking, 454 somehow, that indicates the scope of this frame's distribution to be 455 "only this area" (see also Section 4). 457 In cases where there are multiple transitioners for unicast frames, 458 the border learning mode of operation requires that the address 459 learning between them be shared by some protocol such as running 460 ESADI [RFC6325] for all VLANs of interest to avoid excessive unknown 461 unicast flooding. 463 The issue described at the end of Section 2.2.1 with trees in the 464 unique nickname alternative is eliminated with aggregated nicknames. 465 With aggregated nicknames, each border RBridge that will transition 466 multi-destination frames can have a mapping between Level 2 tree 467 nicknames and Level 1 tree nicknames. There need not even be 468 agreement about the total number of trees; just that the border 469 RBridge have some mapping, and replace the egress RBridge nickname 470 (the tree name) when transitioning levels. 472 2.2.2.2 Swap Nickname Field Aggregated Nicknames 474 As a variant, two additional fields could exist in TRILL Data frames 475 we call the "ingress swap nickname field" and the "egress swap 476 nickname field". The changes in the example above would be as 477 follows: 479 - RB1 will have learned the area nickname of D and the RBridge 480 nickname of RB4 to which D is attached. In encapsulating a frame 481 to D, it puts the area nickname of D (15918) in the egress 482 nickname field of the TRILL Header and puts the nickname of RB3 483 (44) in a egress swap nickname field. 485 - RB2 moves the ingress nickname to the ingress swap nickname field 486 and inserts 15961, the area nickname for S, into the ingress 487 nickname field. 489 - RB3 swaps the egress nickname and the egress swap nickname fields, 490 which sets the egress nickname to 44. 492 - RB4 learns the correspondence between the source MAC/VLAN of S and 493 the { ingress nickname, ingress swap nickname field } pair as it 494 decapsulates and egresses the frame. 496 2.2.2.3 Comparison 498 The Border Learning variant described in Section 2.2.2.1 above 499 minimizes the change in non-border RBridges but imposes the burden on 500 border RBridges of learning and doing lookups in all the end station 501 MAC addresses within their area(s) that are used for communication 502 outside the area. The burden could be somewhat reduced by decreasing 503 the area size and increasing the number of areas. 505 The Swap Nickname Field variant described in Section 2.2.2.2 506 eliminates the extra address learning burden on border RBridges but 507 requires more extensive changes to non-broader RBridges. In 508 particular they must learn to associate both an RBridge nickname and 509 an area nickname with end station MAC/VLAN pairs (except for 510 addresses that are local to their area). 512 The Swap Nickname Field alternative is more scalable but less 513 backward compatible for non-border RBridges. 515 2.3 Building Multi-Area Trees 517 It is easy to build a multi-area tree by building a tree in each area 518 separately, (including the Level 2 "area"), and then having only a 519 single border RBridge, say RB1, in each area, attach to the Level 2 520 area. RB1 would forward all multi-destination frames between that 521 area and Level 2. 523 People might find this unacceptable, however, because of the desire 524 to path split (not always sending all multi-destination traffic 525 through the same border RBridge). 527 Having multiple border RBridges introduces some complexities: 529 a) calculating the RPF check when a multi-destination frame 530 originates outside the area (which border RBridge injected the 531 frame into the area?) 533 b) calculating the filtering information (which border RBridge will 534 transition the frame into Level 2?) 536 This might be solvable if all RBridges are multilevel aware, however 537 it is difficult to imagine how to ensure that old RBridges would 538 calculate RPF and filtering information sensibly. 540 Ignoring old RBridges for now, various possible solutions are as 541 follows: 543 a) elect one border RBridge for transitioning all multi-destination 544 frames between levels (call that the Designated Border RBridge 545 (DBRB)) 547 b) allow the DBRB to appoint other border RBridges to forward some 548 subset of the inter-level frames. (as the DRB does, on a per-VLAN 549 basis, on a link). Make the appointment information visible to 550 the other RBridges in the area so that they can calculate their 551 RPF and filtering information. 553 c) add an "injection nickname" field to the TRILL Data frame which 554 would indicate the RBridge that injected the frame into the 555 current area and base the RPF check on this field, not on the 556 ingress nickname; however, this would require changes to all 557 RBridges. 559 If b), then on what basis would the appointment be made? Various 560 possibilities are as follows: 561 o based on VLAN 562 o based on distribution tree 563 o based on ingress RBridge nickname 564 o based on aggregated source area nickname 566 The more flexibility that is allowed, the more complex the 567 announcement of information becomes, and the more complex the tree 568 database becomes. If appointment is made based on VLAN, then the RPF 569 check would need to be based on (tree, VLAN, ingress nickname), 570 rather than simply (tree, ingress nickname) as it is today. 572 2.4 The RPF Check for Trees 574 For multi-destination frames originating locally in RB1's area, 575 computation of the RPF check is done as today. For multi-destination 576 frames originating outside RB1's area, computation of the RPF check 577 must be done based on which one of the border RBridges (say RB1, RB2, 578 or RB3) injected the frame into the area. 580 An RBridge, say RB4, located inside an area, must be able to know 581 which of RB1, RB2, or RB3 transitioned the frame into the area from 582 Level 2. (or into Level 2 from an area). 584 This could be done based on having the DBRB announce the transitioner 585 assignments to all the RBridges in the area. 587 2.5 Area Nickname Acquisition 589 In the aggregated nickname alternative, each area must acquire a 590 unique area nickname. It is probably simpler to allocate a block of 591 nicknames (say, the top 4000) to be area addresses, and not used by 592 any RBridges. 594 The area nicknames need to be advertised and acquired through Level 595 2. 597 Within an area, all the border RBridges must discover each other 598 through the Level 1 link state database, by advertising, in their LSP 599 "I am a border RBridge". 601 Of the border RBridges, one will have highest priority (say RB7). RB7 602 can dynamically participates, in Level 2, to acquire a nickname for 603 the area. RB7 could give the area a pseudonode IS-IS ID, such as 604 RB7.5, within Level 2. So an area would appear, in Level 2, as a 605 pseudonode and the pseudonode can participate, in Level 2, to acquire 606 a nickname for the area. 608 Within Level 2, all the border RBridges [for an area] can advertise 609 reachability to the pseudonode, which would mean connectivity to the 610 area nickname. 612 2.6 Link State Representation of Areas 614 Within an area, say area A, there is an election for the DBRB, 615 (Designated Border RBridge), say RB1. This can be done through LSPs 616 within area A. The border RBridges announce themselves, together 617 with their DBRB priority. (Note that the election of the DBRB cannot 618 be done based on Hello messages, because the border RBridges are not 619 necessarily physical neighbors of each other. They can, however, 620 reach each other through connectivity within the area, which is why 621 it will work to find each other through Level 1 LSPs.) 623 RB1 acquires the area nickname (in the aggregated nickname approach), 624 gives the area a pseudonode IS-IS ID (just like the DRB would give a 625 pseudonode IS-IS ID to a link). RB1 advertises, in area A, what the 626 pseudonode IS-IS ID for the area is (and the area nickname that RB1 627 has acquired). 629 The pseudonode LSP initiated by RB1 for the area includes any 630 information extraneous to area A that should be input into area A 631 (such as area nicknames of external areas, or perhaps (in the unique 632 nickname variant), all the nicknames of external RBridges in the 633 TRILL campus and filtering information such as IP multicast groups 634 and VLANs). All the other border RBridges for the area announce (in 635 their LSP) attachment to that pseudonode. 637 Within Level 2, RB1 generates a Level 2 LSP on behalf of the area, 638 also represented as a pseudonode. The same pseudonode name could be 639 used within Level 1 and Level 2, for the area. (There does not seem 640 any reason why it would be useful for it to be different, but there's 641 also no reason why it would need to be the same). Likewise, all the 642 area A border RBridges would announce, in their Level 2 LSPs, 643 connection to the pseudonode. 645 3. Area Partition 647 It is possible for an area to become partitioned, so that there is 648 still a path from one section of the area to the other, but that path 649 is via the Level 2 area. 651 With multi-level TRILL, an area will naturally break into two areas 652 in this case. 654 An area address might be configured to ensure two areas are not 655 inadvertently connected. That area address appears in Hellos and 656 LSPs within the area. If two chunks, connected only via Level 2, 657 were configured with the same area address, this would not cause any 658 problems. (They would just operate as separate Level 1 areas.) 660 A more serious problem occurs if the Level 2 area is partitioned in 661 such a way that it could be healed by using a path through a Level 1 662 area. TRILL will not attempt to solve this problem. Within the Level 663 1 area, a single border RBridge will be the DBRB, and will be in 664 charge of deciding which (single) RBridge will transition any 665 particular multi-destination frames between that area and Level 2. 666 If the Level 2 area is partitioned, this will result in multi- 667 destination frames only reaching the portion of the TRILL campus 668 reachable through the partition attached to the RBridge that 669 transitions that frame. It will not cause a loop. 671 4. Multi-Destination Scope 673 There are at least two reasons it would be desirable to be able to 674 mark a multi-destination frame with a scope that indicates this frame 675 should not exit the area, as follows: 677 1. To address an issue in the border learning variant of the 678 aggregated nickname alternative, when a unicast frame turns into a 679 multi-destination frame when transitioning from Level 2 to Level 680 1, as discussed in Section 4.1. 682 2. To constrain the broadcast domain for certain discovery, 683 directory, or service protocols as discussed in Section 4.2. 685 Multi-destination frame distribution scope restriction could be done 686 in a number of ways. For example, there could be a flag in the frame 687 that means "for this area only". However, the technique that might 688 require the least change to RBridge fast path logic would be to 689 indicate this in the egress nickname that designates the distribution 690 tree being used. There could be two general tree nicknames for each 691 tree, one being for distribution restricted to the area and the other 692 being for multi-area trees. Or, alternatively, there would be a set 693 of N (perhaps 16) special currently reserved nicknames used to 694 specify the N highest priority trees but with the variation that if 695 the special nickname is used, the frame is not transitioned between 696 areas. 698 4.1 Unicast to Multi-destination Conversions 700 In the border learning variant of the aggregated nickname 701 alternative, a unicast frame might be known at the Level 1 to Level 2 702 transition, be forwarded as a unicast frame to the least cost border 703 RBridge advertising connectivity to the destination area, but turn 704 out to have an unknown destination MAC/VLAN pair when it arrives at 705 that border RBridge. 707 In this case, the frame must be converted into a multi-destination 708 frame and flooded in the destination area. However, if the border 709 RBridge doing the conversion is not the border RBridge designated to 710 transition the resulting multi-destination frame, there is the danger 711 that the designated transitioner may pick up the frame and flood it 712 back into Level 2 from which it may be flooded into multiple areas. 713 This danger can be avoided by restricting any multi-destination frame 714 that results from such a conversion to the area through a flag in the 715 frame or though distributing it on tree that is restricted to the 716 area. 718 Alternatively, a multi-destination frame intended only for the area 719 could be tunneled (within the area) to the RBridge Rx, that is the 720 appointed transitioner for that form of frame (say, based on VLAN), 721 with instructions that Rx only transmit the frame within the area, 722 and Rx could initiate the multi-destination frame within the area. 723 Since Rx introduced the frame, and is the only one allowed to 724 transition that frame to Level 2, this would accomplish scoping of 725 the frame to within the area. Since this case only occurs when 726 unicast frames need to be turned into multi-destination as described 727 above, the suboptimality of tunneling between the border RBridge that 728 receives the unicast frame and the appointed level transitioner for 729 that frame, would not be an issue. 731 4.2 Selective Broadcast Domain Reduction 733 There are a number of service, discovery, and directory protocols 734 that, for convenience, are accessed via multicast or broadcast 735 frames. Examples are DHCP and the NetBIOS Service Location Protocol. 737 Some such protocols provide some means to restrict distribution to an 738 IP subnet or equivalent to reduce size of the broadcast domain they 739 are using and then provide a proxy that can be placed in that subnet 740 to use unicast to access a service elsewhere. In cases where a proxy 741 mechanism is not currently defined, it may be possible to create one 742 that references a central server or cache. With multilevel TRILL, it 743 is possible to construct very large IP subnets that could become 744 saturated with multi-destination traffic of this type unless frames 745 can be further restricted in their distribution. Such restricted 746 distribution can be accomplished for some protocols, say protocol P, 747 as follows: 749 - Either (1) at all ingress RBridges in an area place all protocol P 750 multi-destination frames on a distribution tree restricted to the 751 area or (2) at all border RBridges between that area and Level 2, 752 detect protocol P multi-destination frames and do not transition 753 them. 755 - Place one protocol P proxy (or more for back-up) inside each area. 756 These proxies can than unicast protocol P requests or other 757 messages to the actual campus servers for P and receive responses 758 or other messages from those servers and deliver them within the 759 area via unicast, multicast, or broadcast as appropriate. Such 760 proxies would not be needed if it was acceptable for all protocol 761 P traffic to be restricted to an area. 763 While it might seem logical to connect the campus servers to RBridges 764 in Level 2, they could be placed within one or more areas so that, in 765 some cases, those areas might not require a local proxy server. 767 5. Co-Existence with Old RBridges 769 RBridges that are not multilevel aware may have a problem with 770 calculating RPF check and filtering information, since they would not 771 be aware of assignment of border RBridge transitioning. 773 A possible solution, as long as any old RBridges exist within an 774 area, is to have the border RBridges elect a single DBRB (Designated 775 Border RBridge), and have all inter-area traffic go through the DBRB 776 (unicast as well as multi-destination). If that DBRB goes down, a 777 new one will be elected, but at any one time, all inter-area traffic 778 (unicast as well as multi-destination) would go through that one 779 DRBR. However this eliminates load splitting at level transition. 781 6. Summary 783 This draft outlines the issues and possible approaches to multilevel 784 TRILL. The alternative using area nicknames for aggregation has 785 significant advantages in terms of scalability over using campus wide 786 unique nicknames, not just of avoiding nickname exhaustion, but 787 allowing, for instance, RPF checks to be aggregated based on an 788 entire area. 790 Some issues are not difficult, such as dealing with partitioned 791 areas. Some issues are more difficult, especially dealing with old 792 RBridges. 794 7. Security Considerations 796 This document explores alternatives for the use of multi-level IS-IS 797 in TRILL. It does not consider security issues. For general TRILL 798 Security Considerations, see [RFC6325]. 800 8. IANA Considerations 802 This document requires no IANA actions. RFC Editor: Please delete 803 this section before publication. 805 9. References 807 Normative and Informational references for this document are listed 808 below. 810 9.1 Normative References 812 [IS-IS] - ISO/IEC 10589:2002, Second Edition, "Intermediate System to 813 Intermediate System Intra-Domain Routing Exchange Protocol for 814 use in Conjunction with the Protocol for Providing the 815 Connectionless-mode Network Service (ISO 8473)", 2002. 817 [RFC1195] - Callon, R., "Use of OSI IS-IS for routing in TCP/IP and 818 dual environments", RFC 1195, December 1990. 820 [RFC6325] - Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 821 Ghanwani, "Routing Bridges (RBridges): Base Protocol 822 Specification", RFC 6325, July 2011. 824 [RFC6326] - Eastlake, D., Banerjee, A., Dutt, D., Perlman, R., and A. 825 Ghanwani, "Transparent Interconnection of Lots of Links (TRILL) 826 Use of IS-IS", RFC 6326, July 2011. 828 9.2 Informative References 830 None. 832 Authors' Addresses 834 Radia Perlman 835 Intel Labs 836 2200 Mission College Blvd. 837 Santa Clara, CA 95054-1549 USA 839 Phone: +1-408-765-8080 840 Email: Radia@alum.mit.edu 842 Donald Eastlake 843 Huawei Technologies 844 155 Beaver Street 845 Milford, MA 01757 USA 847 Phone: +1-508-333-2270 848 Email: d3e3e3@gmail.com 850 Anoop Ghanwani 851 Brocade Communications Systems 852 130 Holger Way 853 San Jose, CA 95134 USA 855 Phone: +1-408-333-7149 856 Email: anoop@brocade.com 858 Hongjun Zhai 859 ZTE 860 68 Zijinghua Road, Yuhuatai District 861 Nanjing, Jiangsu 210012 China 863 Phone: +86 25 52877345 864 Email: zhai.hongjun@zte.com.cn 866 Copyright and IPR Provisions 868 Copyright (c) 2011 IETF Trust and the persons identified as the 869 document authors. All rights reserved. 871 This document is subject to BCP 78 and the IETF Trust's Legal 872 Provisions Relating to IETF Documents 873 (http://trustee.ietf.org/license-info) in effect on the date of 874 publication of this document. Please review these documents 875 carefully, as they describe your rights and restrictions with respect 876 to this document. Code Components extracted from this document must 877 include Simplified BSD License text as described in Section 4.e of 878 the Trust Legal Provisions and are provided without warranty as 879 described in the Simplified BSD License. The definitive version of 880 an IETF Document is that published by, or under the auspices of, the 881 IETF. Versions of IETF Documents that are published by third parties, 882 including those that are translated into other languages, should not 883 be considered to be definitive versions of IETF Documents. The 884 definitive version of these Legal Provisions is that published by, or 885 under the auspices of, the IETF. Versions of these Legal Provisions 886 that are published by third parties, including those that are 887 translated into other languages, should not be considered to be 888 definitive versions of these Legal Provisions. For the avoidance of 889 doubt, each Contributor to the IETF Standards Process licenses each 890 Contribution that he or she makes as part of the IETF Standards 891 Process to the IETF Trust pursuant to the provisions of RFC 5378. No 892 language to the contrary, or terms, conditions or rights that differ 893 from or are inconsistent with the rights and licenses granted under 894 RFC 5378, shall have any effect and shall be null and void, whether 895 published or posted by such Contributor, or included with or in such 896 Contribution.