idnits 2.17.1 draft-perlman-trill-rbridge-protocol-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 830. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 807. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 814. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 820. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 291: '...ARP/ND reply, R1 MUST include E in the...' RFC 2119 keyword, line 292: '... MAY remove E from L2INFO....' RFC 2119 keyword, line 550: '...on is that RBridges MUST recognize and...' RFC 2119 keyword, line 654: '... it MUST do if the target is unknown...' RFC 2119 keyword, line 675: '... target, then R1 MAY broadcast or unic...' (1 more instance...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 13, 2006) is 6647 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '1' is defined on line 763, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 767, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 771, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 775, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 778, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Obsolete informational reference (is this intentional?): RFC 2461 (ref. '3') (Obsoleted by RFC 4861) Summary: 4 errors (**), 0 flaws (~~), 7 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group R. Perlman 2 Internet Draft Sun 3 Expires: August 2006 J. Touch 4 USC/ISI 5 February 13, 2006 7 Rbridges: Base Protocol Specification 8 draft-perlman-trill-rbridge-protocol-00.txt 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that 13 any applicable patent or other IPR claims of which he or she is 14 aware have been or will be disclosed, and any of which he or she 15 becomes aware will be disclosed, in accordance with Section 6 of 16 BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html 34 This Internet-Draft will expire on August 13, 2006. 36 Abstract 38 RBridges provide the ability to have an entire campus, with multiple 39 physical links, look to IP like a single subnet. The design allows 40 for zero configuration of switches within a campus, optimal pair-wise 41 routing, safe forwarding even during periods of temporary loops, and 42 the ability to cut down on ARP/ND traffic. The design also supports 43 VLANs, and allows forwarding tables to be based on RBridge 44 destinations (rather than endnode destinations), which allows 45 internal routing tables to be substantially smaller than in 46 conventional bridge systems. 48 Table of Contents 50 1. Introduction...................................................2 51 2. Detailed Rbridge Design........................................6 52 2.1. Link State Protocol.......................................6 53 2.1.1. Separate Instances...................................6 54 2.1.2. Multiple Rbridge IS-IS Instances.....................6 55 2.2. Ingress Rbridge Tree Calculation..........................8 56 2.3. Pruning the Ingress Rbridge Tree..........................8 57 2.4. Designated Rbridge........................................9 58 2.5. Learning Endnode Location................................10 59 2.6. Forwarding Behavior......................................10 60 2.6.1. Receipt of a Native Packet..........................10 61 2.6.2. Receipt of an In-transit Packet.....................10 62 2.6.2.1. Flooded Packet.................................11 63 2.6.2.2. Unicast Packet.................................11 64 2.7. IGMP Learning............................................12 65 2.8. Combined Bridge/Rbridge..................................12 66 2.9. Forwarding Header on 802 Links...........................13 67 2.10. Format of the Shim Header...............................13 68 2.11. Handling ARP/ND Queries.................................14 69 2.12. Assuring Freshness of Endnode Information...............15 70 3. Rbridge Addresses, Parameters, and Constants..................16 71 4. Security Considerations.......................................16 72 5. IANA Considerations...........................................16 73 6. Conclusions...................................................17 74 7. Acknowledgments...............................................17 75 8. References....................................................17 76 8.1. Normative References.....................................17 77 8.2. Informative References...................................17 78 Author's Addresses...............................................18 79 Intellectual Property Statement..................................18 80 Disclaimer of Validity...........................................18 81 Copyright Statement..............................................19 82 Acknowledgment...................................................19 84 1. Introduction 86 In traditional IPv4 and IPv6 networks, each link must have a unique 87 prefix. This means that a node that moves from one link to another 88 must change its IP address, and a node with multiple links must have 89 multiple addresses. It also means that a company with many links 90 (separated by routers) will have difficulty making full use of its IP 91 address block (since any link not fully populated will waste 92 addresses), and IP routers require significant configuration. Bridges 93 avoid these problems because bridges can transparently glue many 94 physical links into what appears to IP to be a single LAN. 96 However, bridge routing via the spanning tree using the layer 2 97 header has some disadvantages: 99 o The spanning tree limits which links can be used, and therefore 100 concentrates traffic onto selected links 102 o Forwarding based on a header without a TTL is dangerous, because 103 temporary loops might arise due to topology changes, lost spanning 104 tree messages, or components such as repeaters coming up) 106 o Routes cannot be pair-wise shortest paths, but instead whatever 107 path remains after the spanning tree eliminates redundant paths 109 We define the term "campus" to be the set of links connected by any 110 combination of RBridges and bridges. A campus appears to IP nodes to 111 be a single subnet. 113 This document presents the design for Rbridges (routing bridges), 114 which combines the advantages of bridges and routers. Like bridges, 115 RBridges are zero configuration, and are transparent to IP nodes. 116 Like routers, RBridges forward on pair-wise shortest paths, and do 117 not have dangerous behavior during temporary loops. RBridges have the 118 additional advantage that they can optimize ARP (IPv4) and ND (Ipv6) 119 by avoiding the broadcast/multicast behavior of the queries. 121 RBridges are fully compatible with current bridges as well as current 122 IPv4 and IPv6 routers and endnodes. They are as invisible to current 123 IP routers as bridges are, and like routers, they terminate a bridged 124 spanning tree. 126 The main idea is to have RBridges run a link state protocol amongst 127 themselves. This enables them to have enough information to compute 128 pairwise optimal paths for unicast, and to calculate distribution 129 trees for delivery of packets to unknown destinations, or 130 multicast/broadcast packets. 132 RBridges must learn the location of endnodes. They learn the location 133 and layer 2 addresses of attached nodes from the source address of 134 data frames, as bridges do. Additionally, in order to facility proxy 135 ARP or proxy ND optimizations, RBridges also learn the (layer 3, 136 layer 2) addresses of attached IP nodes from ARP or ND replies. 138 Once an RBridge learns the location of a directly attached endnode, 139 it informs the other RBridges in its link state information. 141 RBridge forwarding can be done, as with a router, via pairwise 142 shortest paths. 144 To mitigate the temporary loop issues with bridges, RBridges must 145 always forward based on a header with a hop count. Although the hop 146 count will quickly discard looping frames, it is also desirable not 147 to spawn additional copies of frames. This can be accomplished by 148 having RBridges specify the next RBridge recipient while forwarding 149 across a shared-media link. 151 Frames must be encapsulated as they travel between RBridges for 152 several reasons: 154 1. to prevent source MAC learning from frames in transmit 156 2. so that the frames can be directed towards the egress RBridge. 157 This enables forwarding tables of RBridges to be sized with the 158 number of RBridges rather than the total number of nodes in the 159 common broadcast domain 161 3. so that frames in transit can include a hop count (for links, like 162 Ethernet, that do not already contain a hop count) 164 In order to coexist with Ethernet bridges on Ethernet links, frames 165 in transit on Ethernet links must be encapsulated with an Ethernet 166 header. The outer header of an RBridge-forwarded frame must look, to 167 an Ethernet bridge on the path between two RBridges, like the header 168 of a normal frame that the bridge will forward. 170 Inside that header is a shim header that RBridges will add to the 171 frame that will contain: 173 o the ingress-RBridge (in the case of a broadcast/multicast/unknown 174 destination frame), or egress-RBridge (in the case of a unicast 175 frame to a known destination) 177 o a hop count 179 Inside the shim header is the original frame, as injected into the 180 campus. 182 RBridges must also support VLANs. 184 A VLAN is a broadcast domain. That means that a layer 2 broadcast 185 (multicast) frame sent to a VLAN must only be delivered to links that 186 are in that VLAN. A frame for a particular VLAN may transit any link 187 on the campus, but an unencapsulated VLAN frame must only be 188 delivered to links that RBridges know (for example, through 189 configuration) support that VLAN. 191 There are several types of frames which RBridges must deliver within 192 a broadcast domain: 194 1. frames for unknown destinations 196 2. frames for layer 2 multicast addresses derived from IP multicast 197 addresses 199 3. frames for layer 2 broadcast/multicast frames which are not 200 derived from IP multicast addresses 202 4. ARP/ND queries 204 If a frame belongs in a particular VLAN, the frame must be delivered 205 only to links in that VLAN. This is true for both broadcast/multicast 206 frames, and unicast frames. 208 RBridges will calculate a distribution tree for each ingress RBridge, 209 which we will refer to as the "ingress RBRidge tree". In theory, 210 RBRidges could have calculated a single spanning tree for the entire 211 campus. However, it was decided that the additional computation 212 necessary to compute ingress RBridge trees was warranted because: 214 1. it optimizes the distribution path and (almost always) the cost of 215 delivery when the number of destination links is a subset of the 216 total number of links. Delivery is only to a subset of links in 217 the case of VLANs and IP multicasts 219 2. for unknown destinations, out-of-order delivery is minimized 220 because in the case where a flow starts before the location of the 221 destination is known by the RBridges, the path to the destination 222 through the per-ingress-RBridge tree will be the same as the path 223 directly to the destination 225 RBridges will not use the bridge spanning tree algorithm to calculate 226 the ingress RBridge trees. Instead, the trees are calculated based on 227 the link state information. Therefore the tree calculation is done 228 without requiring any additional exchange of information between 229 RBridges. 231 2. Detailed Rbridge Design 233 2.1. Link State Protocol 235 Running a link state protocol among RBridges is straightforward. It 236 is the same as running a level 1 routing protocol in an area, with 237 endnode addresses being layer 2 addresses rather than, say, IP 238 addresses. IS-IS is natural choice for a link state protocol because 239 it is easy in IS-IS to define new TLVs for carrying new information, 240 and because IS-IS can be done with zero configuration. All that is 241 required to run IS-IS is for each RBridge to have a unique 6-byte 242 system ID, which can be any of the RBridge's MAC addresses. 244 2.1.1. Separate Instances 246 The instance of IS-IS that RBridges will implement is separate from 247 any routing protocol that IP routers will implement, just as the 248 spanning tree messages are not implemented by IP routers. 250 To prevent potential confusion between an IS-IS instance being run by 251 IP routers and the IS-IS being run by RBridges, RBridge routing 252 messages will be sent to a different layer 2 multicast address than 253 IS-IS routing messages. The RBridge IS-IS instance is also 254 differentiated by having a distinct, contant "area address" (the 255 value 0) that would never appear as a real IS-IS area address. 257 2.1.2. Multiple Rbridge IS-IS Instances 259 There are two types of information that are carried in RBridge link 260 state information; "core-RBridge information", and "endnode 261 information". In theory this information could all be contained in 262 one instance of RBridge IS-IS. However, since endnode information for 263 a particular VLAN only needs to be known to RBridges that are 264 connected to links configured to be in that VLAN, it was decided that 265 each RBridge R1 will run a "core" instance of IS-IS for the core 266 RBridge information, and an instance per VLAN that R1 is attached to, 267 for the endnode information for those VLANs. 269 The core-RBridge information, which is carried in the core-RBridge 270 instance, is: 272 1. the system IDs of RBridges which are neighbors of RBridge R1, and 273 the cost of the link to each of those neighbors 275 2. VLANs directly connected to R1 276 The endnode information for VLAN A, which is carried in the VLAN A 277 IS-IS instance injected by R1, contains: 279 1. L2INFO: layer 2 addresses of nodes on a VLAN A link attached to R1 280 which have transmitted frames but have not transmitted ARP or ND 281 replies (i.e., these are not known to be IP nodes) 283 2. L3and2INFO: layer 3, layer 2 addresses of IP nodes attached to R1, 284 which R1 has learned through ARP/ND replies emitted by endnodes on 285 an attached VLAN A link. For data compression, only the portion 286 of the address following the campus-wide prefix need be carried. 287 (This is a more important optimization for IPv6 than for IPv4) 289 If R1 has learned endnode E's location first from a data packet (and 290 therefore has included E's layer 2 address in the L2INFO, and later E 291 transmits an ARP/ND reply, R1 MUST include E in the L3andL2INFO, and 292 MAY remove E from L2INFO. 294 Given that RBridges must already support delivery only to links 295 within a VLAN (for multicast or unknown frames marked with the VLAN's 296 tag), the same mechanism is used to advertise endnode information 297 solely to RBridges within a VLAN. 299 The per-VLAN instance of IS-IS will appear to the RBridges to consist 300 of a single link. R1 will originate a VLAN-A-specific IS-IS frame. 301 All RBridges will recognize the frame as a VLAN A multicast frame 302 (even if they are not connected to VLAN A), and prune the ingress-R1 303 tree so as to only deliver the frame along branches with VLAN A 304 links. This is the same behavior core RBridges would have for any 305 VLAN A multicast/broadcast/unknown destination frame. RBridges that 306 are connected to VLAN A links will, in addition to forwarding along 307 the ingress RBridge tree, process the frame in their VLAN-A IS-IS 308 instance. 310 Thus suppose that RBridges R1, R2, and R3 are all on VLAN A, on links 311 scattered throughout the campus. The VLAN A IS-IS will appear to be a 312 single link (broadcast domain) with R1, R2, and R3 as neighbors. The 313 only information carried in the instance is the endnode information 314 for VLAN A. The other RBridges on the campus facilitate delivery 315 within the VLAN A broadcast domain, and therefore may be on the path 316 between R1 and R2, but will treat the VLAN A instance link state 317 frames as ordinary datagrams. 319 The way that RBridges distinguish which IS-IS instance the link state 320 information is for is based on the VLAN tag in the inner header. 322 2.2. Ingress Rbridge Tree Calculation 324 Some frames (e.g., to unknown destinations, or multicast 325 destinations) will need to be delivered to multiple links. To 326 optimize delivery in the case where not all links are to receive the 327 frame (e.g., an IP multicast or a VLAN-tagged frame), and to avoid 328 out-of-order delivery when location of the destination is discovered 329 after a flow starts up, RBridges calculate a tree per ingress 330 RBridge, and deliver a frame along that distribution tree. The 331 ingress RBridge trees will be calculated based on the link state 332 information distributed in the core IS-IS instance. 334 In IS-IS, each "node" that initiates a link state packet has an ID. 335 If the "node" is a router, initiating the link state information on 336 behalf of itself, the ID is the router's system ID, concatenated with 337 the constant 0. If the "node" is a pseudonode, i.e., a shared link, 338 then one RBridge, say R1, on the link, is elected Designated RBridge, 339 and R1 initiates a link state packet on behalf of the pseudonode. In 340 this case the ID of the pseudonode is a 7-byte quantity which R1 can 341 be sure is unique within the campus, usually the 6-byte system ID of 342 R1, concatenated with a byte chosen by R1 to differentiate this 343 pseudonode from any other link for which R1 might also be Designated 344 RBridge. 346 So, the link state information consists of a bunch of nodes, each 347 with a unique (within the campus) ID, and the connectivity between 348 these nodes. It is essential that all RBridges calculate the same set 349 of ingress RBridge trees. In the case of encountering multiple equal 350 cost paths in the tree calculation, some tie breaker must be used to 351 ensure that all RBridges calculate the same tree. 353 When choosing where to attach a node to the tree being calculated, 354 the tie-breaker is the ID of the parent to which the node will be 355 attached (if a node can be attached to either parent P1 or P2 with 356 the same cost, choose P1 if P1's ID is lower than P2). 358 2.3. Pruning the Ingress Rbridge Tree 360 Packets which must be flooded (e.g., multicasts, unknown 361 destinations), are flooded along a tree rooted at the ingress 362 RBridge, and pruned based on whether there are potential receivers 363 downstream. In the case of a VLAN-tagged packet, it need not be 364 delivered if no RBridges along a branch of that tree (rooted at the 365 ingress RBridge) have RBridges participating in that VLAN. In the 366 case of a multicast derived from an IP multicast, IGMP snooping by 367 RBridges might further narrow which links have potential receivers. 369 The actual spanning tree to forward along is chosen based on the 370 ingress-RBridge, whose identity is contained in the shim header. Say 371 the ingress RBridge is Ri. Suppose RBridge Rc knows that the set of 372 links {L1, L2, L3} is in the ingress-Ri spanning tree. 374 If the frame is received on link L3, the frame may be forwarded to 375 links L1 and L2. However, Rc can limit distribution of the frame if 376 Rc knows there are no interested receivers along a branch. 378 The way this is done is that Rc first calculates the Ri tree, 379 determining that, say, links {L1, L2, and L3} are contained in that 380 tree. Furthermore, since this is an ingress RBridge tree, 381 distribution is unidirectional. So Rc will know that for this tree, 382 all traffic will be received on, say, L1, and transmitted out L2 and 383 L3. Now Rc must calculate, for each of the output links (L2 and L3), 384 the set of destinations that should be forwarded onto that link. 386 For each of L2, and L3, and for each VLAN and for each IP multicast 387 address (as determined by IGMP snooping), Rc must indicate which of 388 those addresses have receivers downstream from that link. 390 So Rc will know that {L2, and L3} are output links for the Ri- 391 ingress-spanning tree. For each of the output links for this ingress 392 Rbridge tree, Rc keeps a list of layer 2 multicast addresses derived 393 from IP multicasts, and learned through IGMP snooping, and the set of 394 VLAN tags, which are reachable through that output link on this 395 ingress-RBridge tree. 397 If Rc receives a multicast frame from L1, it selects the spanning 398 tree based on the ingress RBridge as indicated in the shim header. 399 Then it prunes based on the destination address in the original 400 frame. Then it forwards on the remaining output links for that 401 ingress RBridge tree that have receivers. 403 For each link for which Rc is Designated, Rc additionally checks to 404 see if it should decapsulate the frame and send it to the link. 406 2.4. Designated Rbridge 408 One RBridge on each link needs to be elected to have special duties. 409 This elected RBridge is known as the Designated RBridge. IS-IS 410 already holds such an election. 412 The Designated RBridge is the one on the link that will learn and 413 advertise the identities of attached endnodes, encapsulate and 414 forward frames that originate on that link to the rest of the campus, 415 decapsulate and forward frames onto that link received from other 416 RBridges, initiate a distributed ARP when an ARP query is received 417 for an unknown destination, and answer ARP queries when the target 418 node is known. 420 2.5. Learning Endnode Location 422 RBridges learn endnode location from data frames. They learn (layer 423 3, layer 2) pairs (for the purpose of supporting ARP/ND optimization) 424 from listening to ARP or ND replies. 426 This endnode information is learned by the DR, and distributed to 427 other RBridges through the link state protocol. 429 2.6. Forwarding Behavior 431 2.6.1. Receipt of a Native Packet 433 R1 receives a native (i.e., not RBridge-encapsulated) unicast frame. 434 R1 knows that this is a native frame because the Ethertype is not 435 "RBridge encapsulated frame". The destination in the layer 2 header 436 is D, the source is S. 438 R1 inserts a VLAN tag if required, according to the same rules as 439 bridges do. 441 Once the VLAN (if any) is established, the layer 2 address of D is 442 looked up in the destination table to find the egress RBridge R2, or 443 discover that D is unknown. 445 If D is known, with egress R2, then R1 encapsulates the packet, with 446 R2 indicated in the shim header as egress RBridge. In the outer 447 header, R1 puts "R1" as source, and next hop RBridge (in the path to 448 R2) as "destination", and "encapsulated RBridge packet" as the 449 Ethertype. 451 If D is unknown, R1 encapsulates the packet, with "R1" indicated as 452 ingress RBridge in the shim header, and outer header with source=R1, 453 destination = "all-RBridges". 455 2.6.2. Receipt of an In-transit Packet 457 RBridge R1 receives an encapsulated frame (as indicated by 458 Ethertype="Rbridge-encapsulated). 460 2.6.2.1. Flooded Packet 462 If the destination in the outer header is "all-RBridges", then R1 463 forwards along the ingress RBridge tree indicated by the shim header. 465 If the frame's inner header indicates it is for a specific VLAN, 466 links in that indicated ingress RBridge tree that do not lead to 467 links in that VLAN are pruned for this packet. Furthermore, if the 468 frame contains an IP multicast packet, then R1 only forwards on 469 branches that have learned, through IGMP, have receiver on those 470 links for this IP multicast. 472 In addition, for links for which R1 is Designated, R1 decapsulates 473 the packet and transmits the packet onto those links (unless the 474 packet is IP multicast or VLAN-tagged, and the packet does not belong 475 on that link). 477 If the frame belongs in VLAN A, (based on the presence of a tag in 478 the inner header) then R1 (the ingress RBridge) looks up D's location 479 in R1's table of VLAN A endnodes. 481 If the native frame's destination is a layer 2 multicast, then if 482 the frame is a BDPU, the RBridge drops the frame. 484 If the native frame's destination is "all-RBridges" with Ethertype 485 "IS-IS", then R1 processes the link state packet. 487 If the packet is an IGMP announcement, which will be transmitted to 488 an IP-derived layer 2 multicast address of "all IP routers", then the 489 RBridge learns, based on the "ingress RBridge" in the shim header, 490 the mapping between egress RBridges and IP multicast address 491 listeners. 493 2.6.2.2. Unicast Packet 495 If the destination in the outer header is not R1, then R1 drops the 496 frame. 498 If the shim header indicates R1 is the egress RBridge, then R1 499 extracts the inner frame and forwards it onto the link containing the 500 destination, or processes the packet if the destination in the inner 501 frame is R1. 503 Else, R1 looks up the egress RBridge R2 indicated in the shim header, 504 in its forwarding table, and forwards the packet towards R2, by 505 replacing the outer header with one with source=R1, 506 destination=nexthop RBridge towards R2, and Ethertype "encapsulated 507 RBridge". 509 2.7. IGMP Learning 511 RBridges learn, based on seeing IGMP packets, which multicast 512 addresses should be forwarded onto which links. 514 IGMP messages have to be forwarded throughout the campus, since IP 515 routers in the broadcast domain also need to see these messages. 517 IGMP messages are forwarded by RBridges throughout the campus like 518 any layer 2 multicast. They are recognized by having an IP message 519 type=2 in the IP header. In addition, they are processed by RBridges 520 in order to extract, from announcements, what egress RBridges have 521 receivers for which groups. 523 2.8. Combined Bridge/Rbridge 525 RBridges do not participate in the bridge spanning tree protocol. The 526 only thing RBridges do with regard to bridge spanning tree is 527 recognize BPDUs and drop them. 529 In some cases it might be advantageous for the Designated RBridge to 530 be the Root of the bridge spanning tree calculated on a link (where 531 "link" as perceived by the RBridges might actually be a collection of 532 segments connected via bridges). Having the bridge spanning tree on 533 that link rooted at the RBridge will optimize paths that go between a 534 node on that link and a node off the link to another link within the 535 campus. However, this may make intra-link paths worse, or paths 536 between a node on the link and an IP router also on that link. 538 If it is desired for the spanning tree to be rooted at the Designated 539 RBridge, this can be accomplished by implementing a colocated 540 bridge/RBridge. This would be equivalent to two boxes: a bridge 541 directly connected to the link, and a point-to-point link to the 542 RBridge. 544 The bridge portion of the logically combined box would change its 545 priority to the numerically lowest value if the colocated RBridge is 546 elected Designated RBridge on that link. 548 Note that this section is only an implementation possibility. 549 RBridges are not required to be implemented as combined with bridges. 550 The only point of this section is that RBridges MUST recognize and 551 drop BPDUs. 553 2.9. Forwarding Header on 802 Links 555 It is essential that RBridges coexist with ordinary bridges. 556 Therefore, a frame in transit must look to ordinary bridges like an 557 ordinary layer 2 frame. However, it must also be differentiable from 558 a native layer 2 frame by RBridges. To accomplish this, we use a new 559 layer 2 protocol type ("Ethertype"). 561 A frame in transit on an 802 link will therefore have two 802 562 headers, since the original frame (including the original 802 header) 563 will be tunneled by the RBridges. But rather than just having an 564 additional 802 header, we include additional information between the 565 two headers; at least a hop count. 567 An encapsulated frame would look as follows: 569 +--------------+-------------+-----------------+ 570 | outer header | shim header | original frame | 571 +--------------+-------------+-----------------+ 573 Figure 1 Encapsulated Frame 575 The outer header contains: 577 o L2 destination = next RBridge, or for flooded frames, a new (to be 578 assigned) multicast layer 2 address meaning "all RBridges" 580 o L2 source = transmitting RBridge (the one that most recently 581 handled this frame) 583 protocol type = "to be assigned...RBridge encapsulated frame" 585 The shim header includes: 587 o TTL = starts at some value and decremented by each RBridge. 588 Discarded if=0 590 o egress RBridge (in the case of unicast), or ingress RBridge (in 591 the case of multicast) 593 2.10. Format of the Shim Header 595 The format of the shim header is defined in draft-bryant-perlman- 596 trill-pwe-encap-00. This is an MPLS-based format, although it will 597 have the semantics stated above (that it contains a TTL and an 598 ingress or egress RBridge). However, an RBridge ID is 6-bytes, and 599 this format only allows for 19 bits to specify the RBridge ID. 601 Therefore, there is a distributed process piggybacked on the link 602 state protocol, whereby RBridges choose 19-bit nicknames. This 603 protocol is also specified in the internet draft draft-bryant- 604 perlman-trill-pwe-encap-00. 606 2.11. Handling ARP/ND Queries 608 We will use the term "optimized ARP/ND response" to cover several 609 possible behaviors an RBridge might utilize. Non-optimized behavior 610 would consist of treating an ARP or ND query as an ordinary layer 2 611 broadcast/multicast, and send the query to all links in the campus, 612 allowing the target to respond as to an ordinary ARP/ND query. This 613 behavior is essential when the location of the target is unknown, 614 although RBridges could suppress multiple queries to the same target 615 within some amount of time. 617 When the target's location is assumed to be known by the first 618 RBridge, it need not flood the query. Alternative behaviors of the 619 first Designated RBridge that receives the ARP/ND query would be to: 621 1. send a response directly to the querier, with the layer 2 address 622 of the target, as believed by the RBridge 624 2. encapsulate the ARP/ND query to the target's Designated RBridge, 625 and have the Designated RBridge at the target forward the query to 626 the target. This behavior has the advantage that a response to the 627 query will be definitive. If the query does not reach the target, 628 then the querier will not get a response 630 3. block ARP/ND queries that occur for some time after a query to the 631 same target has been launched, and then respond to the querier 632 when the response to the recently-launched query to that target is 633 received 635 The reason not to do the most optimized behavior all the time is for 636 timeliness of detecting a stale cache. Also, in the case of SEND, 637 cryptography might prevent behavior 1, since the RBridge would not be 638 able to sign the response with the target's private key. 640 It is not essential that all RBridges use the same strategy for which 641 option to select for a particular query. However, once the first 642 Designated RBridge decides on a strategy for a particular query, the 643 other RBridges must carry that through. If the first RBridge responds 644 directly to the querier, or blocks the query, then no other RBridges 645 are involved. 647 If the first Designated RBridge R1 decides to unicast the query to 648 the target's Designated RBridge R2, then R2 must decapsulate the 649 query, and initiate an ARP/ND query on the target's link. When/if the 650 target responds, R2 must encapsulate and unicast the response to R1, 651 which will decapsulate the response and send it to the querier. 653 If the first Designated RBridge R1 decides to flood the query (which 654 it MUST do if the target is unknown, but MAY do if it wants to assure 655 freshness of the information), the query is encapsulated to be 656 flooded through the indicated VLAN. 658 The distributed ARP query is carried by RBridges through the RBridge 659 spanning tree. Each Designated RBridge, in addition to forwarding the 660 query through the spanning tree, initiates an ARP query on its 661 link(s). If a reply is received from the target by Designated RBridge 662 R2, R2 initiates a link state update to inform all the other RBridges 663 of D's location, layer 3 address, and layer 2 address, in addition to 664 forwarding the reply to the querier. 666 It is the querier's Designated RBridge R1 that chooses which strategy 667 to employ when seeing an ARP query. 669 Some mix of these strategies (responding directly, unicasting the 670 query to the target's Designated RBridge, or flooding the query) 671 might be the best solution. For instance, even if the target's 672 location and (layer 3, layer 2) correspondence is in the link state 673 information R1 received from R2, if the target's location has not 674 been recently verified by R1 through a broadcast ARP/ND or unicast 675 query to the target, then R1 MAY broadcast or unicast the query or 676 respond directly. So for instance, RBridges could keep track of the 677 last time a broadcast ARP/ND occurred for each endnode E (by any 678 source, and injected by any RBridge). Let's say the parameter is 20 679 seconds. If a source S on RBridge R1's link does an ARP/ND for D, if 680 R1 has not seen an ARP/ND for D within the last 20 seconds, R1 681 unicasts the query to force a reply from the target; otherwise it 682 proxies the reply. 684 When R2 forwards a unicast ARP/ND query, if the target does not 685 respond, then R2 MAY replay the query, and if the target does not 686 respond, R2 will remove the target from its link state information. 688 2.12. Assuring Freshness of Endnode Information 690 Designated RBridge R1 can ensure freshness of its endnode information 691 by doing ARP/ND queries periodically to ensure that the endnodes are 692 actually there. This can be a problem if the endnodes are in power- 693 saver mode, and this should be a configuration parameter on R1 as to 694 whether R1 should "ping" the endnodes by doing ARP/ND queries. 696 3. Rbridge Addresses, Parameters, and Constants 698 Each RBridge needs a unique ID within the campus. The simplest such 699 address is a unique 6-byte ID, since such an ID is easily obtainable 700 as any of the EUI-48's owned by that RBridge. IS-IS already requires 701 each router to have such an address. 703 A parameter is the value to which to initially set the hop count in 704 the envelope. Recommended default=20. 706 A new Ethertype must be assigned to indicate an RBridge-encapsulated 707 frame. 709 A layer 2 multicast address for "all RBridges" must be assigned for 710 use as the destination address in flooded frames. 712 To support VLANs, RBridges (like bridges today), must be configured, 713 for each port, with the VLAN in which that port belongs. 715 We may want a parameter to determine whether an RBridge should 716 periodically do queries to ensure that the endnode information is 717 fresh, and if so, with what frequency. 719 4. Security Considerations 721 The goal is for RBridges to not add additional security issues over 722 what would be present with traditional bridges. RBridges will not be 723 able to prevent nodes from impersonating other nodes, for instance, 724 by issuing bogus ARP replies. However, RBridges will not interfere 725 with any schemes that would secure neighbor discovery. 727 As with routing schemes, authentication of RBridge messages would be 728 a simple addition to the design (and it would be accomplished the 729 same way as it would be in IS-IS). However, any sort of 730 authentication requires additional configuration, which might 731 interfere with the perception that RBridges, like bridges, are zero 732 configuration. 734 5. IANA Considerations 736 A new Ethertype must be assigned to indicate an RBridge-encapsulated 737 frame. 739 A layer 2 multicast address for "all RBridges" must be assigned for 740 use as the destination address in flooded frames. 742 6. Conclusions 744 This design allows transparent interconnection of multiple links into 745 a single IP subnet. Management would be just like with bridges 746 (plug-and-play). But this design avoids the disadvantages of 747 bridges. Temporary loops are not a problem so failover can be as 748 fast as possible, and shortest paths can be followed. 750 The design is compatible with current IP nodes and routers, and with 751 current bridges. 753 7. Acknowledgments 755 In addition to Eric Grey, and Erik Nordmark, we anticipate that many 756 people will contribute to this design, and invite you to join the 757 mailing list at http://www.postel.org/rbridge 759 8. References 761 8.1. Normative References 763 [1] IEEE 802.1d bridging standard, "IEEE 802.1d bridging standard". 765 8.2. Informative References 767 [2] Bryant, S., Perlman, R., Atlas, Alk, Fedyk, D., "TRILL using 768 Pseudo-Wire Emulation (PWE) Encapsulation", internet draft 769 draft-bryant-perlman-trill-pwe-encap-00. 771 [3] Narten, T., Nordmark, E. and W. Simpson, "Neighbor Discovery 772 for IP Version 6 (IPv6)", RFC 2461 (Standards Track), December 773 1998. 775 [4] Perlman, R., "RBridges: Transparent Routing", Proc. Infocom 776 2005, March 2004. 778 [5] Perlman, R., "Interconnection: Bridges, Routers, Switches, and 779 Internetworking Protocols", Addison Wesley Chapter 3, 1999. 781 Author's Addresses 783 Radia Perlman 784 Sun Microsystems 786 Email: Radia.Perlman@sun.com 788 Joe Touch 789 USC/ISI 790 4676 Admiralty Way 791 Marina del Rey, CA 90292-6695 792 U.S.A. 794 Phone: +1 (310) 448-9151 795 Email: touch@isi.edu 796 URL: http://www.isi.edu/touch 798 Intellectual Property Statement 800 The IETF takes no position regarding the validity or scope of any 801 Intellectual Property Rights or other rights that might be claimed to 802 pertain to the implementation or use of the technology described in 803 this document or the extent to which any license under such rights 804 might or might not be available; nor does it represent that it has 805 made any independent effort to identify any such rights. Information 806 on the procedures with respect to rights in RFC documents can be 807 found in BCP 78 and BCP 79. 809 Copies of IPR disclosures made to the IETF Secretariat and any 810 assurances of licenses to be made available, or the result of an 811 attempt made to obtain a general license or permission for the use of 812 such proprietary rights by implementers or users of this 813 specification can be obtained from the IETF on-line IPR repository at 814 http://www.ietf.org/ipr. 816 The IETF invites any interested party to bring to its attention any 817 copyrights, patents or patent applications, or other proprietary 818 rights that may cover technology that may be required to implement 819 this standard. Please address the information to the IETF at 820 ietf-ipr@ietf.org. 822 Disclaimer of Validity 824 This document and the information contained herein are provided on an 825 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 826 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 827 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 828 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 829 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 830 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 832 Copyright Statement 834 Copyright (C) The Internet Society (2006). 836 This document is subject to the rights, licenses and restrictions 837 contained in BCP 78, and except as set forth therein, the authors 838 retain all their rights. 840 Acknowledgment 842 Funding for the RFC Editor function is currently provided by the 843 Internet Society.