TRILL WG Guillermo Ibanez Internet Draft Alberto Garcia Expires: Dec 2006 Arturo Azcorra June 6, 2006 ABridges as RBridges: Transparent Routing with Simplified Multiple Spanning Trees. draft-gibanez-trill-abridge-01.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on Dec 16, 2006. Abstract RBridges are link layer devices that use routing protocols as a control plane but do not target to scale up to large campus networks. This document contains an alternative proposal to link-state RBridges, named ABridges. ABridges overcome RBridges L2 network size restrictions allowing applicability to very large Ethernet campus networks while maintaining zero configuration and high performance, by assuming a topological restriction that is automatically performed. The proposal includes a two-layered network architecture with two hierarchical independent spanning tree layers. Expected convergence is fast, probably below two seconds. G. ibanez Informational Expires May 2006 1 INTERNET DRAFT abridge June 6, 2006 ABridges use multiple simplified spanning trees rooted at core edge bridges to achieve results comparable to RBridges with lower computational complexity. Two implementation variants of simplified multiple spanning trees are proposed: The first one is a fundamental simplification of the standard Multiple Spanning Tree protocol and the second one (still in a very preliminary stage) consists of an N-multiple simultaneous execution of the Rapid Spanning Tree protocol at each RBridge. An optional mechanism of ARP/ABridge servers/registrars (with load splitting) is proposed to limit ARP traffic in large scale Ethernet networks and to enhance scalability and security. This mechanism can also be used for host-Designated RBridge resolution as an alternative to the interchange of Hosts Lists between RBridges. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [1]. Table of Contents 1.Introduction...............................................2 2.Terminology................................................4 3. Network Architecture......................................4 4. Protocols.................................................5 4.1 RSTP Protocol............................................5 4.2 MSTP Protocol............................................6 4.3 Core Layer: AMSTP Protocol ..............................6 4.4 AMSTP versus MSTP........................................8 4.5 Designated (and Root) ABridge............................9 4.6 Forwarding Scenario.....................................10 4.7 Learning End Node Location..............................12 4.8 Routing versus Learning Bridges Addresses...............12 4.9 Header on 802 Links.....................................12 4.10 Distributed ARP Query..................................13 4.11 ABridge Identities and Addresses.......................13 5. ARP/ABridge Server/Registrars............................12 6. Issues...................................................13 6.1 Per Ingress Spanning Tree...............................14 6.2 Symmetrical Path Problem................................14 6.3 Traffic Aggregation at Root bridge......................14 6.4 VLANs ..................................................14 6.5 Optimizing ARP/ND.......................................14 7. Security Considerations..................................15 8. IANA Considerations......................................15 9.NRSTP.....................................................15 10.Conclusions..............................................15 11.Acknowledgments..........................................15 12.References...............................................15 Author's Addresses..........................................16 Intellectual Property Statement.............................16 Disclaimer of Validity......................................17 Copyright Statement.........................................17 G.Ibanez Informational expires Dec 6, 2006 2 INTERNET DRAFT abridge June 6, 2006 1. Introduction Current IP-based campus networks use one prefix address per link to support routing. This implies administration and configuration of IP addresses. IP addresses are link-related, so the IP address of an end node varies when the point of attachment to the network changes. Bridges do not require this kind of configuration because they forward in the switched domain using flat layer 2 addresses. However standard bridge protocols do not scale, because the spanning tree protocol only enables some selected links to prevent loops, and network utilization is therefore low. Also the routes along the spanning tree are not pair- wise shortest paths, and temporary loops may produce packet proliferation across the entire switched domain. RBridges have been proposed as a hybrid of routers and bridges, showing the advantages of routers while preserving at the same time the zero configuration capability of bridges. However RBridges currently do not fulfill an important requirement such as scaling to large Ethernet campus networks. The importance of this requirement is growing with the increasing size of campus networks and the foreseeable increase in connected devices (displays, IP phones, cameras, 802.11 PDAs, sensors, etc). This lack of scalability derives from the use of flat MAC addresses to perform routing. Being non aggregatable, MAC addresses will produce long tables in RBridges when used in large campus networks. Another potential weakness of RBridges is that, while exhibiting unrestricted topological compatibility with standard bridges, RBridges depend on the bridged links to communicate among themselves and to perform the IS-IS Designated Router election. This dependency increases their complexity and makes the whole system vulnerable to inter RBridge communication problems. The overall convergence time is increased because the spanning tree convergence time adds up to the IS-IS DR election time. This draft proposes an architecture for Ethernet campus networks based on a new type of Ethernet hierarchical switches for campus cores. The architecture is oriented to provide high performance, minimal configuration, and scalability in very large Ethernet campus networks. The proposed network architecture consists of a high capacity core composed of an arbitrary mesh of switches named ABridges, and a number of access networks with standard bridges connected to the core. The document proposes an alternative implementation for RBridges [10] (Routing Bridges), identified as AMSTP Bridges (ABridges) that combine the advantages of bridges and routers. Like bridges and RBridges, ABridges require zero configuration and are transparent to IP nodes. ABridges also forward on pair-wise shortest paths like routers as RBridges do. G.Ibanez Informational expires Dec 6, 2006 3 INTERNET DRAFT abridge June 6, 2006 We propose to use multiple L2 spanning trees between ABridges to forward via shortest paths in the core of the campus network. The AMSTP protocol is a simplification of the standard MSTP protocol, oriented to zero configuration. The core edge bridges provide backbone connectivity to lower layer (Access Layer) networks. The active topology of the Access Layer networks consists of standard spanning trees of switches (RSTP/STP). Each Edge Switch acts as the root bridge of two independent spanning trees: the spanning tree of its lower layer Access network, and one spanning tree instance of the core network. The architecture provides shortest paths in most traffic situations for client-server traffic (for servers located in a server farm) and adapts well to traffic aggregation. Additional mechanisms can be designed to achieve high network availability. Due to the access port mode, ABridges are compatible with current bridges as well as current IPv4 and IPv6 routers and end nodes. They are as transparent to current IP routers as bridges and RBridges are. Like routers, they terminate a bridged spanning tree. Packets in the Core of ABridges must be encapsulated such that: - Forwarding is performed in the Core across per egress bridge tree instances, while maintaining the original L2 header so that end destination bridges can learn about the location of the source by learning the source address from packets. - ABridges can learn the location of end nodes. They can learn the location and layer 2 addresses of attached nodes from the source address of data packets, as bridges and RBridges. However, very large campus networks with tens of thousands of nodes may require more scalable and safer solutions for locating end nodes. For this case, the use of ARP/Abriges Server/Registrars is proposed. Support of VLANs traditionally requires configuration of the bridges to know which ports and links belong to which VLANs. In order to achieve true zero configuration, we recommend that bridges do not separate per VLAN traffic in the campus core, and do not use a separate spanning tree for each broadcast domain. In a campus without VLANs, this means a single spanning tree would be used for delivery of packets with unknown or layer 2 group address layer 2 destinations. ABridges can suppress the broadcast/multicast for Neighbour discovery by using ARP servers/registrars or, similarly to RBridges, by conventional proxy ARP (IPv4) or proxy ND (IPv6). ABridges are fully compatible with current IPv4 and IPv6 routers and end nodes. They are as invisible to current IP G.Ibanez Informational expires Dec 6, 2006 4 INTERNET DRAFT abridge June 6, 2006 routers as bridges are, and they participate in two bridged hierarchically linked but separated spanning trees. 2. Terminology AMSTP: Alternative Multiple Spanning Tree Protocol ABridge: An RBridge implemented as an AMSTP Bridge Access network: subnetwork of standard bridges connected to an ABridge. ARP/ABridge Server/Registrar: Server that provides ARP resolution and the ID of a destination hosts Designated ABridge (see DR). Campus network: set of network elements (standard bridges and ABridges) connected to one or more routers. Core: set of ABridges directly interconnected through point to point links. Core port: The port of an ABridge connected to another ABridge through a point to point link. Access port: The port of an ABridge connected to a link that has active standard bridges connected. It executes the standard spanning tree protocol and provides connection to the Access Network. DR: Designated RBridge. In the context of an ABridge, it means the Designated ABridge that coincides with the STP/RSTP Root bridge of the Access network. MSTP: Multiple Spanning Tree Algorithm and Protocol. NRSTP: Variant implementation of AMSTP through execution of N independent RSTP instances. RBridge: Routing Bridge as defined by Radia Perlman and TRILL WGs proposal. RSTP: Rapid Spanning Tree Algorithm and Protocol 3. Network Architecture Campus network designs are currently based on a layered architecture (core, distribution and access layers) to obtain network scalability and predictability. Segmentation of networks is obtained using routers or devices called multilayer switches that segment the network in IP segments or subnets. A similar approach is proposed here, but with the network segmentation performed at layer 2 instead of layer 3. The G.Ibanez Informational expires Dec 6, 2006 5 INTERNET DRAFT abridge June 6, 2006 proposed network architecture is shown in Figure 1. It uses a two-layer hierarchical L2 network to achieve scalability to large scale Ethernet networks. The upper layer acts as a Core-Distribution Layer (Core) and the lower Layer acts as an Access Layer. The core layer uses the AMSTP protocol for interconnection between core ABridges while the Access Layer uses the standard spanning tree protocol (RSTP or STP) to connect hosts of the access network with other hosts via their root bridge at the core (ABridge). The ABridges constitute the core network and are interconnected by dedicated links. The point to point link requirement derives from the need for fast convergence of standard layer 2 spanning tree algorithms, but it is also required for high performance and enhanced security (802.1X). Thus, point to point links are becoming a requirement for Ethernet networks, at least at the core and distribution layers. Other bridges connect to ABridges without requiring a point to point connection, and form the Access Layer. The Access Layer is segmented in multiple access networks. Each Access network is formed by devices connected to a core ABridge; it may have arbitrary topologies but the active topology will use the standard spanning tree as the basic forwarding mechanism. More sophisticated protocols are possible for better infrastructure usage inside each Access network, but they are out of the scope of this proposal. --------- | network| / --------- / A -----A / \ / \ Core layer / \ / \ A------A-----A / \ \ -------- / \ \----------- |network| \ | network | Access Layer -------- \ ----------- \ --------- | network | ---------- A: ABridge Figure 1. Campus network topology ABridges must auto-configure ports to participate in the Core or in the Access network. The port reconfiguring mechanism is as follows: a port that is not connected using a point to G.Ibanez Informational expires Dec 6, 2006 6 INTERNET DRAFT abridge June 6, 2006 point link to another ABridge configures itself as an access port (an ingress and egress point for traffic to/from the core). Ports directly connected to another ABridge act as core ports. The auto-configuration of ports works as follows: each port detects, through the STP BPDU type (STP, RSTP or AMSTP) received on their link upon initialization, whether the device connected to the link is a standard bridge or an ABridge. If the BPDUs received are standard 802.1D BPDUs, the link will be assigned to the Access Network and the port will be automatically configured to access port mode. Any standard bridge connected to the ABridge is thus automatically excluded from the core function. Figure 1 shows an example of the proposed network topology. A core of ABridges constitutes the campus backbone and interconnects different area networks formed by standard 802.1D bridges. 4. Protocols In this section the proposed protocols are described. The Alternative Multiple Spanning Tree Protocol [7] is an evolution of the standard MSTP [6] and RSTP [3] protocols. In the following paragraphs RSTP and MSTP protocols are first summarily introduced to provide the required context to describe the AMSTP protocol. Differences between AMSTP and MSTP are summarized after a description of AMSTP. 4.1 RSTP Protocol A standard protocol for bridges is the Rapid Spanning Tree Protocol, included in IEEE 802.1D[5]. It provides much faster convergence than the previous standard protocol STP [4]. To achieve convergence in (typically) fractions of one second, RSTP substitutes the timer based mechanism that STP uses, to ensure that the algorithm has converged with a locally controlled proposal-agreement mechanism between adjacent switches to transition the port states to forwarding in a controlled way. This mechanism requires point to point links to operate without loops. Other mechanisms are also used to ensure rapid convergence. 4.2 MSTP protocol The Multiple Spanning Tree Protocol (IEEE 802.1Q) is based on RSTP (IEEE 802.1D) and creates different tree instances that are used by sets of VLANs according to the configuration of the bridge. MSTP implements a set of multiple and independent spanning tree instances (MSTI) in a network region. Each region is interconnected via a common spanning tree (CST) to other MST regions. Inside a region, several VLANs can be mapped to a single tree instance. Multiple tree instances at each region make it possible to improve the usage of the links. At each region, there is a tree instance (IST), identified with the number 0, that acts as the basic spanning tree. The CIST or total spanning tree is comprised of the CST G.Ibanez Informational expires Dec 6, 2006 7 INTERNET DRAFT abridge June 6, 2006 that connects all the regions, and the IST that provides connectivity inside each region. It allows separated management of the regions, appearing to the outside as a unique and separate "superbridge", i.e. the whole region connects to the CST via one Regional Root Bridge port and a number of designated ports like a single bridge. Therefore, no change in internal topology inside is influenced by outside tree topology changes. MSTP allows more efficient network infrastructure usage by assigning different spanning trees to different sets of VLANs. But MSTP is complex to configure. Tree instances must be planned and VLANs must be mapped to those tree instances. The configuration table must be checked to be exactly the same for all bridges of the same region. Serious malfunction occurs if VLAN mapping discrepancies between bridges in the same region exist. 4.3 Core layer: AMSTP Protocol In the architecture proposed, the AMSTP Protocol works as a Core Layer protocol providing shortest path interconnection between Access Networks and providing network segmentation to prevent the extension of failures to the whole switched domain. The AMSTP Protocol has been proposed previously [AMSTP] for metropolitan Ethernet backbones but it can be extended for campus networks as well, with some modifications. AMSTP is a simplified multiple spanning tree protocol that uses one tree instance rooted at each edge bridge in the core to forward frames. A complete multi-tree is the set of all tree instances, one rooted at every edge bridge that interconnects all bridges in the backbone. Only the ABridge ports connected to other ABridges participate in the multiple spanning tree protocol. The rest of the ports participate in the standard spanning tree protocols such as RSTP or STP (IEEE 802.1D). To describe the AMSTP protocol, we consider its two main functionalities: building and maintaining the spanning trees (control plane), and processing and forwarding frames in the bridges (user plane). 4.3.1 Building the Trees The process of tree building consists of two parts: building the basic (standard) RSTP tree and building the rest of the instances, called Alternate Multiple Spanning Tree Instances (AMSTI), till one tree instance per bridge is built as shown in figure 2. The process of building the main tree is the same as in RSTP. Every bridge emits autonomously Bridge Protocol Data Units (BPDU) every Hello Time (configurable from milliseconds) to neighbouring bridges. First the Bridge having the lowest Bridge ID (best configured priority plus lower MAC address appended) is elected as Root Bridge of the main spanning G.Ibanez Informational expires Dec 6, 2006 8 INTERNET DRAFT abridge June 6, 2006 tree. Every bridge receiving BPDU from this Bridge will adopt it as Root and propagate it in the BPDUs emitted. These BPDUs contain the minimum path cost from the emitting bridge to the elected Root Bridge. Every Bridge attaches to the spanning tree by selecting the port that is receiving the "best" BPDU as the root port. The best BPDU is the one that announces minimum path cost to root bridge. Each bridge builds its own BPDU with the result of received BPDUs from other bridges, selecting "superior" BPDUs according to the standard STP criteria (lower Bridge ID, lower path cost, lower port priority, lower port ID) and transmits them via the main tree for the continuous maintenance of the optimum main spanning tree. A -----A / \ / \ / \ / \ A------A-----A A ----A A A A A R-----A A---R / \ / \ \ / \ \ / \ / \ / \ \ / \ \ / \ R-----A----A A----R----A A---A----R A A A A---A A Fig.2. A five node network and its five self-rooted AMSTP Spanning Tree Instances (R: root bridge). The process of building all the other tree instances, one per tree, takes place as follows: Each Core Bridge appends to the main tree BPDU the information of all AMSTI tree instances which the bridges participates in. The information appended per tree instance is called the AM-Record and contains similar information for BPDU tree instance building. The key difference with other spanning tree protocols is that there is no bridge election. In AMSTP the ABridge claims itself as Tree Root Bridge of its own instance and accepts equally every other ABridge as the Root of its own instance. The bridge is accepted as the root by other bridges without negotiation (except when a malfunction is detected). This self rooted tree instance is identified by the bridge ID of the edge ABridge (root). The rest of the process is analogous to the building of the MSTI tree instances used by MSTP inside an MST region [4]: the tree is built by selecting tree paths at every bridge according to the same minimum path cost criteria as MSTP, using port priority and port ID for tie breaking. A flag octet, identical to the one for building the basic tree instance, is used by the bridges to communicate and negotiate transitions of port states and roles per tree instance. 4.3.2 Frame processing in Core Switches G.Ibanez Informational expires Dec 6, 2006 9 INTERNET DRAFT abridge June 6, 2006 When processing a frame, a Core Switch (ABridge) may act as an ingress, transit or egress ABridge. As ingress ABridge, the switch encapsulates the frame with an additional Layer 2 header containing its MAC as source address, and as destination the MAC address of the egress ABridge. The ingress ABridge forwards the encapsulated frame through the branch belonging to the spanning tree instance rooted at the egress ABridge. This path is a pair-wise shortest path because the tree is built by minimizing path cost from each root to the rest of the nodes. Traffic forwarding in the core depends on the traffic type: broadcast, multicast and traffic to unknown destinations is forwarded via the tree instance rooted at the ingress ABridge. Unicast traffic (to a known ABridge) is forwarded through the tree instance of the egress ABridge. Forwarding takes place by sending the frame through the bridge root port. Broadcast and multicast traffic are forwarded via the tree instance rooted at the ingress ABridge. ABridges may learn from the received frames both the MAC addresses of other ABridges and the MACs of the connected end nodes by the inspection of the inner and outer Ethernet MAC addresses of the encapsulated frames. This learning process is called double MAC learning and is applicable only in networks with a moderate number of end nodes, like a backbone with routers connected to it [7]. The MAC learning process is based on frames broadcasted over the switched network. These broadcasts are commonly ARP packets issued by end nodes for layer 2 destination address resolution. In this process the bridges learn the originating MAC at receiving ports and the hosts add the IP-MAC pair to their ARP table. In networks with a high number of end nodes, processing a high number of ARP requests by every endnode may result in significant load for endnodes. A different mechanism is needed to prevent ARP packets from broadcasting/multicasting in large Ethernet campus networks. The ports of switches that are not connected to AMSTP capable Core Switches do not run AMSTP, so they are kept out of the core forwarding mechanism. For Core Switches running AMSTP to interoperate with legacy switches running STP or RSTP, a mechanism is needed, like the standard port migration protocol used by MSTP, RSTP and STP. Basically the mechanism is that if a port of an MSTP switch receives BPDUs of protocol version 0 (STP protocol) it will emit STP BPDUs only. Recovery is not automatic; the port will not emit MSTP BPDUs until a configuration command restarts the protocol migration process, forcing renegotiation between neighbouring switches. 4.3.3 AMSTP BPDU layout AMSTP BPDUs have a structure that resembles MSTP BPDUs [4] since both are comprised essentially of a basic BPDU and G.Ibanez Informational expires Dec 6, 2006 10 INTERNET DRAFT abridge June 6, 2006 several AM-Records appended. The AMSTP BPDU structure is shown in figure 3. The basic BPDU is used for basic tree (0) negotiation between switches. Each of the appended AM-Records is used to negotiate a specific tree instance (AMSTI). As in the MSTP case the BPDUs carrying the rapid spanning tree information distributed via instance 0 also carry the information of all the spanning tree instances appended to the RSTP BPDU as AM records. This reduces broadcasting and simplifies BPDU processing at the switches. -------------------------- ! Basic RSTP BPDU ! ! Tree instance 0 ! -------------------------- ------------------------- ! [AMSTP header] ! /! AMSTI flags ! ! ! / ------------------------- --------------------------/ ! Root bridge ID (edge)! ! Tree Instance 1 ! ------------------------- ! Root 1 ! ! Root path cost ! ! ! ------------------------- -------------------------- ! Dest. Port Address ! ! Tree Instance 2 !\ ! of Root bridge ! ! Root 2 !| ------------------------- ! ! \ ! Port priority ! -------------------------- | ------------------------- ........... \! Remaining hops ! -------------------------- ------------------------- ! Tree Instance 1 ! ! Root N ! -------------------------- Fig. 3. AMSTP BPDU layout Every AM-record includes an octet flag identical to the one described for the RSTP tree. These flags are used to negotiate all transitions of each tree instance between connected ports of neighbouring switches. Minimum configuration is an important requirement for Core Switches. While multiple spanning tree algorithms enable much better usage of the existing infrastructure, they are usually complex to configure because a way to assign frames to tree instances is needed. In the case of MSTP, this means that the mapping of VLANs to tree instances (MSTIs) has to be configured manually at each bridge, resulting in a complex and error-prone process. AMSTP uses Self rooted Spanning Tree instances instead of VLAN mapped trees and all tree instances are automatically created, so no tree configuration is needed. The parameters to configure are those common to RSTP, such as selection of the Root Bridge and configuration of the Backup Bridges for the region and their priorities. Multicast (L2 addresses) traffic. Multicast traffic in the campus core is forwarded via same tree instances as unicast traffic, via pair-wise shortest paths to destination ABridges. G.Ibanez Informational expires Dec 6, 2006 11 INTERNET DRAFT abridge June 6, 2006 The difference with unicast traffic is that the spanning tree used is rooted at the ingress ABridge, instead of the tree rooted at the destination ABridge. The multicast trees are therefore always optimized for minimum hops without the construction of additional tree instances. As for RBridges, ABridges may treat multicast traffic as broadcast or may use current techniques like IGMP snooping to limit broadcast. 4.4 AMSTP versus MSTP Table I below shows a comparison of the main protocol differences between MSTP and AMSTP. The first difference is the criteria used for assignment of frames to a tree instance for processing, in other words, how the bridge knows which spanning tree instance to use to forward the frame. The second one is the criteria used to create a tree instance. TABLE I MSTP VS AMSTP - MAIN PROTOCOL DIFFERENCES -------------------------------------------------------------- Protocol feature MSTP AMSTP -------------------------------------------------------------- Criteria for frame assignment Destination MAC of frame(root) to a tree instance VLAN tag on frame (802.1Q) -------------------------------------------------------------- Tree instance Configured : Automatic: One formation Sets of VLANs are per core bridge criteria mapped to every tree instance -------------------------------------------------------------- Number of tree instances Configured :1 to 64 One per core bridge (*) -------------------------------------------------------------- Root bridge As RSTP (lower bridge No election. election. ID including bridge priority) Every bridge is the root of its tree instance -------------------------------------------------------------- Bridge ID 4 MSB byte priority, 12 bit VLAN ID 6 byte MAC -------------------------------------------------------------- Single or Multiple Single Multiple MST regions -------------------------------------------------------------- Main application Environment Interconnected VLAN based regions Cores, backbones -------------------------------------------------------------- (*) An ABridge with no access ports (transit ABridge instead of edge ABridge) does not create a self rooted instance. 4.5 Designated (and Root) ABridge G.Ibanez Informational expires Dec 6, 2006 12 INTERNET DRAFT abridge June 6, 2006 Similarly to RBridges, an ABridge of each link has special duties. This ABridge acts as the Designated RBridge of that link. The DR function combines very well with being the root bridge of the spanning tree of that link. To achieve automatic election of ABridges as roots of the respective access networks of the campus it would suffice that the default bridge ID of ABridges have a lower value than that of standard bridges (midrange). An ABridge may in this way become the root bridge of any link. DR election and root bridge election are one and the same operation, performed according to the standard procedure [5]. In this way DR election does not depend on any external mechanism and convergence time at links does not add up to the convergence time of DR election at IS- IS as in the RBridge case. The complete DR election process is avoided. 4.6 Forwarding scenario Now the basic forwarding scenario is described. Figure 4 shows two hosts H1 and H2 connected at different access networks. First the ARP and destination ABridge resolution are described, and then the forwarding process. 4.6.1 ARP and ABridge Resolution Using ARP servers is the optional mechanism proposed to limit broadcast/multicast traffic. However, the standard ARP mechanism must be kept to ensure that hosts that silently move from one part of the campus to another can be located. Besides ARP for host resolution, the servers may also be used for resolution of the destination ABridge. Each server stores a table with tuples containing the IP, L2 address of the end node and L2 address of the Designated Bridge (Root ABridge). The set of stored tuples corresponds to IP addresses that produce identical (few bits) hash results of IP destination end node. The sequence for communication between H1 and H2 at figure 4 is as follows: Host H1 first sends a broadcast ARP packet to get the resolution of host H2s L2 address. The packet is distributed through the spanning tree of the access network and arrives at the root ABridge. The root ABridge detects the ARP, calculates hash(IP destination address) and with the result obtains the server responsible for that IP address. The server performs a look up using H2s IP destination address and obtains the H2 L2 address and the (egress) ABridge ID of that access network, then sends the reply in a packet to the ingress ABridge. The ABridge extracts the information and forwards a standard ARP response packet to host H1. Host H1 can then proceed to send packets with the L2 address of host H2. The ingress ABridge also registers the originating host by sending a registration packet containing the ARP packet to the corresponding ARP/ABridge server, obtained by computing hash(IP origin). G.Ibanez Informational expires Dec 6, 2006 13 INTERNET DRAFT abridge June 6, 2006 b---b b: standard bridge / Access Layer A: ABridge b---b Path: H1-b-b-b-A-A-A-b-b-b-H2 / ............. A A \\ \\ Core layer \\ \\ A======A=====A / \ \ .......... / \ \ Access Layer H1---- b---b--b b b---b---b----H2 / / / \ \ b-/ b---b b- b b-b b---b---b Figure 4. End to End forwarding scenario Note: If the destination host is connected to the same access network, the host will reply directly by emitting an ARP response packet. Note: The ABridge registers a host at the corresponding ARP Server/Registrar whenever it detects a frame from an unknown host connected at its access network. 4.6.2 Forwarding The frame forwarding process is as follows: the standard frame sent by host (IP(H2), L2(H2)) arrives to the Access network root bridge (ABridge). Its DA Ethernet Address contains the end node destination address. The root ABridge (Designated) looks at its cache for the ID of the destination end nodes designated ABridge (that was filled just before with the ARP/ABridge server response). The ABridge still has in its cache the pair (L2 address, L2 egress ABridge) obtained before and encapsulates the frame with a header like this: (DA egress ABridge, SA ingress ABridge, Ethertype: AMSTP). It then determines the applicable tree instance by looking at the destination ABridge and forwards it through the port that was elected root for the ABridge destination instance. The packet arrives at the Designated Port of the next ABridge, which then inspects it and forwards it to the outer destination MAC address using the corresponding tree instance to obtain the root port of that instance. The packet is forwarded again via the root port till the egress ABridge is reached. The egress ABridge detects that it is the destination of the frame, removes the encapsulation header of the frame and forwards the original frame via the access port where the L2 host has been learnt or via all access ports if H2 is unknown. The packet goes from the egress bridge (root) to H2 following a branch of the tree rooted at the egress bridge. Frame forwarding in the access networks is performed in the standard way with the spanning tree set up by STP or RSTP. A packet exiting the ABridge by an access port must look to ordinary bridges like an ordinary layer 2 packet and must not be encapsulated. G.Ibanez Informational expires Dec 6, 2006 14 INTERNET DRAFT abridge June 6, 2006 The ABridge may learn the destination ABridge by host list interchange. The forwarding behaviour of RBridges is as follows: "When a DR R1 receives a native packet with layer 2 address S and layer 2 destination address D, R1 looks up the location of D. If D is claimed by egress RBridge R2, then R1 encapsulates the packet, directing it towards R2". ABridges may use the same behaviour, but in this case network size might not scale to one hundred thousand end nodes--the Campus Transit Tables (CTT) would be too big. In contrast to an RBridge, when an ABridge receives an encapsulated packet, it forwards it based on the DA ABridge and does not change the DA for the "next-hop" address. The next hop is selected by forwarding the frame via the root port of tree instance rooted at the destination ABridge. A packet in the core must look like an Ethernet frame, but must be differentiable from a native layer 2 packet by ABridges. To accomplish this, a new layer 2 protocol type ("Ethertype") is used. 4.7 Learning End node Location ABridges learn end node location in access ports as standard bridges do. ABridges learn root bridge IDs of the multiple instances of core from AMSTP BPDUs received. Similarly to RBridges, the Core (Edge) ABridge, acting as root and Designated RBridge, might work in two modes: - As a standard Designated RBridge, that learns the L2 addresses of attached end nodes, initiates a distributed ARP when an ARP query is received for an unknown destination, and answers ARP queries when the target node is known. This mechanism is an alternative to the use of ARP Servers/Registrars - From data packets. They learn (layer 3, layer 2) pairs (for the purpose of supporting proxy ARP/ND) from listening to ARP or ND replies. 4.8 Routing in ABridges vs Learning Bridges Addresses Some recent proposals like Shortest Path Bridging (SPB), as proposed at the IEEE [12][13], use also multiple tree instances rooted at edge bridges. However it presents the problem of asymmetrical spanning trees. This happens when the tree rooted at bridge A differs in chosen path A-B from the path chosen by the tree rooted at B to A. The problem occurs when there are ties in the path costs of tree instances. In the instance with node A as root the tie may be solved by choosing one path. In the instance with node B as root the tie may be solved choosing a different path. But the spanning trees must be symmetrical for the address learning to work correctly: the address learnt at one port of B sent by A (via spanning tree A to B), if forwarded via same port through the opposite direction spanning tree (B to A) might find the path blocked due to a different root port election at A for the tree instance rooted at B. G.Ibanez Informational expires Dec 6, 2006 15 INTERNET DRAFT abridge June 6, 2006 ABridges work differently because they do not learn addresses. ABridges only build spanning trees and assign traffic to them according to the destination ABridge. AMSTP uses always the root port to send frames to the destination bridge (instance rooted at destination), so the routing function for ABridge is as follows: - The bridge ID of the destination corresponding to the destination end node is obtained from the ABridge Server. - The bridge ID of the destination is translated to the port MAC destination address of the destination ABridge at the internal ABridge table. - The frame is encapsulated with an external L2 header with Destination Bridge ID. - ABridges only forward a frame received at a designated port, upstream, via the root port. The L2 external destination address can be the Destination Bridge ID itself. When the encapsulated frame arrives at the destination bridge, it must identify its Bridge ID in the DA and remove the L2 encapsulation of the frame and forward it downstream to the access network via access port(s). 4.9 Header on 802 Links ABridges, as RBridges, must coexist with ordinary bridges. The encapsulated L2 format must be compatible with the Ethernet format. No additional fields like TTL are required if the fast convergence mechanism procedure of RSTP is used. An encapsulated packet would look as follows: +--------------+----------------+---- | outer header |original packet |CRC| +--------------+----------------+---- Figure 5 Encapsulated packet The outer header contains: o L2 destination = destination (egress) ABridge o L2 source = origin (ingress) ABridge o protocol type = "to be assigned...ABridge encapsulated packet" (AMSTP) 4.10 Distributed ARP Query ABridges may perform distributed ARP Query as RBridges do, but for large campus networks, it is recommended the use of ARP/ABridge servers/ registrars to reduce multicast traffic and processing load at end nodes. 4.11 ABridge identities and addresses. G.Ibanez Informational expires Dec 6, 2006 16 INTERNET DRAFT abridge June 6, 2006 Each ABridge needs a unique ID within the campus. The simplest such address is a unique 6-byte ID, since such an ID is easily obtainable as any of the EUI-48's owned by that ABridge. A new Ethertype must be assigned to indicate an ABridge- encapsulated packet. A layer 2 multicast address is used as the "all ABridges" destination address in distributed ARP queries and any other intercommunication message. An optional layer 2 multicast address is needed to address to "all ARB/ABridge" servers" (if used), to communicate among them the available servers and the hash value(s) supported. The AMSTP protocol distributes BPDUs addressed to the local multicast protocol addresses used by the spanning tree protocol (Bridge Group Address 01-80-C2-00-00-00). These addresses are neither forwarded by bridges nor by RBridges or ABridges. 5. ARP/ABridge Servers/Registrars ABridges, as RBridges, may suppress the broadcast/multicast for neighbour discovery by doing proxy ARP (IPv4) or proxy ND (IPv6). However the mechanism proposed for large campus networks to suppress broadcast/multicast for neighbour discovery consists of ARP servers/registrars, where end nodes are registered upon frame detection by the Designated ABridge. Although all ARP/ABridge servers might work in parallel, it seems more efficient to perform statistical uniform load distribution between servers, distributing the IP addresses to resolve among the available servers by a hashing based mechanism. The process is as follows: When a host issues an ARP packet, the packet is forwarded up across the spanning tree of the access network up to the root bridge (ABridge). The ABridge, acting as Designated ABridge, performs hashing of the destination IP. With this hash result the ABridge obtains the ARP/ABridge server ID in charge of that IP address. This server ID was previously obtained from announcement packets from ARP servers containing its IP address, L2 address, server ID and hash values that it serves. The ABridge encapsulates the ARP packet originated by endnode with an additional L2 header with the destination address of the corresponding server for ARP resolution. The ABridge also prepares a registering packet with the IP origin in order to register (or refresh) the host originating the ARP into the corresponding ARP/ABridge server. To avoid redundant load on ARP/ABridge servers, they must share the load by assigning server IDs according to the result of hash (IP destination). The total number of servers G.Ibanez Informational expires Dec 6, 2006 17 INTERNET DRAFT abridge June 6, 2006 may be dimensioned according to the length of the hash results used or by additional grouping. An additional protocol between ARP/ABridge servers can be designed to handle dynamic load splitting among the available servers/registrars as they come into and out of service. A server coming into service takes charge of a hash value handed out by a running server. The new server performs the new registrations, and forwards unsolved requests to the previous server. After the expiration time of the first registration performed at new server is reached, the handover process is complete as no valid registries remain in previous server. 6. Issues In this section the identified issues, either for RBridges, ABridges or both, are described or commented. 6.1 Per Ingress Spanning Tree. Per Ingress multicast spanning Tree is implemented by default with ABridges. Multicast paths always traverse minimum hops. There is no issue here. 6.2 Symmetrical Paths Problem. Shortest Path Bridging [SPB], the current proposal at IEEE for pair-wise shortest path, depends on symmetrical tree instances between bridges pairs for the L2 addresses learning to work properly. In case of a path cost tie during tree instances calculation, different paths might be elected in opposite directions. The proposal at [13] describes a change in MSTP Protocol to prevent this, but convergence times increase. ABridges are not subject to this problem because they forward unicast traffic through one branch of the destination ABridge tree instance. Packets are forwarded in ABridges via its port elected as the root of the destination ABridge tree instance. Unicast forwarding in the core campus always follows the path from Designated Port to root port at each ABridge traversed till reaching the destination. No address learning is used for filtering as the packet is always forwarded via one port (root port of ABridge). 6.3 Traffic Aggregation at Root. A usual argument against spanning trees is that the traffic accumulates near the root bridge, provoking congestion. The real situation in campus networks is that traffic, predominantly client-server, distributes in a tree form. However, bridge design and Ethernet technologies with their various speeds (100 Mbps, 1 Gbps, 10 Gbps) currently make efficient switch designs possible (like N*100 Mbps with two 1 Gbps uplinks) that aggregate traffic efficiently. G.Ibanez Informational expires Dec 6, 2006 18 INTERNET DRAFT abridge June 6, 2006 6.4 VLANs VLAN usage in campus core requires detailed configuration of which ABridge port belongs to which VLAN. ABridges may learn, as VLAN aware bridges, which port belongs to which VLAN by inspecting the incoming VLAN tagged frames. This may help simplify VLAN configuration in ABridges but does not eliminate the need to configure VLANs in campus networks: Tagged VLAN frames must be generated either by manually configured bridges or by hosts originating the frames. In the hosts case, a system to assign a VLAN to each host must be set up via a dynamic VLAN server that requires configuration. VLANs are used to separate broadcast domains. Frames are broadcast in ABridges when the destination is unknown. The tree instance used by the ingress ABridge to broadcast is its own tree instance rooted at that ABridge. To limit broadcast to the ports belonging to the VLAN, it is necessary to filter by VLAN, which means that separate tree instances must be built for VLAN forwarding, increasing the complexity or at least requiring additional filtering on the tree instance used for broadcast, performed using the VLAN tag inside the encapsulated frame. The recommendation, as default behaviour, is that VLAN tagged frames are encapsulated in the same way as non VLAN tagged frames and no VLAN specific forwarding is performed in the ABridges. 6.5 Optimizing ARP/ND Mechanisms proposed for RBridges for ARP/ND optimization [10] are feasible in ABridges as well. However, if proposed ARP/ABridge servers are used for ARP and destination ABridge resolution they become redundant. 7. Security Considerations [To be added] As for RBridges, the objective of ABridges is to keep at least the same security level of bridged networks, not introducing additional risks. However the position of ABridges and their role as Root Bridges combined with the use of ARP Servers/Registrars allow efficient means to enhance the network security due to easier localization of attackers, fast detection of spoofed MACs by successive and duplicated, inconsistent registries, etc. If IEEE 802.1X is used in link ports connecting ABridges, security is greatly enhanced in the network core, although it can not prevent malicious behaviour of trusted authenticated ABridges. G.Ibanez Informational expires Dec 6, 2006 19 INTERNET DRAFT abridge June 6, 2006 However, authentication requires some additional configuration, which contradicts in part the zero configuration objective of RBridges and ABridges. 8. IANA Considerations. A new Ethertype must be assigned to indicate an ABridge- encapsulated packet. A layer 2 multicast address is used as the "all ABridges" destination address in distributed ARP queries and any other intercommunication message. An optional layer 2 multicast address is needed to address to "all ARB/ABridge servers" (if used), to communicate among them the available servers and the hash value(s) supported. A new Ethertype is required for AMSTP protocol. If ARP/ABridge servers-registrars are used, a L2 group multicast address is required. 9. NRSTP Protocol. This concept is in its early stages, and requires detailed analysis and is described summarily here due to its simplicity. An alternative to implementing multiple simplified spanning trees like AMSTP might consist of a simultaneous and independent construction of N spanning trees (one per ABridge) by full independent execution of N RSTP protocols (single code, multiple data) at each ABridge. Each ABridge executes RSTP protocol N times simultaneously to participate in N tree instances. In one of the N protocol executions, the ABridge claims itself as the nonnegotiable root bridge. At the same time, with the other N-1 RSTP protocol executions, the ABridge joins the N-1 RST tree instances proposed by the other N-1 ABridges of the core. As for AMSTP, the destination ABridge tree instance is used to forward unicast frames, while for broadcast and multicast, the originating ABridge tree instance is used. The number of BPDUs is multiplied, but processing and implementation may be simplified. 10. Conclusions An alternative implementation for RBridges has been described. It provides pair-wise shortest paths using multiple L2 spanning trees across ABridges instead of link state L2 routing. The proposal has lower computational complexity than RBridges and is scalable to large scale Ethernet campus networks. A topological restriction, automatically controlled, is introduced: core forwarding only operates on dedicated links that interconnect ABridges. Obtainable convergence is likely similar to that obtained by the standard IEEE Rapid Spanning Tree protocol, less than 2 seconds, typically in the hundreds of milliseconds range. The design is compatible with current IP nodes and routers and with standard bridges, but any connected standard bridge connected to an ABridge always works outside the network core, in the access layer. 11. Acknowledgments This draft used the current RBridges draft as a basis for the structure, and for some of the text, to aid comprehension and to aid comparison between the two. Thanks to Matt Hutton who performed the English language review. For feedback and contributions, join the RBridge mailing list at http://www.postel.org/rbridge G.Ibanez Informational expires Dec 6, 2006 20 INTERNET DRAFT abridge June 6, 2006 12. References [1] Bradner, S."Key words for use in RFCs to Indicate Requirement Levels" BCP 14, RFC 2119, March 1997. [2] The RBridge archives. http://www.postel.org/pipermail/ rbridge/ [3] Rapid Reconfiguration of Spanning Tree. http://www. ieee802.org/1/pages/802.1w.html [4] IEEE 802.1D.IEEE-1998 IEEE standard for local and metropolitan area networks--Common specifications--Media access control (MAC) Bridges. [5] IEEE 802.1D-2004 IEEE standard for local and metropolitan area Networks-- Common specifications--Media access control (MAC) Bridges. [6] IEEE 802.1Q-2003 IEEE standard for Local and Metropolitan Area Networks- Virtual Bridged Local Area Networks. [7] G. Ibanez, A. Garcia, A. Azcorra. Alternative Multiple Spanning Tree Protocol (AMSTP) for Optical Ethernet Backbones. IEEE HSLN (LCN 2004). Tampa, Nov. 2004 [8] Plummer, D., "Ethernet Address Resolution Protocol: Or converting network protocol addresses to 48.bit Ethernet address for transmission on Ethernet hardware", STD 37, RFC 826, November 1982. [9] Narten, T., Nordmark, E. and W. Simpson, "Neighbour Discovery for IP Version 6 (IPv6)", RFC 2461 (Standards Track), December 1998. [10] Perlman, R., "RBridges: Transparent Routing", Proc. Infocom 2004. [11] R. Perlman, J. Touch, A. Yegin. RBridges: Transparent Routing draft-perlman-rbridge-03.txt May 2005. http://www.ietf.org/internet-drafts/draft-perlman-rbridge- 03.txt [12] M. Seaman. Shortest Path Bridging. http://www.ieee802. org/1/files/public/docs2005/ new-seaman-shortest-path-par- 0405-02.htm. [13] N. Finn. "An Update on Networking Technologies". http://www.ieee802.org/802_tutorials/july05/nfinn-shortest path-bridging.pdf [14] A. Iwata, et al., "Global Open Ethernet Architecture for a Cost-Effective Scalable VPN Solution,"IEICE Trans. On Communications, E87-B, 1, pp.142-151, Jan. 2004. G.Ibanez Informational expires Dec 6, 2006 21 INTERNET DRAFT abridge June 6, 2006 Author's Addresses Guillermo Ibanez Universidad Carlos III Madrid Email: gibanez@it.uc3m.es Alberto Garcia Universidad Carlos III Madrid Email: alberto@it.uc3m.es Arturo Azcorra Universidad Carlos III Madrid Email: azcorra@it.uc3m.es Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on- line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). G.Ibanez Informational expires Dec 6, 2006 22 INTERNET DR abridge June 6, 2006 This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. G.Ibanez Informational expires Dec 6, 2006 23