Network working group X. Xu Internet Draft Huawei Category: Standard Track H. Shah Ciena Corp L. Yong Huawei Y. Fan China Telecom Expires: January 2013 July 13, 2012 Virtual Private LAN Service (VPLS) Using IS-IS draft-xu-l2vpn-vpls-isis-04 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 13, 2013. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Xu, et al. Expires January 13, 2013 [Page 1] Internet-Draft VPLS Using IS-IS July 2012 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract This document describes a light-weight Virtual Private LAN Service (VPLS), referred to as IS-IS VPLS, which uses IS-IS for auto- discovery and signaling. IS-IS VPLS is intended to be used as a scalable cloud data center network solution. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. Table of Contents 1. Introduction ................................................ 3 2. Terminology ................................................. 5 3. Control Plane ............................................... 6 3.1. VPLS Info TLV .......................................... 6 3.2. Auto-discovery ......................................... 7 3.3. Signaling .............................................. 8 4. Data Plane .................................................. 8 4.1. Data Encapsulation and Forwarding ...................... 8 4.1.1. Unicast ........................................... 8 4.1.2. Multicast/Broadcast .............................. 8 4.2. MAC Address Learning ................................... 9 4.2.1. Data-plane based MAC Learning ..................... 9 4.2.2. Control-plane based MAC Learning ................. 10 5. ARP Broadcast Reduction .................................... 10 6. Security Considerations .................................... 10 7. IANA Considerations ........................................ 10 8. Acknowledgements ........................................... 11 9. References ................................................. 11 9.1. Normative References .................................. 11 9.2. Informative References ................................ 11 10. Authors' Addresses ........................................ 12 Xu, et al. Expires January 13, 2013 [Page 2] Internet-Draft VPLS Using IS-IS July 2012 1. Introduction For leveraging the economics of scale, today's cloud data centers tend to contain tens to hundreds of thousands of servers. Moreover distributed computing and server virtualization are increasingly adopted in today's cloud data centers. These factors lead to significant scaling, performance, and operational challenges for cloud data center networks. Therefore, to design a scalable and sustainable cloud data center network solution, the following requirements must be taken into account: 1) LAN Extension To achieve service agility and business continuance, Virtual Machine (VM) migration and High-Available (HA) cluster are commonly used in today's cloud data centers. These two applications introduce special requirements on data center networks. For instance, to allow a VM to be freely migrated from one server to another within data centers while retaining its IP address and MAC address, the LAN where the VM is located needs to be extend across multiple racks or pods within data centers. In addition, as some HA cluster applications rely on link-local multicast for cluster member discovery and heartbeat, cluster member servers are usually required to reside within the same Layer2 domain. As a result, LAN extension becomes a fundamental requirement for cloud data center networks. 2) VPN Instance Space Scalability In modern cloud data centers, tens of thousands of tenants (e.g., enterprises or governments who consume computing resources such as Infrastructure-as-a-Service (IaaS) offered by cloud service providers), could be hosted over a shared network infrastructure. For security and performance isolation considerations, these tenants SHOULD be isolated from one another. Hence, cloud data center networks SHOULD provide a large enough VPN instance space for tenant isolation. 3) Forwarding Table Scalability In a highly virtualized cloud data center environment, it's not uncommon that millions of VMs are contained over a common network infrastructure. Therefore, the forwarding table of each network device within cloud data center networks SHOULD be scalable enough so as to keep up with that scale of VMs. Xu, et al. Expires January 13, 2013 [Page 3] Internet-Draft VPLS Using IS-IS July 2012 4) Bandwidth Utilization Maximization In modern cloud data centers, distributed computing is driving the server-to-sever traffic to become the dominating traffic compared to the client-to-server traffic. To meet the growing capacity demands for server-to-server connectivity, shortest path forwarding and Equal Cost Multi-Path (ECMP) have already been the basic capabilities of cloud data center networks. 5) ARP/Unknown Unicast Flood Suppression It's well-known that the flooding of ARP broadcast and unknown unicast traffic within large Layer2 networks will lead to performance impact on both networks and hosts. As the Layer2 domain is extended across multiple racks or pods within data centers, the above problem will become even worse. As such, how to suppress the flooding of ARP broadcast and unknown unicast traffic within cloud data centers becomes increasingly desirable. 6) Flexibility for Tradeoffs between Bandwidth and State It's possible that there would be a great difference between tenants within a cloud data center, in terms of VM numbers or multicast/broadcast traffic volume. For example, some tenants may have seldom VMs while others may have a lot of VMs, or some tenants may have a high volume of multicast/broadcast traffic while others may have little or even no multicast/broadcast traffic. As such, there is no "one size fits all" VPN multicast/broadcast delivery procedure for these tenants. Hence, cloud data center networks SHOULD support both the ingress replication procedure and the multicast tree procedure for delivering VPN multicast/broadcast traffic, so as to allow for an effective tradeoff between bandwidth usage and state maintenance on a per tenant basis according to the particular conditions of each tenant. 7) Simplified Provisioning and Operation It's not surprising that a single cloud data center has thousands of physical switches (e.g., ToR switches). However, a network of such scale usually implies a big challenge for data center operators. Therefore, how to simplify and even automate network provisioning and operation becomes significantly important for cloud data center networks. 8) Reuse Existing Operation Experiences Xu, et al. Expires January 13, 2013 [Page 4] Internet-Draft VPLS Using IS-IS July 2012 IP, as a proven routing technology, has already been used in most today's data centers. Furthermore, those service providers who are planning to build cloud data centers have years of experience in operating MPLS-based L2VPN and/or L3VPN services which can be transported over MPLS or IP-enabled Packet Switching Networks (PSNs). To allow data center operators to reuse their existing network operation experiences, cloud data center network solutions SHOULD reuse existing technologies and protocols where appropriate, rather than reinventing the wheel. That's why there are increasing interests from the industrial community on how to adopt or adapt the existing L2VPN and/or L3VPN technologies for cloud data center networks. Although the existing VPLS solutions [RFC4761, RFC4762] can meet most of the above requirements, there are still spaces for improvement. For instance, the existing VPLS solutions require establishing full-mesh of pseudo-wires (PWs) between PE routers, which implies a significant scaling challenge in the cloud data center environment, especially when imagining configuring thousands of ToR switches as PE routers and provisioning tens of thousands of VPLS instances on them; secondly, the ingress replication procedure used for delivering multicast and broadcast traffic in existing VPLS solutions is not optimal from the bandwidth utilization perspective; thirdly, existing VPLS solutions require running one or more separate protocols besides IGP within data centers for VPLS protocol (e.g., LDP and/or BGP), which results in a certain complexity in network management and operation, especially when considering configuring BGP and VPLS parameters on thousands of ToR switches and configuring thousands of BGP peers on aggregation or core switches. Hence, this document describes a light-weight VPLS solution, referred to as IS-IS VPLS, which uses IS-IS [IS-IS][RFC1195] for VPLS protocol. IS-IS VPLS retains almost all advantages of existing VPLS solutions (e.g., split-horizon forwarding) while overcoming their shortages as mentioned above. For example, there is no need for full-mesh PWs between PE routers in IS-IS VPLS. Furthermore, it allows data center operators to flexibly make tradeoffs between bandwidth and state on a per tenant basis. Finally, an already deployed IGP within cloud data centers (i.e., IS-IS), rather than a dedicated protocol(s), is used for providing VPLS services. 2. Terminology This memo makes use of the terms defined in [RFC4664], [VPLS-MCAST], [RFC4761] and [RFC4762]. Xu, et al. Expires January 13, 2013 [Page 5] Internet-Draft VPLS Using IS-IS July 2012 3. Control Plane There are two primary functions of the VPLS control plane: auto- discovery and signaling. In IS-IS VPLS, these two functions are accomplished by using a single extended IS-IS TLV, referred to as VPLS Info TLV (see section 3.1). By propagating such VPLS Info TLVs that contain VPLS information within data center networks, PE routers automatically discover which other PE routers are part of a given VPLS instance and their assigned VPLS labels for that VPLS instance. Furthermore, according to the ISIS specification defined in [IS-IS] and [RFC1195], IS-IS routers would ignore unknown TLVs in the LSP and pass them on to other neighbors unchanged. Therefore, P routers don't need processing the VPLS info TLV, but instead synchronizing the Link State PDUs (LSP) containing such TLV with their adjacent IS-IS neighbors as normal. In addition, to overcome the 255-byte TLV limit, IS-IS allows the interpretation of multiple TLVs of a given type to be considered additive rather than mutually exclusive (see section 6.4 in [RFC5311] for more details), therefore there is no scaling issue in using IS-IS for propagating a huge amount of VPLS information. 3.1. VPLS Info TLV 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+ |Type=VPLS Info | +-+-+-+-+-+-+-+-+ | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | PE's IPv4 or IPv6 Address | | (128 bits) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Resv (12 bits) | VPLS ID (20 bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Resv (12 bits) | VPLS Label (20 bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Resv (12 bits) | VPLS ID (20 bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Resv (12 bits) | VPLS Label (20 bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Xu, et al. Expires January 13, 2013 [Page 6] Internet-Draft VPLS Using IS-IS July 2012 Type Type code for VPLS Info TLV: TBD. Length Total number of bytes contained in the value field. PE's IPv4 or IPv6 Address This 128-bit field is filled with one of the originating PE router's IPv4 or IPv6 addresses which are reachable across the IP backbone. The address filled in this field SHOULD be used as a tunnel destination address by remote PE routers when these PE routers acting as ingress PE routers want to tunnel a customer Ethernet frame to such PE router. If the IP address is IPv4, the last four octets of this field are filled with the IPv4 address while the remaining part is set to zero. In other words, it is filled with an IPv4-mapped IPv6 address. VPLS ID This field is filled with a 20-bit globally unique VPLS ID for a particular attached VPLS instance. In case that a larger VPLS ID space is required, the leftmost 12-bit reserved field could be used together with the VPLS ID field as an extended VPLS ID field. That is to say, the whole 32 bits are filled with a 32-bit long extended VPLS ID value. VPLS Label This field is filled with a 20-bit MPLS label corresponding to the VPLS instance which is identified by the VPLS ID. 3.2. Auto-discovery In IS-IS VPLS, each PE router could automatically discover which other PE routers are part of a given VPLS instance that is identified by the globally unique VPLS ID. This allows each PE router to be configured with only the identities of the attached VPLS instances, but not identities of all the other PE routers belonging to these VPLS instances. Xu, et al. Expires January 13, 2013 [Page 7] Internet-Draft VPLS Using IS-IS July 2012 3.3. Signaling In IS-IS VPLS, a PE router would assign the same VPLS label for a given VPLS instance to any other PE router. As such, this VPLS label is only used for identifying a particular VPLS instance, rather than identifying both a particular VPLS instance and the corresponding ingress PE router as a PW label does. 4. Data Plane 4.1. Data Encapsulation and Forwarding Since the VPLS label in IS-IS VPLS is only used for identifying a particular VPLS instance, in the data-plane based MAC learning case (see section 4.2.1), IP-based tunnel (e.g., GRE (Generic Routing Encapsulation)/IP [RFC4023] or UDP [MPLS-in-UDP]) is RECOMMENDED to be used as the PE-to-PE tunnel technology. As such, during the MAC learning process, egress PE router could easily determine the ingress PE router of the received VPLS packet from the tunnel source address of that packet. Note that, in the control-plane based MAC learning case (see section 4.2.2), there is no special requirement for PE-to-PE tunnel technology in comparison with existing VPLS solutions. The following sub-sections are based on an assumption that IP tunnels are used between PE routers. 4.1.1. Unicast For known unicast, MAC-in-MPLS-in-IP encapsulation [RFC4448] is used. For unknown unicast, the encapsulation and forwarding procedures are the same as that for multicast/broadcast described in the following section. 4.1.2. Multicast/Broadcast There are two major modes for delivering multicast and broadcast in IS-IS VPLS: ingress replication mode and P-Multicast tree mode. P- Multicast tree mode further includes two sub-options: non- aggregative P-Multicast tree mode where one P-Multicast distribution tree in the IP backbone is exclusively used by a single VPLS instance, and aggregative P-Multicast tree mode in which one P- Multicast tree is shared by more than one VPLS instance. The corresponding encapsulation for each mode is described in the following sub-sections. Xu, et al. Expires January 13, 2013 [Page 8] Internet-Draft VPLS Using IS-IS July 2012 4.1.2.1. Ingress Replication Mode In the ingress replication mode, an ingress PE router forward the received customer multicast/broadcast frames towards remote PE routers in separate tunnels. Hence, the encapsulation in this mode has no difference from that for unicast. 4.1.2.2. Non-aggregative P-Multicast Tree Mode In the non-aggregative P-Multicast tree mode, MAC-in-IP encapsulation is used directly since the destination IP address (i.e., multicast address) contained in the IP-based tunnel header is enough for egress PE routers to determine which VPLS instance the received VPLS packet belongs to. 4.1.2.3. Aggregative P-Multicast Tree Mode For the aggregative P-Multicast tree mode, MAC-in-MPLS-in-IP encapsulation SHOULD be used. Furthermore, the MPLS label here SHOULD be treated as an upstream-assigned label. For example, assume a PE router has assigned a local label L for a given VPLS instance and advertised that VPLS information using the VPLS Info TLV before, when this PE router wants to send a multicast VPLS packet of that VPLS instance through the corresponding aggregative P-Multicast tree, label L as an upstream-assigned label will be contained in that VPLS packet. 4.2. MAC Address Learning MAC addresses of local CE hosts would still be learnt by PE routers as normal bridges. As for learning MAC addresses of remote CE hosts, IS-IS VPLS provides two options: data-plane based MAC learning and control- plane based MAC learning. If unknown unicast flood suppression is required even at the cost of consuming more forwarding table resources, the control-plane based MAC learning option could be considered. Otherwise, the data-plane based MAC learning option SHOULD be preferred. 4.2.1. Data-plane based MAC Learning Upon receiving an VPLS packet from a remote PE router, the MPLS label contained in the packet (or the tunnel destination IP address in the non-aggregative P-Multicast tree case) is used to determine the particular VPLS instance that the packet belongs to, while the Xu, et al. Expires January 13, 2013 [Page 9] Internet-Draft VPLS Using IS-IS July 2012 tunnel source IP address is used to tell from which ingress PE router the packet was sent. 4.2.2. Control-plane based MAC Learning In IS-IS VPLS, MAC addresses of remote CE hosts can also be learnt on the control plane by using the MAC-Reachability TLV defined in [RFC6165]. Upon learning the MAC addresses of their local CE hosts, PE routers would immediately advertise these MAC addresses to other PE routers of the same VPN instance by using the MAC-Reachability TLV defined in [RFC6165]. One or more MAC-Reachability TLVs are carried in a LSP which in turn is encapsulated with an Ethernet header. The source MAC address is the originating PE router's MAC address whereas the destination MAC address is a to-be-defined multicast MAC address specifically identifying IS-IS VPLS PE routers. Such LSPs are forwarded towards remote PE routers as customer Ethernet frames by ingress PE routers. Egress PE routers receiving the above packets SHOULD intercept them and accordingly process them. IP address of the PE router originating these MAC routes could be derived either from the "IP Interface Address" field contained in the corresponding LSPs (Note that the IP address here SHOULD be identical with that contained in the VPLS Info TLV) or from the tunnel source IP address of the VPLS packet containing such MAC routes. Since these LSPs are fully transparent to P routers, there is no impact on the control plane of P routers. More details about the control-plane based MAC learning procedure are for further study. 5. ARP Broadcast Reduction To suppress ARP broadcast flood within a given VPLS instance, ARP cache mechanism can be enabled on PE routers. For more details about ARP cache mechanism, please refer to [ARP-Reduction] 6. Security Considerations This document doesn't introduce additional security risk to IS-IS and VPLS, nor does it provide any additional security feature for IS-IS and VPLS. 7. IANA Considerations The IS-IS TLV type code for VPLS Info TLV is required to be defined by IANA. Xu, et al. Expires January 13, 2013 [Page 10] Internet-Draft VPLS Using IS-IS July 2012 8. Acknowledgements Thanks to Tony Li, Peter Ashwood-Smith, Phil Bedard, Kris Price, Shahram Davari, Adrian Farrel, Giles Heron and Christian Jacquenet for their valuable comments on this proposal. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 9.2. Informative References [IS-IS] ISO/IEC 10589, "Intermediate System to Intermediate System Intra-Domain Routing Exchange Protocol for use in Conjunction with the Protocol for Providing the Connectionless-mode Network Service (ISO 8473)", 2005. [RFC1195] Callon, R., "Use of OSI IS-IS for Routing in TCP/IP and Dual Environments", RFC 1195 1990. [RFC5311] McPherson, D., Ginsberg, L., Previdi, S., and M. Shand, "Simplified Extension of Link State PDU (LSP) Space for IS-IS", RFC 5311, 2009. [RFC4448] Martini, L., Rosen, E., El-Aawar, N., and G. Heron, "Encapsulation Methods for Transport of Ethernet over MPLS Networks", RFC 4448, April 2006. [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating MPLS in IP or Generic Routing Encapsulation (GRE)", RFC 4023, March 2005. [MPLS-in-UDP] X. Xu, et al., "Encapsulating MPLS in UDP", draft-xu- mpls-in-udp-01.txt (work in progress), May 2012. [RFC6165] A. Banerjee., D. Ward, "Extensions to IS-IS for Layer-2 Systems", RFC 6165, February 2011. [VPLS-MCAST] R. Aggarwal., Y. Kamite., L. Fang, "Multicast in VPLS", draft-ietf-l2vpn-vpls-mcast-08.txt (work in progress), October 2010. Xu, et al. Expires January 13, 2013 [Page 11] Internet-Draft VPLS Using IS-IS July 2012 [ARP-Reduction] H. Shah., A. Ghanwani., and N. Bitar, "ARP Broadcast Reduction for Large Data Centers", draft-shah-armd-arp-reduction-02.txt (work in progress), October 2011. [RFC5331] R. Aggarwal, Y. Rekhter, E. Rosen, "MPLS Upstream Label Assignment and Context-Specific Label Space", RFC 5331, August 2008. [RFC4664] Andersson, L. and Rosen, E. (Editors),"Framework for Layer 2 Virtual Private Networks (L2VPNs)", RFC 4664, Sept 2006. [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 4761, January 2007. [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service (VPLS) Using Label Distribution Protocol (LDP) Signaling", RFC 4762, January 2007. 10. Authors' Addresses Xiaohu Xu Huawei Technologies, Beijing, China Email: xuxiaohu@huawei.com Himanshu Shah Ciena Corp Email: hshah@ciena.com Lucy Yong Huawei USA 1700 Alma Dr. Suite 500 Plano, TX 75075, US Email: lucyyong@huawei.com Yongbing Fan China Telecom Guangzhou, China. Phone: +86 20 38639121 Email: fanyb@gsta.com Xu, et al. Expires January 13, 2013 [Page 12]