Network Working Group B. Sarikaya Internet-Draft F. Xia Expires: May 10, 2014 Huawei USA November 06, 2013 Virtual eXtensible Local Area Network over IEEE 802.1Qbg draft-sarikaya-proxy-vxlan-00.txt Abstract In data centers there is interest in offloading network functions to the switches in order to keep the server focused on computation not networking. IEEE 802.1Qbg or Virtual Ethernet Port Aggregator (VEPA) at the hypervisor simply forces each VM frame sent out to the external switch regardless of destination. In this case, the eXtensible Local Area Network operation or proxying at a higher level switch is needed. Communication functions of the eXtensible Local Area Network are moved above to the Top of Rack switches. Top of Rack switch encapsulates the packets and directs them to their destination. Packets from the eXtensible Local Area Network servers are decapsulated before sending it to the destination proxy servers. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on May 10, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Sarikaya & Xia Expires May 10, 2014 [Page 1] Internet-Draft Mapping of VTEP IP to VM Mac addresses November 2013 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4 4. Proxy VXLAN Architecture . . . . . . . . . . . . . . . . . . 5 5. Overview of the protocol . . . . . . . . . . . . . . . . . . 5 6. Encapsulation/Decapsulation Operation . . . . . . . . . . . . 6 7. Virtual Machine Creation . . . . . . . . . . . . . . . . . . 7 8. VXLAN Tunnel Endpoint Notification . . . . . . . . . . . . . 7 9. Virtual Machine Mobility and Operation . . . . . . . . . . . 7 10. Security Considerations . . . . . . . . . . . . . . . . . . . 8 11. IANA considerations . . . . . . . . . . . . . . . . . . . . . 8 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 13.1. Normative References . . . . . . . . . . . . . . . . . . 8 13.2. Informative References . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 1. Introduction Data center networks are being increasingly used by telecom operators as well as by enterprises. Currently these networks are organized as one large Layer 2 network in a single building. In some cases such a network is extended geographically using virtual Local Area Network (VLAN) technologies still as an even larger Layer 2 network connecting the virtual machines (VM), each with its own MAC address. Another important requirement was growing demand for multitenancy, i.e. multiple tenants each with their own isolated network domain. In a data center hosting multiple tenants, each tenant may independently assign MAC addresses and VLAN IDs and this may lead to potential duplication. What we need is IP based tunneling scheme based overlay network called Virtual eXtensible Local Area Network (VXLAN). VXLAN overlays a Layer 2 network over a Layer 3 network. Each overlay is identified by the VXLAN Network Identifier (VNI). This allows up to 16M VXLAN segments to coexist within the same administrative domain [I-D.mahalingam-dutt-dcops-vxlan]. In VXLAN, each MAC frame is transmitted after encapsulation, i.e. an outer Ethernet header, an Sarikaya & Xia Expires May 10, 2014 [Page 2] Internet-Draft Mapping of VTEP IP to VM Mac addresses November 2013 IPv4/IPv6 header, UDP header and VXLAN header are added. Outer Ethernet header indicates an IPv4 or IPv6 payload. VXLAN header contains 24-bit VNI. VXLAN tunnel end point (VTEP) is the hypervisor on the server which houses the VM. VXLAN encapsulation is only known to the VTEP, the VM never sees it. Also the tunneling is stateless, each MAC frame is encapsulated independent on any other MAC frame. Instead of using UDP header, Generic Routing Encapsulation (GRE) encapsulation can be used. A 24-bit Virtual Subnet Identifier (VSID) is placed in the GRE key field. The resulting encapsulation is called Network Virtualization using Generic Routing Encapsulation (NVGRE) [I-D.sridharan-virtualization-nvgre]. Note that VSID is similar to VNI. Although VXLAN terminology is used throughout, the protocol defined in this document applies to VXLAN as well as NVGRE. One deployment strategy for VXLAN is to upgrade data center server hypervisors for VXLAN compatibility. Data center servers that can not be upgraded can also be given VXLAN capability using proxying. For proxying to work, IEEE 801.1Qbg [IEEE802.1Qbg] or Virtual Ethernet Port Aggregator (VEPA) functionality is needed in legacy server hypervisors. In a virtual server environment the most common way to provide Virtual Machine (VM) switching connectivity is a Virtual Ethernet Bridge (VEB) or a vSwitch. VEP acts similar to a Layer 2 hardware switch providing inbound/outbound and inter-VM communication. VEP aggregates multiple VMs traffic across a set of links as well as provides frame delivery between VMs based on MAC address. . However VEB lacks network management, monitoring and security, IEEE 801.1Qbg or Virtual Ethernet Port Aggregator (VEPA) provides a simple solution. VEPA simply sends each VM frame out to the external switch regardless of destination to be handled by an external switch, i.e. Proxy VXLAN switch. VXLAN is a server-based network virtualization solution, and hypervisors are responsible for all networking work. At the same time, IEEE 801.1Qbg [IEEE802.1Qbg] follows a total different philosophy that servers should do as little as possible networking job, and it defines a way for virtual switches to send all traffic and forwarding decisions to the adjacent physical switch. This removes the burden of VM forwarding decisions and network operations from the host CPU. It also leverages the advanced management capabilities in the access or aggregation layer switches. In this document, we develop Proxy VXLAN switch behavior. Sarikaya & Xia Expires May 10, 2014 [Page 3] Internet-Draft Mapping of VTEP IP to VM Mac addresses November 2013 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The terminology in this document is based on the definitions in [I-D.mahalingam-dutt-dcops-vxlan] 3. Problem Statement In a virtual server environment the most common way to provide Virtual Machine (VM) switching connectivity is a Virtual Ethernet Bridge (VEB) or a vSwitch. VEB acts similar to a Layer 2 hardware switch providing inbound/outbound and inter-VM communication. VEP aggregates multiple VMs traffic across a set of links as well as provides frame delivery between VMs based on MAC address. There are a number of disadvantages for VEB solution. First of all, vSwitch consumes valuable CPU and memory bandwidth. The higher the traffic load, the greater the number of CPU and memory cycles required to move traffic through the vSwitch, reducing the ability to support larger numbers of VMs in a physical server. Secondly, the solution lacks network-based visibility.vSwitches have a limited feature set. They don't provide local traffic visibility or have capabilities for enterprise data monitoring, security, or network management. Finaly, it lacks network policy enforcement. Modern external switches have many advanced features such as port security, quality of service (QoS), and access control lists (ACL). But vSwitches often do not have, or have limited support for such features. To solve the management challenges with VEBs, Edge Virtual Bridging (EVB) in the IEEE 802.1Qbg standard was proposed. The primary goals of EVB are to combine the best of software and hardware vSwitches with the best of external L2 network switches. EVB is based on VEPA (Virtual Ethernet Port Aggregator) technology. It is a way for virtual switches to send all traffic and forwarding decisions to the adjacent physical switch. This removes the burden of VM forwarding decisions and network operations from the host CPU. It also leverages the advanced management capabilities in the access or aggregation layer switches. VXLAN idea is mainly developed on vSwitch not IEEE 802.1Qbg. To accommodate IEEE 802.1Qbg, Proxy VXLAN is proposed in this document. Sarikaya & Xia Expires May 10, 2014 [Page 4] Internet-Draft Mapping of VTEP IP to VM Mac addresses November 2013 4. Proxy VXLAN Architecture Proxy VXLAN is composed of servers that can host virtual machines. The servers are not involved in any communications required by VXLAN. This function is moved to the switches above the server. Top of Rack switches are examples of switches that can host proxy functions, i.e. VXLAN Tunnel End Point, VTEPs. Servers support IEEE 802.1Qbg. VTEPs receive raw packets from the servers and send packets upstream after VXLAN encapsulation. Proxy VXLAN is assumed to be connected to VXLAN enabled servers. VTEP in a Proxy VLAN architecture always tags the outgoing frames to let VXLAN enabled servers know that these frames are proxied. A given ToR switch hosting VTEP can serve one of more legacy servers. Virtual machine creation/deletion is done by the management center. 5. Overview of the protocol The steps involved in the protocol are explained below: Encapsulation/Decapsulation of Frames In a hybrid Proxy VXLAN, when a frame is received on the VXLAN connected interface, the proxy switch decapsulates the frame and forwards the packet to the non VXLAN server. When an incoming frame from the non-VXLAN interface is received, the proxy switch encapsulates it and forwards it to the VXLAN server. Virtual Machine Creation Virtual machine creation is initiated by the management center. The management center notifies a given non-VXLAN server to create a VM. The center assigned a MAC address and VXLAN Network Identifier to the VM. VTEP Notification Management Center notifies the ToR switch that is responsible for the server of this newly created virtual machine. The center sends MAC address, VNI of the virtual machine to the ToR switch which will act as the VTEP for this VM. Virtual Machine Operation Virtual machine execution usually starts with ARP/ND Request to get the IP address of the destination virtual machine. After ARP/ND, virtual machine enters into IP communication with the destination virtual machine. Core switches +----+ +----+ +----+ | | | | | | Sarikaya & Xia Expires May 10, 2014 [Page 5] Internet-Draft Mapping of VTEP IP to VM Mac addresses November 2013 | | | | | | +----+ +--+-+ +---\+ Management -- | \\ Center --- | \ -- | \ --- | \\ -- | \ +----+ +-+--+ +--\-+ | | ToR switch | |ToR switch | |ToR switch | |--VTEP1 | |VTEP2 | |VTEP3 +----+ -+ +---+----+\ +-/--+ +--+ | +--+-| +--+ \\ +--+// | +---| | | | +---\ | / +--+ +--+ +--+ +--+ | | | | +---------+ +---------+ +---------+ +---------+ |+--+ +--+| |+--+ +--+| |+--+ +--+| |+--+ +--+| ||vm| |vm|| ||vm| |vm|| ||vm| |vm|| ||vm| |vm|| |+--+ +--+| |+--+ +--+| |+--+ +--+| |+--+ +--+| +---------+ +---------+ +---------+ +---------+ Figure 1: Proxy VXLAN Architecture 6. Encapsulation/Decapsulation Operation In a hybrid Proxy VXLAN, when a frame is received on the VXLAN connected interface, the proxy switch removes the VXLAN header. It checks the destination MAC address of the inner Ethernet frame and forwards the packet to a physical port based on this MAC address. When an incoming frame from the non-VXLAN interface is received, the proxy switch first adds a VXLAN header. VXLAN Network ID (VNI) is set to the value which is provided by the management center. I Flag is set to 1. A new flag, P flag is set to 1 to indicate that this frame is coming from Proxy VXLAN. Source port is assigned by the proxy switch. Destination port is set to 4789. UDP checksum is set to zero. Source IPv4/v6 address is the proxy switch IPv4/v6 address. Destination IPv4/v6 address is obtained based on the inner destination MAC address. It is a multicast address if the incoming frame belongs to ARP/ND or multicast communication. Otherwise the proxy switch looks up its ARP/ND cache to find the IP address corresponding to the inner destination MAC address and places the result in the destination IPv4/IPv6 address. When an incoming frame from the non-VXLAN interface is received, the proxy switch checks if the destination is within the host (another VM Sarikaya & Xia Expires May 10, 2014 [Page 6] Internet-Draft Mapping of VTEP IP to VM Mac addresses November 2013 in the same VLAN). In that case the frame is forwarded back down the port it was received on. 7. Virtual Machine Creation Virtual machines are created by the management center. The management center creates a virtual machine, assigns a server to it. Usually each server may host more than one virtual machine. When a virtual machine is created, the management center assigns a MAC address and its VXLAN Network Identifier. The center sends MAC address and VNI to the server. 8. VXLAN Tunnel Endpoint Notification At the time when the virtual machine is created, the management center also notifies the ToR switch hosting a VTEP that is responsible for the server in which the VM was created. ToR switch receives MAC address and VNI value for this virtual machine. ToR switch is responsible for keeping all MAC address/VNI values for each virtual machine that it serves. These values are used in encapsulating the packets coming from the virtual machines and in virtual machine operation. 9. Virtual Machine Mobility and Operation In Proxy VXLAN, virtual machine mobility can be achieved using the following steps: Step 1. Source VTEP is notified with destination VTEP of the moving VM, Step 2 Source VTEP tunnels all packets for the VM to destination VTEP Step 3 When the VM is ready, it would send gratuitous ARP to all VMs Step 4 When the source VTEP receives the gratuitous ARP, it removes the VM MAC from its original forwarding table and stops tunneling for this virtual machine . When a VM is created or after VM is moved, VM starts its operation, e.g. by sending ARP/ND packets. Non-VXLAN server sends the packet to the upstream switch which finally reaches the ToR switch hosting the VTEP[architecture]. VTEP normally converts this packet, i.e. broadcast packet into a multicast packet, encapsulates it and sends it out to VXLAN enabled servers. How ARP/ND packets are processed is out of scope. Sarikaya & Xia Expires May 10, 2014 [Page 7] Internet-Draft Mapping of VTEP IP to VM Mac addresses November 2013 10. Security Considerations The security considerations in [RFC2131], [RFC2132] and [RFC3315] apply. Special considerations in [I-D.mahalingam-dutt-dcops-vxlan] are also applicable. 11. IANA considerations IANA is requested to assign the OPTION_VNI and OPTION_DSA and VXLAN Network Identifier and ARP/ND Directory Server IP Address Option Codes in the registry maintained for DHCPv4 and DHCPv6. 12. Acknowledgements 13. References 13.1. Normative References [RFC0826] Plummer, D., "Ethernet Address Resolution Protocol: Or converting network protocol addresses to 48.bit Ethernet address for transmission on Ethernet hardware", STD 37, RFC 826, November 1982. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2131] Droms, R., "Dynamic Host Configuration Protocol", RFC 2131, March 1997. [RFC2132] Alexander, S. and R. Droms, "DHCP Options and BOOTP Vendor Extensions", RFC 2132, March 1997. [RFC3315] Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C., and M. Carney, "Dynamic Host Configuration Protocol for IPv6 (DHCPv6)", RFC 3315, July 2003. [RFC4511] Sermersheim, J., "Lightweight Directory Access Protocol (LDAP): The Protocol", RFC 4511, June 2006. [RFC4513] Harrison, R., "Lightweight Directory Access Protocol (LDAP): Authentication Methods and Security Mechanisms", RFC 4513, June 2006. [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, September 2007. [IEEE802.1Qbg] Sarikaya & Xia Expires May 10, 2014 [Page 8] Internet-Draft Mapping of VTEP IP to VM Mac addresses November 2013 IEEE, "Edge Virtual Bridging", IEEE Std 802.1Qbg-2012, May 2012. 13.2. Informative References [I-D.mahalingam-dutt-dcops-vxlan] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-05 (work in progress), October 2013. [I-D.sridharan-virtualization-nvgre] Sridharan, M., Greenberg, A., Wang, Y., Garg, P., Venkataramiah, N., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P., and C. Tumuluri, "NVGRE: Network Virtualization using Generic Routing Encapsulation", draft-sridharan-virtualization-nvgre-03 (work in progress), August 2013. Authors' Addresses Behcet Sarikaya Huawei USA 5340 Legacy Dr. Building 3 Plano, TX 75024 Phone: +1 972-509-5599 Email: sarikaya@ieee.org Frank Xia Huawei USA Nanjing, China Phone: +1 972-509-5599 Email: xiayangsong@huawei.com Sarikaya & Xia Expires May 10, 2014 [Page 9]