NVo3 L. Dunbar Internet Draft Huawei Intended status: Informational June 28, 2012 Expires: December 2012 Issues of Mobility in DC Overlay network draft-dunbar-nvo3-overlay-mobility-issues-00.txt Abstract This draft describes the issues introduced by VM mobility in Data center overlay network. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on December 28, 2011. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License. Expires December 28, 2012 [Page 1] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 Table of Contents 1. Introduction ................................................ 2 2. Terminology ................................................. 3 3. Issues associated with Multicast in Overlay Network........... 3 4. Issues associated with more than 4k Tenant Separation......... 4 4.1. Collision of local VLAN Identifiers when VMs Move........ 7 4.1.1. Local VIDs Managed by External Controller.......... 10 4.1.2. Local VIDs Managed by NVE ......................... 11 4.2. Tenant Virtual Network separation at the physical gateway routers .................................................... 11 5. Summary and Recommendations................................. 12 6. Manageability Considerations................................ 13 7. Security Considerations..................................... 13 8. IANA Considerations ........................................ 13 9. Acknowledgments ............................................ 13 10. References ................................................ 13 Authors' Addresses ............................................ 14 Intellectual Property Statement................................ 14 Disclaimer of Validity ........................................ 14 1. Introduction Overlay networks, such as VxLAN, NvGRE, etc, have been proposed to scale networks in Data Center with massive number of hosts as the result of server virtualization and business demand. Overlay network can hide the massive number of VMs' addresses from the switches/routers in the core (i.e. underlay network). One of the key requirements stated in [NVo3-problem] is the ability of moving VMs across wider range of locations, which could be Dunbar December 28, 2012 [Page 2] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 multiple server racks, PODs, or locations, without changing VM's IP/MAC addresses. That means the association of VMs to their corresponding NVE is changing as VMs migrate. This dynamic nature of VM mobility in Data Center introduces new challenges and complications to overlay networks. This draft describes some of the issues introduced by VM migration in overlay environment. The purpose of the draft is to ensure those issues will be addressed by future solutions. 2. Terminology CE: VPN Customer Edge Device DC: Data Center DA: Destination Address EOR: End of Row switches in data center. VNID: Virtual Network Identifier NVE: Network Virtualization Edge PE: VPN Provider Edge Device SA: Source Address ToR: Top of Rack Switch. It is also known as access switch. VM: Virtual Machines VPLS: Virtual Private LAN Service 3. Issues associated with Multicast in Overlay Network Some data centers avoid the use of IP Multicast due, primarily, to the perceptions of configuration/protocol complexity and multicast scaling limits. There are also many data center operators for whom multicast is critical. Among the latter group, multicast is used for Internet Television (IPTV), market data, cluster load balancing, gaming, just to name a few. The use of multicast in overlay environment can impose some issues to network when VMs move, in particular: Dunbar December 28, 2012 [Page 3] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 The association between multicast members to NVE becomes dynamic as VMs move. At one moment, all members of a multicast group could be attached to one NVE. At another moment, some members of the multicast group could be attached to different NVEs. Among VMs attached to one NVE, some can send, while others can only receive. In addition, Overlay, which hides the VM addresses, introduces the IGMP snooping issue in the core. With NVE adding outer header to data frames from VMs (i.e. applications), multicast addresses are hidden from the underlay networks, making switches in the underlay network not being able to snoop on the IGMP reports from multicast members. For unicast data frames, overlay network edge (e.g. TRILL edge) can learn the inner-outer address mapping by observing data frames passing by. Since multicast address is not placed in the inner- header's SA field of data frame, the learning approach for unicast won't work for multicast in overlay. TRILL solves the multicast inner-outer address learning issues by creating common multicast trees in the TRILL domain. If TRILL's multicast approach is used for DC with VM mobility, the multicast states maintained by switches/routers in the underlay network have to change as VMs move, which means switches in the underlay network have to be aware of VMs mobility and change multicast states accordingly. Overall, the VM mobility in overlay environment make multicast more complicated for switches/routers in the underlay network and for NVEs. 4. Issues associated with more than 4k Tenant Separation The [NVo3-framework] has a good figure showing the logical network seen by each tenant. There are L2 domains being connected by L3 infrastructure. Each tenant can have multiple virtual networks, which are identified IEEE802.1Q compliant 12 bits VLAN ID, under its logical routers (Rtr). Any VMs communicating with peers in different subnets, either within DC or outside DC, will have their L2 MAC address destined towards its local Router (Rtr in the figure below). Dunbar December 28, 2012 [Page 4] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 +----- L3 Infrastructure ----+ | | ,--+-'. ;--+--. ..... Rtr1 )...... . Rtr2 ) | '-----' | '-----' | Tenant1 |LAN12 Tenant1| |LAN11 ....|........ |LAN13 '':'''''''':' | | '':'''''''':' ,'. ,'. ,+. ,+. ,'. ,'. (VM ) .. (VM ) (VM ) .. (VM ) (VM ) .. (VM ) `-' `-' `-' `-' `-' `-' Figure 1: Logical Service Connectivity for a single tenant The overlay introduced by [NVo3-problem] makes the core (i.e. the underlay network) switches/routers forwarding tables not be impacted when VMs belonging to different tenants are placed or moved to anywhere, as shown in the Figure below (copied from [NVo3- framework]). Dunbar December 28, 2012 [Page 5] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 +--------+ +--------+----+ | Tenant | | TES: |VM1 | | End +--+ +---| Blade |VM2 | | System | | | | server |.. | +--------+ | ................... | +--------+----+ | +-+--+ +--+-+ |----VM-a | | NVE| |NVE | |----VM +--| #1 | |#2 |--+----VM +-+--+ +--+-+ / . L3 Overlay . \ +--------+ / . Network . \ +--------+ | Tenant +--+ . . +----| Tenant | | End | . . | End | | System | . +----+ . | System | +--------+ .....|NVE |........ +--------+ |#3 | +----+ | | +--------+ | Tenant |--VM-b | End |--VM | System | +--------+ Figure 2: Overlay example For client traffic "VM-a" to "VM-b", the ingress NVE encapsulates the client payload with an outer header which includes at least egress NVE as DA, ingress NVE as SA, and a VNID. The VNID is a 24-bits identifier proposed by [NVo3-Problem] to separate tens of thousands of tenant virtual networks. When the egress NVE receives the data frame from its ports facing the underlay network, the egress NVE decapsulates the outer header and then forward the decapsulated data frame to the attached VMs. When "VM-b" is on the same subnet (or VLAN) as "VM-a" and located within the same data center, the corresponding egress NVE is usually on a virtual switch in a server, on a ToR switch, or on a blade switch. When "VM-b" is on a different subnet (or VLAN), the corresponding egress NVE should be next to (or located on) the logical Rtr (Figure 1), which is most likely located on the data center gateway router(s). Dunbar December 28, 2012 [Page 6] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 4.1. Collision of local VLAN Identifiers when VMs Move Since the VMs attached to one NVE could belong to different virtual networks, the traffic under each NVE have to be identified by local network identifiers, which is usually VLAN if VMs are attached to NVE access ports via L2. To support tens of thousands of virtual networks, the local VID associated with client payload under each NVE has to be locally significant. If ingress NVE simply encapsulates an outer header to data frames received from VMs and forward the encapsulated data frames to egress NVE via underlay network, the egress NVE can't simply decapsulate the outer header and send the decapsulated data frames to attached VMs as done by TRILL. Egress NVE needs to convert the VID carried in the data frame to a local VID for the virtual network before forwarding the data frame to the VMs attached. In VPLS, operator has to configure the local VIDs under each PE to specific VPN instances. In VPLS, the local VID mapping to VPN instance ID doesn't change very much. In addition, most likely CE is not shared by multiple tenants, so the VIDs on one physical port of PE to CE are only for one tenant. For rare occasion of multiple tenants sharing one CE, the CE can convert the tuple [local customer VIDs & Tenant Access Port] to the VID designated by VPN operator for each VPN instance on the shared link between CE port and PE port. For example, in the figure below, the VIDs under CE#21 and the VIDs under CE#22 can be duplicated as long as the CEs can convert the local VIDs from their downstream links to the VIDs given by the VPN operators for the links between PE and CEs. Dunbar December 28, 2012 [Page 7] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 +--------+ +--------+ | CE | | CE |-> local VIDs | #11 +--+ +---| | | | | | | #21 | +--------+ | ................... | +--------+ | +-+--+ +--+-+ | | | PE | | PE | |<-VIDs configured by VPN operator +--| 1 | | 2 |--+ +-+--+ +--+-+ / . VPLS . \ +--------+ / . Network . \ +--------+ | CE +--+ . . +----| CE |-> Local VIDs | #12 | . . | #22 | | | . +----+ . | | +--------+ .....| PE |........ +--------+ | 3 | +----+ | | +--------+ | CE | | #31 | | | +--------+ Figure 3: VPLS example When all VMs under one virtual network are moved away from a NVE, the local VID, which was designated for this virtual network, might need to be used for different virtual network whose VMs are moved in later. In the Figure below, the NVE#1 may have local VID #100~#200 assigned to some virtual networks attached. The NVE#2 may have local VID #100~#150 assigned to different virtual networks. With VNID encoded in the outer header of data frames, the traffic in the L3 Overlay Network is strictly separated. Dunbar December 28, 2012 [Page 8] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 +--------+ +--------+ | Tenant | | TES: | | End +--+ +---| Blade | | System | | | | server | +--------+ | ................... | +--------+ | +-+--+ +--+-+ | | | NVE| |NVE | | +--| #1 | |#2 |--+ +-+--+ +--+-+ <-local VID to global VNID mapping / . L3 Overlay . \ becomes dynamic +--------+ / . Network . \ +--------+ | Tenant +--+ . . +----| Tenant | | End | . . | End | | System | . +----+ . | System | +--------+ .....|NVE |........ +--------+ |#3 | <-May not aware of VMs added/removed +----+ | | +--------+ | Tenant | | End | | System | +--------+ Figure 4: Overlay example When some VMs associated with Virtual Network X using VID 120 under NVE1 are moved to NVE2, a new VID must be assigned for the Virtual Network X under NVE2. It gets complicated when the local VIDs are tagged by none-NVE devices, e.g. VMs themselves, blade server switches, or virtual switches within servers. The devices which add VID to untagged frames need to be informed of the local VID. If data frames from VMs already have VID encoded in data frames, then there has to be a mechanism to notify the first switch port facing the VMs to convert the VID encoded by the VMs to the local VID which is assigned for the virtual network under the new NVE. That means when a VM is moved to a new location, its immediate adjacent switch port has be informed of local VID to convert the VID encoded in the data frames from the VM. NVE will need the mapping between local VID and the VNID to be used to face L3 underlay network. Dunbar December 28, 2012 [Page 9] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 4.1.1. Local VIDs Managed by External Controller Most likely the VM assignment to a physical location is managed by a non-networking entity, e.g. VM Manager or a Server Manager. NVEs may not be aware of VMs being added or deleted unless NVEs have a north bound interface to a controller which can communicate with VM/server Manager(s). When NVE can be informed of VMs being added/deleted and their associated tenant virtual networks via its controller, NVE should be able to get the specific VNID from its controller for untagged data frames arriving at its Virtual Access Points [VNo3-framework 3.1.1]. Since local VIDs under each NVE are really locally significant, it might be less confusing to egress NVE if ingress NVE remove the local VID attached to the data frame. So that egress NVE always has to assign its own local VID to data frame before sending the decapsulated data frame to attached VMs. If, for whatever reason, it is necessary to have local VID in the data frames before encapsulating outer header of EgressNVE-DA/ IngressNVE-SA /VNID, NVE should get the specific local VID from the external Controller for those untagged data frames coming to each Virtual Access Point. If the data frame is tagged before reaching the NVE's Virtual Access Point (e.g. tagged data frames from VMs) and NVE is more than one hop away from VMs, the first (virtual) port facing the VMs has be informed by the external controller of the new local VID to replace the VID encoded in the data frames. For reverse direction, i.e. data frames coming from core towards VMs, the first switching port facing VMs have to convert the VIDs encoded in the data frames to the VIDs used by VMs. The IEEE802.1Qbg's VDP protocol (Virtual Station Interface (VSI) discovery and configuration protocol) requires hypervisor to send VM profile upon a new VM is instantiated. However, not all hypervisors support this function. Dunbar December 28, 2012 [Page 10] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 4.1.2. Local VIDs Managed by NVE If NVEs don't have interface to any controllers which can be informed of VMs being added to or deleted from NVEs, then NVEs have to learn new VMs/VLANs being attached, figure out to which tenant virtual network those VMs/VLANs belong, and/or age out VMs/VLANs after a specified timer expires. Network management system has to assist NVEs in making the decision, even if the network management system doesn't have interface to VM/server managers. When NVE receives a data frame with a new VM address (e.g. MAC) in a tagged data frame from its Virtual Access Point, the new VM could be from an existing local virtual network, from a different virtual network (being brought in as the VM being added in), or from an illegal VM. Upon NVE learns a new VM being added, either by learning a new MAC address or a new VID, it needs its management system to confirm the validity of the new VID and/or new address. If the new address or VID is from invalid or illegal source, the data frame has to be dropped. 4.2. Tenant Virtual Network separation at the physical gateway routers When a VM communicates with peers in a different subnets, data frames will be sent to the tenant logical Router (Rtr1 or Rtr2 in the Figure 1). Very often, the logical routers of all tenants in a data center are just logical entities (e.g. VRF) on the gateway router(s). That means that all the VLANs for all tenants will be terminated at the Data Center Gateway router(s), as shown in the figure below. ,---------. ,' `. ( IP/MPLS WAN ) `. ,' `-+------+' +--+--+ +-+---+ |DC GW|+-+|DC GW| +-+---+ +-----+ / \ <---- All VLANs of all tenants / \ +-------+ +------+ +/------+ | +/-----+ | | Aggr11| + ----- |AggrN1| + Aggregation +---+---+/ +------+/ / \ / \ Dunbar December 28, 2012 [Page 11] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 / \ / \ +---+ +---+ +---+ +---+ |T11|... |T1x| |T21| .. |T2y| Access Layer +---+ +---+ +---+ +---+ | | | | +-|-+ +-|-+ +-|-+ +-|-+ | |... | | | | .. | | +---+ +---+ +---+ +---+ Server racks | |... | | | | .. | | +---+ +---+ +---+ +---+ | |... | | | | .. | | +---+ +---+ +---+ +---+ Figure 5: Data Center Physical topology Gateway routers can mitigate the overwhelming number of virtual network instances by integrating NVE function within the router(s). That requires routers to map VNID to VRF directly if routers' outbound to external network is VPN based. That requires routers to support tens of thousands of VRF instances, which can be challenging to routers. Data center can also use multiple gateway routers, with each handling a subset of tenants in data centers. That means that each tenant's VMs are only reachable by their designated routers or router ports. With the typical DC design shown in Figure 5, the number of server racks reachable by each gateway router is limited by the number of router ports enabled for the tenant virtual networks. That means the range of locations where each tenant's VMs can be moved across are limited. When VMs in data center communicates with external peers, data frames have to go through gateway. Even though majority of data centers have much more east west traffic volume than north south traffic volume, majority (as high as 90%) of applications (hosted on servers or VMs) in a data center still communicate with external peers. Just the volume of north south traffic is much less in many data centers. 5. Summary and Recommendations Overlay network can hide individual VMs addresses, making switches/routers in the core scalable. However overlay introduces other challenges, especially when VMs move across wide range of NVEs. This draft is to identify those issues introduced by mobility in Dunbar December 28, 2012 [Page 12] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 overlay environment, to ensure that they will be addressed by future solutions. 6. Manageability Considerations 7. Security Considerations Security will be addressed in a separate document. 8. IANA Considerations None. 9. Acknowledgments We want to acknowledge the following people for their valuable comments to this draft: David Black, Ben MackCrane, Peter AshwoodSmith, Lucy Yong and Young Lee. This document was prepared using 2-Word-v2.0.template.dot. 10. References [NVo3-Problem] Narten, et al, "Problem Statement: Overlays for Network Virtualization." Draft-narten-nvo3-overlay-problem- statement-02, June 2012. [NVo3-framework] Lasserre, et al, "Framework for DC Network Virtualization". Draft-lasserre-nvo3-framework-02, June 2012 [IEEE802.1Qbg] "MAC Bridges and Virtual Bridged Local Area Networks - Edge Virtual Switch". IEEE802.1Qbg/D2.2, Feb, 2012. Work in progress [ARMD-Problem] Narten,et al "draft-ietf-armd-problem-statement" in progress, Oct 2011. [ARMD-Multicast] McBride, Lui, "draft-mcbride-armd-mcast-overview- 01", in progress, March 10, 2012 [Gratuitous ARP] S. Cheshire, "IPv4 Address Conflict Detection", RFC 5227, July 2008. Dunbar December 28, 2012 [Page 13] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 Authors' Addresses Linda Dunbar Huawei Technologies 5340 Legacy Drive, Suite 175 Plano, TX 75024, USA Phone: (469) 277 5840 Email: ldunbar@huawei.com Intellectual Property Statement The IETF Trust takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in any IETF Document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Copies of Intellectual Property disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement any standard or specification contained in an IETF Document. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity All IETF Documents and the information contained therein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE Dunbar December 28, 2012 [Page 14] Internet-Draft Mobility Issues in Overlay Nov 1, 2011 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Dunbar December 28, 2012 [Page 15]