| < draft-fang-vpn4dc-problem-statement-00.txt | draft-fang-vpn4dc-problem-statement-01.txt > | |||
|---|---|---|---|---|
| Network Working Group Luyuan Fang | Network Working Group Maria Napierala | |||
| Internet Draft Cisco Systems | Internet Draft AT&T | |||
| Intended status: Informational | Intended status: Informational Luyuan Fang | |||
| Expires: April 24, 2012 | Expires: December 12, 2012 Dennis Cai | |||
| Cisco Systems | ||||
| October 24, 2011 | June 12, 2012 | |||
| VPN4DC Problem Statement | IP-VPN Data Center Problem Statement and Requirements | |||
| draft-fang-vpn4dc-problem-statement-00.txt | draft-fang-vpn4dc-problem-statement-01.txt | |||
| Abstract | Abstract | |||
| Provider Provisioned IP VPNs are commonly used to interconnect | Network Service Providers commonly use BGP/MPLS VPNs [RFC 4364] as | |||
| multiple locations of a private network, such as an enterprise with | the control plane for virtual networks. This technology has proven | |||
| multiple offices. Current developments in data center operations | to scale to a large number of VPNs and attachment points, and it is | |||
| create the need to consider additional connectivity and | well suited for Data Center connectivity, especially when | |||
| connectivity management problems described in this document. | supporting all IP applications. | |||
| The Data Center environment presents new challenges and imposes | ||||
| additional requirements to IP VPN technologies, including multi- | ||||
| tenancy support, high scalability, VM mobility, security, and | ||||
| orchestration. This document describes the problems and defines the | ||||
| new requirements. | ||||
| Status of this Memo | Status of this Memo | |||
| This Internet-Draft is submitted to IETF in full conformance with | This Internet-Draft is submitted to IETF in full conformance with | |||
| the provisions of BCP 78 and BCP 79. | the provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six | Internet-Drafts are draft documents valid for a maximum of six | |||
| months and may be updated, replaced, or obsoleted by other documents | months and may be updated, replaced, or obsoleted by other documents | |||
| at any time. It is inappropriate to use Internet-Drafts as reference | at any time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress". | material or to cite them other than as "work in progress". | |||
| This Internet-Draft will expire on April 24, 2012. | This Internet-Draft will expire on December 12, 2012. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2011 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with | carefully, as they describe your rights and restrictions with | |||
| respect to this document. Code Components extracted from this | respect to this document. Code Components extracted from this | |||
| document must include Simplified BSD License text as described in | document must include Simplified BSD License text as described in | |||
| Section 4.e of the Trust Legal Provisions and are provided without | Section 4.e of the Trust Legal Provisions and are provided without | |||
| warranty as described in the Simplified BSD License. | warranty as described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction 2 | 1. Introduction 3 | |||
| 2. Terminology 3 | 2. Terminology 4 | |||
| 3. VPN4DC: A problem Definition 3 | 3. IP-VPN in Data Center Network 4 | |||
| 4. Private network connectivity between data centers 4 | 3.1. Data Center Connectivity Scenarios 5 | |||
| 5. Private Networks within a public data center 5 | 4. Data Center Virtualization Requirements 6 | |||
| 6. Connectivity between different VPNs 5 | 5. Decoupling of Virtualized Networking from Physical | |||
| 7. Mobile connectivity 5 | Infrastructure 6 | |||
| 8. Security Considerations 6 | 6. Encapsulation/Decapsulation Device for Virtual Network | |||
| 9. IANA Considerations 6 | Payloads 7 | |||
| 10. Normative References 6 | 7. Decoupling of Layer 3 Virtualization from Layer 2 Topology 8 | |||
| 11. Informative References 6 | 8. Requirements for Optimal Forwarding of Data Center Traffic 9 | |||
| 12. Author's Address 6 | 9. Virtual Network Provisioning Requirements 9 | |||
| 10. Application of BGP/MPLS VPN Technology to Data Center Network 10 | ||||
| 10.1. Data Center Transport Network 12 | ||||
| 10.2. BGP Requirements in a Data Center Environment 12 | ||||
| 11. Virtual Machine Migration Requirement 14 | ||||
| 12. IP-VPN Data Center Use Case: Virtualization of Mobile Network 15 | ||||
| 13. Security Considerations 17 | ||||
| 14. IANA Considerations 17 | ||||
| 15. Normative References 17 | ||||
| 16. Informative References 17 | ||||
| 17. Authors' Addresses 17 | ||||
| 18. Acknowledgements 18 | ||||
| Requirements Language | Requirements Language | |||
| Although this document is not a protocol specification, the key | Although this document is not a protocol specification, the key | |||
| words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in | |||
| this document are to be interpreted as described in RFC 2119 [RFC | this document are to be interpreted as described in RFC 2119 [RFC | |||
| 2119]. | 2119]. | |||
| 1. Introduction | 1. Introduction | |||
| Data centers are increasingly being consolidated and outsourced in | Data Centers are increasingly being consolidated and outsourced in | |||
| an effort both to improve the deployment time of applications as | an effort, both to improve the deployment time of applications as | |||
| well as reduce operational costs. This coincides with an increasing | well as reduce operational costs. This coincides with an increasing | |||
| requirement for compute, storage and network connectivity from | demand for compute, storage, and network resources from | |||
| applications. | applications. In order to scale compute, storage, and network | |||
| resources, physical resources are being abstracted from their | ||||
| The consolidation and virtualization of data centers, either public | logical representation. This is referred as server, storage, and | |||
| or private, has consequences in terms of network requirements. It | network virtualization. Virtualization can be implemented in | |||
| creates several new problems for private network connectivity, | various layers of computer systems or networks. | |||
| which are incremental the existing L3VPN technology. It is helpful | ||||
| to identify and analyze these problems separately: | ||||
| - Private network connectivity between different data centers, | ||||
| either public or private. | ||||
| - Private network connectivity between different compute | ||||
| resources within a public data center. | ||||
| - Connectivity between different private networks within or | ||||
| across data centers. | ||||
| - Content distribution between centralized public or private | The compute loads of many different customers are executed over a | |||
| data centers and enterprise branch offices. | common infrastructure. Compute nodes are often executed as Virtual | |||
| Machines in an "Infrastructure as a Service" (IaaS) Data Center. | ||||
| The set of virtual machines corresponding to a particular customer | ||||
| should be constrained to a private network. | ||||
| - Private network connectivity for mobile devices. | New network requirements are presented due to the consolidation and | |||
| virtualization of Data Center resources, public, private, or | ||||
| hybrid. Large scale server virtualization (i.e., IaaS) requires | ||||
| scalable and robust Layer 3 network support. It also requires | ||||
| scalable local and global load balancing. This creates several new | ||||
| problems for network connectivity, namely elasticity, location | ||||
| independence (referred to also as Virtual Machine mobility), and | ||||
| extremely large number of virtual resources. | ||||
| This document defines the problems under the assumption that | In the Data Center networks, the VMs of a specific customer or | |||
| applications require IP unicast connectivity but no layer 2 direct | application are often configured to belong to the same IP subnet. | |||
| adjacencies. Applications with layer 2 requirements are likely to | Many solutions proposed for large Data Center networks rely on the | |||
| also have assumptions of other media characteristics such as round | assumption that the layer-2 inter-server connectivity is required, | |||
| trip time, for instance. | especially to support VM mobility within a virtual IP subnet. Given | |||
| that VM mobility consists in moving VMs anywhere within (and even | ||||
| across) Data Centers, the virtual subnet locality associated with | ||||
| small scale deployments cannot be preserved. A Data Center solution | ||||
| should not prevent grouping of virtual resources into IP subnets | ||||
| but the virtual subnets have no benefits of locality across a large | ||||
| data-center. | ||||
| This document is also under the assumption that both IPv4 and IPv6 | While some applications may expect to find other peers in a | |||
| unicast are in scope, but multicast service is a topic for further | particular user defined IP subnet, this does not imply the need to | |||
| discussion and outside the scope of this document. | provide a Layer 2 service that preserves MAC addresses. A network | |||
| virtualization solution should be able to provide IP unicast | ||||
| connectivity between hosts in the same and different subnets | ||||
| without any assumptions regarding the underlying media layer. A | ||||
| solution should also be able to provide a multicast service that | ||||
| implements IP subnet broadcast as well as IP multicast. | ||||
| The private network service to be provided must provide traffic | One of the main goals in designing a Data Center network is to | |||
| isolation between different VPNs allowing the use of a common | minimize the cost and complexity of its core/"fabric" network. The | |||
| infrastructure and take into account the need to reduce operational | cost and complexity of Data Center network is a function of the | |||
| costs. | number of virtualized resources, that is, the number of "closed | |||
| user-groups". Data Centers use VPNs to isolate compute resources | ||||
| associated with a specific "closed user-group". Some use VLANs as a | ||||
| VPN technology, others use Layer 3 based solutions often with | ||||
| proprietary control planes. Service Providers are interested in | ||||
| interoperability and in openly documented protocols rather than in | ||||
| proprietary solutions. | ||||
| 2. Terminology | 2. Terminology | |||
| AS Autonomous Systems | AS Autonomous Systems | |||
| DC Data Center | DC Data Center | |||
| DCI Data Center Interconnect | ||||
| EPC Evolved Packet Core | ||||
| End-System A device where Guest OS and Host OS/Hypervisor reside | ||||
| IaaS Infrastructure as a Service | IaaS Infrastructure as a Service | |||
| LTE Long Term Evolution | LTE Long Term Evolution | |||
| PCEF Policy Charging and Enforcement Function | ||||
| RT Route Target | RT Route Target | |||
| ToR Top-of-Rack switch | ToR Top-of-Rack switch | |||
| VM Virtual Machine | VM Virtual Machine | |||
| VMM Virtual Machine Manager, Hypervisor | Hypervisor Virtual Machine Manager | |||
| SDN Software Defined Network | ||||
| VPN Virtual Private Network | VPN Virtual Private Network | |||
| 3. VPN4DC: A problem Definition | 3. IP-VPN in Data Center Network | |||
| A VPN4DC solution needs to address the following problems that are | In this document, we define the problem statement and requirements | |||
| incremental to existing IPVPN solutions: | for Data Center connectivity based on the assumption that | |||
| applications require IP connectivity but no Layer 2 direct | ||||
| adjacencies. Applications do not send or receive Ethernet frames | ||||
| directly. They are restricted to IP services due to several reasons | ||||
| such as privileges, address discovery, portability, APIs, etc. IP | ||||
| service can be unicast, VPN broadcast, or multicast. | ||||
| - IP only data center: defined by a data center where VM, | An IP-VPN DC solution is meant to address IP-only Data Center, | |||
| applications, and hypervisors require only IP connectivity | defined by a Data Center where VMs, applications, and appliances | |||
| and the underlying DC infrastructure is IP only. | require only IP connectivity and the underlying DC core | |||
| infrastructure is IP only. Non-IP applications are addressed by | ||||
| other solutions and are not in scope of this document. | ||||
| - Network isolation among tenants or applications sharing the | It is also assumed that both IPv4 and IPv6 unicast communication is | |||
| same data centers. | to be supported. Furthermore, the multicast transmission, i.e., | |||
| allowing IP applications to send packets to a group of IP addresses | ||||
| should also be supported. The most typical multicast applications | ||||
| are service, network, device discovery applications and content | ||||
| distribution. While there are simpler and more effective ways to | ||||
| provide discovery services or reliable content delivery, a Data | ||||
| Center solution should support multicast transmission to | ||||
| applications. A Data Center solution should cover the case where | ||||
| the Data Center transport network does not support IP multicast | ||||
| transmission service. | ||||
| - Hypervisors may not support BGP as a control protocol. | The Data Center multicast service should also support a delivery of | |||
| traffic to all endpoints of a given VPN even if those endpoints | ||||
| have not sent any control messages indicating the need to receive | ||||
| that traffic. In other words, the multicast service should be | ||||
| capable of delivering the IP broadcast traffic in a virtual | ||||
| topology. | ||||
| - Fast and secure provisioning of a VPN connectivity for a VM | 3.1. Data Center Connectivity Scenarios | |||
| with low operational complexity within a data center and | ||||
| across data centers. This includes the ability to connect a | ||||
| VM to a customer VPN outside the data center, thus requiring | ||||
| the ability to provision the communication path within the | ||||
| data center to the customer VPN. It also includes | ||||
| interconnecting VMs within and across physical data centers | ||||
| in the context of a virtual data center. The customer VPN | ||||
| service could be provided by a BGP/MPLS VPN [RFC 4363] | ||||
| network service provider. The VPN connectivity provisioning | ||||
| is targeted to be done via in-band signaling rather than an | ||||
| out-of-band control infrastructure. The Software Defined | ||||
| Network (SDN) is addressing the latter approach. It is | ||||
| expected that both in-band and out-of-band provisioning | ||||
| control will have applicability in different environments. | ||||
| 4. Private network connectivity between data centers | There are three different cases of Data Center (DC) network | |||
| connectivity: | ||||
| Private data centers attach to the VPN network via a CE device, | 1. Intra-DC connectivity: Private network connectivity between | |||
| which advertises the respective IP address prefixes to the network. | compute resources within a public (or private) Data Center. | |||
| In this space, the requirements remain unchanged from current | ||||
| private networks, unless we assume the ability to migrate Virtual | ||||
| Machines (VMs) between different data centers. | ||||
| In the case that VMs are allowed to migrate between distinct data | 2. Inter-DC connectivity: Private network connectivity between | |||
| centers, this requires that each specific IP Host prefix for a VM | different Data Centers, either public or private. | |||
| to be advertised to the VPN network or an "home agent" approach | ||||
| that can redirect traffic from one data center to another (with | ||||
| potential negative consequences to latency). | ||||
| When private networks interconnect with public data centers, the | 3. Client-to-DC connectivity: Connectivity between client and a | |||
| VPN provider must interconnect with the public data center | private or public Data Center. The later includes | |||
| provider. In this case we are in the presence of an Inter-Provider | interconnection between a service provider and a public Data | |||
| VPN in which the VPN service provider manages part of the | Center (which may belong to the same or different service | |||
| connectivity and in which the data center provider provides network | provider). | |||
| attachment points for multiple common customers. | ||||
| As with existing Inter-AS BGP/MPLS VPN scenarios, the Route Target | Private network connectivity within the Data Center requires | |||
| (RT) associated with a specific VPN (in a symmetrical VPN) must be | network virtualization solution. In this document we define Layer 3 | |||
| coordinated between the two entities (service provider and data | VPN requirements to Data Center network virtualization. The Layer 3 | |||
| center provider). The data center provider services (e.g. the API | VPN technology (i.e., MPLS/BGP VPN) also applies to the | |||
| portal to its orchestration system) must also be accessible to all | interconnection of different data-centers. | |||
| the carriers VPNs. | ||||
| As data center providers often have different operational | When private networks interconnect with public Data Centers, the | |||
| procedures than network services providers it is important to | VPN provider must interconnect with the public Data Center | |||
| identify potential solutions, from operational procedures to | provider. In this case we are in the presence of inter-provider | |||
| application APIs that can exchange the necessary information | VPNs. The Inter-AS MPLS/BGP VPN Options A, B, or C [RFC 4364] | |||
| between the VPN network service provider and data center provider. | provide network-to-network interconnection service and they | |||
| constitute the basis of SP network to public Data Center network | ||||
| connectivity. There might incremental improvements to the existing | ||||
| inter-AS solutions, pertaining to scalability and security, for | ||||
| example. | ||||
| 5. Private Networks within a public data center | Service Providers can leverage their existing Layer 3 VPN services | |||
| and provide private VPN access from client's branch sites to | ||||
| client's own private Data Center or to SP's own Data Center. The | ||||
| service provider-based VPN access can provide additional value | ||||
| compared with public internet access, such as security, QoS, OAM, | ||||
| and troubleshooting. | ||||
| Public data centers achieve efficiencies by executing the compute | 4. Data Center Virtualization Requirements | |||
| loads of many different customers over a common infrastructure for | ||||
| compute, storage and network. | ||||
| Compute nodes are often executed as Virtual Machines, in an | Private network connection service in a Data Center must provide | |||
| "Infrastructure as a Service" (IaaS) data center. The set of | traffic isolation between different virtual instances that share a | |||
| virtual machines corresponding to a particular customer should be | common physical infrastructure. A collection of compute resources | |||
| constrained to a private network. L3VPN technologies have proven to | dedicated to a process or application is referred to as a "closed | |||
| be able to scale to a large number of customer routes while | user-group". Each "closed user-group" is a VPN in the terminology | |||
| providing for aggregated management capability. It is important to | used by IP VPNs. | |||
| document the applicability of BGP/MPLS L3VPN technology to VMs in a | ||||
| data center. | ||||
| It must take into account that MPLS itself is not a common | Any DC solution needs to assure network isolation among tenants or | |||
| technology within data centers and as such the solution must | applications sharing the same Data Center physical resources. A DC | |||
| provide for IP based forwarding. It is also important to consider | solution should allow a VM or application end-point to belong to | |||
| whether the end-system itself can contain the routing information | multiple closed user-groups/VPNs. A closed user-group should be able | |||
| corresponding to the VPN overlay networks without the assistance of | to communicate with other closed-user groups according to specified | |||
| the Top-of-Rack (ToR) switch, which may be constrained in terms of | routing policies. A customer or tenant should be able to define | |||
| its routing table size. | multiple closed user-groups. | |||
| 6. Connectivity between different VPNs | Typically VPNs that belong to different tenants do not communicate | |||
| with each other directly but they should be allowed to access | ||||
| common appliances such as storage, database services, security | ||||
| services, etc. It is also common for tenants to deploy a VPN per | ||||
| "application tier" (e.g. a VPN for web front-ends and a different | ||||
| VPN for the logic tier). In that scenario most of the traffic | ||||
| crosses VPN boundaries. That is also the case when "network | ||||
| attached storage" (NAS) is used or when databases are deployed as- | ||||
| a-service. | ||||
| Within a data center, the VMs within a private network will need to | Another reason for the Data Center network virtualization is the | |||
| communicate with data center common services such as storage or | need to support VM move. Since the IP addresses used for | |||
| data-base services. These services often imply high traffic | communication within or between applications may be anywhere across | |||
| volumes. | the data-center, using a virtual topology is an effective way to | |||
| solve this problem. | ||||
| The traditional approach is to deploy stateful service appliance, | 5. Decoupling of Virtualized Networking from Physical | |||
| between different VPNs. That may become cost prohibitive for | Infrastructure | |||
| services with high volume of traffic. It is important to consider | ||||
| whether pushing the desired traffic control rules to the ingress | ||||
| points of the network (traffic sources) may assist in addressing | ||||
| this operational issue. | ||||
| 7. Mobile connectivity | The Data Center switching infrastructure (access, aggregation, and | |||
| core switches) should not maintain any information that pertains to | ||||
| the virtual networks. Decoupling of virtualized networking from the | ||||
| physical infrastructure has the following advantages: 1) provides | ||||
| better scalability; 2) simplifies the design and operation; 3) | ||||
| reduces the cost of a Data Center network. It has been proven (in | ||||
| Internet and in large BGP IP VPN deployments) that moving | ||||
| complexity associated with virtual entities to network edge while | ||||
| keeping network core simple has very good scaling properties. | ||||
| There should be a total separation between the virtualized segments | ||||
| (virtual network interfaces that are associated with VMs) and the | ||||
| physical network (i.e., physical interfaces that are associated | ||||
| with the data-center switching infrastructure). This separation | ||||
| should include the separation of the virtual network IP address | ||||
| space from the physical network IP address space. The physical | ||||
| infrastructure addresses should be routable in the underlying Data | ||||
| Center transport network, while the virtual network addresses | ||||
| should be routable on the VPN network only. Not only should the | ||||
| virtual network data plane be fully decoupled from the physical | ||||
| network, but its control plane should be decoupled as well. | ||||
| In order to decouple virtual and physical networks, the virtual | ||||
| networking should be treated as an "infrastructure" application. | ||||
| Only the solutions that meet those requirements would provide a | ||||
| truly scalable virtual networking. | ||||
| MPLS labels provide the necessary information to implement VPNs. | ||||
| When crossing the Data Center infrastructure the virtual network | ||||
| payloads should be encapsulated in IP or GRE [RFC 4023], or native | ||||
| MPLS envelopes. | ||||
| 6. Encapsulation/Decapsulation Device for Virtual Network Payloads | ||||
| In order to scale a virtualized Data Center infrastructure, the | ||||
| encapsulation (and decapsulation) of virtual network payloads | ||||
| should be implemented on a device as close to virtualized resources | ||||
| as possible. Since the hypervisors in the end-systems are the | ||||
| devices at the edge of a Data Center network they are the most | ||||
| optimal location for the VPN encap/decap functionality. | ||||
| Data-plane device that implements the VPN encap/decap functionality | ||||
| acts as the first-hop router in the virtual topology. | ||||
| The IP-VPN solution for Data Center should also support deployments | ||||
| where it is not possible or not desirable to implement VPN | ||||
| encapsulation in the hypervisor/Host OS. In such deployments | ||||
| encap/decap functionality may be implemented in an external | ||||
| physical switch such as aggregation switch or top-of-rack switch. | ||||
| The external device implementing VPN tunneling functionality should | ||||
| be a close as possible to the end-system itself. The same DC | ||||
| solution should support deployments with both, internal (in a | ||||
| hypervisor) and external (outside of a hypervisor) encap/decap | ||||
| devices. | ||||
| Whenever the VPN forwarding functionality (i.e., the data-plane | ||||
| device that encapsulates packets into, e.g., MPLS-over-GRE header) | ||||
| is implemented in an external device, the VPN service itself must | ||||
| be delivered to the virtual interfaces visible to the guest OS. | ||||
| However, the switching elements connecting the end-system to the | ||||
| encap/decap device should not be aware of the virtual topology. | ||||
| Instead, the VPN endpoint membership information might be, for | ||||
| example, communicated by the end-system using a signaling protocol. | ||||
| Furthermore, for an all-IP solution, the Layer 2 switching elements | ||||
| connecting the end-system to the encap/decap device should have no | ||||
| knowledge of the VM/application endpoints. In particular, the MAC | ||||
| addresses known to the guest OS should not appear on the wire. | ||||
| 7. Decoupling of Layer 3 Virtualization from Layer 2 Topology | ||||
| The IP-VPN approach to Data Center network design dictates that the | ||||
| virtualized communication should be routed, not bridged. The Layer | ||||
| 3 virtualization solution should be decoupled from the Layer 2 | ||||
| topology. Thus, there should be no dependency on VLANs or Layer 2 | ||||
| broadcast. | ||||
| In solutions that depend on Layer 2 broadcast domains, the VM-to-VM | ||||
| communication is established based on flooding and data plane MAC | ||||
| learning. Layer 2 MAC information has to be maintained on every | ||||
| switch where a given VLAN is present. Even if some solutions are | ||||
| able to eliminate data plane MAC learning and/or unicast flooding | ||||
| across Data Center core network, they still rely on VM MAC learning | ||||
| at the network edge and on maintaining the VM MAC addresses on | ||||
| every (edge) switch where the Layer 2 VPN is present. | ||||
| The MAC addresses known to guest OS in end-system are not relevant | ||||
| to IP services and introduce unnecessary overhead. Hence, the MAC | ||||
| addresses associated with virtual machines should not be used in | ||||
| the virtual Layer 3 networks. Rather, only what is significant to | ||||
| IP communication, namely the IP addresses of the VMs and | ||||
| application endpoints should be maintained by the virtual networks. | ||||
| An IP-VPN solution should forwards VM traffic based on their IP | ||||
| addresses and not on their MAC addresses. | ||||
| From a Layer 3 virtual network perspective, IP packets should reach | ||||
| the first-hop router in one-hop, regardless of whether the first- | ||||
| hop router is a hypervisor/Host OS or it is an external device. The | ||||
| VPN first-hop router should always perform an IP lookup on every | ||||
| packet it receives from a VM or an application. The first-hop | ||||
| router should encapsulate the packets and route them towards the | ||||
| destination end-system. Every IP packet should be forwarded along | ||||
| the shortest path towards a destination host or appliance, | ||||
| regardless of whether the packet's source and destination are in | ||||
| the same or different subnets. | ||||
| 8. Requirements for Optimal Forwarding of Data Center Traffic | ||||
| The Data Center solutions that optimize for the maximum utilization | ||||
| of compute and storage resources require that those resources may be | ||||
| located anywhere in the data-center. The physical and logical | ||||
| spreading of appliances and computations implies a very significant | ||||
| increase in data-center infrastructure bandwidth consumption. Hence, | ||||
| it is important that DC solutions are efficient in terms of traffic | ||||
| forwarding and assure that packets traverse Data Center switching | ||||
| infrastructure only once. This is not possible in DC solutions where | ||||
| a virtual network boundary between bridging (Layer 2) and routing | ||||
| (Layer 3) exists anywhere within the Data Center transport network. | ||||
| If a VM can be placed in an arbitrary location, mixing of the Layer | ||||
| 2 and the Layer 3 solutions may cause the VM traffic traverse the | ||||
| Data Center core multiple times before reaching the destination | ||||
| host. | ||||
| It must be also possible to send the traffic directly from one VM | ||||
| to another VM (within or between subnets) without traversing | ||||
| through a midpoint router. This is important given that most of the | ||||
| traffic in a Data Center is within the VPNs. | ||||
| 9. Virtual Network Provisioning Requirements | ||||
| IP-VPN DC has to provide fast and secure provisioning (with low | ||||
| operational complexity) of VPN connectivity for a VM within a Data | ||||
| Center and across Data Centers. This includes interconnecting VMs | ||||
| within and across physical Data Centers in the context of a virtual | ||||
| networking. It also includes the ability to connect a VM to a | ||||
| customer VPN outside the Data Center, thus requiring the ability to | ||||
| provision the communication path within the Data Center to the | ||||
| customer VPN. | ||||
| The VM provisioning should be performed by an orchestration system. | ||||
| The orchestration system should have a notion of a closer user- | ||||
| group/tenant and the information about the services the tenant is | ||||
| allowed to access. The orchestration system should allocate an IP | ||||
| address to a VM. When the VM is provisioned, its IP address and | ||||
| the closed user-group/VPN identifier (VPN-ID) should be | ||||
| communicated to the host OS on the end-system. There should a | ||||
| centralized database system (possibly with a distributed | ||||
| implementation) that will contain the provisioning information | ||||
| regarding VPN-IDs and the services the corresponding VPNs could | ||||
| access. This information should be accessible to the virtual | ||||
| network control plane. | ||||
| The orchestration system should be able to support the | ||||
| specification of fine grain forwarding policies (such as filtering, | ||||
| redirection, rate limiting) to be injected as the traffic flow | ||||
| rules into the virtual network. | ||||
| Common APIs can be a simple and a useful step to facilitate the | ||||
| provisioning processes. Authentication is required when a VM is | ||||
| being provisioned to join an IP VPN. | ||||
| An IP-VPN Data Center networking solution should seamlessly support | ||||
| VM connectivity to other network devices (such as service | ||||
| appliances or routers) that use the traditional BGP/MPLS VPN | ||||
| technology. | ||||
| 10. Application of BGP/MPLS VPN Technology to Data Center | ||||
| Network | ||||
| BGP IP VPN technologies (based on [RFC 4364]) have proven to be | ||||
| able to scale to a large number of VPNs (tens of thousands) and | ||||
| customer routes (millions) while providing for aggregated | ||||
| management capability. Data Center networks could use the same | ||||
| transport mechanisms as used today in many Service Provider | ||||
| networks, specifically the MPLS/BGP VPNs that often overlay huge | ||||
| transport areas. | ||||
| MPLS/BGP VPNs use BGP as a signaling protocol to exchange VPN | ||||
| routes. IP-VPN DC solution should consider that it might not be | ||||
| feasible to run BGP protocol on a hypervisor or external switch | ||||
| such as top-of-rack. This includes functions like BGP route | ||||
| selection and processing of routing policies, as well as handling | ||||
| MP-BGP structures like Route Distinguishers and Route Targets. | ||||
| Rather, it might be preferable to use a signaling mechanism that is | ||||
| more familiar and compatible with the methods used in the | ||||
| application software development. While network devices (such as | ||||
| routers and appliances) may choose to receive VPN signaling | ||||
| information directly via BGP, the end-systems/switches may choose | ||||
| other type of interface or protocol to exchange virtual end-point | ||||
| information. The IP VPN solution for Data Center should specify the | ||||
| mapping between the signaling messages used by the | ||||
| hypervisors/switches and the MP-BGP routes used by MP-BGP speakers | ||||
| participating in the virtual network. | ||||
| In traditional WAN deployments of BGP IP VPNs [RFC 4364], the | ||||
| forwarding function and control function of a Provider Edge (PE) | ||||
| device have co-existed within a single physical router. In a Data | ||||
| Center network, the PE plays a role of the first-hop router, in a | ||||
| virtual domain. The signaling exchanged between forwarding and | ||||
| control planes in a PE has been proprietary to a specific PE | ||||
| router/vendor. When BGP IP VPNs are applied to a Data Center | ||||
| network, the signaling used between the control plane and | ||||
| forwarding should be open to provisioning and standardization. We | ||||
| explore this requirement in more detail below. | ||||
| When MPLS/BGP VPNs [RFC 4364] are used to connect VMs or | ||||
| application endpoints, it might be desirable for a hypervisor's | ||||
| host or an external switch (such as TOR) to support only the | ||||
| forwarding aspect of a Provider Edge (PE) function. The VMs or | ||||
| applications would act as Customer Edges (CEs) and the virtual | ||||
| networks interfaces associated with the VMs/applications as CE | ||||
| interfaces. More specifically, a hypervisor/first-hop switch would | ||||
| support only the creation and population of VRF tables that store | ||||
| the forwarding information to the VMs and applications. The | ||||
| forwarding information should include 20-bit label associated with | ||||
| a virtual interface (i.e., a specific VM/application endpoint) and | ||||
| assigned by the destination PE. This label has only a local | ||||
| significance within a destination PE. A hypervisor/first-hop switch | ||||
| would not need to support BGP, a protocol familiar to network | ||||
| devices. | ||||
| When a PE forwarding function is implemented on an external switch, | ||||
| such as aggregation or top-of-rack switch, the end-system must be | ||||
| able to communicate the endpoint and its VPN membership information | ||||
| to the external switch. It should be able to convey the endpoint's | ||||
| instantiation as well as removal events. | ||||
| An IP-VPN Data Center networking solution should be able to support | ||||
| a mixture of internal PEs (implemented in hypervisors/Host OS) and | ||||
| external PEs (implemented on external to the end-system devices). | ||||
| The IP-VPN DC solution should allow BGP/MPLS VPN-capable network | ||||
| devices, such as routers or appliances, to participate directly in | ||||
| a virtual network with the Virtual Machines and applications. Those | ||||
| network devices can participate in isolated collections of VMs, | ||||
| i.e., in isolated VPNs, as well as in overlapping VPNs (called | ||||
| "extranets" in BGP/MPLS VPN terminology). | ||||
| The device performing PE forwarding function should be capable of | ||||
| supporting multiple Virtual Routing and Forwarding (VRF) tables | ||||
| representing distinct "close user groups". It should also be able | ||||
| to associate a virtual interface (corresponding to a VM or | ||||
| application endpoint) with a specific VRF. | ||||
| The first-hop router has to be capable of encapsulating outgoing | ||||
| traffic (end-system towards Data Center network) in IP/GRE or MPLS | ||||
| envelopes, including the per-prefix 20-bit VPN label. The first-hop | ||||
| router has to be also capable of associating incoming packets from | ||||
| a Data Center network with a virtual interface, based on the 20-bit | ||||
| VPN label contained in the packets. | ||||
| The protocol used by the VPN first-hop routers to signal VPNs | ||||
| should be independent of the transport network protocol as long as | ||||
| the transport encapsulation has the ability to carry a 20-bit VPN | ||||
| label. | ||||
| 10.1. Data Center Transport Network | ||||
| MPLS/VPN technology based on [RFC 4364] specifies several different | ||||
| encapsulation methods for connecting PE routers, namely Label | ||||
| Switched Paths (LSPs), IP tunneling, and GRE tunneling. If LSPs are | ||||
| used in the transport network they could be signaled with LDP, in | ||||
| which case host (/32) routes to all PE routers must be propagated | ||||
| throughout the network, or with RSVP-TE, in which case a full mesh | ||||
| of RSVP-TE tunnels is required, generating a lot of state in the | ||||
| network core. If the number of LSPs is expected to be high, due to | ||||
| a large size of Data Center network, then IP or GRE encapsulation | ||||
| can be used, where the above mentioned scalability is not a concern | ||||
| due to route aggregation property of IP protocols. | ||||
| 10.2. BGP Requirements in a Data Center Environment | ||||
| 10.2.1. BGP Convergence and Routing Consistency | ||||
| BGP was designed to carry very large amount of routing information | ||||
| but it is not a very fast converging protocol. In addition, the | ||||
| routing protocols, including BGP, have traditionally favored | ||||
| convergence (i.e., responsiveness to route change due to failure or | ||||
| policy change) over routing consistency. Routing consistency means | ||||
| that a router forwards a packet strictly along the path adopted by | ||||
| the upstream routers. When responsiveness is favored, a router | ||||
| applies a received update immediately to its forwarding table | ||||
| before propagating the update to other routers, including those | ||||
| that potentially depend upon the outcome of the update. The route | ||||
| change responsiveness comes at the cost of routing blackholes and | ||||
| loops. | ||||
| Routing consistency across Data Center is important because in | ||||
| large Data Centers thousands of Virtual Machines can be | ||||
| simultaneously moved between server racks due to maintenance, for | ||||
| example. If packets sent by the Virtual Machines that are being | ||||
| moved are dropped (because they do not follow a live path), the | ||||
| active network connections on those VMs will be dropped. To | ||||
| minimize the disruption to the established communications during VM | ||||
| migration, the live path continuity is required. | ||||
| 10.2.2. VM Mobility Support | ||||
| To overcome BGP convergence and route consistency limitations, the | ||||
| forwarding plane techniques that support fast convergence should be | ||||
| used. In fact, there exist forwarding plane techniques that support | ||||
| fast convergence by removing from the forwarding table a locally | ||||
| learn route and instantaneously using already installed new routing | ||||
| information to a given destination. This technique is often | ||||
| referred to as "local repair". It allows to forward traffic (almost) | ||||
| continuously to a VM that has migrated to a new physical location | ||||
| using an indirect forwarding path or tunnel via VM's old location | ||||
| (i.e., old VM forwarder). The traffic path is restored locally at | ||||
| the VM's old location while the network converges to the new | ||||
| location of the migrated VM. Eventually, the network converges to | ||||
| optimal path and bypasses the local repair. | ||||
| BGP should assist in the local repair techniques by advertizing | ||||
| multiple and not only the best path to a given destination. | ||||
| 10.2.3. Optimizing Route Distribution | ||||
| When virtual networks are triggered based on the IP communication | ||||
| (as proposed in this document), the Route Target Constraint | ||||
| extension [RFC 4684] of BGP should be used to optimize the route | ||||
| distribution for sparse virtual network events. This technique | ||||
| ensures that only those VPN forwarders that have local participants | ||||
| in a particular data plane event receive its routing information. | ||||
| This also decreases the total load on the upstream BGP speakers. | ||||
| 10.2.4. Inter-operability with MPLS/BGP VPNs | ||||
| As was stated in section 10, the IP-VPN DC solution should be fully | ||||
| inter-operable with MPLS/BGP VPNs. MPLS/BGP VPN technology is | ||||
| widely supported on routers and other appliances. When connecting a | ||||
| Data Center virtual network with other services/networks, it is not | ||||
| necessary to advertize the specific VM host routes but rather the | ||||
| aggregated routing information. A router or appliance within a Data | ||||
| Center can be used to aggregate VPN's IP routing information and | ||||
| advertize the aggregated prefixes. The aggregated prefixes would be | ||||
| advertized with the router/appliance IP address as BGP next-hop and | ||||
| with locally assigned aggregate 20-bit label. The aggregate label | ||||
| will trigger a destination IP lookup in its corresponding VRF on | ||||
| all the packets entering the virtual network. | ||||
| 11. Virtual Machine Migration Requirement | ||||
| The "Virtual Machine live migration" (a.k.a. VM mobility) is highly | ||||
| desirable for many reasons such as efficient and flexible resource | ||||
| sharing, Data Center migration, disaster recovery, server | ||||
| redundancy, or service bursting. VM live migration consists in | ||||
| moving a virtual machine from one physical server to another, while | ||||
| preserving the VM's active network connections (e.g., TCP and | ||||
| higher-level sessions). | ||||
| VM live mobility primarily happens within the same physical Data | ||||
| Center but VM live mobility between Data Centers might be also | ||||
| required. The IP-VPN Data Center solutions need to address both | ||||
| intra-Data Center and inter-Data Center VM live mobility. | ||||
| Traditional Data Center deployments have followed IP subnet | ||||
| boundary, i.e., hosts often stayed in the same IP subnet and a host | ||||
| had to change its IP address when it moved to a different location. | ||||
| Such architecture have worked well when hosts were dedicated to an | ||||
| application and resided in physical proximity to each other. These | ||||
| assumptions are not true in the IaaS environment where compute | ||||
| resources associated with a given application can be spread and | ||||
| dynamically move across a large Data Center. | ||||
| Many DC design proposals are trying to address the VM mobility with | ||||
| data-center wide VLANs using Data Center-wide Layer 2 broadcast | ||||
| domains. With data-center wide VLANs, a VM move is handled by | ||||
| generating gratuitous ARP reply to update all ARP caches and switch | ||||
| learning tables. Since a virtual subnet locality cannot be preserved | ||||
| in a large Data Center, a virtual subnet (VLAN) must be present on | ||||
| every Data Center switch, limiting the number of virtual networks to | ||||
| 4094. Even if a Layer 2 Data Center solution is able to minimize or | ||||
| eliminate the ARP flooding across Data Center core, all edge | ||||
| switches still have to perform dynamic VM MAC learning and maintain | ||||
| VM's MAC-to-IP mappings. | ||||
| Since in large Data Centers physical proximity of computing | ||||
| resources cannot be assumed, grouping of hosts into subnets does | ||||
| not provide any VM mobility benefits. Rather, VM mobility in a | ||||
| large Data Center should be based on a collection of host routes | ||||
| spread randomly across a large physical area. | ||||
| When dealing with IP-only applications it is not only sufficient but | ||||
| optimal to forward the traffic based on Layer 3 rather than on Layer | ||||
| 2 information. The MAC addresses of Virtual Machines are irrelevant | ||||
| to IP services and introduce unnecessary overhead (i.e., maintaining | ||||
| ARP caches of VM MACs) and complications when VMs move (e.g., when | ||||
| VM's MAC address is changed in its new location). IP-based VPN | ||||
| connectivity solution is a cost effective and scalable approach to | ||||
| solve VM mobility problem. In IP-VPN DC a VM move is handled by a | ||||
| route advertisement. | ||||
| To accommodate live migration of Virtual Machines, it is desirable | ||||
| to assign a permanent IP address to a VM that remains with the VM | ||||
| after it moves. Typically, a VM/application reaches the off-subnet | ||||
| destinations via a default gateway, which should be the first-hop | ||||
| router (in the virtual topology). A VM/application should reach the | ||||
| on-subnet destinations via an ARP proxy which again should be the | ||||
| VPN first-hop router. A VM/application cannot change the default | ||||
| gateway's IP and MAC addresses during live migration, as it would | ||||
| require changes to TCP/IP stack in the guest OS. Hence, the first- | ||||
| hop VPN router should use a common, locally significant IP address | ||||
| and a common virtual MAC address to support VM live mobility. More | ||||
| specifically, this IP address and the MAC address should be the | ||||
| same on all first-hop VPN routers in order to support the VM moves | ||||
| between different physical machines. Moreover, in order to preserve | ||||
| virtual network and infrastructure separation, the IP and MAC | ||||
| addresses of the first-hop routers should be shared among all | ||||
| virtual IP-subnets/VPNs. Since the first-hop router always performs | ||||
| an IP lookup on every packet destination IP address, the VM traffic | ||||
| is forwarded on the optimal path and traverses the Data Center | ||||
| network only once. | ||||
| The VM live migration has to be transparent to applications and any | ||||
| external entity interacting with the applications. This implies | ||||
| that the VM's network connectivity restoration time is critical. | ||||
| The transport sessions can typically survive over several seconds | ||||
| of disruption, however, applications may have sub-second latency | ||||
| requirement for their correct operation. | ||||
| To minimize the disruption to the established communications during | ||||
| VM migration, the control plane of a DC solution should be able to | ||||
| differentiate between VM activation in a new location from | ||||
| advertising its host route to the network. This will enable the VPN | ||||
| first-hop routers forwarders to install a route to VM's new | ||||
| location prior to its migration, allowing the traffic to be | ||||
| tunneled via the first-hop router at the VM's old location. There | ||||
| are techniques available in BGP as well as in forwarding plane that | ||||
| support fast convergence due to withdrawal or replacement of | ||||
| current or less preferred forwarding information (see section 10.2 | ||||
| for more detailed description of such technique). | ||||
| 12. IP-VPN Data Center Use Case: Virtualization of Mobile | ||||
| Network | ||||
| Application access is being done increasingly from clients such as | Application access is being done increasingly from clients such as | |||
| cell phones or tablets that may come in via a private WiFi access | cell phones or tablets connecting via private or public WiFi access | |||
| point or a public WiFi or 3G/LTE access. These clients must have | points, or 3G/LTE wireless access. Enterprises with a mobile | |||
| access to application which servers reside on a private network. | workforce need to access resources in the enterprise VPN while they | |||
| are traveling, e.g., sales data from a corporate database. The | ||||
| mobile workforce might also, for security reasons, be equipped with | ||||
| disk-less notebooks which rely on the enterprise VPN for all file | ||||
| accesses. The mobile workforce applications may occasionally need | ||||
| to utilize the compute resources and other functions (e.g., | ||||
| storage) that the enterprise hosts on the infrastructure of a cloud | ||||
| computing provider. The mobile devices might require simultaneous | ||||
| access to resources in both, the cloud infrastructure as well as | ||||
| the enterprise VPN. | ||||
| The enterprise wide area network may use a provider-based MPLS/BGP | ||||
| VPN service. The wireless service providers already use MPLS/BGP | ||||
| VPNs for enterprise customer isolation in the mobile packet core | ||||
| elements. Using the same VPN technology in the service provider Data | ||||
| Center network (or in a public Data Center network) is a natural | ||||
| extension. | ||||
| It is important to consider whether it is possible to connect | Furthermore, there is a need to instantiate mobile applications | |||
| applications in mobile clients to provider provisioned VPNs. For | themselves as virtual networks in order to improve application | |||
| instance by using IPSec tunnels; or whether these applications are | performance (e.g., latency, Quality-of-Service) or to enable new | |||
| best served by content caches running in the service provider | applications with specialized requirements. In addition it might be | |||
| infrastructure. | required that the application's computing resource is made to be | |||
| part of the mobility network itself and placed as close as possible | ||||
| to a mobile user. Since LTE data and voice applications use IP | ||||
| protocols only, the IP-VPN solution to virtualization of compute | ||||
| resources in mobile networks would be the optimal approach. | ||||
| The solution should assume that client, VPN provider and data | The infrastructure of a large scale mobility network could itself | |||
| center may be in different Autonomous Systems. | be virtualized and made available in the form of virtual private | |||
| networks to organizations that do not want to spend the required | ||||
| capital. The Mobile Core functions can be realized via software | ||||
| running on virtual machines in a service-provider-class compute | ||||
| environment. The functional entities such as Service-Gateways (S- | ||||
| GW), Packet-Gateways (P-GW), or Policy Charging and Enforcement | ||||
| Function (PCEF) of the LTE system can be run as applications on | ||||
| virtual machines, coordinated by an orchestrator and managed by a | ||||
| hypervisor. Virtualized packet core network elements (PCEF, S-GW, | ||||
| P-GW) could be placed anywhere in the mobile network | ||||
| infrastructure, as long as the IP connectivity is provided. The | ||||
| virtualization of the Mobile Core functions running on a private | ||||
| computing environment has many benefits, including faster service | ||||
| delivery, better economies of scale, simpler operations. | ||||
| Since the LTE (Long Term Evolution) and Evolved Packet Core (EPC) | ||||
| system are all-IP networks, the IP-VPN solution to mobile network | ||||
| virtualization is the best fit. | ||||
| 8. Security Considerations | 13. Security Considerations | |||
| The document presents the problems need to be addressed in the | The document presents the problems need to be addressed in the | |||
| L3VPN for data center space. The requirements and solutions will be | L3VPN for Data Center space. The requirements and solutions will be | |||
| documented separately. | documented separately. | |||
| The security considerations for general requirements or individual | The security considerations for general requirements or individual | |||
| solutions will be documented in the relevant documents. | solutions will be documented in the relevant documents. | |||
| 9. IANA Considerations | 14. IANA Considerations | |||
| This document contains no new IANA considerations. | This document contains no new IANA considerations. | |||
| 10. Normative References | 15. Normative References | |||
| [RFC 4363] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | [RFC 4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | |||
| Networks (VPNs)", RFC 4364, February 2006. | Networks (VPNs)", RFC 4364, February 2006. | |||
| 11. Informative References | [RFC 4023] Worster, T., Rekhter, Y. and E. Rosen, "Encapsulating in | |||
| IP or Generic Routing Encapsulation (GRE)", RFC 4023, March | ||||
| 2005. | ||||
| [RFC 4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, | ||||
| R., Patel, K. and J. Guichard, "Constrained Route Distribution for | ||||
| Border Gateway Protocol/Multiprotocol Label Switching (BGP/MPLS) | ||||
| Internet Protocol (IP) Virtual Private Networks (VPNs)", RFC 4684, | ||||
| November 2006. | ||||
| 16. Informative References | ||||
| [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
| 12. Author's Address | 17. Authors' Addresses | |||
| Maria Napierala | ||||
| AT&T | ||||
| 200 Laurel Avenue | ||||
| Middletown, NJ 07748 | ||||
| Email: mnapierala@att.com | ||||
| Luyuan Fang | Luyuan Fang | |||
| Cisco Systems | Cisco Systems | |||
| 111 Wood Avenue South | 111 Wood Avenue South | |||
| Iselin, NJ 08830 | Iselin, NJ 08830, USA | |||
| USA | ||||
| Email: lufang@cisco.com | Email: lufang@cisco.com | |||
| 13. Acknowledgement | Dennis Cai | |||
| Cisco Systems | ||||
| 725 Alder Drive | ||||
| Milpitas, CA 95035, USA | ||||
| Email: dcai@cisco.com | ||||
| The author would like to thank Pedro Marques for his helpful | 18. Acknowledgements | |||
| comments/input. | ||||
| The authors would like to thank Pedro Marques for his helpful | ||||
| comments and input. | ||||
| End of changes. 51 change blocks. | ||||
| 165 lines changed or deleted | 696 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||