Network Working Group L. Jin Internet-Draft ZTE Intended status: Informational B. Khasnabish Expires: November 12, 2012 ZTE USA May 11, 2012 Architecture of PSN Independent Overlay Network(PION) draft-kj-nvo3-pion-architecture-00.txt Abstract This draft introduces PSN independent overlay network (PION) architecture for intra- and inter-datacenter (DC) connections. The motivations, protocol layers, applications, and etc, for PION are also discussed. PION provides a virtualized underlying-PSN- independent network in order to maximize the reuse of IETF protocol definitions and implementations. The inter- and intra-DC connection provided by PION could be from endpoint to endpoint, or endpoint to network, or network to network. The packet transport capabilities provided by the overlay network are determined by the capability of the underlying PSN. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 12, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Jin & Khasnabish Expires November 12, 2012 [Page 1] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . 3 3. PION Definition . . . . . . . . . . . . . . . . . . . . . . . 3 4. PION Motivation . . . . . . . . . . . . . . . . . . . . . . . 4 5. PION Protocol model . . . . . . . . . . . . . . . . . . . . . 5 5.1. Protocol Layers . . . . . . . . . . . . . . . . . . . . . 5 5.2. Encapsulation Layer . . . . . . . . . . . . . . . . . . . 6 5.3. Tenant Network Identifier Layer . . . . . . . . . . . . . 7 5.4. PSN Layer Encapsulation . . . . . . . . . . . . . . . . . 7 5.5. Associate tenant and PSN Layer . . . . . . . . . . . . . . 7 6. Network Architecture . . . . . . . . . . . . . . . . . . . . . 8 7. Applicability of PION . . . . . . . . . . . . . . . . . . . . 8 7.1. PION over IP PSN . . . . . . . . . . . . . . . . . . . . . 9 7.2. PION over MPLS PSN . . . . . . . . . . . . . . . . . . . . 9 8. Control Plane Consideration . . . . . . . . . . . . . . . . . 10 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 10. Informative References . . . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 Jin & Khasnabish Expires November 12, 2012 [Page 2] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 1. Introduction This draft introduces architecture of PSN independent overlay network (PION) for intra and inter datacenter connection, the motivation, protocol layer, application and etc. The PSN in this draft refers to IP or MPLS network. PION provides a virtualized network independent of underlying PSN, so as to maximize the reuse of IETF protocol definitions and implementations. That means the overlay network could work on any underlying PSN layer, and reuse the capability of the underlying layer. The inter- and intra-DC connection provided by PION could be from endpoint to endpoint, or endpoint to network, or network to network. The packet transport capabilities provided by the overlay network are determined by the capability of the underlying PSN. Enabling overlay network to be independent of underlying PSN allows overlay network to be benefit from different kinds of underlying PSN capabilities. 2. List of Acronyms PSN: Packet Switched Network PION: PSN independent overlay network PION Header: include an encapsulation layer and tenant ID layer Tenant Packet: a customer packet encapsulated with PION header BW: BandWidth ECMP: Equal Cost Multi-Path MPLS: Multi-Protocol Label Switch NVE: Network Virtualization Edge QoS: Quality of Service SLA: Service level aggrement 3. PION Definition PSN independent overlay network (PION) is to provide an overlay network for datacenter, to provider intra and inter connection between end-points over various kinds of underlying PSN. PION could provide service bandwidth and QoS assurance, multicast, traffic Jin & Khasnabish Expires November 12, 2012 [Page 3] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 engineering, security and other capabilities by relying on different kinds of underlying PSN capabilities. PION is designed to isolate traffic and addresses among different tenants, and to be scalable enough to accommodate millions of end-points. The PION provides the following functionalities: 1. Be PSN independent, to maximize reuse of existing IETF defined PSN technologies. 2. Provide traffic/address isolation for each tenant traffic; 3. Provide good scalability to accommodate two million VMs running on greater than one hundred thousands of physical servers; 4. Provide differentiate service for differentiate tenant, including bandwidth, QoS and etc; 4. PION Motivation As a core requirement, emerging datacenters need to support both multi-tenancy and high scalability. The packet transport capabilities provided by overlay network are determined by the capability of underlying PSN, making overlay network independent of underlying PSN allow overlay network to benefit from different kinds of underlying PSN capabilities. The IP PSN could provide the maximum connection availability for the overlay network, and it also provides stateless IP connections which ease the operation of PSN tunnel. Approach examples like VXLAN [I-D.mahalingam-dutt-dcops-vxlan], NVGRE [I-D.sridharan-virtualization-nvgre] or STT [I-D.davie-stt] allow setting up of a Layer 2 overlay network over UDP, IP, or even TCP- like. The MPLS PSN is now widely deployed in wide area network, and has been proved to have capability to provide connection with bandwidth guarantee, differentiate QoS assurance, high resiliency, and etc. There are some capabilities that IP PSN does not own, but MPLS PSN do. One typical example is as below: 1. It is required to support bandwidth guarantee per tenant, not shared bandwidth provisioning among tenants. The IP connections resources among NVEs are served for all tenants, and would be unable to setup connections that are dedicated for tenants. Some "Gold" class tenants may require bandwidth guarantee for the service. If tenants of other category (e.g., Silver, Bronze, etc.) are mixed/ Jin & Khasnabish Expires November 12, 2012 [Page 4] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 shared with Gold category tenants, and the traffic flows from all category tenants are transferred over the same connection, the desired bandwidth of the Gold-class tenants may not be guaranteed. When the overlay network is across WAN, the bandwidth guarantee problem would be exaggerated by the limited bandwidth in WAN. The MPLS PSM has the capability to provide tenant-aware traffic transportation. For example, when the connection provided by overlay network is across WAN with IP/MPLS enabled, and then the specified tenant traffic could be traffic engineered by the IP/MPLS network, which would greatly improve the tenant service transportation quality. The purpose of PSN independent overlay network (PION) is to reuse various kinds of existing IETF defined PSN technologies, while keeping the tenant packet encapsulation to be uniformed over different type of PSN connections/tunnels. The PSN here mainly refer to IP and MPLS, the layer2 PSN technologies are excluded. 5. PION Protocol model 5.1. Protocol Layers PION protocol layering model is shown below: +-------------------------------------------+ | Customer Payload | | ~~~ | /===========================================\ H Tenant Network Identifier H H-------------------------------------------H <--Tenant Header H Encapsulation H \===========================================/ | PSN Layer | +-------------------------------------------+ | Data-Link | +-------------------------------------------+ | Physical Layer | +-------------------------------------------+ Figure 1 The customer payload in datacenter would be an Ethernet payload, but here it does not preclude other type of payload, e.g, IP payload. The encapsulation layer provides packet transport with some capabilities that other layers could not provide. For more detail, Jin & Khasnabish Expires November 12, 2012 [Page 5] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 see section 4.2. Tenant Network Identifier (TNI) layer provides customer traffic and address isolation among different tenants. This identifier maybe unique per NVE, but MUST be unique per connection between two NVEs. The PSN layer provides physical network transport for the virtualized network in datacenter, and is maximally reused from IETF definition protocols. The ECMP transport capability of PSN layer should be able to hash the traffic per flow per tenant. Data-Link and physical layer is out of the scope of this document. 5.2. Encapsulation Layer There are several functions/services that the encapsulation layer could provide. This draft lists the following functions/services: 1. Customer payload indication to indicate different type of customer payloads. 2. Packet sequencing and fragmentation capability. 3. Flow entropy value to add flow based entropy, and tag all the packets from a flow with an entropy label. The customer payload could be Ethernet in many cases, but does not preclude IP payload. An indication value in encapsulation layer could be provided to indicate the customer payload type. Some application using UDP transportation requires to transmit packets with sequence, and to get the information of packet loss. Some application requires lager packet transportation to improve efficiency, and then packet fragmentation is required, and is preferred to be performed at hardware layer. The encapsulation layer has the capability to provide packet fragmentation information. Some PSN connection used by PION does not provide ECMP capability, e.g, GRE. The encapsulation layer would provide such ECMP capability, by adding a flow entropy value to indicate flow based entropy, and it is required to tag all the packets from one flow with same entropy value. As the PSN layer with UDP encapsulation, the entropy value could be added to the UDP source port, then the flow entropy value in encapsulation layer could be omitted. As the PSN layer with GRE tunnel, the flow entropy value in encapsulation layer should be added if ECMP per flow is required. Jin & Khasnabish Expires November 12, 2012 [Page 6] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 As the PSN layer with TCP-like [I-D.davie-stt] encapsulation, the sequencing and fragmentation could be provided by the IP layer, and then the sequencing and fragmentation capability in tenant header could be omitted. The entropy value could be added to the TCP source port, and then the flow entropy value in encapsulation layer could be omitted. As the PSN layer with MPLS tunnel, the sequencing and fragmentation in tenant header would be applied if required. The entropy value could be added to the MPLS flow label, and then the flow entropy value in encapsulation layer could be omitted. 5.3. Tenant Network Identifier Layer The tenant network identifier (TNI) could be an integer to indicate the membership of each customer packet. One example is to use an explicit integer number, like VLAN. Take datacenter for example, explicit tenant ID will simplify the interoperations in the inter- datacenter connection environment. By whatever control plane TNI has been allocated, static configuration or dynamic allocation, the overlay network with different control plane could be always interoperable with same TNI. That would be particular useful when interconnecting two datacenter with different control plane, the operator only needs to ensure the same TNI (or by TNI translation) to interoperate. 5.4. PSN Layer Encapsulation The PSN Layer required for PION could be any kinds of PSN connection that has capability to transmit tenant packets. There would be generally two kinds of PSN connection that could be provided, IP and MPLS. 5.5. Associate tenant and PSN Layer It is the NVE's responsibility to associate the tenant with PSN connection to one peer NVE, which could be done by configuration or other implementation specific way. Different type of PSN connections could be used between different NVEs within one tenant. The NVE should have the capability to setup the specified PSN connection if required. For example, if only IP connection required between or among NVEs, IP connection setup capability is required for NVEs. If one NVE requires BW guarantee connection to peer NVE which is located in another datacenter across WAN, the NVE should setup hierarchy MPLS LSP as specified in section 7.2, and specify the bandwidth required. Jin & Khasnabish Expires November 12, 2012 [Page 7] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 6. Network Architecture One important application for the overlay network is to provide intra and inter datacenter connection between end-point and end-point, or end-point and network, see figure below. /-----DC1-----\ /-------WAN-------\ /-----DC2-----\ +---+ | | | | +---+ |NVE|--\ | | | | /--|NVE| | 1 | \ / | 3 | +---+ \ +-------+ +-------+ / +---+ \--| Edge | | Edge |--/ /--|Router1|-----------|Router2|--\ / +-------+ +-------+ \ +---+ / \ +---+ |NVE|--/ \--|NVE| | 2 | | | | | | 4 | +---+ | | | | +---+ \-----DC1-----/ \-------WAN-------/ \-----DC2-----/ Figure 2 There are two datacenters, DC1 and DC2 managed by same administrators, and the two datacenters are connected by the WAN which would be an IP/MPLS network. The overlay network of intra- datacenter connection is to connect end-points within a datacenter (e.g, between NVE1 and NVE2, NVE3 and NVE4), and be scalable enough without being restricted by the topology of underlying datacenter network. The overlay network of inter-datacenter connection is to connect end- points between two different datacenters (e.g, between NVE1 and NVE3 if NVE1 and NVE3 is the gateway). In this case, the overlay network would be across a wide area network where the network resources would be always limited. The overlay network should have the capability to provide QoS/BW guarantee per tenant customers, even when being across WAN where large network providers have already deployed MPLS technology widely. In addition, the MPLS network has the capability to provide traffic-engineering, QoS/BW guarantee, and higher reliability. The overlay network should have the capability to benefit from the underlying network to provide high quality service. 7. Applicability of PION Jin & Khasnabish Expires November 12, 2012 [Page 8] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 7.1. PION over IP PSN Most of the datacenters have IP transport capability, and the end- points would be reasonably to be assumed to be IP reachable in most cases. The PSN layer would be an IP layer with UDP encapsulation, GRE encapsulation, or TCP-like [I-D.davie-stt] encapsulation. If security is required, IPsec could be used as a PSN tunnel. The IP connection is indexed by destination IP address, and all tenants would share the same IP connection when sending to the same destination. To provide PSN tunnel per destination per tenant, please see section 7.2. 7.2. PION over MPLS PSN The MPLS LSP could be setup per destination per tenant, and provide optimized traffic transmission for the overlay network, which would greatly improve the service quality especially when interconnecting different datacenters. And hierarchy MPLS LSP could provide flexible connection across different domains. PION over MPLS PSN does not require all the nodes in the network to be MPLS enabled. Hierarchy MPLS LSP over IP would be used to adapt the IP environment of a datacenter house. Most of the deployment would only require NVE and edge router to be MPLS capable in datacenter. One typical deployment use case for MPLS tunnel per destination per tenant would be PION deployment across WAN. See the figure below. /----DC1----\ /----WAN--------\ /----DC2---\ / \ / \ / \ +---+ +-------+ +-------+ +---+ |NVE| | Edge | | Edge | |NVE| | 1 |--------|Router1|-----------|Router2|--------| 2 | +---+ +-------+ +-------+ +---+ || | | || ||<=============>|<=================>|<=============>|| | T1(IP) T2(MPLS) T3(IP) | |<--------------------------------------------------->| End to End MPLS Tunnel Figure 3 The two interconnected datacenter would be across WAN which is MPLS enabled. The MPLS tunnel per destination address per tenant ID provided for "Gold" class tenant customer could have dedicated network resources to serve. Jin & Khasnabish Expires November 12, 2012 [Page 9] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 Assuming only the WAN network is required to provide bandwidth guarantee, where congestion is always happened. When setting up connection from NVE1 to NVE2 where the two NVEs are gateway for the specified tenant, there would be a hierarchy LSP from NVE1 to NVE2. The underlying Tunnel3 (T3 in the figure) between Edge Router2 and NVE2, underlying Tunnel1 (T1 in the figure) between Edge Router1 and NVE1 could be IP connection within the datacenter (e.g, GRE encapsulated). The underlying Tunnel2 (T2 in the figure) between Edge Router1 and Edge Router2 could be selected as an MPLS-TE tunnel which would provide QoS/BW guarantee. The allocated MPLS label is an inner label to associated different underlying tunnels in different domains. And inner MPLS label is only switched at underlying tunnel stitching point, e.g, Edge Router1 and Edge Router2. In the above case, only NVE and Edge Router are required to be MPLS capable and the MPLS network in WAN could be optimized to provide high quality service to tenant customer. The edge router in above case is designed to be tenant-aware to optimize the tenant traffic by standard IETF way. 8. Control Plane Consideration There are three kinds of control plane functions for PION: 1. One is between end-point and NVE which is used to signal the behave requirement of end-point to NVE. One option to implement the first control plane is to reuse VDP defined by IEEE. 2. The other one is among NVEs, to synchronize end-point and PION connection mapping among NVEs. Please refer to the requirement [I-D.kreeger-nvo3-overlay-cp] for more detail. One option to implement the second control plane above within one datacenter could be by using one centralized server, and a standard interface between the central server and NVE should be defined. The centralized server would collect all the mapping information from each NVE through this standard interface. When the NVE receives a packet without forwarding entry, it would request the forwarding entry from the centralized server to get the correct forwarding entry and install it with appropriate lift time. It is also possible to employ two or more centralized servers in one datacenters, different centralized servers should be able to synchronize the mapping information, and a standard interface between different centralized servers should be defined. When inter-connecting two datacenters, a standard interface between Jin & Khasnabish Expires November 12, 2012 [Page 10] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 the corresponding two centralized servers should also be defined, the interface would be the same as the one within datacenter. All the PION mapping information would be exchanged between the two central servers through this standard interface. 3. Additional control plane function for PION is to setup PSN connection. The IP connection within IP PSN is setup by normal routing protocols and the IETF defined control plane could be reused. The control plane of MPLS connection per destination per tenant would be defined, one possible way is to reused MP-BGP or XMPP. 9. Acknowledgments The authors would like to thank Igor Gashinsky, David McDysan, Patricia Thaler, Thomas Morin, Vishwas Manral for their review and contributions. 10. Informative References [I-D.davie-stt] Davie, B. and J. Gross, "A Stateless Transport Tunneling Protocol for Network Virtualization (STT)", draft-davie-stt-01 (work in progress), March 2012. [I-D.kreeger-nvo3-overlay-cp] Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T. Narten, "Network Virtualization Overlay Control Protocol Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in progress), January 2012. [I-D.mahalingam-dutt-dcops-vxlan] Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright, C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01 (work in progress), February 2012. [I-D.sridharan-virtualization-nvgre] Sridhavan, M., Duda, K., Ganga, I., Greenberg, A., Lin, G., Pearson, M., Thaler, P., Tumuluri, C., and Y. Wang, "NVGRE: Network Virtualization using Generic Routing Encapsulation", draft-sridharan-virtualization-nvgre-00 (work in progress), September 2011. [RFC6513] Rosen, E. and R. Aggarwal, "Multicast in MPLS/BGP IP VPNs", RFC 6513, February 2012. Jin & Khasnabish Expires November 12, 2012 [Page 11] Internet-Draft draft-kj-nvo3-pion-architecture-00 May 2012 Authors' Addresses Lizhong Jin ZTE 889, Bibo Road Shanghai, 201203, China Email: lizhong.jin@zte.com.cn, lizho.jin@gmail.com Bhumip Khasnabish ZTE USA, Inc. 55 Madison Avenue, Suite 160 Morristown, NJ 07960 USA Email: bhumip.khasnabish@zteusa.com, vumip1@gmail.com Jin & Khasnabish Expires November 12, 2012 [Page 12]