Network Working Group L. Dunbar Internet Draft Futurewei Intended status: Informational Andy Malis Expires:MarchAugust 5, 2020 Independent C. Jacquenet Orange M. Toy VerizonNovember 1, 2019February 5, 2020 Dynamic Networks to Hybrid Cloud DCs Problem Statementdraft-ietf-rtgwg-net2cloud-problem-statement-05draft-ietf-rtgwg-net2cloud-problem-statement-06 Abstract This document describes the problems that enterprises face today when interconnecting their branch offices with dynamic workloads in third party data centers (a.k.a. Cloud DCs). There can be many problems associated with network connecting to or among Clouds, many of which probably are out of the IETF scope. The objective of this document is to identify some of the problems that need additional work in IETF Routing area. Other problems are out of the scope of this document. It examines some of the approaches interconnecting cloud DCs with enterprises' on-premises DCs & branch offices. This document also describes some of the network problems that many enterprises face when they have workloads & applications & data split among different data centers, especially for those enterprises with multiple sites that are already interconnected by VPNs (e.g., MPLS L2VPN/L3VPN). Current operational problems are examined to determine whether there is a need to improve existing protocols or whether a new protocol is necessary to solve them. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire onApril 1, 2009.August 5, 2020. Copyright Notice Copyright (c)20192020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction...................................................3 1.1.On the evolutionKey Characteristics of CloudDC connectivity.................3Services:....................3 1.2. Connecting to Cloud Services..............................3 1.3. The role of SD-WANtechniquesin connecting to CloudDC connectivity....4Services........4 2. Definition ofterms............................................4terms............................................5 3.Interconnecting Enterprise Sites with Cloud DCs................5 3.1. Multiple connectionsHigh Level Issues of Connecting toworkloads in a Cloud DC...........6Multi-Cloud.................6 3.1. Security Issues...........................................6 3.2.Interconnect PrivateAuthorization andPublic Cloud DCs.................7Identity Management.....................6 3.3.Desired PropertiesAPI abstraction...........................................7 3.4. DNS forNetworks that interconnect Hybrid Clouds.........................................................8Cloud Resources...................................8 3.5. NAT for Cloud Services....................................8 3.6. Cloud Discovery...........................................9 4.Multiple Clouds Interconnection................................9Interconnecting Enterprise Sites with Cloud DCs................9 4.1.Multi-Cloud Interconnection...............................9Sites to Cloud DC........................................10 4.2.Desired Properties for Multi-Cloud Interconnection.......11Inter-Cloud Interconnection..............................12 5. Problems with MPLS-based VPNs extending to Hybrid CloudDCs...11DCs...13 6. Problem with using IPsec tunnels to CloudDCs.................13DCs.................15 6.1.Complexity of multi-point any-to-any interconnection.....13Scaling Issues with IPsec Tunnels........................15 6.2. Poor performance over longdistance......................14 6.3. Scaling Issues with IPsec Tunnels........................14distance......................15 7. Problems of Using SD-WAN to connect to CloudDCs..............15DCs..............16 7.1.SD-WAN among branch offices vs. interconnectMore Complexity toCloud DCs15Edge Nodes............................16 7.2. Edge WAN Port Management.................................17 7.3. Forwarding based on Application..........................17 8. End-to-End Security Concerns for DataFlows...................18Flows...................17 9. Requirements for Dynamic Cloud Data CenterVPNs...............18VPNs...............17 10. SecurityConsiderations......................................19Considerations......................................18 11. IANAConsiderations..........................................19Considerations..........................................18 12.References...................................................19References...................................................18 12.1. NormativeReferences....................................19References....................................18 12.2. Informative References..................................19 13.Acknowledgments..............................................20Acknowledgments..............................................19 1. Introduction 1.1.On the evolutionKey Characteristics of CloudDC connectivity The ever-increasing useServices: Key characteristics ofcloud applications for communication services change the way corporate business worksCloud Services are on-demand, scalable, highly available, andshares information. Such cloud applications use resources hosted inusage-based billing. Cloud Services, such as, compute, storage, network functions (most likely virtual), third partyDCs that also hostmanaged applications, etc. are usually hosted and managed by third parties Cloud Operators. Here are some examples of Cloud network functions: Virtual Firewall services, Virtual private network services, Virtual PBX servicesfor otherincluding voice and video conferencing systems, etc. Cloud Data Center (DC) is shared infrastructure that hosts the Cloud Services to many customers. 1.2. Connecting to Cloud Services With the advent of widely available third-party cloud DCs and services in diverse geographic locations and the advancement of tools for monitoring and predicting application behaviors, it istechnically feasiblevery attractive for enterprises to instantiate applications and workloads in locations that are geographically closest to theirend-users.end- users. Such proximityimprovescan improve end-to-end latency and overall user experience. Conversely, an enterprise can easily shutdown applications and workloads whenever end-users are in motion (thereby modifying the networking connection of subsequently relocated applications and workloads). In addition,an enterpriseenterprises may wish to take advantage of more and more business applications offered bythird party privatecloudDCs. Most of those enterprise branch offices & on-premises data centers are already connected via VPNs, such as MPLS-based L2VPNs and L3VPNs. Then connecting to the cloud-hosted resources may not be straightforward if the provider ofoperators. The networks that interconnect hybrid cloud DCs must address theVPN service does not have direct connectionsfollowing requirements: - High availability to access all workloads in thecorrespondingdesired cloud DCs.Under those circumstances, the enterprise can upgrade the CPEs deployed in its various premises to utilize SD-WAN techniques to reachMany enterprises include cloudresources (without any assistance from the VPN service provider), or wait forin theirVPN service provider to make new agreements with data center providers to connect todisaster recovery strategy, such as enforcing periodic backup policies within thecloud resources. Either way has additional infrastructure and operational costs. In addition, more enterprises are moving towards hybrid cloud DCs, i.e. ownedcloud, oroperated byrunning backup applications in the Cloud. - Global reachability from differentCloud operators, to maximizegeographical zones, thereby facilitating thebenefitsproximity ofgeographical proximity, elasticityapplications as a function of the end users' location, to improve latency. - Elasticity: prompt connection to newly instantiated applications at Cloud DCs when usages increase andspecial features offered by different cloud DCs. 1.2.prompt release of connection after applications at locations being removed when demands change. - Scalable security management. 1.3. The role of SD-WANtechniquesinCloud DC connectivity This document discusses the issues associated withconnectingenterprise's workloads/applications instantiated in multiple third- party data centers (a.k.a. Cloud DCs) and its on-prem data centers. Very often, the actualto CloudDCs that hostServices Some of theworkloads/applications can be transient. SD-WAN, initially launched to maximize bandwidths between locations by aggregating multiple paths managed by different service providers, has expanded to include flexible, on-demand, application-characteristics of SD-WAN [SDWAN-BGP-USAGE], such as network augmentation and forwarding basedconnections established over any networkson application IDs instead of based on destination IP addresses, are very essential for connecting toaccess dynamic workloads inon-demand CloudDCs. Therefore, this document discusses the use ofservices. Issues associated with using SD-WANtechniquesfor connecting toimprove enterprise-to-cloud DC and cloud DC-to-cloud DC connectivity.Cloud services are also discussed in this document. 2. Definition of terms Cloud DC: Third party Data Centers that usually host applications and workload owned by different organizations or tenants. Controller: Used interchangeably with SD-WAN controller to manage SD-WAN overlay path creation/deletion and monitoring the path conditions between two or more sites. DSVPN: Dynamic Smart Virtual Private Network. DSVPN is a secure network that exchanges data between sites without needing to pass traffic through an organization's headquarter virtual private network (VPN) server or router. Heterogeneous Cloud: applications and workloads split among Cloud DCs owned or managed by different operators. Hybrid Clouds: Hybrid Clouds refers to an enterprise using its own on-premises DCs in addition to Cloud services provided by one or more cloud operators. (e.g. AWS, Azure, Google, Salesforces, SAP, etc). SD-WAN: Software Defined Wide Area Network. In this document, "SD-WAN" refers to the solutions of pooling WAN bandwidth from multiple underlay networks to get better WAN bandwidth management, visibility & control. When the underlay networks are private networks, traffic can traverse without additional encryption; when the underlay networks are public, such as Internet, some traffic needs to be encrypted when traversing through (depending on user provided policies). VPC: Virtual Private Cloud is a virtual network dedicated to one client account. It is logically isolated from other virtual networks in a Cloud DC. Each client can launch his/her desired resources, such as compute, storage, or network functions into his/her VPC. Most Cloud operators' VPCs only support private addresses, some support IPv4 only, others support IPv4/IPv6 dual stack. 3. High Level Issues of Connecting to Multi-Cloud There are many problems associated with connecting to hybrid Cloud Services, many of which are out of the IETF scope. This section is to identify some of the high level problems that can be addressed by IETF, especially by Routing area. Other problems are out of the scope of this document. By no means has this section covered all problems for connecting to Hybrid Cloud Services, e.g. difficulty in managing cloud spending is not discussed here. 3.1. Security Issues Cloud Services is built upon shared infrastructure, therefore not secure by nature. Security has been a primary, and valid, concern from the start of cloud computing: you are unable to see the exact location where your data is stored or being processed. Headlines highlighting data breaches, compromised credentials, and broken authentication, hacked interfaces and APIs, account hijacking haven't helped alleviate concerns. Secure user identity management, authentication, and access control mechanisms are important. Developing appropriate security measurements can enhance the confidence needed by enterprises to fully take advantage of Cloud Services. 3.2. Authorization and Identity Management One of the more prominent challenges for Cloud Services is Identity Management and Authorization. The Authorization not only includes user authorization, but also the authorization of API calls by applications from different Cloud DCs managed by different Cloud Operators. In addition, there are authorization for Workload Migration, Data Migration, and Workload Management. There are many types of users in cloud environments, e.g. end users for accessing applications hosted in Cloud DCs, Cloud-resource users who are responsible for setting permissions for the resources based on roles, access lists, IP addresses, domains, etc. There are many types of Cloud authorizations: including MAC (Mandatory Access Control) - where each app owns individual access permissions, DAC (Discretionary Access Control) - where each app requests permissions from an external permissions app, RBAC (Role- based Access Control) - where the authorization service owns roles with different privileges on the cloud service, and ABAC (Attribute- based Access Control) - where access is based on request attributes and policies. IETF hasn't yet developed comprehensive specification for Identity management and data models for Cloud Authorizations. 3.3. API abstraction Different Cloud Operators have different APIs to access their Cloud resources, security functions, the NAT, etc. It is difficult to move applications built by one Cloud operator's APIs to another. However, it is highly desirable to have a single and consistent way to manage the networks and respective security policies for interconnecting applications hosted in different Cloud DCs. The desired property would be having a single network fabric to which different Cloud DCs and enterprise's multiple sites can be attached or detached, with a common interface for setting desired policies. The difficulty of connecting applications in different Clouds might be stemmed from the fact that they are direct competitors. Usually traffic flow out of Cloud DCs incur charges. Therefore, direct communications between applications in different Cloud DCs can be more expensive than intra Cloud communications. It is desirable to have a common API shim layer or abstraction for different Cloud providers to make it easier to move applications from one Cloud DC to another. 3.4. DNS for Cloud Resources DNS name resolution is essential for on-premises and cloud-based resources. For customers with hybrid workloads, which include on- premises and cloud-based resources, extra steps are necessary to configure DNS to work seamlessly across both environments. Cloud operators have their own DNS to resolve resources within their Cloud DCs and to well-known public domains. Cloud's DNS can be configured to forward queries to customer managed authoritative DNS servers hosted on-premises, and to respond to DNS queries forwarded by on-premises DNS servers. For enterprises utilizing Cloud services by different cloud operators, it is necessary to establish policies and rules on how/where to forward DNS queries to. When applications in one Cloud need to communication with applications hosted in another Cloud, there could be DNS queries from one Cloud DC being forwarded to the enterprise's on premise DNS, which in turn be forwarded to the DNS service in another Cloud. Needless to say, configuration can be complex depending on the application communication patterns. 3.5. NAT for Cloud Services Cloud resources, such as VM instances, are usually assigned with private IP addresses. By configuration, some private subnets can have the NAT function to reach out to external network and some private subnets are internal to Cloud only. Different Cloud operators support different levels of NAT functions. For example, AWS NAT Gateway does not currently support connections towards, or from VPC Endpoints, VPN, AWS Direct Connect, or VPC Peering. https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc- nat-gateway.html#nat-gateway-other-services. AWS Direct Connect/VPN/VPC Peering does not currently support any NAT functionality. Google's Cloud NAT allows Google Cloud virtual machine (VM) instances without external IP addresses and private Google Kubernetes Engine (GKE) clusters to connect to the Internet. Cloud NAT implements outbound NAT in conjunction with a default route to allow instances to reach the Internet. It does not implement inbound NAT. Hosts outside of VPC network can only respond to established connections initiated by instances inside the Google Cloud; they cannot initiate their own, new connections to Cloud instances via NAT. For enterprises with applications running in different Cloud DCs, proper configuration of NAT have to be performed in Cloud DC and in their own on-premise DC. 3.6. Cloud Discovery One of the concerns of using Cloud services is not aware where the resource is actually located, especially Cloud operators can move application instances from one place to another. When applications in Cloud communicate with on-premise applications, it may not be clear where the Cloud applications are located or to which VPCs they belong. It is highly desirable to have tools to discover cloud services in much the same way as you would discover your on-premises infrastructure. A significant difference is that cloud discovery uses the cloud vendor's API to extract data on your cloud services, rather than the direct access used in scanning your on-premises infrastructure. Standard data models, APIs or tools can alleviate concerns of enterprise utilizing Cloud Resources, e.g. having a Cloud service scan that connects to the API of the cloud provider and collects information directly. 4. Interconnecting Enterprise Sites with Cloud DCs3.1. MultipleConsidering that many enterprises already have existing VPNs (e.g. MPLS based L2VPN or L3VPN) interconnecting branch offices & on- premises data centers, connecting to Cloud services will be mixed of different types of networks. When an enterprise's existing VPN service providers do not have direct connections toworkloads in athe corresponding cloud DCs that the enterprise prefers to use, the enterprise has to face additional infrastructure and operational costs to utilize Cloud services. 4.1. Sites to Cloud DC Most Cloud operators offer some type of network gateway through which an enterprise can reach their workloads hosted in the Cloud DCs.For example,AWS (Amazon Web Services) offers the following options to reach workloads in AWS Cloud DCs: - AWS Internet gateway allows communication between instances in AWS VPC and the internet. - AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are established between an enterprise's own gateway and AWS vGW, so that the communications between those gateways can be secured from the underlay (which might be the public Internet). - AWS Direct Connect, which allows enterprises to purchase direct connect from network service providers to get a private leased line interconnecting the enterprises gateway(s) and the AWS Direct Connect routers. In addition, an AWS Transit Gateway can be used to interconnect multiple VPCs in different Availability Zones. AWS Transit Gateway acts as a hub that controls how traffic is forwarded among all the connected networks which act like spokes.As an example, some branch officesMicrosoft's ExpressRoute allows extension ofan enterprise can connect to over the Interneta private network toreach AWS's vGW via IPsec tunnels. Other branch officesany of thesame enterpriseMicrosoft cloud services, including Azure and Office365. ExpressRoute is configured using Layer 3 routing. Customers canconnectopt for redundancy by provisioning dual links from their location toAWS DirectConnect viatwo Microsoft Enterprise edge routers (MSEEs) located within aprivate network (without any encryption). ). Itthird- party ExpressRoute peering location. The BGP routing protocol isimportant for enterprisesthen setup over WAN links tobe ableprovide redundancy toobservethespecific behaviors when connectedcloud. This redundancy is maintained from the peering data center into Microsoft's cloud network. Google's Cloud Dedicated Interconnect offers similar network connectivity options as AWS and Microsoft. One distinct difference, however, is that Google's service allows customers access to the entire global cloud network by default. It does this by connecting your on-premises network with the Google Cloud using BGP and Google Cloud Routers to provide optimal paths to the differentconnections.regions of the global cloud infrastructure. Figure below shows an example of sometenants'of a tenant's workloads are accessible via a virtual router connected by AWS Internet Gateway; some are accessible via AWS vGW, and others are accessible via AWS Direct Connect.vR1 uses IPsec to establish secure tunnels over the InternetDifferent types of access require different level of security functions. Sometimes it is not visible toavoid paying extra feesend customers which type of network access is used forthe IPsec features provided by AWS vGW. Some tenants can deploya specific application instance. To get better visibility, separate virtual routers (e.g. vR1 & vR2) can be deployed toconnect to internetdifferentiate trafficandto/from different cloud GWs. It is important for some enterprises totraffic from the secure channels from vGW and DirectConnect, e.g. vR1 & vR2. Others may have one virtual router connectingbe able toboth types of traffic.observe the specific behaviors when connected by different connections. Customer Gateway can be customer owned router or ports physically connected to AWS Direct Connect GW. +------------------------+ | ,---. ,---. | | (TN-1 ) ( TN-2)| | `-+-' +---+ `-+-' | | +----|vR1|----+ | | ++--+ | | | +-+----+ | | /Internet\ For External | +-------+ Gateway +---------------------- | \ / to reach via Internet | +-+----+ | | | ,---. ,---. | | (TN-1 ) ( TN-2)| | `-+-' +---+ `-+-' | | +----|vR2|----+ | | ++--+ | | | +-+----+ | | / virtual\ For IPsec Tunnel | +-------+ Gateway +---------------------- | | \ / termination | | +-+----+ | | | | | +-+----+ +------+ | | / \ For Direct /customer\ | +-------+ Gateway +----------+ gateway | | \ / Connect \ / | +-+----+ +------+ | | +------------------------+ Figure 1: Examples of Multiple Cloud DC connections.3.2. Interconnect Private and Public Cloud DCs It is likely that hybrid designs will become the rule for cloud services, as more enterprises see the benefits of integrating public and private cloud infrastructures. However, enabling the growth of hybrid cloud deployments in the enterprise requires fast and safe interconnection between public and private cloud services. For an enterprise to connect to applications & workloads hosted in multiple Cloud DCs, the enterprise can use IPsec tunnels established over the Internet or a (virtualized) leased line service to connect its on-premises gateways to each of the Cloud DC's gateways, virtual routers instantiated in the Cloud DCs, or any other suitable design (including a combination thereof). Some enterprises prefer to instantiate their own virtual CPEs/routers inside the Cloud DC to connect the workloads within the Cloud DC. Then an overlay path is established between customer gateways to the virtual CPEs/routers for reaching the workloads inside the cloud DC. 3.3. Desired Properties for Networks that interconnect Hybrid Clouds The networks that interconnect hybrid cloud DCs must address the following requirements: - High availability to access all workloads in the desired cloud DCs. Many enterprises include cloud infrastructures in their disaster recovery strategy, e.g., by enforcing periodic backup policies within the cloud, or by running backup applications in the Cloud, etc. Therefore, the connection to the cloud DCs may not be permanent, but rather needs to be on-demand. - Global reachability from different geographical zones, thereby facilitating the proximity of applications as a function of the end users' location, to improve latency. - Elasticity: prompt connection to newly instantiated applications at Cloud DCs when usages increase and prompt release of connection after applications at locations being removed when demands change. Some enterprises have front-end web portals running in cloud DCs and database servers in their on-premises DCs. Those Front- end web portals need to be reachable from the public Internet. The backend connection to the sensitive data in database servers hosted in the on-premises DCs might need secure connections. - Scalable security management. IPsec is commonly used to interconnect cloud gateways with CPEs deployed in the enterprise premises. For enterprises with a large number or branch offices, managing the IPsec's Security Associations among many nodes can be very difficult. 4. Multiple Clouds Interconnection 4.1. Multi-Cloud4.2. Inter-Cloud InterconnectionEnterprises today can instantiate their workloads or applications in Cloud DCs owned by different Cloud providers, e.g. AWS, Azure, GoogleCloud, Oracle, etc. Interconnecting those workloads involves three parties:TheEnterprise, its network service providers, and the Cloud providers. All Cloud Operators offer secure ways to connect enterprises' on- prem sites/DCs with their Cloud DCs. Some Cloud Operators allow enterprises to connect via private networks. For example, AWS's DirectConnect allows enterprises to use rd 3 party provided private Layer 2 path from enterprises' GW to AWS DirectConnect GW. Microsoft's ExpressRoute allows extension of a private network to any of the Microsoft cloud services, including Azure and Office365. ExpressRoute is configured using Layer 3 routing. Customers can opt for redundancy by provisioning dual links from their location to two Microsoft Enterprise edge routers (MSEEs) located within a third-party ExpressRoute peering location. The BGP routing protocol is then setup over WAN links to provide redundancy to the cloud. This redundancy is maintained from the peering data center into Microsoft's cloud network. Google's Cloud Dedicated Interconnect offers similar networkconnectivity optionsas AWS and Microsoft. One distinct difference, however, is that Google's service allows customers accesstothe entire global cloud network by default. It does this by connecting your on-premises network with the Google Cloud using BGP and GoogleCloudRouters to provide optimal paths to the different regions ofDCs described in theglobal cloud infrastructure. All those connectivity optionsprevious section arebetweenfor reaching Cloud providers'DCs and the Enterprises,DCs, but not between cloud DCs.For example, to connectWhen applications in AWS Cloud need to communicate with applications inAzure Cloud, there must beAzure, today's practice requires a third-party gateway (physical or virtual) to interconnect the AWS's Layer 2 DirectConnect path with Azure's Layer 3 ExpressRoute. Enterprises can also instantiate their own virtual routers in different Cloud DCs and administer IPsec tunnels among them, which by itself is not a trivial task. Or by leveraging open source VPN software such as strongSwan, you create an IPSec connection to the Azure gateway using a shared key. Thestrong swanStrongSwan instance within AWS not only can connect to Azure but can also be used to facilitate traffic to other nodes within the AWS VPC by configuring forwarding and using appropriate routing rules for the VPC. Most Cloud operators, such as AWS VPC or Azure VNET, usenon-globallynon- globally routable CIDR from private IPv4 address ranges as specified by RFC1918. To establish IPsec tunnel between two Cloud DCs, it is necessary to exchange Public routable addresses for applications in different Cloud DCs. [BGP-SDWAN] describes one method. Other methods are worth exploring. In summary, here are some approaches, available now (which might change in the future), to interconnect workloads among different Cloud DCs: a) Utilize Cloud DC provided inter/intra-cloud connectivity services (e.g., AWS Transit Gateway) to connect workloads instantiated in multiple VPCs. Such services are provided with the cloud gateway to connect to external networks (e.g., AWS DirectConnect Gateway). b) Hairpin all traffic through the customer gateway, meaning all workloads are directly connected to the customer gateway, so that communications among workloads within one Cloud DC must traverse through the customer gateway. c) Establish direct tunnels among different VPCs (AWS' Virtual Private Clouds) and VNET (Azure's Virtual Networks) via client's own virtual routers instantiated within Cloud DCs. DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN (Dynamic Smart VPN) techniques can be used to establish direct Multi-point-to-Point or multi-point-to multi-point tunnels among those client's own virtual routers. Approach a) usually does not work if Cloud DCs are owned and managed by different Cloud providers. Approach b) creates additional transmission delay plus incurring cost when exiting Cloud DCs. For the Approach c), DMVPN or DSVPN use NHRP (Next Hop Resolution Protocol) [RFC2735] so that spoke nodes can register their IP addresses & WAN ports with the hub node. The IETF ION (Internetworking over NBMA (non-broadcast multiple access) WG standardized NHRP for connection-oriented NBMA network (such as ATM) network address resolution more than two decades ago. There are many differences between virtual routers in Public Cloud DCs and the nodes in an NBMA network. NHRP cannot be used for registering virtual routers in Cloud DCs unless an extension of such protocols is developed for that purpose, e.g. taking NAT or dynamic addresses into consideration. Therefore, DMVPN and/or DSVPN cannot be used directly for connecting workloads in hybrid Cloud DCs. Other protocols such as BGP can be used, as described in [BGP- SDWAN].4.2. Desired Properties for Multi-Cloud Interconnection Different Cloud Operators have different APIs to access their Cloud resources. It is difficult to move applications built by one Cloud operator's APIs to another. However, it is highly desirable to have a single and consistent way to manage the networks and respective security policies for interconnecting applications hosted in different Cloud DCs. The desired property would be having a single network fabric to which different Cloud DCs and enterprise's multiple sites can be attached or detached, with a common interface for setting desired policies. SDWAN is positioned to become that network fabric enabling Cloud DCs to be dynamically attached or detached. But the reality is that different Cloud Operators have different access methods, and Cloud DCs might be geographically far apart. More Cloud connectivity problems are described in the subsequent sections. The difficulty of connecting applications in different Clouds might be stemmed from the fact that they are direct competitors. Usually traffic flow out of Cloud DCs incur charges. Therefore, direct communications between applications in different Cloud DCs can be more expensive than intra Cloud communications.5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs Traditional MPLS-based VPNs have been widely deployed as an effective way to support businesses and organizations that require network performance and reliability. MPLS shifted the burden of managing a VPN service from enterprises to service providers. The CPEs attached to MPLS VPNs are also simpler and less expensive,sincebecause they do not need to manage routes to remote sites; they simply pass all outbound traffic to the MPLS VPN PEs to which the CPEs are attached (albeit multi-homing scenarios require more processing logic on CPEs). MPLS has addressed the problems of scale, availability, and fast recovery from network faults, and incorporated traffic-engineering capabilities. However, traditional MPLS-based VPN solutions are sub-optimized for connecting end-users to dynamic workloads/applications in cloud DCs because: - The Provider Edge (PE) nodes of the enterprise's VPNs might not have direct connections to third party cloud DCs that are used for hosting workloads with the goal of providing an easy access to enterprises' end-users. - Itusuallytakes some time to deploy provider edge (PE) routers at new locations. When enterprise's workloads are changed from one cloud DC to another (i.e., removed from one DC and re- instantiated to another location when demand changes), the enterprise branch offices need to be connected to the new cloud DC, but the network service provider might not have PEs located at the new location. One of the main drivers for moving workloads into the cloud is the widely available cloud DCs at geographically diverse locations, where apps can be instantiated so that they can be as close to their end-users as possible. When the user base changes, the applications may be migrated to a new cloud DC location closest to the new user base. - Most of the cloud DCs do not expose their internal networks. An enterprise with a hybrid cloud deployment can use an MPLS-VPN to connect to a Cloud provider at multiple locations. The connection locations often correspond to gateways of different Cloud DC locations from the Cloud provider. The different Cloud DCs are interconnected by the Cloud provider's own internal network. At each connection location (gateway), the Cloud provider uses BGP to advertise all of the prefixes in the enterprise's VPC, regardless of which Cloud DC a given prefix is actually in. This can result in inefficient routing for the end-to-end data path.- Extensive usage of Overlay by Cloud DCs: Many cloud DCs use an overlay to connect their gateways to the workloads located inside the DC. There is currently no standard that specifies the interworking between the Cloud Overlay and the enterprise' existing underlay networks. One of the characteristics of overlay networks is that some of the WAN ports of the edge nodes connect to third party networks. There is therefore a need to propagate WAN port information to remote authorized peers in third party network domains in addition to route propagation. Such an exchange cannot happen before communication between peers is properly secured.Another roadblock is the lack of a standard way to express and enforce consistent security policies for workloads that not only use virtual addresses, but in which are also very likely hosted in different locations within the Cloud DC [RFC8192]. The current VPN path computation and bandwidth allocation schemes may not be flexible enough to address the need for enterprises to rapidly connect to dynamically instantiated (or removed) workloads and applications regardless of their location/nature (i.e., third party cloud DCs). 6. Problem with using IPsec tunnels to Cloud DCs As described in the previous section, many Cloud operators expose their gateways for external entities (which can be enterprises themselves) to directly establish IPsec tunnels. Enterprises can also instantiate virtual routers within Cloud DCs to connect to their on-premises devices via IPsec tunnels. 6.1. Scaling Issues with IPsec Tunnels If there is only one enterprise location that needs to reach the Cloud DC, an IPsec tunnel is a very convenient solution. However, many medium-to-large enterprisesusuallyhave multiple sites and multiple data centers. Forworkloads and apps hosted in cloud DCs,multiple sitesneedto communicatesecurelywiththose cloudworkloads andapps. This section documents some of the issues associated with using IPsec tunnels to connect enterprise premises with cloud gateways. 6.1. Complexity of multi-point any-to-any interconnection The dynamic workload instantiatedapps hosted in cloud DCs, Cloud DCneedsgateways have tocommunicate with multiple branch offices and on-premises data centers. Most enterprises need multi-point interconnection among multiple locations, which can be provided by means of MPLS L2/L3 VPNs. Usingmaintain many IPsecoverlay pathstunnels toconnectallbranches & on-premises data centers to cloud DCsthose locations. In addition, each of those IPsec Tunnels requiresCPEs to manage routing among Cloud DCs gateways and the CPEs located at other branchpair-wise periodic key refreshment. For a company with hundreds or thousands of locations,which can dramatically increase the complexitythere could be hundreds (or even thousands) ofthe design, possiblyIPsec tunnels terminating at thecost of jeopardizing the CPE performance. The complexity of requiring CPEs to maintain routing among other CPEscloud DC gateway, which is very processing intensive. That isone of the reasonswhyenterprises migrated from Frame Relay based services to MPLS-based VPN services. MPLS-based VPNs have their PEs directly connected to the CPEs. Therefore, CPEs only need to forward all traffic to the directly attached PEs, which are therefore responsible for enforcing the routing policy within the corresponding VPNs. Even for multi-homed CPEs, the CPEsmany cloud operators onlyneedallow a limited number of (IPsec) tunnels & bandwidth toforward traffic among the directly connected PEs. However, when usingeach customer. Alternatively, you could use a solution like group encryption where a single IPsectunnels between CPEs and Cloud DCs,SA is necessary at theCPEs need to compute, select, establishGW but the drawback is key distribution andmaintain routes for traffic to be forwarded to Cloud DCs, to remote CPEs via VPN, or directly.maintenance of a key server, etc. 6.2. Poor performance over long distance When enterprise CPEs or gateways are far away from cloud DC gateways or across country/continent boundaries, performance of IPsec tunnels over the public Internet can be problematic and unpredictable. Even though there are many monitoring tools available to measure delay and various performance characteristics of the network, the measurement for paths over the Internet is passive and past measurements may not represent future performance. Many cloud providers can replicate workloads in different available zones. An App instantiated in a cloud DC closest to clients may have to cooperate with another App (or its mirror image) in another region or database server(s) in the on-premises DC. This kind of coordination requires predicable networking behavior/performance among those locations.6.3. Scaling Issues with IPsec Tunnels IPsec can achieve secure overlay connections between two locations over any underlay network, e.g., between CPEs and Cloud DC Gateways. If there is only one enterprise location connected to the cloud gateway, a small number of IPsec tunnels can be configured on-demand between the on-premises DC and the Cloud DC, which is an easy and flexible solution. However, for multiple enterprise locations to reach workloads hosted in cloud DCs, the cloud DC gateway needs to maintain multiple IPsec tunnels to all those locations (e.g., as a hub & spoke topology). For a company with hundreds or thousands of locations, there could be hundreds (or even thousands) of IPsec tunnels terminating at the cloud DC gateway, which is not only very expensive (because Cloud Operators usually charge their customers based on connections), but can be very processing intensive for the gateway. Many cloud operators only allow a limited number of (IPsec) tunnels & bandwidth to each customer. Alternatively, you could use a solution like group encryption where a single IPsec SA is necessary at the GW but the drawback here is key distribution and maintenance of a key server, etc.7. Problems of Using SD-WAN to connect to Cloud DCs SD-WANcan establish parallel paths over multiple underlay networks between two locations on-demand, for example, to support the connections established between two CPEs interconnected by a traditional MPLS VPN ([RFC4364] or [RFC4664]) or by IPsec [RFC6071] tunnels. SD-WANlets enterprises augment their current VPN network with cost- effective, readily available Broadband Internet connectivity, enabling some traffic offloading to paths over the Internet according to differentiated, possibly application-based traffic forwarding policies, or when the MPLS VPN connection between the two locations is congested, or otherwise undesirable or unavailable. 7.1.SD-WAN among branch offices vs. interconnectMore Complexity toCloud DCs SD-WAN interconnection of branch officesEdge Nodes Augmenting transport path is not as simple as it appears. For an enterprise with multiple sites,using SD-WANCPE managed overlay paths among sites requires each CPE to manage all the addresses that local hosts havethepotential to reach, i.e., map internal VPN addresses to appropriateSD-WANOverlay paths. This is similar to the complexity of Frame Relay based VPNs, where each CPE needed to maintain mesh routing for all destinations if they were to avoid an extra hop through a hub router. Eventhough SD-WAN CPEs can getwith the assistance from a central controller (instead of running a routing protocol) to resolve the mapping between destinations and SD-WAN paths, SD-WAN CPEs are still responsible for routing table maintenance as remote destinations change their attachments, e.g., the dynamic workload in other DCs are de-commissioned or added.Even though originally envisioned for interconnecting branch offices, SD-WAN offers a very attractive way for enterprises to connect to Cloud DCs. The SD-WANIn addition, overlay path for interconnecting branch officesand the SD-WAN for interconnectingare different from connecting to CloudDCs have some differences:DCs: -SD-WAN forOverlay path interconnecting branch offices usually have twoend- points (e.g.,end-points (e.g. CPEs) controlled by one entity(e.g., a controller(e.g. controllers or managementsystemsystems operated by the enterprise). -SD-WAN forConnecting to Cloud DCinterconnectsmayconsiderconsists of CPEs owned or managed by the enterprise,whileand the remote end-pointsarebeing managed or controlled by CloudDCs (For the ease of description, let's call such CPEs asymmetrically-managed CPEs). - Cloud DCs mayDCs. 7.2. Edge WAN Port Management An SDWAN edge node can have WAN ports connected to differententry points (or devices) with one entry point that terminates a private direct connection (based uponnetworks or public internet managed by different operators. There is therefore aleased line for example) and other entry points being devices terminating the IPsec tunnels, as shownneed to propagate WAN port property to remote authorized peers inFigure 2. Therefore, the SD-WAN design becomes asymmetric. +------------------------+ | ,---. ,---. | | (TN-1 ) ( TN-2)| TN: Tenant applications/workloads | `-+-' +---+ `-+-' | | +----|vR1|----+ | | ++--+ | | | +-+----+ | | /Internet\ One path via | +-------+ Gateway +---------------------+ | \ / Internet \ | +-+----+ \ +------------------------+ \ \ +------------------------+ native traffic \ | ,---. ,---. | without encryption| | (TN-3 ) ( TN-4)| | | `-+-' +--+ `-+-' | | +------+ | +----|vR|-----+ | +----+ CPE | | ++-+ | | +------+ | | +-+----+ | | | / virtual\ One path via IPsec Tunnel | | +-------+ Gateway +-------------------------- + | \ / Encrypted traffic over| | +-+----+ publicthird party network| +------------------------+ | | +------------------------+ | | ,---. ,---. | Native traffic | | (TN-5 ) ( TN-6)| without encryption | | `-+-' +--+ `-+-' | over secure network| | +----|vR|-----+ | | | ++-+ | | | | +-+----+ +------+ | | | / \ Via Direct /customer\ | | +-------+ Gateway +----------+ gateway |-----+ | \ / Connect \ / | +-+----+ +------+ +------------------------+Customer GW has physical connectiondomains in addition toAWS GW Figure 2: Different Underlaysroute propagation. Such an exchange cannot happen before communication between peers is properly secured. 7.3. Forwarding based on Application Forwarding based on application IDs instead of based on destination IP addresses is often referred toReach Cloud DCas Application based Segmentation. If the Applications have unique IP addresses, then the Application Based Segmentation can be achieved by propagating different BGP UPDATE messages to different nodes, as described in [BGP-SDWAN-USAGE]. If the Application cannot be uniquely identified by the IP addresses, more work is needed. 8. End-to-End Security Concerns for Data Flows When IPsec tunnels established from enterprise on-premises CPEs are terminated at the Cloud DC gateway where the workloads or applications are hosted, some enterprises have concerns regarding traffic to/from their workload being exposed to others behind the data center gateway (e.g., exposed to other organizations that have workloads in the same data center). To ensure that traffic to/from workloads is not exposed to unwanted entities, IPsec tunnels may go all the way to the workload (servers, or VMs) within the DC. 9. Requirements for Dynamic Cloud Data Center VPNs In order to address the aforementioned issues, any solution for enterprise VPNs that includes connectivity to dynamic workloads or applications in cloud data centers should satisfy a set of requirements: - The solution should allow enterprises to take advantage of the current state-of-the-art in VPN technology, in both traditional MPLS-based VPNs and IPsec-based VPNs (or any combination thereof) that run over the public Internet. - The solution should not require an enterprise to upgrade all their existing CPEs. - The solution should support scalable IPsec key management among all nodes involved in DC interconnect schemes. - The solution needs to support easy and fast, on-the-fly, VPN connections to dynamic workloads and applications in third party data centers, and easily allow these workloads to migrate both within a data center and between data centers. - Allow VPNs to provide bandwidth and other performance guarantees. - Be a cost-effective solution for enterprises to incorporate dynamic cloud-based applications and workloads into their existing VPN environment. 10. Security Considerations The draft discusses security requirements as a part of the problem space, particularly in sections 4, 5, and 8. Solution drafts resulting from this work will address security concerns inherent to the solution(s), including both protocol aspects and the importance (for example) of securing workloads in cloud DCs and the use of secure interconnection mechanisms. 11. IANA Considerations This document requires no IANA actions. RFC Editor: Please remove this section before publication. 12. References 12.1. Normative References 12.2. Informative References [RFC2735] B. Fox, et al "NHRP Support for Virtual Private networks". Dec. 1999. [RFC8192] S. Hares, et al "Interface to Network Security Functions (I2NSF) Problem Statement and Use Cases", July 2017 [ITU-T-X1036] ITU-T Recommendation X.1036, "Framework for creation, storage, distribution and enforcement of policies for network security", Nov 2007. [RFC6071] S. Frankel and S. Krishnan, "IP Security (IPsec) and Internet Key Exchange (IKE) Document Roadmap", Feb 2011. [RFC4364] E. Rosen and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", Feb 2006 [RFC4664] L. Andersson and E. Rosen, "Framework for Layer 2 Virtual Private Networks (L2VPNs)", Sept 2006. [BGP-SDWAN] L. Dunbar, et al. "BGP Extension for SDWAN Overlay Networks", draft-dunbar-idr-bgp-sdwan-overlay-ext-03, work-in-progress, Nov 2018. 13. Acknowledgments Many thanks to Alia Atlas, Chris Bowers, Ignas Bagdonas, Michael Huang, Liu Yuan Jiao, Katherine Zhao, and Jim Guichard for the discussion and contributions. Authors' Addresses Linda Dunbar Futurewei Email: Linda.Dunbar@futurewei.com Andrew G. Malis Independent Email: agmalis@gmail.com Christian Jacquenet Orange Rennes, 35000 France Email: Christian.jacquenet@orange.com Mehmet Toy Verizon One Verizon Way Basking Ridge, NJ 07920 Email: mehmet.toy@verizon.com