Network Working Group                                         L. Dunbar
Internet Draft                                                Futurewei
Intended status: Informational                               Andy Malis
Expires: March August 5, 2020                                     Independent
                                                           C. Jacquenet
                                                                 Orange
                                                                 M. Toy
                                                                Verizon
                                                       November 1, 2019
                                                       February 5, 2020

           Dynamic Networks to Hybrid Cloud DCs Problem Statement
              draft-ietf-rtgwg-net2cloud-problem-statement-05
              draft-ietf-rtgwg-net2cloud-problem-statement-06

Abstract

   This document describes the problems that enterprises face today
   when interconnecting their branch offices with dynamic workloads in
   third party data centers (a.k.a. Cloud DCs). There can be many
   problems associated with network connecting to or among Clouds, many
   of which probably are out of the IETF scope. The objective of this
   document is to identify some of the problems that need additional
   work in IETF Routing area. Other problems are out of the scope of
   this document.

   It examines some of the approaches interconnecting cloud DCs with
   enterprises' on-premises DCs & branch offices. This document also
   describes some of the network problems that many enterprises face
   when they have workloads & applications & data split among different
   data centers, especially for those enterprises with multiple sites
   that are already interconnected by VPNs (e.g., MPLS L2VPN/L3VPN).

   Current operational problems are examined to determine whether there
   is a need to improve existing protocols or whether a new protocol is
   necessary to solve them.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on April 1, 2009. August 5, 2020.

Copyright Notice

   Copyright (c) 2019 2020 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document. Code Components extracted from this
   document must include Simplified BSD License text as described in
   Section 4.e of the Trust Legal Provisions and are provided without
   warranty as described in the Simplified BSD License.

Table of Contents

   1. Introduction...................................................3
      1.1. On the evolution Key Characteristics of Cloud DC connectivity.................3 Services:....................3
      1.2. Connecting to Cloud Services..............................3
      1.3. The role of SD-WAN techniques in connecting to Cloud DC connectivity....4 Services........4
   2. Definition of terms............................................4 terms............................................5
   3. Interconnecting Enterprise Sites with Cloud DCs................5
      3.1. Multiple connections High Level Issues of Connecting to workloads in a Cloud DC...........6 Multi-Cloud.................6
      3.1. Security Issues...........................................6
      3.2. Interconnect Private Authorization and Public Cloud DCs.................7 Identity Management.....................6
      3.3. Desired Properties API abstraction...........................................7
      3.4. DNS for Networks that interconnect Hybrid
      Clouds.........................................................8 Cloud Resources...................................8
      3.5. NAT for Cloud Services....................................8
      3.6. Cloud Discovery...........................................9
   4. Multiple Clouds Interconnection................................9 Interconnecting Enterprise Sites with Cloud DCs................9
      4.1. Multi-Cloud Interconnection...............................9 Sites to Cloud DC........................................10
      4.2. Desired Properties for Multi-Cloud Interconnection.......11 Inter-Cloud Interconnection..............................12
   5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs...11 DCs...13
   6. Problem with using IPsec tunnels to Cloud DCs.................13 DCs.................15
      6.1. Complexity of multi-point any-to-any interconnection.....13 Scaling Issues with IPsec Tunnels........................15
      6.2. Poor performance over long distance......................14
      6.3. Scaling Issues with IPsec Tunnels........................14 distance......................15
   7. Problems of Using SD-WAN to connect to Cloud DCs..............15 DCs..............16
      7.1. SD-WAN among branch offices vs. interconnect More Complexity to Cloud DCs15 Edge Nodes............................16
      7.2. Edge WAN Port Management.................................17
      7.3. Forwarding based on Application..........................17
   8. End-to-End Security Concerns for Data Flows...................18 Flows...................17
   9. Requirements for Dynamic Cloud Data Center VPNs...............18 VPNs...............17
   10. Security Considerations......................................19 Considerations......................................18
   11. IANA Considerations..........................................19 Considerations..........................................18
   12. References...................................................19 References...................................................18
      12.1. Normative References....................................19 References....................................18
      12.2. Informative References..................................19
   13. Acknowledgments..............................................20 Acknowledgments..............................................19

1. Introduction

1.1. On the evolution Key Characteristics of Cloud DC connectivity

   The ever-increasing use Services:

   Key characteristics of cloud applications for communication
   services change the way corporate business works Cloud Services are on-demand, scalable,
   highly available, and shares
   information. Such cloud applications use resources hosted in usage-based billing. Cloud Services, such as,
   compute, storage, network functions (most likely virtual), third
   party DCs that also host managed applications, etc. are usually hosted and managed by third            parties Cloud Operators. Here are some examples of Cloud network
   functions: Virtual Firewall services, Virtual private network
   services, Virtual PBX services for other including voice and video
   conferencing systems, etc. Cloud Data Center (DC) is shared
   infrastructure that hosts the Cloud Services to many customers.

1.2. Connecting to Cloud Services

   With the advent of widely available third-party cloud DCs and
   services in diverse geographic locations and the advancement of
   tools for monitoring and predicting application behaviors, it is technically feasible
   very attractive for enterprises to instantiate applications and
   workloads in locations that are geographically closest to their end-users. end-
   users. Such proximity
   improves can improve end-to-end latency and overall
   user experience. Conversely, an enterprise can easily shutdown
   applications and workloads whenever end-users are in motion (thereby
   modifying the networking connection of subsequently relocated
   applications and workloads). In addition, an enterprise enterprises may wish to
   take advantage of more and more business applications offered by third party private
   cloud DCs.

   Most of those enterprise branch offices & on-premises data centers
   are already connected via VPNs, such as MPLS-based L2VPNs and
   L3VPNs. Then connecting to the cloud-hosted resources may not be
   straightforward if the provider of operators.

   The networks that interconnect hybrid cloud DCs must address the VPN service does not have
   direct connections
   following requirements:
     - High availability to access all workloads in the corresponding desired cloud
        DCs. Under those
   circumstances, the enterprise can upgrade the CPEs deployed in its
   various premises to utilize SD-WAN techniques to reach
        Many enterprises include cloud
   resources (without any assistance from the VPN service provider), or
   wait for in their VPN service provider to make new agreements with data
   center providers to connect to disaster recovery
        strategy, such as enforcing periodic backup policies within the cloud resources. Either way has
   additional infrastructure and operational costs.

   In addition, more enterprises are moving towards hybrid cloud DCs,
   i.e. owned
        cloud, or operated by running backup applications in the Cloud.

     - Global reachability from different Cloud operators, to maximize geographical zones, thereby
        facilitating the
   benefits proximity of geographical proximity, elasticity applications as a function of the
        end users' location, to improve latency.
     - Elasticity: prompt connection to newly instantiated
        applications at Cloud DCs when usages increase and special features
   offered by different cloud DCs.

1.2. prompt
        release of connection after applications at locations being
        removed when demands change.
     - Scalable security management.

1.3. The role of SD-WAN techniques in Cloud DC connectivity

   This document discusses the issues associated with connecting
   enterprise's workloads/applications instantiated in multiple third-
   party data centers (a.k.a. Cloud DCs) and its on-prem data centers.
   Very often, the actual to Cloud DCs that host Services

   Some of the
   workloads/applications can be transient.

   SD-WAN, initially launched to maximize bandwidths between locations
   by aggregating multiple paths managed by different service
   providers, has expanded to include flexible, on-demand, application- characteristics of SD-WAN [SDWAN-BGP-USAGE], such as
   network augmentation and forwarding based connections established over any networks on application IDs instead
   of based on destination IP addresses, are very essential for
   connecting to access dynamic
   workloads in on-demand Cloud DCs.

   Therefore, this document discusses the use of services.

   Issues associated with using SD-WAN techniques for connecting to
   improve enterprise-to-cloud DC and cloud DC-to-cloud DC
   connectivity. Cloud services
   are also discussed in this document.

2. Definition of terms

   Cloud DC:   Third party Data Centers that usually host applications
               and workload owned by different organizations or
               tenants.

   Controller: Used interchangeably with SD-WAN controller to manage
               SD-WAN overlay path creation/deletion and monitoring the
               path conditions between two or more sites.

   DSVPN:      Dynamic Smart Virtual Private Network. DSVPN is a secure
               network that exchanges data between sites without
               needing to pass traffic through an organization's
               headquarter virtual private network (VPN) server or
               router.

   Heterogeneous Cloud: applications and workloads split among Cloud
               DCs owned or managed by different operators.

   Hybrid Clouds: Hybrid Clouds refers to an enterprise using its own
               on-premises DCs in addition to Cloud services provided
               by one or more cloud operators. (e.g. AWS, Azure,
               Google, Salesforces, SAP, etc).

   SD-WAN:     Software Defined Wide Area Network. In this document,
               "SD-WAN" refers to the solutions of pooling WAN
               bandwidth from multiple underlay networks to get better
               WAN bandwidth management, visibility & control. When the
               underlay networks are private networks, traffic can
               traverse without additional encryption; when the
               underlay networks are public, such as Internet, some
               traffic needs to be encrypted when traversing through
               (depending on user provided policies).

   VPC:        Virtual Private Cloud is a virtual network dedicated to
               one client account. It is logically isolated from other
               virtual networks in a Cloud DC. Each client can launch
               his/her desired resources, such as compute, storage, or
               network functions into his/her VPC. Most Cloud
               operators' VPCs only support private addresses, some
               support IPv4 only, others support IPv4/IPv6 dual stack.

3. High Level Issues of Connecting to Multi-Cloud

   There are many problems associated with connecting to hybrid Cloud
   Services, many of which are out of the IETF scope. This section is
   to identify some of the high level problems that can be addressed by
   IETF, especially by Routing area. Other problems are out of the
   scope of this document. By no means has this section covered all
   problems for connecting to Hybrid Cloud Services, e.g. difficulty in
   managing cloud spending is not discussed here.

3.1. Security Issues

   Cloud Services is built upon shared infrastructure, therefore not
   secure by nature. Security has been a primary, and valid, concern
   from the start of cloud computing: you are unable to see the exact
   location where your data is stored or being processed. Headlines
   highlighting data breaches, compromised credentials, and broken
   authentication, hacked interfaces and APIs, account hijacking
   haven't helped alleviate concerns.

   Secure user identity management, authentication, and access control
   mechanisms are important. Developing appropriate security
   measurements can enhance the confidence needed by enterprises to
   fully take advantage of Cloud Services.

3.2. Authorization and Identity Management

   One of the more prominent challenges for Cloud Services is Identity
   Management and Authorization. The Authorization not only includes
   user authorization, but also the authorization of API calls by
   applications from different Cloud DCs managed by different Cloud
   Operators. In addition, there are authorization for Workload
   Migration, Data Migration, and Workload Management.

   There are many types of users in cloud environments, e.g. end users
   for accessing applications hosted in Cloud DCs, Cloud-resource users
   who are responsible for setting permissions for the resources based
   on roles, access lists, IP addresses, domains, etc.

   There are many types of Cloud authorizations: including MAC
   (Mandatory Access Control) - where each app owns individual access
   permissions, DAC (Discretionary Access Control) - where each app
   requests permissions from an external permissions app, RBAC (Role-
   based Access Control) - where the authorization service owns roles
   with different privileges on the cloud service, and ABAC (Attribute-
   based Access Control) - where access is based on request attributes
   and policies.

   IETF hasn't yet developed comprehensive specification for Identity
   management and data models for Cloud Authorizations.

3.3. API abstraction

   Different Cloud Operators have different APIs to access their Cloud
   resources, security functions, the NAT, etc.

   It is difficult to move applications built by one Cloud operator's
   APIs to another. However, it is highly desirable to have a single
   and consistent way to manage the networks and respective security
   policies for interconnecting applications hosted in different Cloud
   DCs.

   The desired property would be having a single network fabric to
   which different Cloud DCs and enterprise's multiple sites can be
   attached or detached, with a common interface for setting desired
   policies.

   The difficulty of connecting applications in different Clouds might
   be stemmed from the fact that they are direct competitors. Usually
   traffic flow out of Cloud DCs incur charges. Therefore, direct
   communications between applications in different Cloud DCs can be
   more expensive than intra Cloud communications.

   It is desirable to have a common API shim layer or abstraction for
   different Cloud providers to make it easier to move applications
   from one Cloud DC to another.

3.4. DNS for Cloud Resources

   DNS name resolution is essential for on-premises and cloud-based
   resources. For customers with hybrid workloads, which include on-
   premises and cloud-based resources, extra steps are necessary to
   configure DNS to work seamlessly across both environments.

   Cloud operators have their own DNS to resolve resources within their
   Cloud DCs and to well-known public domains. Cloud's DNS can be
   configured to forward queries to customer managed authoritative DNS
   servers hosted on-premises, and to respond to DNS queries forwarded
   by on-premises DNS servers.

   For enterprises utilizing Cloud services by different cloud
   operators, it is necessary to establish policies and rules on
   how/where to forward DNS queries to. When applications in one Cloud
   need to communication with applications hosted in another Cloud,
   there could be DNS queries from one Cloud DC being forwarded to the
   enterprise's on premise DNS, which in turn be forwarded to the DNS
   service in another Cloud. Needless to say, configuration can be
   complex depending on the application communication patterns.

3.5. NAT for Cloud Services

   Cloud resources, such as VM instances, are usually assigned with
   private IP addresses. By configuration, some private subnets can
   have the NAT function to reach out to external network and some
   private subnets are internal to Cloud only.

   Different Cloud operators support different levels of NAT functions.
   For example, AWS NAT Gateway does not currently support connections
   towards, or from VPC Endpoints, VPN, AWS Direct Connect, or VPC
   Peering. https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-
   nat-gateway.html#nat-gateway-other-services. AWS Direct
   Connect/VPN/VPC Peering does not currently support any NAT
   functionality.

   Google's Cloud NAT allows Google Cloud virtual machine (VM)
   instances without external IP addresses and private Google
   Kubernetes Engine (GKE) clusters to connect to the Internet. Cloud
   NAT implements outbound NAT in conjunction with a default route to
   allow instances to reach the Internet. It does not implement inbound
   NAT. Hosts outside of VPC network can only respond to established
   connections initiated by instances inside the Google Cloud; they
   cannot initiate their own, new connections to Cloud instances via
   NAT.

   For enterprises with applications running in different Cloud DCs,
   proper configuration of NAT have to be performed in Cloud DC and in
   their own on-premise DC.

3.6. Cloud Discovery

   One of the concerns of using Cloud services is not aware where the
   resource is actually located, especially Cloud operators can move
   application instances from one place to another. When applications
   in Cloud communicate with on-premise applications, it may not be
   clear where the Cloud applications are located or to which VPCs they
   belong.

   It is highly desirable to have tools to discover cloud services in
   much the same way as you would discover your on-premises
   infrastructure. A significant difference is that cloud discovery
   uses the cloud vendor's API to extract data on your cloud services,
   rather than the direct access used in scanning your on-premises
   infrastructure.

   Standard data models, APIs or tools can alleviate concerns of
   enterprise utilizing Cloud Resources, e.g. having a Cloud service
   scan that connects to the API of the cloud provider and collects
   information directly.

4. Interconnecting Enterprise Sites with Cloud DCs
3.1. Multiple

   Considering that many enterprises already have existing VPNs (e.g.
   MPLS based L2VPN or L3VPN) interconnecting branch offices & on-
   premises data centers, connecting to Cloud services will be mixed of
   different types of networks. When an enterprise's existing VPN
   service providers do not have direct connections to workloads in a the
   corresponding cloud DCs that the enterprise prefers to use, the
   enterprise has to face additional infrastructure and operational
   costs to utilize Cloud services.

4.1. Sites to Cloud DC

   Most Cloud operators offer some type of network gateway through
   which an enterprise can reach their workloads hosted in the Cloud
   DCs. For example, AWS (Amazon Web Services) offers the following options to reach
   workloads in AWS Cloud DCs:

     - AWS Internet gateway allows communication between instances in
        AWS VPC and the internet.
     - AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are
        established between an enterprise's own gateway and AWS vGW, so
        that the communications between those gateways can be secured
        from the underlay (which might be the public Internet).
     - AWS Direct Connect, which allows enterprises to purchase direct
        connect from network service providers to get a private leased
        line interconnecting the enterprises gateway(s) and the AWS
        Direct Connect routers. In addition, an AWS Transit Gateway can
        be used to interconnect multiple VPCs in different Availability
        Zones. AWS Transit Gateway acts as a hub that controls how
        traffic is forwarded among all the connected networks which act
        like spokes.

   As an example, some branch offices

   Microsoft's ExpressRoute allows extension of an enterprise can connect to
   over the Internet a private network to reach AWS's vGW via IPsec tunnels. Other branch
   offices
   any of the same enterprise Microsoft cloud services, including Azure and Office365.
   ExpressRoute is configured using Layer 3 routing. Customers can connect opt
   for redundancy by provisioning dual links from their location to AWS DirectConnect via two
   Microsoft Enterprise edge routers (MSEEs) located within a private network (without any encryption). ). It third-
   party ExpressRoute peering location. The BGP routing protocol is important for
   enterprises
   then setup over WAN links to be able provide redundancy to observe the specific behaviors when
   connected cloud. This
   redundancy is maintained from the peering data center into
   Microsoft's cloud network.

   Google's Cloud Dedicated Interconnect offers similar network
   connectivity options as AWS and Microsoft. One distinct difference,
   however, is that Google's service allows customers access to the
   entire global cloud network by default. It does this by connecting
   your on-premises network with the Google Cloud using BGP and Google
   Cloud Routers to provide optimal paths to the different connections. regions of
   the global cloud infrastructure.

   Figure below shows an example of some tenants' of a tenant's workloads are
   accessible via a virtual router connected by AWS Internet Gateway;
   some are accessible via AWS vGW, and others are accessible via AWS
   Direct Connect. vR1 uses IPsec to establish secure tunnels over the
   Internet

   Different types of access require different level of security
   functions. Sometimes it is not visible to avoid paying extra fees end customers which type
   of network access is used for the IPsec features provided
   by AWS vGW. Some tenants can deploy a specific application instance.  To
   get better visibility, separate virtual routers (e.g. vR1 & vR2) can
   be deployed to
   connect to internet differentiate traffic and to/from different cloud GWs. It
   is important for some enterprises to traffic from the secure channels
   from vGW and DirectConnect, e.g. vR1 & vR2. Others may have one
   virtual router connecting be able to both types of traffic. observe the specific
   behaviors when connected by different connections.

   Customer Gateway can be customer owned router or ports physically
   connected to AWS Direct Connect GW.
     +------------------------+
     |    ,---.         ,---. |
     |   (TN-1 )       ( TN-2)|
     |    `-+-'  +---+  `-+-' |
     |      +----|vR1|----+   |
     |           ++--+        |
     |            |         +-+----+
     |            |        /Internet\ For External
     |            +-------+ Gateway  +----------------------
     |                     \        / to reach via Internet
     |                      +-+----+
     |                        |
     |    ,---.         ,---. |
     |   (TN-1 )       ( TN-2)|
     |    `-+-'  +---+  `-+-' |
     |      +----|vR2|----+   |
     |           ++--+        |
     |            |         +-+----+
     |            |        / virtual\ For IPsec Tunnel
     |            +-------+ Gateway  +----------------------
     |            |        \        /  termination
     |            |         +-+----+
     |            |           |
     |            |         +-+----+              +------+
     |            |        /        \ For Direct /customer\
     |            +-------+ Gateway  +----------+ gateway  |
     |                     \        /  Connect   \        /
     |                      +-+----+              +------+
     |                        |
     +------------------------+

     Figure 1: Examples of Multiple Cloud DC connections.

3.2. Interconnect Private and Public Cloud DCs

   It is likely that hybrid designs will become the rule for cloud
   services, as more enterprises see the benefits of integrating public
   and private cloud infrastructures. However, enabling the growth of
   hybrid cloud deployments in the enterprise requires fast and safe
   interconnection between public and private cloud services.
   For an enterprise to connect to applications & workloads hosted in
   multiple Cloud DCs, the enterprise can use IPsec tunnels established
   over the Internet or a (virtualized) leased line service to connect
   its on-premises gateways to each of the Cloud DC's gateways, virtual
   routers instantiated in the Cloud DCs, or any other suitable design
   (including a combination thereof).

   Some enterprises prefer to instantiate their own virtual
   CPEs/routers inside the Cloud DC to connect the workloads within the
   Cloud DC. Then an overlay path is established between customer
   gateways to the virtual CPEs/routers for reaching the workloads
   inside the cloud DC.

3.3. Desired Properties for Networks that interconnect Hybrid Clouds

   The networks that interconnect hybrid cloud DCs must address the
   following requirements:
     - High availability to access all workloads in the desired cloud
        DCs.
        Many enterprises include cloud infrastructures in their
        disaster recovery strategy, e.g., by enforcing periodic backup
        policies within the cloud, or by running backup applications in
        the Cloud, etc. Therefore, the connection to the cloud DCs may
        not be permanent, but rather needs to be on-demand.

     - Global reachability from different geographical zones, thereby
        facilitating the proximity of applications as a function of the
        end users' location, to improve latency.
     - Elasticity: prompt connection to newly instantiated
        applications at Cloud DCs when usages increase and prompt
        release of connection after applications at locations being
        removed when demands change.
        Some enterprises have front-end web portals running in cloud
        DCs and database servers in their on-premises DCs. Those Front-
        end web portals need to be reachable from the public Internet.
        The backend connection to the sensitive data in database
        servers hosted in the on-premises DCs might need secure
        connections.

     - Scalable security management. IPsec is commonly used to
        interconnect cloud gateways with CPEs deployed in the
        enterprise premises. For enterprises with a large number or
        branch offices, managing the IPsec's Security Associations
        among many nodes can be very difficult.

4. Multiple Clouds Interconnection

4.1. Multi-Cloud

4.2. Inter-Cloud Interconnection

   Enterprises today can instantiate their workloads or applications in
   Cloud DCs owned by different Cloud providers, e.g. AWS, Azure,
   GoogleCloud, Oracle, etc. Interconnecting those workloads involves
   three parties:

   The Enterprise, its network service providers, and
   the Cloud providers.

   All Cloud Operators offer secure ways to connect enterprises' on-
   prem sites/DCs with their Cloud DCs.

   Some Cloud Operators allow enterprises to connect via private
   networks. For example, AWS's DirectConnect allows enterprises to use          rd        3  party provided private Layer 2 path from enterprises' GW to AWS
   DirectConnect GW. Microsoft's ExpressRoute allows extension of a
   private network to any of the Microsoft cloud services, including
   Azure and Office365. ExpressRoute is configured using Layer 3
   routing. Customers can opt for redundancy by provisioning dual links
   from their location to two Microsoft Enterprise edge routers (MSEEs)
   located within a third-party ExpressRoute peering location. The BGP
   routing protocol is then setup over WAN links to provide redundancy
   to the cloud. This redundancy is maintained from the peering data
   center into Microsoft's cloud network.

   Google's Cloud Dedicated Interconnect offers similar network connectivity options as AWS and Microsoft. One distinct difference,
   however, is that Google's service allows customers access to the
   entire global cloud network by default. It does this by connecting
   your on-premises network with the Google Cloud using BGP and Google Cloud Routers to provide optimal paths to the different regions of DCs described in the global cloud infrastructure.

   All those connectivity options previous
   section are between for reaching Cloud providers' DCs and
   the Enterprises, DCs, but not between cloud
   DCs.  For example, to connect When applications in AWS Cloud need to communicate with
   applications in Azure Cloud, there must
   be Azure, today's practice requires a third-party
   gateway (physical or virtual) to interconnect the AWS's Layer 2
   DirectConnect path with Azure's Layer 3 ExpressRoute.

   Enterprises can also instantiate their own virtual routers in
   different Cloud DCs and administer IPsec tunnels among them, which
   by itself is not a trivial task. Or by leveraging open source VPN
   software such as strongSwan, you create an IPSec connection to the
   Azure gateway using a shared key. The strong swan StrongSwan instance within AWS
   not only can connect to Azure but can also be used to facilitate
   traffic to other nodes within the AWS VPC by configuring forwarding
   and using appropriate routing rules for the VPC.

   Most Cloud operators, such as AWS VPC or Azure VNET, use non-globally non-
   globally routable CIDR from private IPv4 address ranges as specified
   by RFC1918. To establish IPsec tunnel between two Cloud DCs, it is
   necessary to exchange Public routable addresses for applications in
   different Cloud DCs. [BGP-SDWAN] describes one method. Other methods
   are worth exploring.

   In summary, here are some approaches, available now (which might
   change in the future), to interconnect workloads among different
   Cloud DCs:

     a) Utilize Cloud DC provided inter/intra-cloud connectivity
        services (e.g., AWS Transit Gateway) to connect workloads
        instantiated in multiple VPCs. Such services are provided with
        the cloud gateway to connect to external networks (e.g., AWS
        DirectConnect Gateway).
     b) Hairpin all traffic through the customer gateway, meaning all
        workloads are directly connected to the customer gateway, so
        that communications among workloads within one Cloud DC must
        traverse through the customer gateway.
     c) Establish direct tunnels among different VPCs (AWS' Virtual
        Private Clouds) and VNET (Azure's Virtual Networks) via
        client's own virtual routers instantiated within Cloud DCs.
        DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN
        (Dynamic Smart VPN) techniques can be used to establish direct
        Multi-point-to-Point or multi-point-to multi-point tunnels
        among those client's own virtual routers.

   Approach a) usually does not work if Cloud DCs are owned and managed
   by different Cloud providers.

   Approach b) creates additional transmission delay plus incurring
   cost when exiting Cloud DCs.

   For the Approach c), DMVPN or DSVPN use NHRP (Next Hop Resolution
   Protocol) [RFC2735] so that spoke nodes can register their IP
   addresses & WAN ports with the hub node. The IETF ION
   (Internetworking over NBMA (non-broadcast multiple access) WG
   standardized NHRP for connection-oriented NBMA network (such as ATM)
   network address resolution more than two decades ago.

   There are many differences between virtual routers in Public Cloud
   DCs and the nodes in an NBMA network. NHRP cannot be used for
   registering virtual routers in Cloud DCs unless an extension of such
   protocols is developed for that purpose, e.g. taking NAT or dynamic
   addresses into consideration. Therefore, DMVPN and/or DSVPN cannot
   be used directly for connecting workloads in hybrid Cloud DCs.

   Other protocols such as BGP can be used, as described in [BGP-
   SDWAN].

4.2. Desired Properties for Multi-Cloud Interconnection

   Different Cloud Operators have different APIs to access their Cloud
   resources. It is difficult to move applications built by one Cloud
   operator's APIs to another. However, it is highly desirable to have
   a single and consistent way to manage the networks and respective
   security policies for interconnecting applications hosted in
   different Cloud DCs.

   The desired property would be having a single network fabric to
   which different Cloud DCs and enterprise's multiple sites can be
   attached or detached, with a common interface for setting desired
   policies. SDWAN is positioned to become that network fabric enabling
   Cloud DCs to be dynamically attached or detached. But the reality is
   that different Cloud Operators have different access methods, and
   Cloud DCs might be geographically far apart. More Cloud connectivity
   problems are described in the subsequent sections.

   The difficulty of connecting applications in different Clouds might
   be stemmed from the fact that they are direct competitors. Usually
   traffic flow out of Cloud DCs incur charges. Therefore, direct
   communications between applications in different Cloud DCs can be
   more expensive than intra Cloud communications.

5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs

   Traditional MPLS-based VPNs have been widely deployed as an
   effective way to support businesses and organizations that require
   network performance and reliability. MPLS shifted the burden of
   managing a VPN service from enterprises to service providers. The
   CPEs attached to MPLS VPNs are also simpler and less expensive,
   since
   because they do not need to manage routes to remote sites; they
   simply pass all outbound traffic to the MPLS VPN PEs to which the
   CPEs are attached (albeit multi-homing scenarios require more
   processing logic on CPEs).  MPLS has addressed the problems of
   scale, availability, and fast recovery from network faults, and
   incorporated traffic-engineering capabilities.

   However, traditional MPLS-based VPN solutions are sub-optimized for
   connecting end-users to dynamic workloads/applications in cloud DCs
   because:

     - The Provider Edge (PE) nodes of the enterprise's VPNs might not
        have direct connections to third party cloud DCs that are used
        for hosting workloads with the goal of providing an easy access
        to enterprises' end-users.

     - It usually takes some time to deploy provider edge (PE) routers at new
        locations. When enterprise's workloads are changed from one
        cloud DC to another (i.e., removed from one DC and re-
        instantiated to another location when demand changes), the
        enterprise branch offices need to be connected to the new cloud
        DC, but the network service provider might not have PEs located
        at the new location.

        One of the main drivers for moving workloads into the cloud is
        the widely available cloud DCs at geographically diverse
        locations, where apps can be instantiated so that they can be
        as close to their end-users as possible. When the user base
        changes, the applications may be migrated to a new cloud DC
        location closest to the new user base.

     - Most of the cloud DCs do not expose their internal networks. An
        enterprise with a hybrid cloud deployment can use an MPLS-VPN
        to connect to a Cloud provider at multiple locations.  The
        connection locations often correspond to gateways of different
        Cloud DC locations from the Cloud provider.  The different
        Cloud DCs are interconnected by the Cloud provider's own
        internal network.  At each connection location (gateway), the
        Cloud provider uses BGP to advertise all of the prefixes in the
        enterprise's VPC, regardless of which Cloud DC a given prefix
        is actually in. This can result in inefficient routing for the
        end-to-end data path.

     - Extensive usage of Overlay by Cloud DCs:

        Many cloud DCs use an overlay to connect their gateways to the
        workloads located inside the DC. There is currently no standard
        that specifies the interworking between the Cloud Overlay and
        the enterprise' existing underlay networks. One of the
        characteristics of overlay networks is that some of the WAN
        ports of the edge nodes connect to third party networks. There
        is therefore a need to propagate WAN port information to remote
        authorized peers in third party network domains in addition to
        route propagation. Such an exchange cannot happen before
        communication between peers is properly secured.

   Another roadblock is the lack of a standard way to express and
   enforce consistent security policies for workloads that not only use
   virtual addresses, but in which are also very likely hosted in
   different locations within the Cloud DC [RFC8192]. The current VPN
   path computation and bandwidth allocation schemes may not be
   flexible enough to address the need for enterprises to rapidly
   connect to dynamically instantiated (or removed) workloads and
   applications regardless of their location/nature (i.e., third party
   cloud DCs).

6. Problem with using IPsec tunnels to Cloud DCs
   As described in the previous section, many Cloud operators expose
   their gateways for external entities (which can be enterprises
   themselves) to directly establish IPsec tunnels. Enterprises can
   also instantiate virtual routers within Cloud DCs to connect to
   their on-premises devices via IPsec tunnels.

6.1. Scaling Issues with IPsec Tunnels

   If there is only one enterprise location that needs to reach the
   Cloud DC, an IPsec tunnel is a very convenient solution.

   However, many medium-to-large enterprises usually have multiple sites and
   multiple data centers. For workloads and apps hosted in
   cloud DCs, multiple sites need to communicate securely with those
   cloud
   workloads and apps. This section documents some of the issues
   associated with using IPsec tunnels to connect enterprise premises
   with cloud gateways.

6.1. Complexity of multi-point any-to-any interconnection

   The dynamic workload instantiated apps hosted in cloud DCs, Cloud DC needs gateways have to communicate
   with multiple branch offices and on-premises data centers. Most
   enterprises need multi-point interconnection among multiple
   locations, which can be provided by means of MPLS L2/L3 VPNs.

   Using
   maintain many IPsec overlay paths tunnels to connect all branches & on-premises data
   centers to cloud DCs those locations. In addition,
   each of those IPsec Tunnels requires CPEs to manage routing among Cloud DCs
   gateways and the CPEs located at other branch pair-wise periodic key
   refreshment. For a company with hundreds or thousands of locations, which can
   dramatically increase the complexity
   there could be hundreds (or even thousands) of the design, possibly IPsec tunnels
   terminating at the
   cost of jeopardizing the CPE performance.

   The complexity of requiring CPEs to maintain routing among other
   CPEs cloud DC gateway, which is very processing
   intensive. That is one of the reasons why enterprises migrated from Frame Relay
   based services to MPLS-based VPN services.

   MPLS-based VPNs have their PEs directly connected to the CPEs.
   Therefore, CPEs only need to forward all traffic to the directly
   attached PEs, which are therefore responsible for enforcing the
   routing policy within the corresponding VPNs. Even for multi-homed
   CPEs, the CPEs many cloud operators only need allow a limited
   number of (IPsec) tunnels & bandwidth to forward traffic among the directly
   connected PEs. However, when using each customer.

   Alternatively, you could use a solution like group encryption where
   a single IPsec tunnels between CPEs and
   Cloud DCs, SA is necessary at the CPEs need to compute, select, establish GW but the drawback is key
   distribution and maintain
   routes for traffic to be forwarded to Cloud DCs, to remote CPEs via
   VPN, or directly. maintenance of a key server, etc.

6.2. Poor performance over long distance

   When enterprise CPEs or gateways are far away from cloud DC gateways
   or across country/continent boundaries, performance of IPsec tunnels
   over the public Internet can be problematic and unpredictable. Even
   though there are many monitoring tools available to measure delay
   and various performance characteristics of the network, the
   measurement for paths over the Internet is passive and past
   measurements may not represent future performance.

   Many cloud providers can replicate workloads in different available
   zones. An App instantiated in a cloud DC closest to clients may have
   to cooperate with another App (or its mirror image) in another
   region or database server(s) in the on-premises DC. This kind of
   coordination requires predicable networking behavior/performance
   among those locations.

6.3. Scaling Issues with IPsec Tunnels

   IPsec can achieve secure overlay connections between two locations
   over any underlay network, e.g., between CPEs and Cloud DC Gateways.

   If there is only one enterprise location connected to the cloud
   gateway, a small number of IPsec tunnels can be configured on-demand
   between the on-premises DC and the Cloud DC, which is an easy and
   flexible solution.

   However, for multiple enterprise locations to reach workloads hosted
   in cloud DCs, the cloud DC gateway needs to maintain multiple IPsec
   tunnels to all those locations (e.g., as a hub & spoke topology).
   For a company with hundreds or thousands of locations, there could
   be hundreds (or even thousands) of IPsec tunnels terminating at the
   cloud DC gateway, which is not only very expensive (because Cloud
   Operators usually charge their customers based on connections), but
   can be very processing intensive for the gateway. Many cloud
   operators only allow a limited number of (IPsec) tunnels & bandwidth
   to each customer.  Alternatively, you could use a solution like
   group encryption where a single IPsec SA is necessary at the GW but
   the drawback here is key distribution and maintenance of a key
   server, etc.

7. Problems of Using SD-WAN to connect to Cloud DCs
   SD-WAN can establish parallel paths over multiple underlay networks
   between two locations on-demand, for example, to support the
   connections established between two CPEs interconnected by a
   traditional MPLS VPN ([RFC4364] or [RFC4664]) or by IPsec [RFC6071]
   tunnels.

   SD-WAN lets enterprises augment their current VPN network with cost-
   effective, readily available Broadband Internet connectivity,
   enabling some traffic offloading to paths over the Internet
   according to differentiated, possibly application-based traffic
   forwarding policies, or when the MPLS VPN connection between the two
   locations is congested, or otherwise undesirable or unavailable.

7.1. SD-WAN among branch offices vs. interconnect More Complexity to Cloud DCs

   SD-WAN interconnection of branch offices Edge Nodes

   Augmenting transport path is not as simple as it appears. For an
   enterprise with multiple sites, using SD-WAN CPE managed overlay paths among
   sites requires each CPE to manage all the addresses that local hosts
   have the potential to reach, i.e., map internal VPN addresses to
   appropriate SD-WAN Overlay paths. This is similar to the complexity of
   Frame Relay based VPNs, where each CPE needed to maintain mesh
   routing for all destinations if they were to avoid an extra hop
   through a hub router. Even though SD-WAN CPEs can get with the  assistance from a central
   controller (instead of running a routing protocol) to resolve the
   mapping between destinations and SD-WAN paths, SD-WAN CPEs are still
   responsible for routing table maintenance as remote destinations
   change their attachments, e.g., the dynamic workload in other DCs
   are de-commissioned or added.

   Even though originally envisioned for interconnecting branch
   offices, SD-WAN offers a very attractive way for enterprises to
   connect to Cloud DCs.

   The SD-WAN

   In addition, overlay path for interconnecting branch offices and the SD-WAN for
   interconnecting are
   different from connecting to Cloud DCs have some differences: DCs:

     - SD-WAN for Overlay path interconnecting branch offices usually have two end-
        points (e.g.,
        end-points (e.g. CPEs) controlled by one entity (e.g., a
        controller (e.g.
        controllers or management system systems operated by the enterprise).
     - SD-WAN for Connecting to Cloud DC interconnects may consider consists of CPEs owned or managed by
        the enterprise, while and the remote end-points are being managed or
        controlled by Cloud DCs (For the ease of
        description, let's call such CPEs asymmetrically-managed CPEs).

     - Cloud DCs may DCs.

7.2. Edge WAN Port Management

        An SDWAN edge node can have WAN ports connected to different entry points (or devices) with one
        entry point that terminates a private direct connection (based
        upon
        networks or public internet managed by different operators.
        There is therefore a leased line for example) and other entry points being
        devices terminating the IPsec tunnels, as shown need to propagate WAN port property to
        remote authorized peers in Figure 2.

     Therefore, the SD-WAN design becomes asymmetric.
     +------------------------+
     |    ,---.         ,---. |
     |   (TN-1 )       ( TN-2)|      TN: Tenant applications/workloads
     |    `-+-'  +---+  `-+-' |
     |      +----|vR1|----+   |
     |           ++--+        |
     |            |         +-+----+
     |            |        /Internet\ One path via
     |            +-------+ Gateway  +---------------------+
     |                     \        /   Internet            \
     |                      +-+----+                         \
     +------------------------+                               \
                                                               \
     +------------------------+                 native traffic  \
     |    ,---.         ,---. |                without encryption|
     |   (TN-3 )       ( TN-4)|                                  |
     |    `-+-'  +--+   `-+-' |                                  |    +------+
     |      +----|vR|-----+   |                                  +----+ CPE  |
     |           ++-+         |                                  |    +------+
     |            |         +-+----+                             |
     |            |        / virtual\ One path via IPsec Tunnel  |
     |            +-------+ Gateway  +-------------------------- +
     |                     \        /      Encrypted traffic over|
     |                      +-+----+          public third party network     |
     +------------------------+                                  |
                                                                 |
     +------------------------+                                  |
     |    ,---.         ,---. |                   Native traffic |
     |   (TN-5 )       ( TN-6)|               without encryption |
     |    `-+-'  +--+   `-+-' |               over secure network|
     |      +----|vR|-----+   |                                  |
     |           ++-+         |                                  |
     |            |         +-+----+              +------+       |
     |            |        /        \ Via Direct /customer\      |
     |            +-------+ Gateway  +----------+ gateway  |-----+
     |                     \        /  Connect   \        /
     |                      +-+----+              +------+
     +------------------------+Customer GW has physical connection domains in
        addition to AWS GW

     Figure 2: Different Underlays route propagation. Such an exchange cannot happen
        before communication between peers is properly secured.

7.3. Forwarding based on Application
     Forwarding based on application IDs instead of based on
     destination IP addresses is often referred to Reach Cloud DC as Application based
     Segmentation. If the Applications have unique IP addresses, then
     the Application Based Segmentation can be achieved by propagating
     different BGP UPDATE messages to different nodes, as described in
     [BGP-SDWAN-USAGE]. If the Application cannot be uniquely
     identified by the IP addresses, more work is needed.

8. End-to-End Security Concerns for Data Flows

     When IPsec tunnels established from enterprise on-premises CPEs
     are terminated at the Cloud DC gateway where the workloads or
     applications are hosted, some enterprises have concerns regarding
     traffic to/from their workload being exposed to others behind the
     data center gateway (e.g., exposed to other organizations that
     have workloads in the same data center).
     To ensure that traffic to/from workloads is not exposed to
     unwanted entities, IPsec tunnels may go all the way to the
     workload (servers, or VMs) within the DC.

9. Requirements for Dynamic Cloud Data Center VPNs

   In order to address the aforementioned issues, any solution for
   enterprise VPNs that includes connectivity to dynamic workloads or
   applications in cloud data centers should satisfy a set of
   requirements:

     - The solution should allow enterprises to take advantage of the
        current state-of-the-art in VPN technology, in both traditional
        MPLS-based VPNs and IPsec-based VPNs (or any combination
        thereof) that run over the public Internet.
     - The solution should not require an enterprise to upgrade all
        their existing CPEs.
     - The solution should support scalable IPsec key management among
        all nodes involved in DC interconnect schemes.
     - The solution needs to support easy and fast, on-the-fly, VPN
        connections to dynamic workloads and applications in third
        party data centers, and easily allow these workloads to migrate
        both within a data center and between data centers.
     - Allow VPNs to provide bandwidth and other performance
        guarantees.
     - Be a cost-effective solution for enterprises to incorporate
        dynamic cloud-based applications and workloads into their
        existing VPN environment.

10. Security Considerations

   The draft discusses security requirements as a part of the problem
   space, particularly in sections 4, 5, and 8.

   Solution drafts resulting from this work will address security
   concerns inherent to the solution(s), including both protocol
   aspects and the importance (for example) of securing workloads in
   cloud DCs and the use of secure interconnection mechanisms.

11. IANA Considerations

   This document requires no IANA actions. RFC Editor: Please remove
   this section before publication.

12. References

12.1. Normative References
12.2. Informative References

   [RFC2735]   B. Fox, et al "NHRP Support for Virtual Private
   networks". Dec. 1999.

   [RFC8192] S. Hares, et al "Interface to Network Security Functions
             (I2NSF) Problem Statement and Use Cases", July 2017

    [ITU-T-X1036] ITU-T Recommendation X.1036, "Framework for creation,
             storage, distribution and enforcement of policies for
             network security", Nov 2007.

    [RFC6071] S. Frankel and S. Krishnan, "IP Security (IPsec) and
             Internet Key Exchange (IKE) Document Roadmap", Feb 2011.

   [RFC4364] E. Rosen and Y. Rekhter, "BGP/MPLS IP Virtual Private
             Networks (VPNs)", Feb 2006

   [RFC4664] L. Andersson and E. Rosen, "Framework for Layer 2 Virtual
             Private Networks (L2VPNs)", Sept 2006.

   [BGP-SDWAN] L. Dunbar, et al. "BGP Extension for SDWAN Overlay
             Networks", draft-dunbar-idr-bgp-sdwan-overlay-ext-03,
             work-in-progress, Nov 2018.

13. Acknowledgments

   Many thanks to Alia Atlas, Chris Bowers, Ignas Bagdonas, Michael
   Huang, Liu Yuan Jiao, Katherine Zhao, and Jim Guichard for the
   discussion and contributions.

Authors' Addresses

   Linda Dunbar
   Futurewei
   Email: Linda.Dunbar@futurewei.com

   Andrew G. Malis
   Independent
   Email: agmalis@gmail.com

   Christian Jacquenet
   Orange
   Rennes, 35000
   France
   Email: Christian.jacquenet@orange.com

   Mehmet Toy
   Verizon
   One Verizon Way
   Basking Ridge, NJ 07920
   Email: mehmet.toy@verizon.com