< draft-ietf-rtgwg-net2cloud-problem-statement-05.txt   draft-ietf-rtgwg-net2cloud-problem-statement-06.txt >
Network Working Group L. Dunbar Network Working Group L. Dunbar
Internet Draft Futurewei Internet Draft Futurewei
Intended status: Informational Andy Malis Intended status: Informational Andy Malis
Expires: March 2020 Independent Expires: August 5, 2020 Independent
C. Jacquenet C. Jacquenet
Orange Orange
M. Toy M. Toy
Verizon Verizon
November 1, 2019 February 5, 2020
Dynamic Networks to Hybrid Cloud DCs Problem Statement Dynamic Networks to Hybrid Cloud DCs Problem Statement
draft-ietf-rtgwg-net2cloud-problem-statement-05 draft-ietf-rtgwg-net2cloud-problem-statement-06
Abstract Abstract
This document describes the problems that enterprises face today This document describes the problems that enterprises face today
when interconnecting their branch offices with dynamic workloads in when interconnecting their branch offices with dynamic workloads in
third party data centers (a.k.a. Cloud DCs). There can be many third party data centers (a.k.a. Cloud DCs). There can be many
problems associated with network connecting to or among Clouds, many problems associated with network connecting to or among Clouds, many
of which probably are out of the IETF scope. The objective of this of which probably are out of the IETF scope. The objective of this
document is to identify some of the problems that need additional document is to identify some of the problems that need additional
work in IETF Routing area. Other problems are out of the scope of work in IETF Routing area. Other problems are out of the scope of
skipping to change at page 2, line 21 skipping to change at page 2, line 21
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on April 1, 2009. This Internet-Draft will expire on August 5, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License. warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction...................................................3 1. Introduction...................................................3
1.1. On the evolution of Cloud DC connectivity.................3 1.1. Key Characteristics of Cloud Services:....................3
1.2. The role of SD-WAN techniques in Cloud DC connectivity....4 1.2. Connecting to Cloud Services..............................3
2. Definition of terms............................................4 1.3. The role of SD-WAN in connecting to Cloud Services........4
3. Interconnecting Enterprise Sites with Cloud DCs................5 2. Definition of terms............................................5
3.1. Multiple connections to workloads in a Cloud DC...........6 3. High Level Issues of Connecting to Multi-Cloud.................6
3.2. Interconnect Private and Public Cloud DCs.................7 3.1. Security Issues...........................................6
3.3. Desired Properties for Networks that interconnect Hybrid 3.2. Authorization and Identity Management.....................6
Clouds.........................................................8 3.3. API abstraction...........................................7
4. Multiple Clouds Interconnection................................9 3.4. DNS for Cloud Resources...................................8
4.1. Multi-Cloud Interconnection...............................9 3.5. NAT for Cloud Services....................................8
4.2. Desired Properties for Multi-Cloud Interconnection.......11 3.6. Cloud Discovery...........................................9
5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs...11 4. Interconnecting Enterprise Sites with Cloud DCs................9
6. Problem with using IPsec tunnels to Cloud DCs.................13 4.1. Sites to Cloud DC........................................10
6.1. Complexity of multi-point any-to-any interconnection.....13 4.2. Inter-Cloud Interconnection..............................12
6.2. Poor performance over long distance......................14 5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs...13
6.3. Scaling Issues with IPsec Tunnels........................14 6. Problem with using IPsec tunnels to Cloud DCs.................15
7. Problems of Using SD-WAN to connect to Cloud DCs..............15 6.1. Scaling Issues with IPsec Tunnels........................15
7.1. SD-WAN among branch offices vs. interconnect to Cloud DCs15 6.2. Poor performance over long distance......................15
8. End-to-End Security Concerns for Data Flows...................18 7. Problems of Using SD-WAN to connect to Cloud DCs..............16
9. Requirements for Dynamic Cloud Data Center VPNs...............18 7.1. More Complexity to Edge Nodes............................16
10. Security Considerations......................................19 7.2. Edge WAN Port Management.................................17
11. IANA Considerations..........................................19 7.3. Forwarding based on Application..........................17
12. References...................................................19 8. End-to-End Security Concerns for Data Flows...................17
12.1. Normative References....................................19 9. Requirements for Dynamic Cloud Data Center VPNs...............17
10. Security Considerations......................................18
11. IANA Considerations..........................................18
12. References...................................................18
12.1. Normative References....................................18
12.2. Informative References..................................19 12.2. Informative References..................................19
13. Acknowledgments..............................................20 13. Acknowledgments..............................................19
1. Introduction 1. Introduction
1.1. On the evolution of Cloud DC connectivity 1.1. Key Characteristics of Cloud Services:
The ever-increasing use of cloud applications for communication Key characteristics of Cloud Services are on-demand, scalable,
services change the way corporate business works and shares highly available, and usage-based billing. Cloud Services, such as,
information. Such cloud applications use resources hosted in third compute, storage, network functions (most likely virtual), third
party DCs that also host services for other customers. party managed applications, etc. are usually hosted and managed by third parties Cloud Operators. Here are some examples of Cloud network
functions: Virtual Firewall services, Virtual private network
services, Virtual PBX services including voice and video
conferencing systems, etc. Cloud Data Center (DC) is shared
infrastructure that hosts the Cloud Services to many customers.
With the advent of widely available third-party cloud DCs in diverse 1.2. Connecting to Cloud Services
geographic locations and the advancement of tools for monitoring and
predicting application behaviors, it is technically feasible for
enterprises to instantiate applications and workloads in locations
that are geographically closest to their end-users. Such proximity
improves end-to-end latency and overall user experience. Conversely,
an enterprise can easily shutdown applications and workloads
whenever end-users are in motion (thereby modifying the networking
connection of subsequently relocated applications and workloads). In
addition, an enterprise may wish to take advantage of more and more
business applications offered by third party private cloud DCs.
Most of those enterprise branch offices & on-premises data centers With the advent of widely available third-party cloud DCs and
are already connected via VPNs, such as MPLS-based L2VPNs and services in diverse geographic locations and the advancement of
L3VPNs. Then connecting to the cloud-hosted resources may not be tools for monitoring and predicting application behaviors, it is
straightforward if the provider of the VPN service does not have very attractive for enterprises to instantiate applications and
direct connections to the corresponding cloud DCs. Under those workloads in locations that are geographically closest to their end-
circumstances, the enterprise can upgrade the CPEs deployed in its users. Such proximity can improve end-to-end latency and overall
various premises to utilize SD-WAN techniques to reach cloud user experience. Conversely, an enterprise can easily shutdown
resources (without any assistance from the VPN service provider), or applications and workloads whenever end-users are in motion (thereby
wait for their VPN service provider to make new agreements with data modifying the networking connection of subsequently relocated
center providers to connect to the cloud resources. Either way has applications and workloads). In addition, enterprises may wish to
additional infrastructure and operational costs. take advantage of more and more business applications offered by
cloud operators.
In addition, more enterprises are moving towards hybrid cloud DCs, The networks that interconnect hybrid cloud DCs must address the
i.e. owned or operated by different Cloud operators, to maximize the following requirements:
benefits of geographical proximity, elasticity and special features - High availability to access all workloads in the desired cloud
offered by different cloud DCs. DCs.
Many enterprises include cloud in their disaster recovery
strategy, such as enforcing periodic backup policies within the
cloud, or running backup applications in the Cloud.
1.2. The role of SD-WAN techniques in Cloud DC connectivity - Global reachability from different geographical zones, thereby
facilitating the proximity of applications as a function of the
end users' location, to improve latency.
- Elasticity: prompt connection to newly instantiated
applications at Cloud DCs when usages increase and prompt
release of connection after applications at locations being
removed when demands change.
- Scalable security management.
This document discusses the issues associated with connecting 1.3. The role of SD-WAN in connecting to Cloud Services
enterprise's workloads/applications instantiated in multiple third-
party data centers (a.k.a. Cloud DCs) and its on-prem data centers.
Very often, the actual Cloud DCs that host the
workloads/applications can be transient.
SD-WAN, initially launched to maximize bandwidths between locations Some of the characteristics of SD-WAN [SDWAN-BGP-USAGE], such as
by aggregating multiple paths managed by different service network augmentation and forwarding based on application IDs instead
providers, has expanded to include flexible, on-demand, application- of based on destination IP addresses, are very essential for
based connections established over any networks to access dynamic connecting to on-demand Cloud services.
workloads in Cloud DCs.
Therefore, this document discusses the use of SD-WAN techniques to Issues associated with using SD-WAN for connecting to Cloud services
improve enterprise-to-cloud DC and cloud DC-to-cloud DC are also discussed in this document.
connectivity.
2. Definition of terms 2. Definition of terms
Cloud DC: Third party Data Centers that usually host applications Cloud DC: Third party Data Centers that usually host applications
and workload owned by different organizations or and workload owned by different organizations or
tenants. tenants.
Controller: Used interchangeably with SD-WAN controller to manage Controller: Used interchangeably with SD-WAN controller to manage
SD-WAN overlay path creation/deletion and monitoring the SD-WAN overlay path creation/deletion and monitoring the
path conditions between two or more sites. path conditions between two or more sites.
skipping to change at page 5, line 41 skipping to change at page 6, line 5
(depending on user provided policies). (depending on user provided policies).
VPC: Virtual Private Cloud is a virtual network dedicated to VPC: Virtual Private Cloud is a virtual network dedicated to
one client account. It is logically isolated from other one client account. It is logically isolated from other
virtual networks in a Cloud DC. Each client can launch virtual networks in a Cloud DC. Each client can launch
his/her desired resources, such as compute, storage, or his/her desired resources, such as compute, storage, or
network functions into his/her VPC. Most Cloud network functions into his/her VPC. Most Cloud
operators' VPCs only support private addresses, some operators' VPCs only support private addresses, some
support IPv4 only, others support IPv4/IPv6 dual stack. support IPv4 only, others support IPv4/IPv6 dual stack.
3. Interconnecting Enterprise Sites with Cloud DCs 3. High Level Issues of Connecting to Multi-Cloud
3.1. Multiple connections to workloads in a Cloud DC
There are many problems associated with connecting to hybrid Cloud
Services, many of which are out of the IETF scope. This section is
to identify some of the high level problems that can be addressed by
IETF, especially by Routing area. Other problems are out of the
scope of this document. By no means has this section covered all
problems for connecting to Hybrid Cloud Services, e.g. difficulty in
managing cloud spending is not discussed here.
3.1. Security Issues
Cloud Services is built upon shared infrastructure, therefore not
secure by nature. Security has been a primary, and valid, concern
from the start of cloud computing: you are unable to see the exact
location where your data is stored or being processed. Headlines
highlighting data breaches, compromised credentials, and broken
authentication, hacked interfaces and APIs, account hijacking
haven't helped alleviate concerns.
Secure user identity management, authentication, and access control
mechanisms are important. Developing appropriate security
measurements can enhance the confidence needed by enterprises to
fully take advantage of Cloud Services.
3.2. Authorization and Identity Management
One of the more prominent challenges for Cloud Services is Identity
Management and Authorization. The Authorization not only includes
user authorization, but also the authorization of API calls by
applications from different Cloud DCs managed by different Cloud
Operators. In addition, there are authorization for Workload
Migration, Data Migration, and Workload Management.
There are many types of users in cloud environments, e.g. end users
for accessing applications hosted in Cloud DCs, Cloud-resource users
who are responsible for setting permissions for the resources based
on roles, access lists, IP addresses, domains, etc.
There are many types of Cloud authorizations: including MAC
(Mandatory Access Control) - where each app owns individual access
permissions, DAC (Discretionary Access Control) - where each app
requests permissions from an external permissions app, RBAC (Role-
based Access Control) - where the authorization service owns roles
with different privileges on the cloud service, and ABAC (Attribute-
based Access Control) - where access is based on request attributes
and policies.
IETF hasn't yet developed comprehensive specification for Identity
management and data models for Cloud Authorizations.
3.3. API abstraction
Different Cloud Operators have different APIs to access their Cloud
resources, security functions, the NAT, etc.
It is difficult to move applications built by one Cloud operator's
APIs to another. However, it is highly desirable to have a single
and consistent way to manage the networks and respective security
policies for interconnecting applications hosted in different Cloud
DCs.
The desired property would be having a single network fabric to
which different Cloud DCs and enterprise's multiple sites can be
attached or detached, with a common interface for setting desired
policies.
The difficulty of connecting applications in different Clouds might
be stemmed from the fact that they are direct competitors. Usually
traffic flow out of Cloud DCs incur charges. Therefore, direct
communications between applications in different Cloud DCs can be
more expensive than intra Cloud communications.
It is desirable to have a common API shim layer or abstraction for
different Cloud providers to make it easier to move applications
from one Cloud DC to another.
3.4. DNS for Cloud Resources
DNS name resolution is essential for on-premises and cloud-based
resources. For customers with hybrid workloads, which include on-
premises and cloud-based resources, extra steps are necessary to
configure DNS to work seamlessly across both environments.
Cloud operators have their own DNS to resolve resources within their
Cloud DCs and to well-known public domains. Cloud's DNS can be
configured to forward queries to customer managed authoritative DNS
servers hosted on-premises, and to respond to DNS queries forwarded
by on-premises DNS servers.
For enterprises utilizing Cloud services by different cloud
operators, it is necessary to establish policies and rules on
how/where to forward DNS queries to. When applications in one Cloud
need to communication with applications hosted in another Cloud,
there could be DNS queries from one Cloud DC being forwarded to the
enterprise's on premise DNS, which in turn be forwarded to the DNS
service in another Cloud. Needless to say, configuration can be
complex depending on the application communication patterns.
3.5. NAT for Cloud Services
Cloud resources, such as VM instances, are usually assigned with
private IP addresses. By configuration, some private subnets can
have the NAT function to reach out to external network and some
private subnets are internal to Cloud only.
Different Cloud operators support different levels of NAT functions.
For example, AWS NAT Gateway does not currently support connections
towards, or from VPC Endpoints, VPN, AWS Direct Connect, or VPC
Peering. https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-
nat-gateway.html#nat-gateway-other-services. AWS Direct
Connect/VPN/VPC Peering does not currently support any NAT
functionality.
Google's Cloud NAT allows Google Cloud virtual machine (VM)
instances without external IP addresses and private Google
Kubernetes Engine (GKE) clusters to connect to the Internet. Cloud
NAT implements outbound NAT in conjunction with a default route to
allow instances to reach the Internet. It does not implement inbound
NAT. Hosts outside of VPC network can only respond to established
connections initiated by instances inside the Google Cloud; they
cannot initiate their own, new connections to Cloud instances via
NAT.
For enterprises with applications running in different Cloud DCs,
proper configuration of NAT have to be performed in Cloud DC and in
their own on-premise DC.
3.6. Cloud Discovery
One of the concerns of using Cloud services is not aware where the
resource is actually located, especially Cloud operators can move
application instances from one place to another. When applications
in Cloud communicate with on-premise applications, it may not be
clear where the Cloud applications are located or to which VPCs they
belong.
It is highly desirable to have tools to discover cloud services in
much the same way as you would discover your on-premises
infrastructure. A significant difference is that cloud discovery
uses the cloud vendor's API to extract data on your cloud services,
rather than the direct access used in scanning your on-premises
infrastructure.
Standard data models, APIs or tools can alleviate concerns of
enterprise utilizing Cloud Resources, e.g. having a Cloud service
scan that connects to the API of the cloud provider and collects
information directly.
4. Interconnecting Enterprise Sites with Cloud DCs
Considering that many enterprises already have existing VPNs (e.g.
MPLS based L2VPN or L3VPN) interconnecting branch offices & on-
premises data centers, connecting to Cloud services will be mixed of
different types of networks. When an enterprise's existing VPN
service providers do not have direct connections to the
corresponding cloud DCs that the enterprise prefers to use, the
enterprise has to face additional infrastructure and operational
costs to utilize Cloud services.
4.1. Sites to Cloud DC
Most Cloud operators offer some type of network gateway through Most Cloud operators offer some type of network gateway through
which an enterprise can reach their workloads hosted in the Cloud which an enterprise can reach their workloads hosted in the Cloud
DCs. For example, AWS (Amazon Web Services) offers the following DCs. AWS (Amazon Web Services) offers the following options to reach
options to reach workloads in AWS Cloud DCs: workloads in AWS Cloud DCs:
- AWS Internet gateway allows communication between instances in - AWS Internet gateway allows communication between instances in
AWS VPC and the internet. AWS VPC and the internet.
- AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are - AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are
established between an enterprise's own gateway and AWS vGW, so established between an enterprise's own gateway and AWS vGW, so
that the communications between those gateways can be secured that the communications between those gateways can be secured
from the underlay (which might be the public Internet). from the underlay (which might be the public Internet).
- AWS Direct Connect, which allows enterprises to purchase direct - AWS Direct Connect, which allows enterprises to purchase direct
connect from network service providers to get a private leased connect from network service providers to get a private leased
line interconnecting the enterprises gateway(s) and the AWS line interconnecting the enterprises gateway(s) and the AWS
Direct Connect routers. In addition, an AWS Transit Gateway can Direct Connect routers. In addition, an AWS Transit Gateway can
be used to interconnect multiple VPCs in different Availability be used to interconnect multiple VPCs in different Availability
Zones. AWS Transit Gateway acts as a hub that controls how Zones. AWS Transit Gateway acts as a hub that controls how
traffic is forwarded among all the connected networks which act traffic is forwarded among all the connected networks which act
like spokes. like spokes.
As an example, some branch offices of an enterprise can connect to Microsoft's ExpressRoute allows extension of a private network to
over the Internet to reach AWS's vGW via IPsec tunnels. Other branch any of the Microsoft cloud services, including Azure and Office365.
offices of the same enterprise can connect to AWS DirectConnect via ExpressRoute is configured using Layer 3 routing. Customers can opt
a private network (without any encryption). ). It is important for for redundancy by provisioning dual links from their location to two
enterprises to be able to observe the specific behaviors when Microsoft Enterprise edge routers (MSEEs) located within a third-
connected by different connections. party ExpressRoute peering location. The BGP routing protocol is
then setup over WAN links to provide redundancy to the cloud. This
redundancy is maintained from the peering data center into
Microsoft's cloud network.
Figure below shows an example of some tenants' workloads are Google's Cloud Dedicated Interconnect offers similar network
connectivity options as AWS and Microsoft. One distinct difference,
however, is that Google's service allows customers access to the
entire global cloud network by default. It does this by connecting
your on-premises network with the Google Cloud using BGP and Google
Cloud Routers to provide optimal paths to the different regions of
the global cloud infrastructure.
Figure below shows an example of some of a tenant's workloads are
accessible via a virtual router connected by AWS Internet Gateway; accessible via a virtual router connected by AWS Internet Gateway;
some are accessible via AWS vGW, and others are accessible via AWS some are accessible via AWS vGW, and others are accessible via AWS
Direct Connect. vR1 uses IPsec to establish secure tunnels over the Direct Connect.
Internet to avoid paying extra fees for the IPsec features provided
by AWS vGW. Some tenants can deploy separate virtual routers to
connect to internet traffic and to traffic from the secure channels
from vGW and DirectConnect, e.g. vR1 & vR2. Others may have one
virtual router connecting to both types of traffic. Customer Gateway
can be customer owned router or ports physically connected to AWS
Direct Connect GW.
Different types of access require different level of security
functions. Sometimes it is not visible to end customers which type
of network access is used for a specific application instance. To
get better visibility, separate virtual routers (e.g. vR1 & vR2) can
be deployed to differentiate traffic to/from different cloud GWs. It
is important for some enterprises to be able to observe the specific
behaviors when connected by different connections.
Customer Gateway can be customer owned router or ports physically
connected to AWS Direct Connect GW.
+------------------------+ +------------------------+
| ,---. ,---. | | ,---. ,---. |
| (TN-1 ) ( TN-2)| | (TN-1 ) ( TN-2)|
| `-+-' +---+ `-+-' | | `-+-' +---+ `-+-' |
| +----|vR1|----+ | | +----|vR1|----+ |
| ++--+ | | ++--+ |
| | +-+----+ | | +-+----+
| | /Internet\ For External | | /Internet\ For External
| +-------+ Gateway +---------------------- | +-------+ Gateway +----------------------
| \ / to reach via Internet | \ / to reach via Internet
skipping to change at page 7, line 38 skipping to change at page 12, line 5
| | +-+----+ +------+ | | +-+----+ +------+
| | / \ For Direct /customer\ | | / \ For Direct /customer\
| +-------+ Gateway +----------+ gateway | | +-------+ Gateway +----------+ gateway |
| \ / Connect \ / | \ / Connect \ /
| +-+----+ +------+ | +-+----+ +------+
| | | |
+------------------------+ +------------------------+
Figure 1: Examples of Multiple Cloud DC connections. Figure 1: Examples of Multiple Cloud DC connections.
3.2. Interconnect Private and Public Cloud DCs 4.2. Inter-Cloud Interconnection
It is likely that hybrid designs will become the rule for cloud
services, as more enterprises see the benefits of integrating public
and private cloud infrastructures. However, enabling the growth of
hybrid cloud deployments in the enterprise requires fast and safe
interconnection between public and private cloud services.
For an enterprise to connect to applications & workloads hosted in
multiple Cloud DCs, the enterprise can use IPsec tunnels established
over the Internet or a (virtualized) leased line service to connect
its on-premises gateways to each of the Cloud DC's gateways, virtual
routers instantiated in the Cloud DCs, or any other suitable design
(including a combination thereof).
Some enterprises prefer to instantiate their own virtual
CPEs/routers inside the Cloud DC to connect the workloads within the
Cloud DC. Then an overlay path is established between customer
gateways to the virtual CPEs/routers for reaching the workloads
inside the cloud DC.
3.3. Desired Properties for Networks that interconnect Hybrid Clouds
The networks that interconnect hybrid cloud DCs must address the
following requirements:
- High availability to access all workloads in the desired cloud
DCs.
Many enterprises include cloud infrastructures in their
disaster recovery strategy, e.g., by enforcing periodic backup
policies within the cloud, or by running backup applications in
the Cloud, etc. Therefore, the connection to the cloud DCs may
not be permanent, but rather needs to be on-demand.
- Global reachability from different geographical zones, thereby
facilitating the proximity of applications as a function of the
end users' location, to improve latency.
- Elasticity: prompt connection to newly instantiated
applications at Cloud DCs when usages increase and prompt
release of connection after applications at locations being
removed when demands change.
Some enterprises have front-end web portals running in cloud
DCs and database servers in their on-premises DCs. Those Front-
end web portals need to be reachable from the public Internet.
The backend connection to the sensitive data in database
servers hosted in the on-premises DCs might need secure
connections.
- Scalable security management. IPsec is commonly used to
interconnect cloud gateways with CPEs deployed in the
enterprise premises. For enterprises with a large number or
branch offices, managing the IPsec's Security Associations
among many nodes can be very difficult.
4. Multiple Clouds Interconnection
4.1. Multi-Cloud Interconnection
Enterprises today can instantiate their workloads or applications in
Cloud DCs owned by different Cloud providers, e.g. AWS, Azure,
GoogleCloud, Oracle, etc. Interconnecting those workloads involves
three parties: The Enterprise, its network service providers, and
the Cloud providers.
All Cloud Operators offer secure ways to connect enterprises' on-
prem sites/DCs with their Cloud DCs.
Some Cloud Operators allow enterprises to connect via private
networks. For example, AWS's DirectConnect allows enterprises to use rd 3 party provided private Layer 2 path from enterprises' GW to AWS
DirectConnect GW. Microsoft's ExpressRoute allows extension of a
private network to any of the Microsoft cloud services, including
Azure and Office365. ExpressRoute is configured using Layer 3
routing. Customers can opt for redundancy by provisioning dual links
from their location to two Microsoft Enterprise edge routers (MSEEs)
located within a third-party ExpressRoute peering location. The BGP
routing protocol is then setup over WAN links to provide redundancy
to the cloud. This redundancy is maintained from the peering data
center into Microsoft's cloud network.
Google's Cloud Dedicated Interconnect offers similar network
connectivity options as AWS and Microsoft. One distinct difference,
however, is that Google's service allows customers access to the
entire global cloud network by default. It does this by connecting
your on-premises network with the Google Cloud using BGP and Google
Cloud Routers to provide optimal paths to the different regions of
the global cloud infrastructure.
All those connectivity options are between Cloud providers' DCs and The connectivity options to Cloud DCs described in the previous
the Enterprises, but not between cloud DCs. For example, to connect section are for reaching Cloud providers' DCs, but not between cloud
applications in AWS Cloud to applications in Azure Cloud, there must DCs. When applications in AWS Cloud need to communicate with
be a third-party gateway (physical or virtual) to interconnect the applications in Azure, today's practice requires a third-party
AWS's Layer 2 DirectConnect path with Azure's Layer 3 ExpressRoute. gateway (physical or virtual) to interconnect the AWS's Layer 2
DirectConnect path with Azure's Layer 3 ExpressRoute.
Enterprises can also instantiate their own virtual routers in Enterprises can also instantiate their own virtual routers in
different Cloud DCs and administer IPsec tunnels among them, which different Cloud DCs and administer IPsec tunnels among them, which
by itself is not a trivial task. Or by leveraging open source VPN by itself is not a trivial task. Or by leveraging open source VPN
software such as strongSwan, you create an IPSec connection to the software such as strongSwan, you create an IPSec connection to the
Azure gateway using a shared key. The strong swan instance within Azure gateway using a shared key. The StrongSwan instance within AWS
AWS not only can connect to Azure but can also be used to facilitate not only can connect to Azure but can also be used to facilitate
traffic to other nodes within the AWS VPC by configuring forwarding traffic to other nodes within the AWS VPC by configuring forwarding
and using appropriate routing rules for the VPC. Most Cloud and using appropriate routing rules for the VPC.
operators, such as AWS VPC or Azure VNET, use non-globally routable
CIDR from private IPv4 address ranges as specified by RFC1918. To Most Cloud operators, such as AWS VPC or Azure VNET, use non-
establish IPsec tunnel between two Cloud DCs, it is necessary to globally routable CIDR from private IPv4 address ranges as specified
exchange Public routable addresses for applications in different by RFC1918. To establish IPsec tunnel between two Cloud DCs, it is
Cloud DCs. [BGP-SDWAN] describes one method. Other methods are worth necessary to exchange Public routable addresses for applications in
exploring. different Cloud DCs. [BGP-SDWAN] describes one method. Other methods
are worth exploring.
In summary, here are some approaches, available now (which might In summary, here are some approaches, available now (which might
change in the future), to interconnect workloads among different change in the future), to interconnect workloads among different
Cloud DCs: Cloud DCs:
a) Utilize Cloud DC provided inter/intra-cloud connectivity a) Utilize Cloud DC provided inter/intra-cloud connectivity
services (e.g., AWS Transit Gateway) to connect workloads services (e.g., AWS Transit Gateway) to connect workloads
instantiated in multiple VPCs. Such services are provided with instantiated in multiple VPCs. Such services are provided with
the cloud gateway to connect to external networks (e.g., AWS the cloud gateway to connect to external networks (e.g., AWS
DirectConnect Gateway). DirectConnect Gateway).
b) Hairpin all traffic through the customer gateway, meaning all b) Hairpin all traffic through the customer gateway, meaning all
workloads are directly connected to the customer gateway, so workloads are directly connected to the customer gateway, so
that communications among workloads within one Cloud DC must that communications among workloads within one Cloud DC must
traverse through the customer gateway. traverse through the customer gateway.
c) Establish direct tunnels among different VPCs (AWS' Virtual c) Establish direct tunnels among different VPCs (AWS' Virtual
Private Clouds) and VNET (Azure's Virtual Networks) via Private Clouds) and VNET (Azure's Virtual Networks) via
client's own virtual routers instantiated within Cloud DCs. client's own virtual routers instantiated within Cloud DCs.
DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN
(Dynamic Smart VPN) techniques can be used to establish direct (Dynamic Smart VPN) techniques can be used to establish direct
Multi-point-to-Point or multi-point-to multi-point tunnels Multi-point-to-Point or multi-point-to multi-point tunnels
among those client's own virtual routers. among those client's own virtual routers.
Approach a) usually does not work if Cloud DCs are owned and managed Approach a) usually does not work if Cloud DCs are owned and managed
by different Cloud providers. by different Cloud providers.
skipping to change at page 11, line 12 skipping to change at page 13, line 30
There are many differences between virtual routers in Public Cloud There are many differences between virtual routers in Public Cloud
DCs and the nodes in an NBMA network. NHRP cannot be used for DCs and the nodes in an NBMA network. NHRP cannot be used for
registering virtual routers in Cloud DCs unless an extension of such registering virtual routers in Cloud DCs unless an extension of such
protocols is developed for that purpose, e.g. taking NAT or dynamic protocols is developed for that purpose, e.g. taking NAT or dynamic
addresses into consideration. Therefore, DMVPN and/or DSVPN cannot addresses into consideration. Therefore, DMVPN and/or DSVPN cannot
be used directly for connecting workloads in hybrid Cloud DCs. be used directly for connecting workloads in hybrid Cloud DCs.
Other protocols such as BGP can be used, as described in [BGP- Other protocols such as BGP can be used, as described in [BGP-
SDWAN]. SDWAN].
4.2. Desired Properties for Multi-Cloud Interconnection
Different Cloud Operators have different APIs to access their Cloud
resources. It is difficult to move applications built by one Cloud
operator's APIs to another. However, it is highly desirable to have
a single and consistent way to manage the networks and respective
security policies for interconnecting applications hosted in
different Cloud DCs.
The desired property would be having a single network fabric to
which different Cloud DCs and enterprise's multiple sites can be
attached or detached, with a common interface for setting desired
policies. SDWAN is positioned to become that network fabric enabling
Cloud DCs to be dynamically attached or detached. But the reality is
that different Cloud Operators have different access methods, and
Cloud DCs might be geographically far apart. More Cloud connectivity
problems are described in the subsequent sections.
The difficulty of connecting applications in different Clouds might
be stemmed from the fact that they are direct competitors. Usually
traffic flow out of Cloud DCs incur charges. Therefore, direct
communications between applications in different Cloud DCs can be
more expensive than intra Cloud communications.
5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs 5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs
Traditional MPLS-based VPNs have been widely deployed as an Traditional MPLS-based VPNs have been widely deployed as an
effective way to support businesses and organizations that require effective way to support businesses and organizations that require
network performance and reliability. MPLS shifted the burden of network performance and reliability. MPLS shifted the burden of
managing a VPN service from enterprises to service providers. The managing a VPN service from enterprises to service providers. The
CPEs attached to MPLS VPNs are also simpler and less expensive, CPEs attached to MPLS VPNs are also simpler and less expensive,
since they do not need to manage routes to remote sites; they simply because they do not need to manage routes to remote sites; they
pass all outbound traffic to the MPLS VPN PEs to which the CPEs are simply pass all outbound traffic to the MPLS VPN PEs to which the
attached (albeit multi-homing scenarios require more processing CPEs are attached (albeit multi-homing scenarios require more
logic on CPEs). MPLS has addressed the problems of scale, processing logic on CPEs). MPLS has addressed the problems of
availability, and fast recovery from network faults, and scale, availability, and fast recovery from network faults, and
incorporated traffic-engineering capabilities. incorporated traffic-engineering capabilities.
However, traditional MPLS-based VPN solutions are sub-optimized for However, traditional MPLS-based VPN solutions are sub-optimized for
connecting end-users to dynamic workloads/applications in cloud DCs connecting end-users to dynamic workloads/applications in cloud DCs
because: because:
- The Provider Edge (PE) nodes of the enterprise's VPNs might not - The Provider Edge (PE) nodes of the enterprise's VPNs might not
have direct connections to third party cloud DCs that are used have direct connections to third party cloud DCs that are used
for hosting workloads with the goal of providing an easy access for hosting workloads with the goal of providing an easy access
to enterprises' end-users. to enterprises' end-users.
- It usually takes some time to deploy provider edge (PE) routers - It takes some time to deploy provider edge (PE) routers at new
at new locations. When enterprise's workloads are changed from locations. When enterprise's workloads are changed from one
one cloud DC to another (i.e., removed from one DC and re- cloud DC to another (i.e., removed from one DC and re-
instantiated to another location when demand changes), the instantiated to another location when demand changes), the
enterprise branch offices need to be connected to the new cloud enterprise branch offices need to be connected to the new cloud
DC, but the network service provider might not have PEs located DC, but the network service provider might not have PEs located
at the new location. at the new location.
One of the main drivers for moving workloads into the cloud is One of the main drivers for moving workloads into the cloud is
the widely available cloud DCs at geographically diverse the widely available cloud DCs at geographically diverse
locations, where apps can be instantiated so that they can be locations, where apps can be instantiated so that they can be
as close to their end-users as possible. When the user base as close to their end-users as possible. When the user base
changes, the applications may be migrated to a new cloud DC changes, the applications may be migrated to a new cloud DC
skipping to change at page 12, line 43 skipping to change at page 14, line 41
to connect to a Cloud provider at multiple locations. The to connect to a Cloud provider at multiple locations. The
connection locations often correspond to gateways of different connection locations often correspond to gateways of different
Cloud DC locations from the Cloud provider. The different Cloud DC locations from the Cloud provider. The different
Cloud DCs are interconnected by the Cloud provider's own Cloud DCs are interconnected by the Cloud provider's own
internal network. At each connection location (gateway), the internal network. At each connection location (gateway), the
Cloud provider uses BGP to advertise all of the prefixes in the Cloud provider uses BGP to advertise all of the prefixes in the
enterprise's VPC, regardless of which Cloud DC a given prefix enterprise's VPC, regardless of which Cloud DC a given prefix
is actually in. This can result in inefficient routing for the is actually in. This can result in inefficient routing for the
end-to-end data path. end-to-end data path.
- Extensive usage of Overlay by Cloud DCs:
Many cloud DCs use an overlay to connect their gateways to the
workloads located inside the DC. There is currently no standard
that specifies the interworking between the Cloud Overlay and
the enterprise' existing underlay networks. One of the
characteristics of overlay networks is that some of the WAN
ports of the edge nodes connect to third party networks. There
is therefore a need to propagate WAN port information to remote
authorized peers in third party network domains in addition to
route propagation. Such an exchange cannot happen before
communication between peers is properly secured.
Another roadblock is the lack of a standard way to express and Another roadblock is the lack of a standard way to express and
enforce consistent security policies for workloads that not only use enforce consistent security policies for workloads that not only use
virtual addresses, but in which are also very likely hosted in virtual addresses, but in which are also very likely hosted in
different locations within the Cloud DC [RFC8192]. The current VPN different locations within the Cloud DC [RFC8192]. The current VPN
path computation and bandwidth allocation schemes may not be path computation and bandwidth allocation schemes may not be
flexible enough to address the need for enterprises to rapidly flexible enough to address the need for enterprises to rapidly
connect to dynamically instantiated (or removed) workloads and connect to dynamically instantiated (or removed) workloads and
applications regardless of their location/nature (i.e., third party applications regardless of their location/nature (i.e., third party
cloud DCs). cloud DCs).
6. Problem with using IPsec tunnels to Cloud DCs 6. Problem with using IPsec tunnels to Cloud DCs
As described in the previous section, many Cloud operators expose As described in the previous section, many Cloud operators expose
their gateways for external entities (which can be enterprises their gateways for external entities (which can be enterprises
themselves) to directly establish IPsec tunnels. Enterprises can themselves) to directly establish IPsec tunnels. Enterprises can
also instantiate virtual routers within Cloud DCs to connect to also instantiate virtual routers within Cloud DCs to connect to
their on-premises devices via IPsec tunnels. If there is only one their on-premises devices via IPsec tunnels.
enterprise location that needs to reach the Cloud DC, an IPsec
tunnel is a very convenient solution.
However, many medium-to-large enterprises usually have multiple
sites and multiple data centers. For workloads and apps hosted in
cloud DCs, multiple sites need to communicate securely with those
cloud workloads and apps. This section documents some of the issues
associated with using IPsec tunnels to connect enterprise premises
with cloud gateways.
6.1. Complexity of multi-point any-to-any interconnection
The dynamic workload instantiated in cloud DC needs to communicate 6.1. Scaling Issues with IPsec Tunnels
with multiple branch offices and on-premises data centers. Most
enterprises need multi-point interconnection among multiple
locations, which can be provided by means of MPLS L2/L3 VPNs.
Using IPsec overlay paths to connect all branches & on-premises data If there is only one enterprise location that needs to reach the
centers to cloud DCs requires CPEs to manage routing among Cloud DCs Cloud DC, an IPsec tunnel is a very convenient solution.
gateways and the CPEs located at other branch locations, which can
dramatically increase the complexity of the design, possibly at the
cost of jeopardizing the CPE performance.
The complexity of requiring CPEs to maintain routing among other However, many medium-to-large enterprises have multiple sites and
CPEs is one of the reasons why enterprises migrated from Frame Relay multiple data centers. For multiple sites to communicate with
based services to MPLS-based VPN services. workloads and apps hosted in cloud DCs, Cloud DC gateways have to
maintain many IPsec tunnels to all those locations. In addition,
each of those IPsec Tunnels requires pair-wise periodic key
refreshment. For a company with hundreds or thousands of locations,
there could be hundreds (or even thousands) of IPsec tunnels
terminating at the cloud DC gateway, which is very processing
intensive. That is why many cloud operators only allow a limited
number of (IPsec) tunnels & bandwidth to each customer.
MPLS-based VPNs have their PEs directly connected to the CPEs. Alternatively, you could use a solution like group encryption where
Therefore, CPEs only need to forward all traffic to the directly a single IPsec SA is necessary at the GW but the drawback is key
attached PEs, which are therefore responsible for enforcing the distribution and maintenance of a key server, etc.
routing policy within the corresponding VPNs. Even for multi-homed
CPEs, the CPEs only need to forward traffic among the directly
connected PEs. However, when using IPsec tunnels between CPEs and
Cloud DCs, the CPEs need to compute, select, establish and maintain
routes for traffic to be forwarded to Cloud DCs, to remote CPEs via
VPN, or directly.
6.2. Poor performance over long distance 6.2. Poor performance over long distance
When enterprise CPEs or gateways are far away from cloud DC gateways When enterprise CPEs or gateways are far away from cloud DC gateways
or across country/continent boundaries, performance of IPsec tunnels or across country/continent boundaries, performance of IPsec tunnels
over the public Internet can be problematic and unpredictable. Even over the public Internet can be problematic and unpredictable. Even
though there are many monitoring tools available to measure delay though there are many monitoring tools available to measure delay
and various performance characteristics of the network, the and various performance characteristics of the network, the
measurement for paths over the Internet is passive and past measurement for paths over the Internet is passive and past
measurements may not represent future performance. measurements may not represent future performance.
Many cloud providers can replicate workloads in different available Many cloud providers can replicate workloads in different available
zones. An App instantiated in a cloud DC closest to clients may have zones. An App instantiated in a cloud DC closest to clients may have
to cooperate with another App (or its mirror image) in another to cooperate with another App (or its mirror image) in another
region or database server(s) in the on-premises DC. This kind of region or database server(s) in the on-premises DC. This kind of
coordination requires predicable networking behavior/performance coordination requires predicable networking behavior/performance
among those locations. among those locations.
6.3. Scaling Issues with IPsec Tunnels
IPsec can achieve secure overlay connections between two locations
over any underlay network, e.g., between CPEs and Cloud DC Gateways.
If there is only one enterprise location connected to the cloud
gateway, a small number of IPsec tunnels can be configured on-demand
between the on-premises DC and the Cloud DC, which is an easy and
flexible solution.
However, for multiple enterprise locations to reach workloads hosted
in cloud DCs, the cloud DC gateway needs to maintain multiple IPsec
tunnels to all those locations (e.g., as a hub & spoke topology).
For a company with hundreds or thousands of locations, there could
be hundreds (or even thousands) of IPsec tunnels terminating at the
cloud DC gateway, which is not only very expensive (because Cloud
Operators usually charge their customers based on connections), but
can be very processing intensive for the gateway. Many cloud
operators only allow a limited number of (IPsec) tunnels & bandwidth
to each customer. Alternatively, you could use a solution like
group encryption where a single IPsec SA is necessary at the GW but
the drawback here is key distribution and maintenance of a key
server, etc.
7. Problems of Using SD-WAN to connect to Cloud DCs 7. Problems of Using SD-WAN to connect to Cloud DCs
SD-WAN can establish parallel paths over multiple underlay networks
between two locations on-demand, for example, to support the
connections established between two CPEs interconnected by a
traditional MPLS VPN ([RFC4364] or [RFC4664]) or by IPsec [RFC6071]
tunnels.
SD-WAN lets enterprises augment their current VPN network with cost- SD-WAN lets enterprises augment their current VPN network with cost-
effective, readily available Broadband Internet connectivity, effective, readily available Broadband Internet connectivity,
enabling some traffic offloading to paths over the Internet enabling some traffic offloading to paths over the Internet
according to differentiated, possibly application-based traffic according to differentiated, possibly application-based traffic
forwarding policies, or when the MPLS VPN connection between the two forwarding policies, or when the MPLS VPN connection between the two
locations is congested, or otherwise undesirable or unavailable. locations is congested, or otherwise undesirable or unavailable.
7.1. SD-WAN among branch offices vs. interconnect to Cloud DCs 7.1. More Complexity to Edge Nodes
SD-WAN interconnection of branch offices is not as simple as it
appears. For an enterprise with multiple sites, using SD-WAN overlay
paths among sites requires each CPE to manage all the addresses that
local hosts have the potential to reach, i.e., map internal VPN
addresses to appropriate SD-WAN paths. This is similar to the
complexity of Frame Relay based VPNs, where each CPE needed to
maintain mesh routing for all destinations if they were to avoid an
extra hop through a hub router. Even though SD-WAN CPEs can get
assistance from a central controller (instead of running a routing
protocol) to resolve the mapping between destinations and SD-WAN
paths, SD-WAN CPEs are still responsible for routing table
maintenance as remote destinations change their attachments, e.g.,
the dynamic workload in other DCs are de-commissioned or added.
Even though originally envisioned for interconnecting branch Augmenting transport path is not as simple as it appears. For an
offices, SD-WAN offers a very attractive way for enterprises to enterprise with multiple sites, CPE managed overlay paths among
connect to Cloud DCs. sites requires each CPE to manage all the addresses that local hosts
have potential to reach, i.e., map internal VPN addresses to
appropriate Overlay paths. This is similar to the complexity of
Frame Relay based VPNs, where each CPE needed to maintain mesh
routing for all destinations if they were to avoid an extra hop
through a hub router. Even with the assistance from a central
controller (instead of running a routing protocol) to resolve the
mapping between destinations and SD-WAN paths, SD-WAN CPEs are still
responsible for routing table maintenance as remote destinations
change their attachments, e.g., the dynamic workload in other DCs
are de-commissioned or added.
The SD-WAN for interconnecting branch offices and the SD-WAN for In addition, overlay path for interconnecting branch offices are
interconnecting to Cloud DCs have some differences: different from connecting to Cloud DCs:
- SD-WAN for interconnecting branch offices usually have two end- - Overlay path interconnecting branch offices usually have two
points (e.g., CPEs) controlled by one entity (e.g., a end-points (e.g. CPEs) controlled by one entity (e.g.
controller or management system operated by the enterprise). controllers or management systems operated by the enterprise).
- SD-WAN for Cloud DC interconnects may consider CPEs owned or - Connecting to Cloud DC may consists of CPEs owned or managed by
managed by the enterprise, while remote end-points are being the enterprise, and the remote end-points being managed or
managed or controlled by Cloud DCs (For the ease of controlled by Cloud DCs.
description, let's call such CPEs asymmetrically-managed CPEs).
- Cloud DCs may have different entry points (or devices) with one 7.2. Edge WAN Port Management
entry point that terminates a private direct connection (based
upon a leased line for example) and other entry points being
devices terminating the IPsec tunnels, as shown in Figure 2.
Therefore, the SD-WAN design becomes asymmetric. An SDWAN edge node can have WAN ports connected to different
+------------------------+ networks or public internet managed by different operators.
| ,---. ,---. | There is therefore a need to propagate WAN port property to
| (TN-1 ) ( TN-2)| TN: Tenant applications/workloads remote authorized peers in third party network domains in
| `-+-' +---+ `-+-' | addition to route propagation. Such an exchange cannot happen
| +----|vR1|----+ | before communication between peers is properly secured.
| ++--+ |
| | +-+----+
| | /Internet\ One path via
| +-------+ Gateway +---------------------+
| \ / Internet \
| +-+----+ \
+------------------------+ \
\
+------------------------+ native traffic \
| ,---. ,---. | without encryption|
| (TN-3 ) ( TN-4)| |
| `-+-' +--+ `-+-' | | +------+
| +----|vR|-----+ | +----+ CPE |
| ++-+ | | +------+
| | +-+----+ |
| | / virtual\ One path via IPsec Tunnel |
| +-------+ Gateway +-------------------------- +
| \ / Encrypted traffic over|
| +-+----+ public network |
+------------------------+ |
|
+------------------------+ |
| ,---. ,---. | Native traffic |
| (TN-5 ) ( TN-6)| without encryption |
| `-+-' +--+ `-+-' | over secure network|
| +----|vR|-----+ | |
| ++-+ | |
| | +-+----+ +------+ |
| | / \ Via Direct /customer\ |
| +-------+ Gateway +----------+ gateway |-----+
| \ / Connect \ /
| +-+----+ +------+
+------------------------+Customer GW has physical connection to AWS GW
Figure 2: Different Underlays to Reach Cloud DC 7.3. Forwarding based on Application
Forwarding based on application IDs instead of based on
destination IP addresses is often referred to as Application based
Segmentation. If the Applications have unique IP addresses, then
the Application Based Segmentation can be achieved by propagating
different BGP UPDATE messages to different nodes, as described in
[BGP-SDWAN-USAGE]. If the Application cannot be uniquely
identified by the IP addresses, more work is needed.
8. End-to-End Security Concerns for Data Flows 8. End-to-End Security Concerns for Data Flows
When IPsec tunnels established from enterprise on-premises CPEs When IPsec tunnels established from enterprise on-premises CPEs
are terminated at the Cloud DC gateway where the workloads or are terminated at the Cloud DC gateway where the workloads or
applications are hosted, some enterprises have concerns regarding applications are hosted, some enterprises have concerns regarding
traffic to/from their workload being exposed to others behind the traffic to/from their workload being exposed to others behind the
data center gateway (e.g., exposed to other organizations that data center gateway (e.g., exposed to other organizations that
have workloads in the same data center). have workloads in the same data center).
To ensure that traffic to/from workloads is not exposed to To ensure that traffic to/from workloads is not exposed to
 End of changes. 47 change blocks. 
374 lines changed or deleted 352 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/