idnits 2.17.1 draft-ietf-rtgwg-net2cloud-problem-statement-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 8 longer pages, the longest (page 12) being 62 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 21 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 267 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The "Author's Address" (or "Authors' Addresses") section title is misspelled. == Line 27 has weird spacing: '...rprises on-pr...' == Line 148 has weird spacing: '...d users locat...' == Line 568 has weird spacing: '...rprises end-u...' == Line 668 has weird spacing: '...ith the assis...' -- The document date (March 16, 2020) is 1501 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'SDWAN-BGP-USAGE' is mentioned on line 157, but not defined == Missing Reference: 'BGP-SDWAN-USAGE' is mentioned on line 700, but not defined == Unused Reference: 'ITU-T-X1036' is defined on line 767, but no explicit reference was found in the text == Unused Reference: 'RFC4364' is defined on line 774, but no explicit reference was found in the text == Unused Reference: 'RFC4664' is defined on line 777, but no explicit reference was found in the text Summary: 2 errors (**), 0 flaws (~~), 13 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group L. Dunbar 3 Internet Draft Futurewei 4 Intended status: Informational Andy Malis 5 Expires: September 16, 2020 Independent 6 C. Jacquenet 7 Orange 8 M. Toy 9 Verizon 10 March 16, 2020 12 Dynamic Networks to Hybrid Cloud DCs Problem Statement 13 draft-ietf-rtgwg-net2cloud-problem-statement-09 15 Abstract 17 This document describes the problems that enterprises face today 18 when interconnecting their branch offices with dynamic workloads in 19 third party data centers (a.k.a. Cloud DCs). There can be many 20 problems associated with network connecting to or among Clouds, many 21 of which probably are out of the IETF scope. The objective of this 22 document is to identify some of the problems that need additional 23 work in IETF Routing area. Other problems are out of the scope of 24 this document. 26 It examines some of the approaches interconnecting cloud DCs with 27 enterprises on-premises DCs & branch offices. This document also 28 describes some of the network problems that many enterprises face 29 when they have workloads & applications & data split among different 30 data centers, especially for those enterprises with multiple sites 31 that are already interconnected by VPNs (e.g., MPLS L2VPN/L3VPN). 33 Current operational problems are examined to determine whether there 34 is a need to improve existing protocols or whether a new protocol is 35 necessary to solve them. 37 Status of this Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF), its areas, and its working groups. Note that 44 other groups may also distribute working documents as Internet- 45 Drafts. 47 Internet-Drafts are draft documents valid for a maximum of six 48 months and may be updated, replaced, or obsoleted by other documents 49 at any time. It is inappropriate to use Internet-Drafts as 50 reference material or to cite them other than as "work in progress." 52 The list of current Internet-Drafts can be accessed at 53 http://www.ietf.org/ietf/1id-abstracts.txt 55 The list of Internet-Draft Shadow Directories can be accessed at 56 http://www.ietf.org/shadow.html 58 This Internet-Draft will expire on August 16, 2020. 60 Copyright Notice 62 Copyright (c) 2020 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust s Legal 66 Provisions Relating to IETF Documents 67 (http://trustee.ietf.org/license-info) in effect on the date of 68 publication of this document. Please review these documents 69 carefully, as they describe your rights and restrictions with 70 respect to this document. Code Components extracted from this 71 document must include Simplified BSD License text as described in 72 Section 4.e of the Trust Legal Provisions and are provided without 73 warranty as described in the Simplified BSD License. 75 Table of Contents 77 1. Introduction...................................................3 78 1.1. Key Characteristics of Cloud Services:....................3 79 1.2. Connecting to Cloud Services..............................3 80 1.3. The role of SD-WAN in connecting to Cloud Services........4 81 2. Definition of terms............................................5 82 3. High Level Issues of Connecting to Multi-Cloud.................6 83 3.1. Security Issues...........................................6 84 3.2. Authorization and Identity Management.....................6 85 3.3. API abstraction...........................................7 86 3.4. DNS for Cloud Resources...................................8 87 3.5. NAT for Cloud Services....................................9 88 3.6. Cloud Discovery...........................................9 89 4. Interconnecting Enterprise Sites with Cloud DCs...............10 90 4.1. Sites to Cloud DC........................................10 91 4.2. Inter-Cloud Interconnection..............................12 92 5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs...14 93 6. Problem with using IPsec tunnels to Cloud DCs.................15 94 6.1. Scaling Issues with IPsec Tunnels........................15 95 6.2. Poor performance over long distance......................16 96 7. Problems of Using SD-WAN to connect to Cloud DCs..............16 97 7.1. More Complexity to Edge Nodes............................17 98 7.2. Edge WAN Port Management.................................17 99 7.3. Forwarding based on Application..........................18 100 8. End-to-End Security Concerns for Data Flows...................18 101 9. Requirements for Dynamic Cloud Data Center VPNs...............18 102 10. Security Considerations......................................19 103 11. IANA Considerations..........................................19 104 12. References...................................................19 105 12.1. Normative References....................................19 106 12.2. Informative References..................................19 107 13. Acknowledgments..............................................20 109 1. Introduction 111 1.1. Key Characteristics of Cloud Services: 113 Key characteristics of Cloud Services are on-demand, scalable, 114 highly available, and usage-based billing. Cloud Services, such as, 115 compute, storage, network functions (most likely virtual), third 116 party managed applications, etc. are usually hosted and managed by 117 third parties Cloud Operators. Here are some examples of Cloud 118 network functions: Virtual Firewall services, Virtual private 119 network services, Virtual PBX services including voice and video 120 conferencing systems, etc. Cloud Data Center (DC) is shared 121 infrastructure that hosts the Cloud Services to many customers. 123 1.2. Connecting to Cloud Services 125 With the advent of widely available third-party cloud DCs and 126 services in diverse geographic locations and the advancement of 127 tools for monitoring and predicting application behaviors, it is 128 very attractive for enterprises to instantiate applications and 129 workloads in locations that are geographically closest to their end- 130 users. Such proximity can improve end-to-end latency and overall 131 user experience. Conversely, an enterprise can easily shutdown 132 applications and workloads whenever end-users are in motion (thereby 133 modifying the networking connection of subsequently relocated 134 applications and workloads). In addition, enterprises may wish to 135 take advantage of more and more business applications offered by 136 cloud operators. 138 The networks that interconnect hybrid cloud DCs must address the 139 following requirements: 140 - High availability to access all workloads in the desired cloud 141 DCs. 142 Many enterprises include cloud in their disaster recovery 143 strategy, such as enforcing periodic backup policies within the 144 cloud, or running backup applications in the Cloud. 146 - Global reachability from different geographical zones, thereby 147 facilitating the proximity of applications as a function of the 148 end users location, to improve latency. 149 - Elasticity: prompt connection to newly instantiated 150 applications at Cloud DCs when usages increase and prompt 151 release of connection after applications at locations being 152 removed when demands change. 153 - Scalable security management. 155 1.3. The role of SD-WAN in connecting to Cloud Services 157 Some of the characteristics of SD-WAN [SDWAN-BGP-USAGE], such as 158 network augmentation and forwarding based on application IDs instead 159 of based on destination IP addresses, are very essential for 160 connecting to on-demand Cloud services. 162 Issues associated with using SD-WAN for connecting to Cloud services 163 are also discussed in this document. 165 2. Definition of terms 167 Cloud DC: Third party Data Centers that usually host applications 168 and workload owned by different organizations or 169 tenants. 171 Controller: Used interchangeably with SD-WAN controller to manage 172 SD-WAN overlay path creation/deletion and monitoring the 173 path conditions between two or more sites. 175 DSVPN: Dynamic Smart Virtual Private Network. DSVPN is a secure 176 network that exchanges data between sites without 177 needing to pass traffic through an organization's 178 headquarter virtual private network (VPN) server or 179 router. 181 Heterogeneous Cloud: applications and workloads split among Cloud 182 DCs owned or managed by different operators. 184 Hybrid Clouds: Hybrid Clouds refers to an enterprise using its own 185 on-premises DCs in addition to Cloud services provided 186 by one or more cloud operators. (e.g. AWS, Azure, 187 Google, Salesforces, SAP, etc). 189 SD-WAN: Software Defined Wide Area Network. In this document, 190 SD-WAN refers to the solutions of pooling WAN 191 bandwidth from multiple underlay networks to get better 192 WAN bandwidth management, visibility & control. When the 193 underlay networks are private networks, traffic can 194 traverse without additional encryption; when the 195 underlay networks are public, such as Internet, some 196 traffic needs to be encrypted when traversing through 197 (depending on user provided policies). 199 VPC: Virtual Private Cloud is a virtual network dedicated to 200 one client account. It is logically isolated from other 201 virtual networks in a Cloud DC. Each client can launch 202 his/her desired resources, such as compute, storage, or 203 network functions into his/her VPC. Most Cloud 204 operators VPCs only support private addresses, some 205 support IPv4 only, others support IPv4/IPv6 dual stack. 207 3. High Level Issues of Connecting to Multi-Cloud 209 There are many problems associated with connecting to hybrid Cloud 210 Services, many of which are out of the IETF scope. This section is 211 to identify some of the high level problems that can be addressed by 212 IETF, especially by Routing area. Other problems are out of the 213 scope of this document. By no means has this section covered all 214 problems for connecting to Hybrid Cloud Services, e.g. difficulty in 215 managing cloud spending is not discussed here. 217 3.1. Security Issues 219 Cloud Services is built upon shared infrastructure, therefore not 220 secure by nature. Security has been a primary, and valid, concern 221 from the start of cloud computing: you are unable to see the exact 222 location where your data is stored or being processed. Headlines 223 highlighting data breaches, compromised credentials, and broken 224 authentication, hacked interfaces and APIs, account hijacking 225 haven t helped alleviate concerns. 227 Secure user identity management, authentication, and access control 228 mechanisms are important. Developing appropriate security 229 measurements can enhance the confidence needed by enterprises to 230 fully take advantage of Cloud Services. 232 3.2. Authorization and Identity Management 234 One of the more prominent challenges for Cloud Services is Identity 235 Management and Authorization. The Authorization not only includes 236 user authorization, but also the authorization of API calls by 237 applications from different Cloud DCs managed by different Cloud 238 Operators. In addition, there are authorization for Workload 239 Migration, Data Migration, and Workload Management. 241 There are many types of users in cloud environments, e.g. end users 242 for accessing applications hosted in Cloud DCs, Cloud-resource users 243 who are responsible for setting permissions for the resources based 244 on roles, access lists, IP addresses, domains, etc. 246 There are many types of Cloud authorizations: including MAC 247 (Mandatory Access Control) where each app owns individual access 248 permissions, DAC (Discretionary Access Control) where each app 249 requests permissions from an external permissions app, RBAC (Role- 250 based Access Control) where the authorization service owns roles 251 with different privileges on the cloud service, and ABAC (Attribute- 252 based Access Control) where access is based on request attributes 253 and policies. 255 IETF hasn t yet developed comprehensive specification for Identity 256 management and data models for Cloud Authorizations. 258 3.3. API abstraction 260 Different Cloud Operators have different APIs to access their Cloud 261 resources, security functions, the NAT, etc. 263 It is difficult to move applications built by one Cloud operator s 264 APIs to another. However, it is highly desirable to have a single 265 and consistent way to manage the networks and respective security 266 policies for interconnecting applications hosted in different Cloud 267 DCs. 269 The desired property would be having a single network fabric to 270 which different Cloud DCs and enterprise s multiple sites can be 271 attached or detached, with a common interface for setting desired 272 policies. 274 The difficulty of connecting applications in different Clouds might 275 be stemmed from the fact that they are direct competitors. Usually 276 traffic flow out of Cloud DCs incur charges. Therefore, direct 277 communications between applications in different Cloud DCs can be 278 more expensive than intra Cloud communications. 280 It is desirable to have a common API shim layer or abstraction for 281 different Cloud providers to make it easier to move applications 282 from one Cloud DC to another. 284 3.4. DNS for Cloud Resources 286 DNS name resolution is essential for on-premises and cloud-based 287 resources. For customers with hybrid workloads, which include on- 288 premises and cloud-based resources, extra steps are necessary to 289 configure DNS to work seamlessly across both environments. 291 Cloud operators have their own DNS to resolve resources within their 292 Cloud DCs and to well-known public domains. Cloud s DNS can be 293 configured to forward queries to customer managed authoritative DNS 294 servers hosted on-premises, and to respond to DNS queries forwarded 295 by on-premises DNS servers. 297 For enterprises utilizing Cloud services by different cloud 298 operators, it is necessary to establish policies and rules on 299 how/where to forward DNS queries to. When applications in one Cloud 300 need to communication with applications hosted in another Cloud, 301 there could be DNS queries from one Cloud DC being forwarded to the 302 enterprise s on premise DNS, which in turn be forwarded to the DNS 303 service in another Cloud. Needless to say, configuration can be 304 complex depending on the application communication patterns. 306 However, even with carefully managed policies and configurations, 307 collisions can still occur. If you use an internal name like .cloud 308 and then want your services to be available via or within some other 309 cloud provider which also uses .cloud, then it can't work. 310 Therefore, it is better to use the global domain name even when an 311 organization does not make all its namespace globally resolvable. An 312 organization's globally unique DNS can include subdomains that 313 cannot be resolved at all outside certain restricted paths, zones 314 that resolve differently based on the origin of the query, and zones 315 that resolve the same globally for all queries from any source. 317 Globally unique names do not equate to globally resolvable names or 318 even global names that resolve the same way from every perspective. 319 Globally unique names do prevent any possibility of collision at the 320 present or in the future and they make DNSSEC trust manageable. 321 Consider using a registered and fully qualified domain name (FQDN) 322 from global DNS as the root for enterprise and other internal 323 namespaces. 325 3.5. NAT for Cloud Services 327 Cloud resources, such as VM instances, are usually assigned with 328 private IP addresses. By configuration, some private subnets can 329 have the NAT function to reach out to external network and some 330 private subnets are internal to Cloud only. 332 Different Cloud operators support different levels of NAT functions. 333 For example, AWS NAT Gateway does not currently support connections 334 towards, or from VPC Endpoints, VPN, AWS Direct Connect, or VPC 335 Peering. https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc- 336 nat-gateway.html#nat-gateway-other-services. AWS Direct 337 Connect/VPN/VPC Peering does not currently support any NAT 338 functionality. 340 Google s Cloud NAT allows Google Cloud virtual machine (VM) 341 instances without external IP addresses and private Google 342 Kubernetes Engine (GKE) clusters to connect to the Internet. Cloud 343 NAT implements outbound NAT in conjunction with a default route to 344 allow instances to reach the Internet. It does not implement inbound 345 NAT. Hosts outside of VPC network can only respond to established 346 connections initiated by instances inside the Google Cloud; they 347 cannot initiate their own, new connections to Cloud instances via 348 NAT. 350 For enterprises with applications running in different Cloud DCs, 351 proper configuration of NAT have to be performed in Cloud DC and in 352 their own on-premise DC. 354 3.6. Cloud Discovery 356 One of the concerns of using Cloud services is not aware where the 357 resource is actually located, especially Cloud operators can move 358 application instances from one place to another. When applications 359 in Cloud communicate with on-premise applications, it may not be 360 clear where the Cloud applications are located or to which VPCs they 361 belong. 363 It is highly desirable to have tools to discover cloud services in 364 much the same way as you would discover your on-premises 365 infrastructure. A significant difference is that cloud discovery 366 uses the cloud vendor's API to extract data on your cloud services, 367 rather than the direct access used in scanning your on-premises 368 infrastructure. 370 Standard data models, APIs or tools can alleviate concerns of 371 enterprise utilizing Cloud Resources, e.g. having a Cloud service 372 scan that connects to the API of the cloud provider and collects 373 information directly. 375 4. Interconnecting Enterprise Sites with Cloud DCs 377 Considering that many enterprises already have existing VPNs (e.g. 378 MPLS based L2VPN or L3VPN) interconnecting branch offices & on- 379 premises data centers, connecting to Cloud services will be mixed of 380 different types of networks. When an enterprise s existing VPN 381 service providers do not have direct connections to the 382 corresponding cloud DCs that the enterprise prefers to use, the 383 enterprise has to face additional infrastructure and operational 384 costs to utilize Cloud services. 386 4.1. Sites to Cloud DC 388 Most Cloud operators offer some type of network gateway through 389 which an enterprise can reach their workloads hosted in the Cloud 390 DCs. AWS (Amazon Web Services) offers the following options to reach 391 workloads in AWS Cloud DCs: 393 - AWS Internet gateway allows communication between instances in 394 AWS VPC and the internet. 395 - AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are 396 established between an enterprise s own gateway and AWS vGW, so 397 that the communications between those gateways can be secured 398 from the underlay (which might be the public Internet). 399 - AWS Direct Connect, which allows enterprises to purchase direct 400 connect from network service providers to get a private leased 401 line interconnecting the enterprises gateway(s) and the AWS 402 Direct Connect routers. In addition, an AWS Transit Gateway can 403 be used to interconnect multiple VPCs in different Availability 404 Zones. AWS Transit Gateway acts as a hub that controls how 405 traffic is forwarded among all the connected networks which act 406 like spokes. 408 Microsoft s ExpressRoute allows extension of a private network to 409 any of the Microsoft cloud services, including Azure and Office365. 410 ExpressRoute is configured using Layer 3 routing. Customers can opt 411 for redundancy by provisioning dual links from their location to two 412 Microsoft Enterprise edge routers (MSEEs) located within a third- 413 party ExpressRoute peering location. The BGP routing protocol is 414 then setup over WAN links to provide redundancy to the cloud. This 415 redundancy is maintained from the peering data center into 416 Microsoft's cloud network. 418 Google s Cloud Dedicated Interconnect offers similar network 419 connectivity options as AWS and Microsoft. One distinct difference, 420 however, is that Google s service allows customers access to the 421 entire global cloud network by default. It does this by connecting 422 your on-premises network with the Google Cloud using BGP and Google 423 Cloud Routers to provide optimal paths to the different regions of 424 the global cloud infrastructure. 426 Figure below shows an example of some of a tenant s workloads are 427 accessible via a virtual router connected by AWS Internet Gateway; 428 some are accessible via AWS vGW, and others are accessible via AWS 429 Direct Connect. 431 Different types of access require different level of security 432 functions. Sometimes it is not visible to end customers which type 433 of network access is used for a specific application instance. To 434 get better visibility, separate virtual routers (e.g. vR1 & vR2) can 435 be deployed to differentiate traffic to/from different cloud GWs. It 436 is important for some enterprises to be able to observe the specific 437 behaviors when connected by different connections. 439 Customer Gateway can be customer owned router or ports physically 440 connected to AWS Direct Connect GW. 442 +------------------------+ 443 | ,---. ,---. | 444 | (TN-1 ) ( TN-2)| 445 | `-+-' +---+ `-+-' | 446 | +----|vR1|----+ | 447 | ++--+ | 448 | | +-+----+ 449 | | /Internet\ For External 450 | +-------+ Gateway +---------------------- 451 | \ / to reach via Internet 452 | +-+----+ 453 | | 454 | ,---. ,---. | 455 | (TN-1 ) ( TN-2)| 456 | `-+-' +---+ `-+-' | 457 | +----|vR2|----+ | 458 | ++--+ | 459 | | +-+----+ 460 | | / virtual\ For IPsec Tunnel 461 | +-------+ Gateway +---------------------- 462 | | \ / termination 463 | | +-+----+ 464 | | | 465 | | +-+----+ +------+ 466 | | / \ For Direct /customer\ 467 | +-------+ Gateway +----------+ gateway | 468 | \ / Connect \ / 469 | +-+----+ +------+ 470 | | 471 +------------------------+ 473 Figure 1: Examples of Multiple Cloud DC connections. 475 4.2. Inter-Cloud Interconnection 477 The connectivity options to Cloud DCs described in the previous 478 section are for reaching Cloud providers DCs, but not between cloud 479 DCs. When applications in AWS Cloud need to communicate with 480 applications in Azure, today s practice requires a third-party 481 gateway (physical or virtual) to interconnect the AWS s Layer 2 482 DirectConnect path with Azure s Layer 3 ExpressRoute. 484 Enterprises can also instantiate their own virtual routers in 485 different Cloud DCs and administer IPsec tunnels among them, which 486 by itself is not a trivial task. Or by leveraging open source VPN 487 software such as strongSwan, you create an IPSec connection to the 488 Azure gateway using a shared key. The StrongSwan instance within AWS 489 not only can connect to Azure but can also be used to facilitate 490 traffic to other nodes within the AWS VPC by configuring forwarding 491 and using appropriate routing rules for the VPC. 493 Most Cloud operators, such as AWS VPC or Azure VNET, use non- 494 globally routable CIDR from private IPv4 address ranges as specified 495 by RFC1918. To establish IPsec tunnel between two Cloud DCs, it is 496 necessary to exchange Public routable addresses for applications in 497 different Cloud DCs. [BGP-SDWAN] describes one method. Other methods 498 are worth exploring. 500 In summary, here are some approaches, available now (which might 501 change in the future), to interconnect workloads among different 502 Cloud DCs: 504 a) 505 Utilize Cloud DC provided inter/intra-cloud connectivity 506 services (e.g., AWS Transit Gateway) to connect workloads 507 instantiated in multiple VPCs. Such services are provided with 508 the cloud gateway to connect to external networks (e.g., AWS 509 DirectConnect Gateway). 510 b) 511 Hairpin all traffic through the customer gateway, meaning all 512 workloads are directly connected to the customer gateway, so 513 that communications among workloads within one Cloud DC must 514 traverse through the customer gateway. 515 c) 516 Establish direct tunnels among different VPCs (AWS Virtual 517 Private Clouds) and VNET (Azure s Virtual Networks) via 518 client s own virtual routers instantiated within Cloud DCs. 519 DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN 520 (Dynamic Smart VPN) techniques can be used to establish direct 521 Multi-point-to-Point or multi-point-to multi-point tunnels 522 among those client s own virtual routers. 524 Approach a) usually does not work if Cloud DCs are owned and managed 525 by different Cloud providers. 527 Approach b) creates additional transmission delay plus incurring 528 cost when exiting Cloud DCs. 530 For the Approach c), DMVPN or DSVPN use NHRP (Next Hop Resolution 531 Protocol) [RFC2735] so that spoke nodes can register their IP 532 addresses & WAN ports with the hub node. The IETF ION 533 (Internetworking over NBMA (non-broadcast multiple access) WG 534 standardized NHRP for connection-oriented NBMA network (such as ATM) 535 network address resolution more than two decades ago. 537 There are many differences between virtual routers in Public Cloud 538 DCs and the nodes in an NBMA network. NHRP cannot be used for 539 registering virtual routers in Cloud DCs unless an extension of such 540 protocols is developed for that purpose, e.g. taking NAT or dynamic 541 addresses into consideration. Therefore, DMVPN and/or DSVPN cannot 542 be used directly for connecting workloads in hybrid Cloud DCs. 544 Other protocols such as BGP can be used, as described in [BGP- 545 SDWAN]. 547 5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs 549 Traditional MPLS-based VPNs have been widely deployed as an 550 effective way to support businesses and organizations that require 551 network performance and reliability. MPLS shifted the burden of 552 managing a VPN service from enterprises to service providers. The 553 CPEs attached to MPLS VPNs are also simpler and less expensive, 554 because they do not need to manage routes to remote sites; they 555 simply pass all outbound traffic to the MPLS VPN PEs to which the 556 CPEs are attached (albeit multi-homing scenarios require more 557 processing logic on CPEs). MPLS has addressed the problems of 558 scale, availability, and fast recovery from network faults, and 559 incorporated traffic-engineering capabilities. 561 However, traditional MPLS-based VPN solutions are sub-optimized for 562 connecting end-users to dynamic workloads/applications in cloud DCs 563 because: 565 - The Provider Edge (PE) nodes of the enterprise s VPNs might not 566 have direct connections to third party cloud DCs that are used 567 for hosting workloads with the goal of providing an easy access 568 to enterprises end-users. 570 - It takes some time to deploy provider edge (PE) routers at new 571 locations. When enterprise s workloads are changed from one 572 cloud DC to another (i.e., removed from one DC and re- 573 instantiated to another location when demand changes), the 574 enterprise branch offices need to be connected to the new cloud 575 DC, but the network service provider might not have PEs located 576 at the new location. 578 One of the main drivers for moving workloads into the cloud is 579 the widely available cloud DCs at geographically diverse 580 locations, where apps can be instantiated so that they can be 581 as close to their end-users as possible. When the user base 582 changes, the applications may be migrated to a new cloud DC 583 location closest to the new user base. 585 - Most of the cloud DCs do not expose their internal networks. An 586 enterprise with a hybrid cloud deployment can use an MPLS-VPN 587 to connect to a Cloud provider at multiple locations. The 588 connection locations often correspond to gateways of different 589 Cloud DC locations from the Cloud provider. The different 590 Cloud DCs are interconnected by the Cloud provider's own 591 internal network. At each connection location (gateway), the 592 Cloud provider uses BGP to advertise all of the prefixes in the 593 enterprise's VPC, regardless of which Cloud DC a given prefix 594 is actually in. This can result in inefficient routing for the 595 end-to-end data path. 597 Another roadblock is the lack of a standard way to express and 598 enforce consistent security policies for workloads that not only use 599 virtual addresses, but in which are also very likely hosted in 600 different locations within the Cloud DC [RFC8192]. The current VPN 601 path computation and bandwidth allocation schemes may not be 602 flexible enough to address the need for enterprises to rapidly 603 connect to dynamically instantiated (or removed) workloads and 604 applications regardless of their location/nature (i.e., third party 605 cloud DCs). 607 6. Problem with using IPsec tunnels to Cloud DCs 608 As described in the previous section, many Cloud operators expose 609 their gateways for external entities (which can be enterprises 610 themselves) to directly establish IPsec tunnels. Enterprises can 611 also instantiate virtual routers within Cloud DCs to connect to 612 their on-premises devices via IPsec tunnels. 614 6.1. Scaling Issues with IPsec Tunnels 616 If there is only one enterprise location that needs to reach the 617 Cloud DC, an IPsec tunnel is a very convenient solution. 619 However, many medium-to-large enterprises have multiple sites and 620 multiple data centers. For multiple sites to communicate with 621 workloads and apps hosted in cloud DCs, Cloud DC gateways have to 622 maintain many IPsec tunnels to all those locations. In addition, 623 each of those IPsec Tunnels requires pair-wise periodic key 624 refreshment. For a company with hundreds or thousands of locations, 625 there could be hundreds (or even thousands) of IPsec tunnels 626 terminating at the cloud DC gateway, which is very processing 627 intensive. That is why many cloud operators only allow a limited 628 number of (IPsec) tunnels & bandwidth to each customer. 630 Alternatively, you could use a solution like group encryption where 631 a single IPsec SA is necessary at the GW but the drawback is key 632 distribution and maintenance of a key server, etc. 634 6.2. Poor performance over long distance 636 When enterprise CPEs or gateways are far away from cloud DC gateways 637 or across country/continent boundaries, performance of IPsec tunnels 638 over the public Internet can be problematic and unpredictable. Even 639 though there are many monitoring tools available to measure delay 640 and various performance characteristics of the network, the 641 measurement for paths over the Internet is passive and past 642 measurements may not represent future performance. 644 Many cloud providers can replicate workloads in different available 645 zones. An App instantiated in a cloud DC closest to clients may have 646 to cooperate with another App (or its mirror image) in another 647 region or database server(s) in the on-premises DC. This kind of 648 coordination requires predicable networking behavior/performance 649 among those locations. 651 7. Problems of Using SD-WAN to connect to Cloud DCs 652 SD-WAN lets enterprises augment their current VPN network with cost- 653 effective, readily available Broadband Internet connectivity, 654 enabling some traffic offloading to paths over the Internet 655 according to differentiated, possibly application-based traffic 656 forwarding policies, or when the MPLS VPN connection between the two 657 locations is congested, or otherwise undesirable or unavailable. 659 7.1. More Complexity to Edge Nodes 661 Augmenting transport path is not as simple as it appears. For an 662 enterprise with multiple sites, CPE managed overlay paths among 663 sites requires each CPE to manage all the addresses that local hosts 664 have potential to reach, i.e., map internal VPN addresses to 665 appropriate Overlay paths. This is similar to the complexity of 666 Frame Relay based VPNs, where each CPE needed to maintain mesh 667 routing for all destinations if they were to avoid an extra hop 668 through a hub router. Even with the assistance from a central 669 controller (instead of running a routing protocol) to resolve the 670 mapping between destinations and SD-WAN paths, SD-WAN CPEs are still 671 responsible for routing table maintenance as remote destinations 672 change their attachments, e.g., the dynamic workload in other DCs 673 are de-commissioned or added. 675 In addition, overlay path for interconnecting branch offices are 676 different from connecting to Cloud DCs: 678 - Overlay path interconnecting branch offices usually have two 679 end-points (e.g. CPEs) controlled by one entity (e.g. 680 controllers or management systems operated by the enterprise). 681 - Connecting to Cloud DC may consists of CPEs owned or managed by 682 the enterprise, and the remote end-points being managed or 683 controlled by Cloud DCs. 685 7.2. Edge WAN Port Management 687 An SDWAN edge node can have WAN ports connected to different 688 networks or public internet managed by different operators. 689 There is therefore a need to propagate WAN port property to 690 remote authorized peers in third party network domains in 691 addition to route propagation. Such an exchange cannot happen 692 before communication between peers is properly secured. 694 7.3. Forwarding based on Application 695 Forwarding based on application IDs instead of based on 696 destination IP addresses is often referred to as Application based 697 Segmentation. If the Applications have unique IP addresses, then 698 the Application Based Segmentation can be achieved by propagating 699 different BGP UPDATE messages to different nodes, as described in 700 [BGP-SDWAN-USAGE]. If the Application cannot be uniquely 701 identified by the IP addresses, more work is needed. 703 8. End-to-End Security Concerns for Data Flows 705 When IPsec tunnels established from enterprise on-premises CPEs 706 are terminated at the Cloud DC gateway where the workloads or 707 applications are hosted, some enterprises have concerns regarding 708 traffic to/from their workload being exposed to others behind the 709 data center gateway (e.g., exposed to other organizations that 710 have workloads in the same data center). 711 To ensure that traffic to/from workloads is not exposed to 712 unwanted entities, IPsec tunnels may go all the way to the 713 workload (servers, or VMs) within the DC. 715 9. Requirements for Dynamic Cloud Data Center VPNs 717 In order to address the aforementioned issues, any solution for 718 enterprise VPNs that includes connectivity to dynamic workloads or 719 applications in cloud data centers should satisfy a set of 720 requirements: 722 - The solution should allow enterprises to take advantage of the 723 current state-of-the-art in VPN technology, in both traditional 724 MPLS-based VPNs and IPsec-based VPNs (or any combination 725 thereof) that run over the public Internet. 726 - The solution should not require an enterprise to upgrade all 727 their existing CPEs. 728 - The solution should support scalable IPsec key management among 729 all nodes involved in DC interconnect schemes. 730 - The solution needs to support easy and fast, on-the-fly, VPN 731 connections to dynamic workloads and applications in third 732 party data centers, and easily allow these workloads to migrate 733 both within a data center and between data centers. 734 - Allow VPNs to provide bandwidth and other performance 735 guarantees. 736 - Be a cost-effective solution for enterprises to incorporate 737 dynamic cloud-based applications and workloads into their 738 existing VPN environment. 740 10. Security Considerations 742 The draft discusses security requirements as a part of the problem 743 space, particularly in sections 4, 5, and 8. 745 Solution drafts resulting from this work will address security 746 concerns inherent to the solution(s), including both protocol 747 aspects and the importance (for example) of securing workloads in 748 cloud DCs and the use of secure interconnection mechanisms. 750 11. IANA Considerations 752 This document requires no IANA actions. RFC Editor: Please remove 753 this section before publication. 755 12. References 757 12.1. Normative References 759 12.2. Informative References 761 [RFC2735] B. Fox, et al NHRP Support for Virtual Private 762 networks . Dec. 1999. 764 [RFC8192] S. Hares, et al Interface to Network Security Functions 765 (I2NSF) Problem Statement and Use Cases , July 2017 767 [ITU-T-X1036] ITU-T Recommendation X.1036, Framework for creation, 768 storage, distribution and enforcement of policies for 769 network security , Nov 2007. 771 [RFC6071] S. Frankel and S. Krishnan, IP Security (IPsec) and 772 Internet Key Exchange (IKE) Document Roadmap , Feb 2011. 774 [RFC4364] E. Rosen and Y. Rekhter, BGP/MPLS IP Virtual Private 775 Networks (VPNs) , Feb 2006 777 [RFC4664] L. Andersson and E. Rosen, Framework for Layer 2 Virtual 778 Private Networks (L2VPNs) , Sept 2006. 780 [BGP-SDWAN] L. Dunbar, et al. BGP Extension for SDWAN Overlay 781 Networks , draft-dunbar-idr-bgp-sdwan-overlay-ext-03, 782 work-in-progress, Nov 2018. 784 13. Acknowledgments 786 Many thanks to Alia Atlas, Chris Bowers, Paul Vixie, Paul Ebersman, 787 Timothy Morizot, Ignas Bagdonas, Michael Huang, Liu Yuan Jiao, 788 Katherine Zhao, and Jim Guichard for the discussion and 789 contributions. 791 Authors Addresses 793 Linda Dunbar 794 Futurewei 795 Email: Linda.Dunbar@futurewei.com 797 Andrew G. Malis 798 Independent 799 Email: agmalis@gmail.com 801 Christian Jacquenet 802 Orange 803 Rennes, 35000 804 France 805 Email: Christian.jacquenet@orange.com 807 Mehmet Toy 808 Verizon 809 One Verizon Way 810 Basking Ridge, NJ 07920 811 Email: mehmet.toy@verizon.com