idnits 2.17.1 draft-ietf-rtgwg-net2cloud-problem-statement-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 79 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 643 has weird spacing: '...ith the assis...' -- The document date (February 5, 2020) is 1513 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'SDWAN-BGP-USAGE' is mentioned on line 155, but not defined == Missing Reference: 'BGP-SDWAN-USAGE' is mentioned on line 675, but not defined == Unused Reference: 'ITU-T-X1036' is defined on line 741, but no explicit reference was found in the text == Unused Reference: 'RFC4364' is defined on line 748, but no explicit reference was found in the text == Unused Reference: 'RFC4664' is defined on line 751, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group L. Dunbar 2 Internet Draft Futurewei 3 Intended status: Informational Andy Malis 4 Expires: August 5, 2020 Independent 5 C. Jacquenet 6 Orange 7 M. Toy 8 Verizon 9 February 5, 2020 11 Dynamic Networks to Hybrid Cloud DCs Problem Statement 12 draft-ietf-rtgwg-net2cloud-problem-statement-06 14 Abstract 16 This document describes the problems that enterprises face today 17 when interconnecting their branch offices with dynamic workloads in 18 third party data centers (a.k.a. Cloud DCs). There can be many 19 problems associated with network connecting to or among Clouds, many 20 of which probably are out of the IETF scope. The objective of this 21 document is to identify some of the problems that need additional 22 work in IETF Routing area. Other problems are out of the scope of 23 this document. 25 It examines some of the approaches interconnecting cloud DCs with 26 enterprises' on-premises DCs & branch offices. This document also 27 describes some of the network problems that many enterprises face 28 when they have workloads & applications & data split among different 29 data centers, especially for those enterprises with multiple sites 30 that are already interconnected by VPNs (e.g., MPLS L2VPN/L3VPN). 32 Current operational problems are examined to determine whether there 33 is a need to improve existing protocols or whether a new protocol is 34 necessary to solve them. 36 Status of this Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF), its areas, and its working groups. Note that 43 other groups may also distribute working documents as Internet- 44 Drafts. 46 Internet-Drafts are draft documents valid for a maximum of six 47 months and may be updated, replaced, or obsoleted by other documents 48 at any time. It is inappropriate to use Internet-Drafts as 49 reference material or to cite them other than as "work in progress." 51 The list of current Internet-Drafts can be accessed at 52 http://www.ietf.org/ietf/1id-abstracts.txt 54 The list of Internet-Draft Shadow Directories can be accessed at 55 http://www.ietf.org/shadow.html 57 This Internet-Draft will expire on August 5, 2020. 59 Copyright Notice 61 Copyright (c) 2020 IETF Trust and the persons identified as the 62 document authors. All rights reserved. 64 This document is subject to BCP 78 and the IETF Trust's Legal 65 Provisions Relating to IETF Documents 66 (http://trustee.ietf.org/license-info) in effect on the date of 67 publication of this document. Please review these documents 68 carefully, as they describe your rights and restrictions with 69 respect to this document. Code Components extracted from this 70 document must include Simplified BSD License text as described in 71 Section 4.e of the Trust Legal Provisions and are provided without 72 warranty as described in the Simplified BSD License. 74 Table of Contents 76 1. Introduction...................................................3 77 1.1. Key Characteristics of Cloud Services:....................3 78 1.2. Connecting to Cloud Services..............................3 79 1.3. The role of SD-WAN in connecting to Cloud Services........4 80 2. Definition of terms............................................5 81 3. High Level Issues of Connecting to Multi-Cloud.................6 82 3.1. Security Issues...........................................6 83 3.2. Authorization and Identity Management.....................6 84 3.3. API abstraction...........................................7 85 3.4. DNS for Cloud Resources...................................8 86 3.5. NAT for Cloud Services....................................8 87 3.6. Cloud Discovery...........................................9 88 4. Interconnecting Enterprise Sites with Cloud DCs................9 89 4.1. Sites to Cloud DC........................................10 90 4.2. Inter-Cloud Interconnection..............................12 91 5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs...13 92 6. Problem with using IPsec tunnels to Cloud DCs.................15 93 6.1. Scaling Issues with IPsec Tunnels........................15 94 6.2. Poor performance over long distance......................15 95 7. Problems of Using SD-WAN to connect to Cloud DCs..............16 96 7.1. More Complexity to Edge Nodes............................16 97 7.2. Edge WAN Port Management.................................17 98 7.3. Forwarding based on Application..........................17 99 8. End-to-End Security Concerns for Data Flows...................17 100 9. Requirements for Dynamic Cloud Data Center VPNs...............17 101 10. Security Considerations......................................18 102 11. IANA Considerations..........................................18 103 12. References...................................................18 104 12.1. Normative References....................................18 105 12.2. Informative References..................................19 106 13. Acknowledgments..............................................19 108 1. Introduction 110 1.1. Key Characteristics of Cloud Services: 112 Key characteristics of Cloud Services are on-demand, scalable, 113 highly available, and usage-based billing. Cloud Services, such as, 114 compute, storage, network functions (most likely virtual), third 115 party managed applications, etc. are usually hosted and managed by third parties Cloud Operators. Here are some examples of Cloud network 116 functions: Virtual Firewall services, Virtual private network 117 services, Virtual PBX services including voice and video 118 conferencing systems, etc. Cloud Data Center (DC) is shared 119 infrastructure that hosts the Cloud Services to many customers. 121 1.2. Connecting to Cloud Services 123 With the advent of widely available third-party cloud DCs and 124 services in diverse geographic locations and the advancement of 125 tools for monitoring and predicting application behaviors, it is 126 very attractive for enterprises to instantiate applications and 127 workloads in locations that are geographically closest to their end- 128 users. Such proximity can improve end-to-end latency and overall 129 user experience. Conversely, an enterprise can easily shutdown 130 applications and workloads whenever end-users are in motion (thereby 131 modifying the networking connection of subsequently relocated 132 applications and workloads). In addition, enterprises may wish to 133 take advantage of more and more business applications offered by 134 cloud operators. 136 The networks that interconnect hybrid cloud DCs must address the 137 following requirements: 138 - High availability to access all workloads in the desired cloud 139 DCs. 140 Many enterprises include cloud in their disaster recovery 141 strategy, such as enforcing periodic backup policies within the 142 cloud, or running backup applications in the Cloud. 144 - Global reachability from different geographical zones, thereby 145 facilitating the proximity of applications as a function of the 146 end users' location, to improve latency. 147 - Elasticity: prompt connection to newly instantiated 148 applications at Cloud DCs when usages increase and prompt 149 release of connection after applications at locations being 150 removed when demands change. 151 - Scalable security management. 153 1.3. The role of SD-WAN in connecting to Cloud Services 155 Some of the characteristics of SD-WAN [SDWAN-BGP-USAGE], such as 156 network augmentation and forwarding based on application IDs instead 157 of based on destination IP addresses, are very essential for 158 connecting to on-demand Cloud services. 160 Issues associated with using SD-WAN for connecting to Cloud services 161 are also discussed in this document. 163 2. Definition of terms 165 Cloud DC: Third party Data Centers that usually host applications 166 and workload owned by different organizations or 167 tenants. 169 Controller: Used interchangeably with SD-WAN controller to manage 170 SD-WAN overlay path creation/deletion and monitoring the 171 path conditions between two or more sites. 173 DSVPN: Dynamic Smart Virtual Private Network. DSVPN is a secure 174 network that exchanges data between sites without 175 needing to pass traffic through an organization's 176 headquarter virtual private network (VPN) server or 177 router. 179 Heterogeneous Cloud: applications and workloads split among Cloud 180 DCs owned or managed by different operators. 182 Hybrid Clouds: Hybrid Clouds refers to an enterprise using its own 183 on-premises DCs in addition to Cloud services provided 184 by one or more cloud operators. (e.g. AWS, Azure, 185 Google, Salesforces, SAP, etc). 187 SD-WAN: Software Defined Wide Area Network. In this document, 188 "SD-WAN" refers to the solutions of pooling WAN 189 bandwidth from multiple underlay networks to get better 190 WAN bandwidth management, visibility & control. When the 191 underlay networks are private networks, traffic can 192 traverse without additional encryption; when the 193 underlay networks are public, such as Internet, some 194 traffic needs to be encrypted when traversing through 195 (depending on user provided policies). 197 VPC: Virtual Private Cloud is a virtual network dedicated to 198 one client account. It is logically isolated from other 199 virtual networks in a Cloud DC. Each client can launch 200 his/her desired resources, such as compute, storage, or 201 network functions into his/her VPC. Most Cloud 202 operators' VPCs only support private addresses, some 203 support IPv4 only, others support IPv4/IPv6 dual stack. 205 3. High Level Issues of Connecting to Multi-Cloud 207 There are many problems associated with connecting to hybrid Cloud 208 Services, many of which are out of the IETF scope. This section is 209 to identify some of the high level problems that can be addressed by 210 IETF, especially by Routing area. Other problems are out of the 211 scope of this document. By no means has this section covered all 212 problems for connecting to Hybrid Cloud Services, e.g. difficulty in 213 managing cloud spending is not discussed here. 215 3.1. Security Issues 217 Cloud Services is built upon shared infrastructure, therefore not 218 secure by nature. Security has been a primary, and valid, concern 219 from the start of cloud computing: you are unable to see the exact 220 location where your data is stored or being processed. Headlines 221 highlighting data breaches, compromised credentials, and broken 222 authentication, hacked interfaces and APIs, account hijacking 223 haven't helped alleviate concerns. 225 Secure user identity management, authentication, and access control 226 mechanisms are important. Developing appropriate security 227 measurements can enhance the confidence needed by enterprises to 228 fully take advantage of Cloud Services. 230 3.2. Authorization and Identity Management 232 One of the more prominent challenges for Cloud Services is Identity 233 Management and Authorization. The Authorization not only includes 234 user authorization, but also the authorization of API calls by 235 applications from different Cloud DCs managed by different Cloud 236 Operators. In addition, there are authorization for Workload 237 Migration, Data Migration, and Workload Management. 239 There are many types of users in cloud environments, e.g. end users 240 for accessing applications hosted in Cloud DCs, Cloud-resource users 241 who are responsible for setting permissions for the resources based 242 on roles, access lists, IP addresses, domains, etc. 244 There are many types of Cloud authorizations: including MAC 245 (Mandatory Access Control) - where each app owns individual access 246 permissions, DAC (Discretionary Access Control) - where each app 247 requests permissions from an external permissions app, RBAC (Role- 248 based Access Control) - where the authorization service owns roles 249 with different privileges on the cloud service, and ABAC (Attribute- 250 based Access Control) - where access is based on request attributes 251 and policies. 253 IETF hasn't yet developed comprehensive specification for Identity 254 management and data models for Cloud Authorizations. 256 3.3. API abstraction 258 Different Cloud Operators have different APIs to access their Cloud 259 resources, security functions, the NAT, etc. 261 It is difficult to move applications built by one Cloud operator's 262 APIs to another. However, it is highly desirable to have a single 263 and consistent way to manage the networks and respective security 264 policies for interconnecting applications hosted in different Cloud 265 DCs. 267 The desired property would be having a single network fabric to 268 which different Cloud DCs and enterprise's multiple sites can be 269 attached or detached, with a common interface for setting desired 270 policies. 272 The difficulty of connecting applications in different Clouds might 273 be stemmed from the fact that they are direct competitors. Usually 274 traffic flow out of Cloud DCs incur charges. Therefore, direct 275 communications between applications in different Cloud DCs can be 276 more expensive than intra Cloud communications. 278 It is desirable to have a common API shim layer or abstraction for 279 different Cloud providers to make it easier to move applications 280 from one Cloud DC to another. 282 3.4. DNS for Cloud Resources 284 DNS name resolution is essential for on-premises and cloud-based 285 resources. For customers with hybrid workloads, which include on- 286 premises and cloud-based resources, extra steps are necessary to 287 configure DNS to work seamlessly across both environments. 289 Cloud operators have their own DNS to resolve resources within their 290 Cloud DCs and to well-known public domains. Cloud's DNS can be 291 configured to forward queries to customer managed authoritative DNS 292 servers hosted on-premises, and to respond to DNS queries forwarded 293 by on-premises DNS servers. 295 For enterprises utilizing Cloud services by different cloud 296 operators, it is necessary to establish policies and rules on 297 how/where to forward DNS queries to. When applications in one Cloud 298 need to communication with applications hosted in another Cloud, 299 there could be DNS queries from one Cloud DC being forwarded to the 300 enterprise's on premise DNS, which in turn be forwarded to the DNS 301 service in another Cloud. Needless to say, configuration can be 302 complex depending on the application communication patterns. 304 3.5. NAT for Cloud Services 306 Cloud resources, such as VM instances, are usually assigned with 307 private IP addresses. By configuration, some private subnets can 308 have the NAT function to reach out to external network and some 309 private subnets are internal to Cloud only. 311 Different Cloud operators support different levels of NAT functions. 312 For example, AWS NAT Gateway does not currently support connections 313 towards, or from VPC Endpoints, VPN, AWS Direct Connect, or VPC 314 Peering. https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc- 315 nat-gateway.html#nat-gateway-other-services. AWS Direct 316 Connect/VPN/VPC Peering does not currently support any NAT 317 functionality. 319 Google's Cloud NAT allows Google Cloud virtual machine (VM) 320 instances without external IP addresses and private Google 321 Kubernetes Engine (GKE) clusters to connect to the Internet. Cloud 322 NAT implements outbound NAT in conjunction with a default route to 323 allow instances to reach the Internet. It does not implement inbound 324 NAT. Hosts outside of VPC network can only respond to established 325 connections initiated by instances inside the Google Cloud; they 326 cannot initiate their own, new connections to Cloud instances via 327 NAT. 329 For enterprises with applications running in different Cloud DCs, 330 proper configuration of NAT have to be performed in Cloud DC and in 331 their own on-premise DC. 333 3.6. Cloud Discovery 335 One of the concerns of using Cloud services is not aware where the 336 resource is actually located, especially Cloud operators can move 337 application instances from one place to another. When applications 338 in Cloud communicate with on-premise applications, it may not be 339 clear where the Cloud applications are located or to which VPCs they 340 belong. 342 It is highly desirable to have tools to discover cloud services in 343 much the same way as you would discover your on-premises 344 infrastructure. A significant difference is that cloud discovery 345 uses the cloud vendor's API to extract data on your cloud services, 346 rather than the direct access used in scanning your on-premises 347 infrastructure. 349 Standard data models, APIs or tools can alleviate concerns of 350 enterprise utilizing Cloud Resources, e.g. having a Cloud service 351 scan that connects to the API of the cloud provider and collects 352 information directly. 354 4. Interconnecting Enterprise Sites with Cloud DCs 356 Considering that many enterprises already have existing VPNs (e.g. 357 MPLS based L2VPN or L3VPN) interconnecting branch offices & on- 358 premises data centers, connecting to Cloud services will be mixed of 359 different types of networks. When an enterprise's existing VPN 360 service providers do not have direct connections to the 361 corresponding cloud DCs that the enterprise prefers to use, the 362 enterprise has to face additional infrastructure and operational 363 costs to utilize Cloud services. 365 4.1. Sites to Cloud DC 367 Most Cloud operators offer some type of network gateway through 368 which an enterprise can reach their workloads hosted in the Cloud 369 DCs. AWS (Amazon Web Services) offers the following options to reach 370 workloads in AWS Cloud DCs: 372 - AWS Internet gateway allows communication between instances in 373 AWS VPC and the internet. 374 - AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are 375 established between an enterprise's own gateway and AWS vGW, so 376 that the communications between those gateways can be secured 377 from the underlay (which might be the public Internet). 378 - AWS Direct Connect, which allows enterprises to purchase direct 379 connect from network service providers to get a private leased 380 line interconnecting the enterprises gateway(s) and the AWS 381 Direct Connect routers. In addition, an AWS Transit Gateway can 382 be used to interconnect multiple VPCs in different Availability 383 Zones. AWS Transit Gateway acts as a hub that controls how 384 traffic is forwarded among all the connected networks which act 385 like spokes. 387 Microsoft's ExpressRoute allows extension of a private network to 388 any of the Microsoft cloud services, including Azure and Office365. 389 ExpressRoute is configured using Layer 3 routing. Customers can opt 390 for redundancy by provisioning dual links from their location to two 391 Microsoft Enterprise edge routers (MSEEs) located within a third- 392 party ExpressRoute peering location. The BGP routing protocol is 393 then setup over WAN links to provide redundancy to the cloud. This 394 redundancy is maintained from the peering data center into 395 Microsoft's cloud network. 397 Google's Cloud Dedicated Interconnect offers similar network 398 connectivity options as AWS and Microsoft. One distinct difference, 399 however, is that Google's service allows customers access to the 400 entire global cloud network by default. It does this by connecting 401 your on-premises network with the Google Cloud using BGP and Google 402 Cloud Routers to provide optimal paths to the different regions of 403 the global cloud infrastructure. 405 Figure below shows an example of some of a tenant's workloads are 406 accessible via a virtual router connected by AWS Internet Gateway; 407 some are accessible via AWS vGW, and others are accessible via AWS 408 Direct Connect. 410 Different types of access require different level of security 411 functions. Sometimes it is not visible to end customers which type 412 of network access is used for a specific application instance. To 413 get better visibility, separate virtual routers (e.g. vR1 & vR2) can 414 be deployed to differentiate traffic to/from different cloud GWs. It 415 is important for some enterprises to be able to observe the specific 416 behaviors when connected by different connections. 418 Customer Gateway can be customer owned router or ports physically 419 connected to AWS Direct Connect GW. 420 +------------------------+ 421 | ,---. ,---. | 422 | (TN-1 ) ( TN-2)| 423 | `-+-' +---+ `-+-' | 424 | +----|vR1|----+ | 425 | ++--+ | 426 | | +-+----+ 427 | | /Internet\ For External 428 | +-------+ Gateway +---------------------- 429 | \ / to reach via Internet 430 | +-+----+ 431 | | 432 | ,---. ,---. | 433 | (TN-1 ) ( TN-2)| 434 | `-+-' +---+ `-+-' | 435 | +----|vR2|----+ | 436 | ++--+ | 437 | | +-+----+ 438 | | / virtual\ For IPsec Tunnel 439 | +-------+ Gateway +---------------------- 440 | | \ / termination 441 | | +-+----+ 442 | | | 443 | | +-+----+ +------+ 444 | | / \ For Direct /customer\ 445 | +-------+ Gateway +----------+ gateway | 446 | \ / Connect \ / 447 | +-+----+ +------+ 448 | | 449 +------------------------+ 451 Figure 1: Examples of Multiple Cloud DC connections. 453 4.2. Inter-Cloud Interconnection 455 The connectivity options to Cloud DCs described in the previous 456 section are for reaching Cloud providers' DCs, but not between cloud 457 DCs. When applications in AWS Cloud need to communicate with 458 applications in Azure, today's practice requires a third-party 459 gateway (physical or virtual) to interconnect the AWS's Layer 2 460 DirectConnect path with Azure's Layer 3 ExpressRoute. 462 Enterprises can also instantiate their own virtual routers in 463 different Cloud DCs and administer IPsec tunnels among them, which 464 by itself is not a trivial task. Or by leveraging open source VPN 465 software such as strongSwan, you create an IPSec connection to the 466 Azure gateway using a shared key. The StrongSwan instance within AWS 467 not only can connect to Azure but can also be used to facilitate 468 traffic to other nodes within the AWS VPC by configuring forwarding 469 and using appropriate routing rules for the VPC. 471 Most Cloud operators, such as AWS VPC or Azure VNET, use non- 472 globally routable CIDR from private IPv4 address ranges as specified 473 by RFC1918. To establish IPsec tunnel between two Cloud DCs, it is 474 necessary to exchange Public routable addresses for applications in 475 different Cloud DCs. [BGP-SDWAN] describes one method. Other methods 476 are worth exploring. 478 In summary, here are some approaches, available now (which might 479 change in the future), to interconnect workloads among different 480 Cloud DCs: 482 a) Utilize Cloud DC provided inter/intra-cloud connectivity 483 services (e.g., AWS Transit Gateway) to connect workloads 484 instantiated in multiple VPCs. Such services are provided with 485 the cloud gateway to connect to external networks (e.g., AWS 486 DirectConnect Gateway). 487 b) Hairpin all traffic through the customer gateway, meaning all 488 workloads are directly connected to the customer gateway, so 489 that communications among workloads within one Cloud DC must 490 traverse through the customer gateway. 491 c) Establish direct tunnels among different VPCs (AWS' Virtual 492 Private Clouds) and VNET (Azure's Virtual Networks) via 493 client's own virtual routers instantiated within Cloud DCs. 494 DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN 495 (Dynamic Smart VPN) techniques can be used to establish direct 496 Multi-point-to-Point or multi-point-to multi-point tunnels 497 among those client's own virtual routers. 499 Approach a) usually does not work if Cloud DCs are owned and managed 500 by different Cloud providers. 502 Approach b) creates additional transmission delay plus incurring 503 cost when exiting Cloud DCs. 505 For the Approach c), DMVPN or DSVPN use NHRP (Next Hop Resolution 506 Protocol) [RFC2735] so that spoke nodes can register their IP 507 addresses & WAN ports with the hub node. The IETF ION 508 (Internetworking over NBMA (non-broadcast multiple access) WG 509 standardized NHRP for connection-oriented NBMA network (such as ATM) 510 network address resolution more than two decades ago. 512 There are many differences between virtual routers in Public Cloud 513 DCs and the nodes in an NBMA network. NHRP cannot be used for 514 registering virtual routers in Cloud DCs unless an extension of such 515 protocols is developed for that purpose, e.g. taking NAT or dynamic 516 addresses into consideration. Therefore, DMVPN and/or DSVPN cannot 517 be used directly for connecting workloads in hybrid Cloud DCs. 519 Other protocols such as BGP can be used, as described in [BGP- 520 SDWAN]. 522 5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs 524 Traditional MPLS-based VPNs have been widely deployed as an 525 effective way to support businesses and organizations that require 526 network performance and reliability. MPLS shifted the burden of 527 managing a VPN service from enterprises to service providers. The 528 CPEs attached to MPLS VPNs are also simpler and less expensive, 529 because they do not need to manage routes to remote sites; they 530 simply pass all outbound traffic to the MPLS VPN PEs to which the 531 CPEs are attached (albeit multi-homing scenarios require more 532 processing logic on CPEs). MPLS has addressed the problems of 533 scale, availability, and fast recovery from network faults, and 534 incorporated traffic-engineering capabilities. 536 However, traditional MPLS-based VPN solutions are sub-optimized for 537 connecting end-users to dynamic workloads/applications in cloud DCs 538 because: 540 - The Provider Edge (PE) nodes of the enterprise's VPNs might not 541 have direct connections to third party cloud DCs that are used 542 for hosting workloads with the goal of providing an easy access 543 to enterprises' end-users. 545 - It takes some time to deploy provider edge (PE) routers at new 546 locations. When enterprise's workloads are changed from one 547 cloud DC to another (i.e., removed from one DC and re- 548 instantiated to another location when demand changes), the 549 enterprise branch offices need to be connected to the new cloud 550 DC, but the network service provider might not have PEs located 551 at the new location. 553 One of the main drivers for moving workloads into the cloud is 554 the widely available cloud DCs at geographically diverse 555 locations, where apps can be instantiated so that they can be 556 as close to their end-users as possible. When the user base 557 changes, the applications may be migrated to a new cloud DC 558 location closest to the new user base. 560 - Most of the cloud DCs do not expose their internal networks. An 561 enterprise with a hybrid cloud deployment can use an MPLS-VPN 562 to connect to a Cloud provider at multiple locations. The 563 connection locations often correspond to gateways of different 564 Cloud DC locations from the Cloud provider. The different 565 Cloud DCs are interconnected by the Cloud provider's own 566 internal network. At each connection location (gateway), the 567 Cloud provider uses BGP to advertise all of the prefixes in the 568 enterprise's VPC, regardless of which Cloud DC a given prefix 569 is actually in. This can result in inefficient routing for the 570 end-to-end data path. 572 Another roadblock is the lack of a standard way to express and 573 enforce consistent security policies for workloads that not only use 574 virtual addresses, but in which are also very likely hosted in 575 different locations within the Cloud DC [RFC8192]. The current VPN 576 path computation and bandwidth allocation schemes may not be 577 flexible enough to address the need for enterprises to rapidly 578 connect to dynamically instantiated (or removed) workloads and 579 applications regardless of their location/nature (i.e., third party 580 cloud DCs). 582 6. Problem with using IPsec tunnels to Cloud DCs 583 As described in the previous section, many Cloud operators expose 584 their gateways for external entities (which can be enterprises 585 themselves) to directly establish IPsec tunnels. Enterprises can 586 also instantiate virtual routers within Cloud DCs to connect to 587 their on-premises devices via IPsec tunnels. 589 6.1. Scaling Issues with IPsec Tunnels 591 If there is only one enterprise location that needs to reach the 592 Cloud DC, an IPsec tunnel is a very convenient solution. 594 However, many medium-to-large enterprises have multiple sites and 595 multiple data centers. For multiple sites to communicate with 596 workloads and apps hosted in cloud DCs, Cloud DC gateways have to 597 maintain many IPsec tunnels to all those locations. In addition, 598 each of those IPsec Tunnels requires pair-wise periodic key 599 refreshment. For a company with hundreds or thousands of locations, 600 there could be hundreds (or even thousands) of IPsec tunnels 601 terminating at the cloud DC gateway, which is very processing 602 intensive. That is why many cloud operators only allow a limited 603 number of (IPsec) tunnels & bandwidth to each customer. 605 Alternatively, you could use a solution like group encryption where 606 a single IPsec SA is necessary at the GW but the drawback is key 607 distribution and maintenance of a key server, etc. 609 6.2. Poor performance over long distance 611 When enterprise CPEs or gateways are far away from cloud DC gateways 612 or across country/continent boundaries, performance of IPsec tunnels 613 over the public Internet can be problematic and unpredictable. Even 614 though there are many monitoring tools available to measure delay 615 and various performance characteristics of the network, the 616 measurement for paths over the Internet is passive and past 617 measurements may not represent future performance. 619 Many cloud providers can replicate workloads in different available 620 zones. An App instantiated in a cloud DC closest to clients may have 621 to cooperate with another App (or its mirror image) in another 622 region or database server(s) in the on-premises DC. This kind of 623 coordination requires predicable networking behavior/performance 624 among those locations. 626 7. Problems of Using SD-WAN to connect to Cloud DCs 627 SD-WAN lets enterprises augment their current VPN network with cost- 628 effective, readily available Broadband Internet connectivity, 629 enabling some traffic offloading to paths over the Internet 630 according to differentiated, possibly application-based traffic 631 forwarding policies, or when the MPLS VPN connection between the two 632 locations is congested, or otherwise undesirable or unavailable. 634 7.1. More Complexity to Edge Nodes 636 Augmenting transport path is not as simple as it appears. For an 637 enterprise with multiple sites, CPE managed overlay paths among 638 sites requires each CPE to manage all the addresses that local hosts 639 have potential to reach, i.e., map internal VPN addresses to 640 appropriate Overlay paths. This is similar to the complexity of 641 Frame Relay based VPNs, where each CPE needed to maintain mesh 642 routing for all destinations if they were to avoid an extra hop 643 through a hub router. Even with the assistance from a central 644 controller (instead of running a routing protocol) to resolve the 645 mapping between destinations and SD-WAN paths, SD-WAN CPEs are still 646 responsible for routing table maintenance as remote destinations 647 change their attachments, e.g., the dynamic workload in other DCs 648 are de-commissioned or added. 650 In addition, overlay path for interconnecting branch offices are 651 different from connecting to Cloud DCs: 653 - Overlay path interconnecting branch offices usually have two 654 end-points (e.g. CPEs) controlled by one entity (e.g. 655 controllers or management systems operated by the enterprise). 656 - Connecting to Cloud DC may consists of CPEs owned or managed by 657 the enterprise, and the remote end-points being managed or 658 controlled by Cloud DCs. 660 7.2. Edge WAN Port Management 662 An SDWAN edge node can have WAN ports connected to different 663 networks or public internet managed by different operators. 664 There is therefore a need to propagate WAN port property to 665 remote authorized peers in third party network domains in 666 addition to route propagation. Such an exchange cannot happen 667 before communication between peers is properly secured. 669 7.3. Forwarding based on Application 670 Forwarding based on application IDs instead of based on 671 destination IP addresses is often referred to as Application based 672 Segmentation. If the Applications have unique IP addresses, then 673 the Application Based Segmentation can be achieved by propagating 674 different BGP UPDATE messages to different nodes, as described in 675 [BGP-SDWAN-USAGE]. If the Application cannot be uniquely 676 identified by the IP addresses, more work is needed. 678 8. End-to-End Security Concerns for Data Flows 680 When IPsec tunnels established from enterprise on-premises CPEs 681 are terminated at the Cloud DC gateway where the workloads or 682 applications are hosted, some enterprises have concerns regarding 683 traffic to/from their workload being exposed to others behind the 684 data center gateway (e.g., exposed to other organizations that 685 have workloads in the same data center). 686 To ensure that traffic to/from workloads is not exposed to 687 unwanted entities, IPsec tunnels may go all the way to the 688 workload (servers, or VMs) within the DC. 690 9. Requirements for Dynamic Cloud Data Center VPNs 692 In order to address the aforementioned issues, any solution for 693 enterprise VPNs that includes connectivity to dynamic workloads or 694 applications in cloud data centers should satisfy a set of 695 requirements: 697 - The solution should allow enterprises to take advantage of the 698 current state-of-the-art in VPN technology, in both traditional 699 MPLS-based VPNs and IPsec-based VPNs (or any combination 700 thereof) that run over the public Internet. 701 - The solution should not require an enterprise to upgrade all 702 their existing CPEs. 703 - The solution should support scalable IPsec key management among 704 all nodes involved in DC interconnect schemes. 705 - The solution needs to support easy and fast, on-the-fly, VPN 706 connections to dynamic workloads and applications in third 707 party data centers, and easily allow these workloads to migrate 708 both within a data center and between data centers. 709 - Allow VPNs to provide bandwidth and other performance 710 guarantees. 711 - Be a cost-effective solution for enterprises to incorporate 712 dynamic cloud-based applications and workloads into their 713 existing VPN environment. 715 10. Security Considerations 717 The draft discusses security requirements as a part of the problem 718 space, particularly in sections 4, 5, and 8. 720 Solution drafts resulting from this work will address security 721 concerns inherent to the solution(s), including both protocol 722 aspects and the importance (for example) of securing workloads in 723 cloud DCs and the use of secure interconnection mechanisms. 725 11. IANA Considerations 727 This document requires no IANA actions. RFC Editor: Please remove 728 this section before publication. 730 12. References 732 12.1. Normative References 733 12.2. Informative References 735 [RFC2735] B. Fox, et al "NHRP Support for Virtual Private 736 networks". Dec. 1999. 738 [RFC8192] S. Hares, et al "Interface to Network Security Functions 739 (I2NSF) Problem Statement and Use Cases", July 2017 741 [ITU-T-X1036] ITU-T Recommendation X.1036, "Framework for creation, 742 storage, distribution and enforcement of policies for 743 network security", Nov 2007. 745 [RFC6071] S. Frankel and S. Krishnan, "IP Security (IPsec) and 746 Internet Key Exchange (IKE) Document Roadmap", Feb 2011. 748 [RFC4364] E. Rosen and Y. Rekhter, "BGP/MPLS IP Virtual Private 749 Networks (VPNs)", Feb 2006 751 [RFC4664] L. Andersson and E. Rosen, "Framework for Layer 2 Virtual 752 Private Networks (L2VPNs)", Sept 2006. 754 [BGP-SDWAN] L. Dunbar, et al. "BGP Extension for SDWAN Overlay 755 Networks", draft-dunbar-idr-bgp-sdwan-overlay-ext-03, 756 work-in-progress, Nov 2018. 758 13. Acknowledgments 760 Many thanks to Alia Atlas, Chris Bowers, Ignas Bagdonas, Michael 761 Huang, Liu Yuan Jiao, Katherine Zhao, and Jim Guichard for the 762 discussion and contributions. 764 Authors' Addresses 766 Linda Dunbar 767 Futurewei 768 Email: Linda.Dunbar@futurewei.com 770 Andrew G. Malis 771 Independent 772 Email: agmalis@gmail.com 774 Christian Jacquenet 775 Orange 776 Rennes, 35000 777 France 778 Email: Christian.jacquenet@orange.com 780 Mehmet Toy 781 Verizon 782 One Verizon Way 783 Basking Ridge, NJ 07920 784 Email: mehmet.toy@verizon.com