idnits 2.17.1 draft-ietf-rtgwg-net2cloud-problem-statement-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 8 instances of too long lines in the document, the longest one being 85 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 1, 2019) is 1632 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'ITU-T-X1036' is defined on line 764, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group L. Dunbar 2 Internet Draft Futurewei 3 Intended status: Informational Andy Malis 4 Expires: March 2020 Independent 5 C. Jacquenet 6 Orange 7 M. Toy 8 Verizon 9 November 1, 2019 11 Dynamic Networks to Hybrid Cloud DCs Problem Statement 12 draft-ietf-rtgwg-net2cloud-problem-statement-05 14 Abstract 16 This document describes the problems that enterprises face today 17 when interconnecting their branch offices with dynamic workloads in 18 third party data centers (a.k.a. Cloud DCs). There can be many 19 problems associated with network connecting to or among Clouds, many 20 of which probably are out of the IETF scope. The objective of this 21 document is to identify some of the problems that need additional 22 work in IETF Routing area. Other problems are out of the scope of 23 this document. 25 It examines some of the approaches interconnecting cloud DCs with 26 enterprises' on-premises DCs & branch offices. This document also 27 describes some of the network problems that many enterprises face 28 when they have workloads & applications & data split among different 29 data centers, especially for those enterprises with multiple sites 30 that are already interconnected by VPNs (e.g., MPLS L2VPN/L3VPN). 32 Current operational problems are examined to determine whether there 33 is a need to improve existing protocols or whether a new protocol is 34 necessary to solve them. 36 Status of this Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF), its areas, and its working groups. Note that 43 other groups may also distribute working documents as Internet- 44 Drafts. 46 Internet-Drafts are draft documents valid for a maximum of six 47 months and may be updated, replaced, or obsoleted by other documents 48 at any time. It is inappropriate to use Internet-Drafts as 49 reference material or to cite them other than as "work in progress." 51 The list of current Internet-Drafts can be accessed at 52 http://www.ietf.org/ietf/1id-abstracts.txt 54 The list of Internet-Draft Shadow Directories can be accessed at 55 http://www.ietf.org/shadow.html 57 This Internet-Draft will expire on April 1, 2009. 59 Copyright Notice 61 Copyright (c) 2019 IETF Trust and the persons identified as the 62 document authors. All rights reserved. 64 This document is subject to BCP 78 and the IETF Trust's Legal 65 Provisions Relating to IETF Documents 66 (http://trustee.ietf.org/license-info) in effect on the date of 67 publication of this document. Please review these documents 68 carefully, as they describe your rights and restrictions with 69 respect to this document. Code Components extracted from this 70 document must include Simplified BSD License text as described in 71 Section 4.e of the Trust Legal Provisions and are provided without 72 warranty as described in the Simplified BSD License. 74 Table of Contents 76 1. Introduction...................................................3 77 1.1. On the evolution of Cloud DC connectivity.................3 78 1.2. The role of SD-WAN techniques in Cloud DC connectivity....4 79 2. Definition of terms............................................4 80 3. Interconnecting Enterprise Sites with Cloud DCs................5 81 3.1. Multiple connections to workloads in a Cloud DC...........6 82 3.2. Interconnect Private and Public Cloud DCs.................7 83 3.3. Desired Properties for Networks that interconnect Hybrid 84 Clouds.........................................................8 85 4. Multiple Clouds Interconnection................................9 86 4.1. Multi-Cloud Interconnection...............................9 87 4.2. Desired Properties for Multi-Cloud Interconnection.......11 88 5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs...11 89 6. Problem with using IPsec tunnels to Cloud DCs.................13 90 6.1. Complexity of multi-point any-to-any interconnection.....13 91 6.2. Poor performance over long distance......................14 92 6.3. Scaling Issues with IPsec Tunnels........................14 93 7. Problems of Using SD-WAN to connect to Cloud DCs..............15 94 7.1. SD-WAN among branch offices vs. interconnect to Cloud DCs15 95 8. End-to-End Security Concerns for Data Flows...................18 96 9. Requirements for Dynamic Cloud Data Center VPNs...............18 97 10. Security Considerations......................................19 98 11. IANA Considerations..........................................19 99 12. References...................................................19 100 12.1. Normative References....................................19 101 12.2. Informative References..................................19 102 13. Acknowledgments..............................................20 104 1. Introduction 106 1.1. On the evolution of Cloud DC connectivity 108 The ever-increasing use of cloud applications for communication 109 services change the way corporate business works and shares 110 information. Such cloud applications use resources hosted in third 111 party DCs that also host services for other customers. 113 With the advent of widely available third-party cloud DCs in diverse 114 geographic locations and the advancement of tools for monitoring and 115 predicting application behaviors, it is technically feasible for 116 enterprises to instantiate applications and workloads in locations 117 that are geographically closest to their end-users. Such proximity 118 improves end-to-end latency and overall user experience. Conversely, 119 an enterprise can easily shutdown applications and workloads 120 whenever end-users are in motion (thereby modifying the networking 121 connection of subsequently relocated applications and workloads). In 122 addition, an enterprise may wish to take advantage of more and more 123 business applications offered by third party private cloud DCs. 125 Most of those enterprise branch offices & on-premises data centers 126 are already connected via VPNs, such as MPLS-based L2VPNs and 127 L3VPNs. Then connecting to the cloud-hosted resources may not be 128 straightforward if the provider of the VPN service does not have 129 direct connections to the corresponding cloud DCs. Under those 130 circumstances, the enterprise can upgrade the CPEs deployed in its 131 various premises to utilize SD-WAN techniques to reach cloud 132 resources (without any assistance from the VPN service provider), or 133 wait for their VPN service provider to make new agreements with data 134 center providers to connect to the cloud resources. Either way has 135 additional infrastructure and operational costs. 137 In addition, more enterprises are moving towards hybrid cloud DCs, 138 i.e. owned or operated by different Cloud operators, to maximize the 139 benefits of geographical proximity, elasticity and special features 140 offered by different cloud DCs. 142 1.2. The role of SD-WAN techniques in Cloud DC connectivity 144 This document discusses the issues associated with connecting 145 enterprise's workloads/applications instantiated in multiple third- 146 party data centers (a.k.a. Cloud DCs) and its on-prem data centers. 147 Very often, the actual Cloud DCs that host the 148 workloads/applications can be transient. 150 SD-WAN, initially launched to maximize bandwidths between locations 151 by aggregating multiple paths managed by different service 152 providers, has expanded to include flexible, on-demand, application- 153 based connections established over any networks to access dynamic 154 workloads in Cloud DCs. 156 Therefore, this document discusses the use of SD-WAN techniques to 157 improve enterprise-to-cloud DC and cloud DC-to-cloud DC 158 connectivity. 160 2. Definition of terms 162 Cloud DC: Third party Data Centers that usually host applications 163 and workload owned by different organizations or 164 tenants. 166 Controller: Used interchangeably with SD-WAN controller to manage 167 SD-WAN overlay path creation/deletion and monitoring the 168 path conditions between two or more sites. 170 DSVPN: Dynamic Smart Virtual Private Network. DSVPN is a secure 171 network that exchanges data between sites without 172 needing to pass traffic through an organization's 173 headquarter virtual private network (VPN) server or 174 router. 176 Heterogeneous Cloud: applications and workloads split among Cloud 177 DCs owned or managed by different operators. 179 Hybrid Clouds: Hybrid Clouds refers to an enterprise using its own 180 on-premises DCs in addition to Cloud services provided 181 by one or more cloud operators. (e.g. AWS, Azure, 182 Google, Salesforces, SAP, etc). 184 SD-WAN: Software Defined Wide Area Network. In this document, 185 "SD-WAN" refers to the solutions of pooling WAN 186 bandwidth from multiple underlay networks to get better 187 WAN bandwidth management, visibility & control. When the 188 underlay networks are private networks, traffic can 189 traverse without additional encryption; when the 190 underlay networks are public, such as Internet, some 191 traffic needs to be encrypted when traversing through 192 (depending on user provided policies). 194 VPC: Virtual Private Cloud is a virtual network dedicated to 195 one client account. It is logically isolated from other 196 virtual networks in a Cloud DC. Each client can launch 197 his/her desired resources, such as compute, storage, or 198 network functions into his/her VPC. Most Cloud 199 operators' VPCs only support private addresses, some 200 support IPv4 only, others support IPv4/IPv6 dual stack. 202 3. Interconnecting Enterprise Sites with Cloud DCs 203 3.1. Multiple connections to workloads in a Cloud DC 205 Most Cloud operators offer some type of network gateway through 206 which an enterprise can reach their workloads hosted in the Cloud 207 DCs. For example, AWS (Amazon Web Services) offers the following 208 options to reach workloads in AWS Cloud DCs: 210 - AWS Internet gateway allows communication between instances in 211 AWS VPC and the internet. 212 - AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are 213 established between an enterprise's own gateway and AWS vGW, so 214 that the communications between those gateways can be secured 215 from the underlay (which might be the public Internet). 216 - AWS Direct Connect, which allows enterprises to purchase direct 217 connect from network service providers to get a private leased 218 line interconnecting the enterprises gateway(s) and the AWS 219 Direct Connect routers. In addition, an AWS Transit Gateway can 220 be used to interconnect multiple VPCs in different Availability 221 Zones. AWS Transit Gateway acts as a hub that controls how 222 traffic is forwarded among all the connected networks which act 223 like spokes. 225 As an example, some branch offices of an enterprise can connect to 226 over the Internet to reach AWS's vGW via IPsec tunnels. Other branch 227 offices of the same enterprise can connect to AWS DirectConnect via 228 a private network (without any encryption). ). It is important for 229 enterprises to be able to observe the specific behaviors when 230 connected by different connections. 232 Figure below shows an example of some tenants' workloads are 233 accessible via a virtual router connected by AWS Internet Gateway; 234 some are accessible via AWS vGW, and others are accessible via AWS 235 Direct Connect. vR1 uses IPsec to establish secure tunnels over the 236 Internet to avoid paying extra fees for the IPsec features provided 237 by AWS vGW. Some tenants can deploy separate virtual routers to 238 connect to internet traffic and to traffic from the secure channels 239 from vGW and DirectConnect, e.g. vR1 & vR2. Others may have one 240 virtual router connecting to both types of traffic. Customer Gateway 241 can be customer owned router or ports physically connected to AWS 242 Direct Connect GW. 244 +------------------------+ 245 | ,---. ,---. | 246 | (TN-1 ) ( TN-2)| 247 | `-+-' +---+ `-+-' | 248 | +----|vR1|----+ | 249 | ++--+ | 250 | | +-+----+ 251 | | /Internet\ For External 252 | +-------+ Gateway +---------------------- 253 | \ / to reach via Internet 254 | +-+----+ 255 | | 256 | ,---. ,---. | 257 | (TN-1 ) ( TN-2)| 258 | `-+-' +---+ `-+-' | 259 | +----|vR2|----+ | 260 | ++--+ | 261 | | +-+----+ 262 | | / virtual\ For IPsec Tunnel 263 | +-------+ Gateway +---------------------- 264 | | \ / termination 265 | | +-+----+ 266 | | | 267 | | +-+----+ +------+ 268 | | / \ For Direct /customer\ 269 | +-------+ Gateway +----------+ gateway | 270 | \ / Connect \ / 271 | +-+----+ +------+ 272 | | 273 +------------------------+ 275 Figure 1: Examples of Multiple Cloud DC connections. 277 3.2. Interconnect Private and Public Cloud DCs 279 It is likely that hybrid designs will become the rule for cloud 280 services, as more enterprises see the benefits of integrating public 281 and private cloud infrastructures. However, enabling the growth of 282 hybrid cloud deployments in the enterprise requires fast and safe 283 interconnection between public and private cloud services. 284 For an enterprise to connect to applications & workloads hosted in 285 multiple Cloud DCs, the enterprise can use IPsec tunnels established 286 over the Internet or a (virtualized) leased line service to connect 287 its on-premises gateways to each of the Cloud DC's gateways, virtual 288 routers instantiated in the Cloud DCs, or any other suitable design 289 (including a combination thereof). 291 Some enterprises prefer to instantiate their own virtual 292 CPEs/routers inside the Cloud DC to connect the workloads within the 293 Cloud DC. Then an overlay path is established between customer 294 gateways to the virtual CPEs/routers for reaching the workloads 295 inside the cloud DC. 297 3.3. Desired Properties for Networks that interconnect Hybrid Clouds 299 The networks that interconnect hybrid cloud DCs must address the 300 following requirements: 301 - High availability to access all workloads in the desired cloud 302 DCs. 303 Many enterprises include cloud infrastructures in their 304 disaster recovery strategy, e.g., by enforcing periodic backup 305 policies within the cloud, or by running backup applications in 306 the Cloud, etc. Therefore, the connection to the cloud DCs may 307 not be permanent, but rather needs to be on-demand. 309 - Global reachability from different geographical zones, thereby 310 facilitating the proximity of applications as a function of the 311 end users' location, to improve latency. 312 - Elasticity: prompt connection to newly instantiated 313 applications at Cloud DCs when usages increase and prompt 314 release of connection after applications at locations being 315 removed when demands change. 316 Some enterprises have front-end web portals running in cloud 317 DCs and database servers in their on-premises DCs. Those Front- 318 end web portals need to be reachable from the public Internet. 319 The backend connection to the sensitive data in database 320 servers hosted in the on-premises DCs might need secure 321 connections. 323 - Scalable security management. IPsec is commonly used to 324 interconnect cloud gateways with CPEs deployed in the 325 enterprise premises. For enterprises with a large number or 326 branch offices, managing the IPsec's Security Associations 327 among many nodes can be very difficult. 329 4. Multiple Clouds Interconnection 331 4.1. Multi-Cloud Interconnection 333 Enterprises today can instantiate their workloads or applications in 334 Cloud DCs owned by different Cloud providers, e.g. AWS, Azure, 335 GoogleCloud, Oracle, etc. Interconnecting those workloads involves 336 three parties: The Enterprise, its network service providers, and 337 the Cloud providers. 339 All Cloud Operators offer secure ways to connect enterprises' on- 340 prem sites/DCs with their Cloud DCs. 342 Some Cloud Operators allow enterprises to connect via private 343 networks. For example, AWS's DirectConnect allows enterprises to use rd 3 party provided private Layer 2 path from enterprises' GW to AWS 344 DirectConnect GW. Microsoft's ExpressRoute allows extension of a 345 private network to any of the Microsoft cloud services, including 346 Azure and Office365. ExpressRoute is configured using Layer 3 347 routing. Customers can opt for redundancy by provisioning dual links 348 from their location to two Microsoft Enterprise edge routers (MSEEs) 349 located within a third-party ExpressRoute peering location. The BGP 350 routing protocol is then setup over WAN links to provide redundancy 351 to the cloud. This redundancy is maintained from the peering data 352 center into Microsoft's cloud network. 354 Google's Cloud Dedicated Interconnect offers similar network 355 connectivity options as AWS and Microsoft. One distinct difference, 356 however, is that Google's service allows customers access to the 357 entire global cloud network by default. It does this by connecting 358 your on-premises network with the Google Cloud using BGP and Google 359 Cloud Routers to provide optimal paths to the different regions of 360 the global cloud infrastructure. 362 All those connectivity options are between Cloud providers' DCs and 363 the Enterprises, but not between cloud DCs. For example, to connect 364 applications in AWS Cloud to applications in Azure Cloud, there must 365 be a third-party gateway (physical or virtual) to interconnect the 366 AWS's Layer 2 DirectConnect path with Azure's Layer 3 ExpressRoute. 368 Enterprises can also instantiate their own virtual routers in 369 different Cloud DCs and administer IPsec tunnels among them, which 370 by itself is not a trivial task. Or by leveraging open source VPN 371 software such as strongSwan, you create an IPSec connection to the 372 Azure gateway using a shared key. The strong swan instance within 373 AWS not only can connect to Azure but can also be used to facilitate 374 traffic to other nodes within the AWS VPC by configuring forwarding 375 and using appropriate routing rules for the VPC. Most Cloud 376 operators, such as AWS VPC or Azure VNET, use non-globally routable 377 CIDR from private IPv4 address ranges as specified by RFC1918. To 378 establish IPsec tunnel between two Cloud DCs, it is necessary to 379 exchange Public routable addresses for applications in different 380 Cloud DCs. [BGP-SDWAN] describes one method. Other methods are worth 381 exploring. 383 In summary, here are some approaches, available now (which might 384 change in the future), to interconnect workloads among different 385 Cloud DCs: 387 a) Utilize Cloud DC provided inter/intra-cloud connectivity 388 services (e.g., AWS Transit Gateway) to connect workloads 389 instantiated in multiple VPCs. Such services are provided with 390 the cloud gateway to connect to external networks (e.g., AWS 391 DirectConnect Gateway). 392 b) Hairpin all traffic through the customer gateway, meaning all 393 workloads are directly connected to the customer gateway, so 394 that communications among workloads within one Cloud DC must 395 traverse through the customer gateway. 396 c) Establish direct tunnels among different VPCs (AWS' Virtual 397 Private Clouds) and VNET (Azure's Virtual Networks) via 398 client's own virtual routers instantiated within Cloud DCs. 399 DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN 400 (Dynamic Smart VPN) techniques can be used to establish direct 401 Multi-point-to-Point or multi-point-to multi-point tunnels 402 among those client's own virtual routers. 404 Approach a) usually does not work if Cloud DCs are owned and managed 405 by different Cloud providers. 407 Approach b) creates additional transmission delay plus incurring 408 cost when exiting Cloud DCs. 410 For the Approach c), DMVPN or DSVPN use NHRP (Next Hop Resolution 411 Protocol) [RFC2735] so that spoke nodes can register their IP 412 addresses & WAN ports with the hub node. The IETF ION 413 (Internetworking over NBMA (non-broadcast multiple access) WG 414 standardized NHRP for connection-oriented NBMA network (such as ATM) 415 network address resolution more than two decades ago. 417 There are many differences between virtual routers in Public Cloud 418 DCs and the nodes in an NBMA network. NHRP cannot be used for 419 registering virtual routers in Cloud DCs unless an extension of such 420 protocols is developed for that purpose, e.g. taking NAT or dynamic 421 addresses into consideration. Therefore, DMVPN and/or DSVPN cannot 422 be used directly for connecting workloads in hybrid Cloud DCs. 424 Other protocols such as BGP can be used, as described in [BGP- 425 SDWAN]. 427 4.2. Desired Properties for Multi-Cloud Interconnection 429 Different Cloud Operators have different APIs to access their Cloud 430 resources. It is difficult to move applications built by one Cloud 431 operator's APIs to another. However, it is highly desirable to have 432 a single and consistent way to manage the networks and respective 433 security policies for interconnecting applications hosted in 434 different Cloud DCs. 436 The desired property would be having a single network fabric to 437 which different Cloud DCs and enterprise's multiple sites can be 438 attached or detached, with a common interface for setting desired 439 policies. SDWAN is positioned to become that network fabric enabling 440 Cloud DCs to be dynamically attached or detached. But the reality is 441 that different Cloud Operators have different access methods, and 442 Cloud DCs might be geographically far apart. More Cloud connectivity 443 problems are described in the subsequent sections. 445 The difficulty of connecting applications in different Clouds might 446 be stemmed from the fact that they are direct competitors. Usually 447 traffic flow out of Cloud DCs incur charges. Therefore, direct 448 communications between applications in different Cloud DCs can be 449 more expensive than intra Cloud communications. 451 5. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs 453 Traditional MPLS-based VPNs have been widely deployed as an 454 effective way to support businesses and organizations that require 455 network performance and reliability. MPLS shifted the burden of 456 managing a VPN service from enterprises to service providers. The 457 CPEs attached to MPLS VPNs are also simpler and less expensive, 458 since they do not need to manage routes to remote sites; they simply 459 pass all outbound traffic to the MPLS VPN PEs to which the CPEs are 460 attached (albeit multi-homing scenarios require more processing 461 logic on CPEs). MPLS has addressed the problems of scale, 462 availability, and fast recovery from network faults, and 463 incorporated traffic-engineering capabilities. 465 However, traditional MPLS-based VPN solutions are sub-optimized for 466 connecting end-users to dynamic workloads/applications in cloud DCs 467 because: 469 - The Provider Edge (PE) nodes of the enterprise's VPNs might not 470 have direct connections to third party cloud DCs that are used 471 for hosting workloads with the goal of providing an easy access 472 to enterprises' end-users. 474 - It usually takes some time to deploy provider edge (PE) routers 475 at new locations. When enterprise's workloads are changed from 476 one cloud DC to another (i.e., removed from one DC and re- 477 instantiated to another location when demand changes), the 478 enterprise branch offices need to be connected to the new cloud 479 DC, but the network service provider might not have PEs located 480 at the new location. 482 One of the main drivers for moving workloads into the cloud is 483 the widely available cloud DCs at geographically diverse 484 locations, where apps can be instantiated so that they can be 485 as close to their end-users as possible. When the user base 486 changes, the applications may be migrated to a new cloud DC 487 location closest to the new user base. 489 - Most of the cloud DCs do not expose their internal networks. An 490 enterprise with a hybrid cloud deployment can use an MPLS-VPN 491 to connect to a Cloud provider at multiple locations. The 492 connection locations often correspond to gateways of different 493 Cloud DC locations from the Cloud provider. The different 494 Cloud DCs are interconnected by the Cloud provider's own 495 internal network. At each connection location (gateway), the 496 Cloud provider uses BGP to advertise all of the prefixes in the 497 enterprise's VPC, regardless of which Cloud DC a given prefix 498 is actually in. This can result in inefficient routing for the 499 end-to-end data path. 501 - Extensive usage of Overlay by Cloud DCs: 503 Many cloud DCs use an overlay to connect their gateways to the 504 workloads located inside the DC. There is currently no standard 505 that specifies the interworking between the Cloud Overlay and 506 the enterprise' existing underlay networks. One of the 507 characteristics of overlay networks is that some of the WAN 508 ports of the edge nodes connect to third party networks. There 509 is therefore a need to propagate WAN port information to remote 510 authorized peers in third party network domains in addition to 511 route propagation. Such an exchange cannot happen before 512 communication between peers is properly secured. 514 Another roadblock is the lack of a standard way to express and 515 enforce consistent security policies for workloads that not only use 516 virtual addresses, but in which are also very likely hosted in 517 different locations within the Cloud DC [RFC8192]. The current VPN 518 path computation and bandwidth allocation schemes may not be 519 flexible enough to address the need for enterprises to rapidly 520 connect to dynamically instantiated (or removed) workloads and 521 applications regardless of their location/nature (i.e., third party 522 cloud DCs). 524 6. Problem with using IPsec tunnels to Cloud DCs 525 As described in the previous section, many Cloud operators expose 526 their gateways for external entities (which can be enterprises 527 themselves) to directly establish IPsec tunnels. Enterprises can 528 also instantiate virtual routers within Cloud DCs to connect to 529 their on-premises devices via IPsec tunnels. If there is only one 530 enterprise location that needs to reach the Cloud DC, an IPsec 531 tunnel is a very convenient solution. 533 However, many medium-to-large enterprises usually have multiple 534 sites and multiple data centers. For workloads and apps hosted in 535 cloud DCs, multiple sites need to communicate securely with those 536 cloud workloads and apps. This section documents some of the issues 537 associated with using IPsec tunnels to connect enterprise premises 538 with cloud gateways. 540 6.1. Complexity of multi-point any-to-any interconnection 542 The dynamic workload instantiated in cloud DC needs to communicate 543 with multiple branch offices and on-premises data centers. Most 544 enterprises need multi-point interconnection among multiple 545 locations, which can be provided by means of MPLS L2/L3 VPNs. 547 Using IPsec overlay paths to connect all branches & on-premises data 548 centers to cloud DCs requires CPEs to manage routing among Cloud DCs 549 gateways and the CPEs located at other branch locations, which can 550 dramatically increase the complexity of the design, possibly at the 551 cost of jeopardizing the CPE performance. 553 The complexity of requiring CPEs to maintain routing among other 554 CPEs is one of the reasons why enterprises migrated from Frame Relay 555 based services to MPLS-based VPN services. 557 MPLS-based VPNs have their PEs directly connected to the CPEs. 558 Therefore, CPEs only need to forward all traffic to the directly 559 attached PEs, which are therefore responsible for enforcing the 560 routing policy within the corresponding VPNs. Even for multi-homed 561 CPEs, the CPEs only need to forward traffic among the directly 562 connected PEs. However, when using IPsec tunnels between CPEs and 563 Cloud DCs, the CPEs need to compute, select, establish and maintain 564 routes for traffic to be forwarded to Cloud DCs, to remote CPEs via 565 VPN, or directly. 567 6.2. Poor performance over long distance 569 When enterprise CPEs or gateways are far away from cloud DC gateways 570 or across country/continent boundaries, performance of IPsec tunnels 571 over the public Internet can be problematic and unpredictable. Even 572 though there are many monitoring tools available to measure delay 573 and various performance characteristics of the network, the 574 measurement for paths over the Internet is passive and past 575 measurements may not represent future performance. 577 Many cloud providers can replicate workloads in different available 578 zones. An App instantiated in a cloud DC closest to clients may have 579 to cooperate with another App (or its mirror image) in another 580 region or database server(s) in the on-premises DC. This kind of 581 coordination requires predicable networking behavior/performance 582 among those locations. 584 6.3. Scaling Issues with IPsec Tunnels 586 IPsec can achieve secure overlay connections between two locations 587 over any underlay network, e.g., between CPEs and Cloud DC Gateways. 589 If there is only one enterprise location connected to the cloud 590 gateway, a small number of IPsec tunnels can be configured on-demand 591 between the on-premises DC and the Cloud DC, which is an easy and 592 flexible solution. 594 However, for multiple enterprise locations to reach workloads hosted 595 in cloud DCs, the cloud DC gateway needs to maintain multiple IPsec 596 tunnels to all those locations (e.g., as a hub & spoke topology). 597 For a company with hundreds or thousands of locations, there could 598 be hundreds (or even thousands) of IPsec tunnels terminating at the 599 cloud DC gateway, which is not only very expensive (because Cloud 600 Operators usually charge their customers based on connections), but 601 can be very processing intensive for the gateway. Many cloud 602 operators only allow a limited number of (IPsec) tunnels & bandwidth 603 to each customer. Alternatively, you could use a solution like 604 group encryption where a single IPsec SA is necessary at the GW but 605 the drawback here is key distribution and maintenance of a key 606 server, etc. 608 7. Problems of Using SD-WAN to connect to Cloud DCs 609 SD-WAN can establish parallel paths over multiple underlay networks 610 between two locations on-demand, for example, to support the 611 connections established between two CPEs interconnected by a 612 traditional MPLS VPN ([RFC4364] or [RFC4664]) or by IPsec [RFC6071] 613 tunnels. 615 SD-WAN lets enterprises augment their current VPN network with cost- 616 effective, readily available Broadband Internet connectivity, 617 enabling some traffic offloading to paths over the Internet 618 according to differentiated, possibly application-based traffic 619 forwarding policies, or when the MPLS VPN connection between the two 620 locations is congested, or otherwise undesirable or unavailable. 622 7.1. SD-WAN among branch offices vs. interconnect to Cloud DCs 624 SD-WAN interconnection of branch offices is not as simple as it 625 appears. For an enterprise with multiple sites, using SD-WAN overlay 626 paths among sites requires each CPE to manage all the addresses that 627 local hosts have the potential to reach, i.e., map internal VPN 628 addresses to appropriate SD-WAN paths. This is similar to the 629 complexity of Frame Relay based VPNs, where each CPE needed to 630 maintain mesh routing for all destinations if they were to avoid an 631 extra hop through a hub router. Even though SD-WAN CPEs can get 632 assistance from a central controller (instead of running a routing 633 protocol) to resolve the mapping between destinations and SD-WAN 634 paths, SD-WAN CPEs are still responsible for routing table 635 maintenance as remote destinations change their attachments, e.g., 636 the dynamic workload in other DCs are de-commissioned or added. 638 Even though originally envisioned for interconnecting branch 639 offices, SD-WAN offers a very attractive way for enterprises to 640 connect to Cloud DCs. 642 The SD-WAN for interconnecting branch offices and the SD-WAN for 643 interconnecting to Cloud DCs have some differences: 645 - SD-WAN for interconnecting branch offices usually have two end- 646 points (e.g., CPEs) controlled by one entity (e.g., a 647 controller or management system operated by the enterprise). 648 - SD-WAN for Cloud DC interconnects may consider CPEs owned or 649 managed by the enterprise, while remote end-points are being 650 managed or controlled by Cloud DCs (For the ease of 651 description, let's call such CPEs asymmetrically-managed CPEs). 653 - Cloud DCs may have different entry points (or devices) with one 654 entry point that terminates a private direct connection (based 655 upon a leased line for example) and other entry points being 656 devices terminating the IPsec tunnels, as shown in Figure 2. 658 Therefore, the SD-WAN design becomes asymmetric. 659 +------------------------+ 660 | ,---. ,---. | 661 | (TN-1 ) ( TN-2)| TN: Tenant applications/workloads 662 | `-+-' +---+ `-+-' | 663 | +----|vR1|----+ | 664 | ++--+ | 665 | | +-+----+ 666 | | /Internet\ One path via 667 | +-------+ Gateway +---------------------+ 668 | \ / Internet \ 669 | +-+----+ \ 670 +------------------------+ \ 671 \ 672 +------------------------+ native traffic \ 673 | ,---. ,---. | without encryption| 674 | (TN-3 ) ( TN-4)| | 675 | `-+-' +--+ `-+-' | | +------+ 676 | +----|vR|-----+ | +----+ CPE | 677 | ++-+ | | +------+ 678 | | +-+----+ | 679 | | / virtual\ One path via IPsec Tunnel | 680 | +-------+ Gateway +-------------------------- + 681 | \ / Encrypted traffic over| 682 | +-+----+ public network | 683 +------------------------+ | 684 | 685 +------------------------+ | 686 | ,---. ,---. | Native traffic | 687 | (TN-5 ) ( TN-6)| without encryption | 688 | `-+-' +--+ `-+-' | over secure network| 689 | +----|vR|-----+ | | 690 | ++-+ | | 691 | | +-+----+ +------+ | 692 | | / \ Via Direct /customer\ | 693 | +-------+ Gateway +----------+ gateway |-----+ 694 | \ / Connect \ / 695 | +-+----+ +------+ 696 +------------------------+Customer GW has physical connection to AWS GW 698 Figure 2: Different Underlays to Reach Cloud DC 700 8. End-to-End Security Concerns for Data Flows 702 When IPsec tunnels established from enterprise on-premises CPEs 703 are terminated at the Cloud DC gateway where the workloads or 704 applications are hosted, some enterprises have concerns regarding 705 traffic to/from their workload being exposed to others behind the 706 data center gateway (e.g., exposed to other organizations that 707 have workloads in the same data center). 708 To ensure that traffic to/from workloads is not exposed to 709 unwanted entities, IPsec tunnels may go all the way to the 710 workload (servers, or VMs) within the DC. 712 9. Requirements for Dynamic Cloud Data Center VPNs 714 In order to address the aforementioned issues, any solution for 715 enterprise VPNs that includes connectivity to dynamic workloads or 716 applications in cloud data centers should satisfy a set of 717 requirements: 719 - The solution should allow enterprises to take advantage of the 720 current state-of-the-art in VPN technology, in both traditional 721 MPLS-based VPNs and IPsec-based VPNs (or any combination 722 thereof) that run over the public Internet. 723 - The solution should not require an enterprise to upgrade all 724 their existing CPEs. 725 - The solution should support scalable IPsec key management among 726 all nodes involved in DC interconnect schemes. 727 - The solution needs to support easy and fast, on-the-fly, VPN 728 connections to dynamic workloads and applications in third 729 party data centers, and easily allow these workloads to migrate 730 both within a data center and between data centers. 731 - Allow VPNs to provide bandwidth and other performance 732 guarantees. 733 - Be a cost-effective solution for enterprises to incorporate 734 dynamic cloud-based applications and workloads into their 735 existing VPN environment. 737 10. Security Considerations 739 The draft discusses security requirements as a part of the problem 740 space, particularly in sections 4, 5, and 8. 742 Solution drafts resulting from this work will address security 743 concerns inherent to the solution(s), including both protocol 744 aspects and the importance (for example) of securing workloads in 745 cloud DCs and the use of secure interconnection mechanisms. 747 11. IANA Considerations 749 This document requires no IANA actions. RFC Editor: Please remove 750 this section before publication. 752 12. References 754 12.1. Normative References 756 12.2. Informative References 758 [RFC2735] B. Fox, et al "NHRP Support for Virtual Private 759 networks". Dec. 1999. 761 [RFC8192] S. Hares, et al "Interface to Network Security Functions 762 (I2NSF) Problem Statement and Use Cases", July 2017 764 [ITU-T-X1036] ITU-T Recommendation X.1036, "Framework for creation, 765 storage, distribution and enforcement of policies for 766 network security", Nov 2007. 768 [RFC6071] S. Frankel and S. Krishnan, "IP Security (IPsec) and 769 Internet Key Exchange (IKE) Document Roadmap", Feb 2011. 771 [RFC4364] E. Rosen and Y. Rekhter, "BGP/MPLS IP Virtual Private 772 Networks (VPNs)", Feb 2006 774 [RFC4664] L. Andersson and E. Rosen, "Framework for Layer 2 Virtual 775 Private Networks (L2VPNs)", Sept 2006. 777 [BGP-SDWAN] L. Dunbar, et al. "BGP Extension for SDWAN Overlay 778 Networks", draft-dunbar-idr-bgp-sdwan-overlay-ext-03, 779 work-in-progress, Nov 2018. 781 13. Acknowledgments 783 Many thanks to Alia Atlas, Chris Bowers, Ignas Bagdonas, Michael 784 Huang, Liu Yuan Jiao, Katherine Zhao, and Jim Guichard for the 785 discussion and contributions. 787 Authors' Addresses 789 Linda Dunbar 790 Futurewei 791 Email: Linda.Dunbar@futurewei.com 793 Andrew G. Malis 794 Independent 795 Email: agmalis@gmail.com 797 Christian Jacquenet 798 Orange 799 Rennes, 35000 800 France 801 Email: Christian.jacquenet@orange.com 803 Mehmet Toy 804 Verizon 805 One Verizon Way 806 Basking Ridge, NJ 07920 807 Email: mehmet.toy@verizon.com