idnits 2.17.1 draft-stein-cloud-access-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 04, 2013) is 3947 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-09) exists of draft-ietf-nvo3-framework-02 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group YJ. Stein 3 Internet-Draft Y. Gittik 4 Intended status: Informational RAD Data Communications 5 Expires: January 05, 2014 D. Kofman 6 K. Katsaros 7 LINCS 8 M. Morrow 9 L. Fang 10 Cisco Systems 11 W. Henderickx 12 Alcatel-Lucent 13 July 04, 2013 15 Accessing Cloud Services 16 draft-stein-cloud-access-03.txt 18 Abstract 20 Cloud services are revolutionizing the way computational resources 21 are provided, but at the expense of requiring an even more 22 revolutionary overhaul of the networking infrastructure needed to 23 deliver them. Much recent work has focused on intra- and inter- 24 datacenter connectivity requirements and architectures, while the 25 "access segment" connecting the cloud services user to the datacenter 26 still needs to be addressed. In this draft we consider tighter 27 integration between the network and the datacenter, in order to 28 improve end-to-end Quality of Experience, while minimizing both 29 networking and computational resource costs. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on January 05, 2014. 48 Copyright Notice 50 Copyright (c) 2013 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 66 2. Model of Existing Cloud Services . . . . . . . . . . . . . . 4 67 3. Optimized Cloud Access . . . . . . . . . . . . . . . . . . . 6 68 4. Security Considerations . . . . . . . . . . . . . . . . . . . 8 69 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 70 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 71 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 74 1. Introduction 76 Cloud services replace computational power and storage resources 77 traditionally located under the user's table or on the user's in- 78 house servers, with resources located in remote datacenters. The 79 cloud resources may be raw computing power and storage 80 (Infrastructure as a Service - IaaS), or computer systems along with 81 supported operating systems and tools (Platform as a Service - PaaS), 82 or even fully developed applications (Software as a Service - SaaS). 83 Processing power required for the operation of network devices can 84 also be provided (e.g., Routing as a Service - RaaS). The inter- and 85 intra-datacenter networking architectures needed to support cloud 86 services are described in [I-D.bitar-datacenter-vpn-applicability]. 88 The advantages of cloud services over conventional IT services 89 include elasticity (the ability to increase or decrease resources on 90 demand rather than having to purchase enough resources for worst case 91 scenarios), scalability (allocating multiple resources and load- 92 balancing them), high-availability (resources may be backed up by 93 similar resources at other datacenters), and offloading of IT tasks 94 (such as applications upgrading, firewalling, load balancing, storage 95 backup, and disaster recovery). These translate to economic 96 efficiencies if actually delivered. The disadvantages of cloud 97 service are lack of direct control by the customer, insecurity 98 regarding remote storage of sensitive data, and communications costs 99 (both direct monetary and technical such as lack of availability and 100 additional transaction latency). 102 The cloud service user connects to cloud resources over a networking 103 infrastructure. Today this infrastructure is often the public 104 Internet, but (for reasons to be explained below) is preferably a 105 network maintained by a Network Service Provider (NSP). The 106 datacenter(s) may belong to the NSP (which is the case considered by 107 [I-D.masum-chari-shc]), or may belong to a separate Cloud Service 108 Provider (CSP), and accessible from the NSP's network. In the latter 109 case there may or not be a business relationship between the NSP and 110 CSP, the strongest such relationship being when either the NSP or CSP 111 offers a unified "bundled" service to the customer. 113 In order to obtain the advantages of cloud service without many of 114 the disadvantages, the cloud services customer enters into a Service 115 Level Agreement (SLA) with the CSP. However, such an SLA by itself 116 will be unable to guarantee end-to-end service goals, since it does 117 not cover degradations introduced by the intervening network. 118 Indeed, if the datacenter is accessed over the public Internet, end- 119 to-end service goals may be unattainable. Thus an additional SLA 120 with the NSP (that may already be in effect for pre-cloud services) 121 is typically required. When the CSP and the NSP are the same entity 122 but not offering a bundled service, these SLAs may still be separate 123 documents. 125 Cloud services require a fundamental rethinking of the Information 126 Technology (IT) infrastructure, due to the requirement for dynamic 127 changes in IT resource configuration. Physical IT resources are 128 replaced by virtualized ones packaged in Virtual Machines (VMs). VMs 129 can be created, relocated while running (VM migration), and destroyed 130 on-demand. Since VMs need to interconnect, connect to physical 131 resources, and connect to the cloud services user, they need to be 132 allocated appropriate IP and layer 2 addresses. Since these 133 addresses need to be allocated, moved, and destroyed on-the-fly, the 134 cloud IT revolution directly impacts the networking infrastructure. 135 Recent work, such as [I-D.bitar-datacenter-vpn-applicability], has 136 focused on requirements and architectures for connectivity inside and 137 between datacenters. However, the "access segment", that is, the 138 networking infrastructure connecting the cloud services user to the 139 datacenter, has not been fully addressed. 141 The allocation, management, manipulation, and release of cloud 142 resources is called "orchestration" (see [I-D.dalela-orchestration]). 143 Orchestrators need to respond to user demands and uphold user SLAs 144 (perhaps exploiting virtualization techniques such as VM migration) 145 while taking into account the location and availability of IT 146 resources, and optimizing the CSP's operational objectives. These 147 objectives include, for example, decreasing costs by consolidating 148 resources, balancing use of resources by reallocating computational 149 and storage resources, and enforcing engineering, business, and 150 security policies. Orchestrators of the present generation do not 151 attempt optimization of CSP's networking resources, but this 152 generalization is being studied [I-D.ietf-nvo3-framework]. 153 Furthermore, these orchestrators are completely oblivious to the 154 NSP's resources and objectives. Hence, there is no mechanism for 155 maintaining end-to-end SLAs, or for optimizing end-to-end networking. 157 This goal of this Internet Draft is to kick off discussions on 158 requirements and possible mechanisms for improving end-to-end Quality 159 of Experience while minimizing both networking and computational 160 costs. 162 2. Model of Existing Cloud Services 164 ----- 165 /- - - - - - - - - - - - - - - - - -I O I 166 ----- 167 / ------- I 168 I OSS I ---------- 169 / ------- I I 170 I I DC A I 171 / -------- ----------- /I I 172 / I I I I / ---------- 173 -------- I I I NSP I--->--- 174 I user I->-I CE I--->---I I 175 -------- I I I network I--->--- 176 I I I I \ ---------- 177 -------- ----------- \I I 178 I DC B I 179 I I 180 ---------- 182 Figure 1: Simplified model of cloud service provided over Service 183 Provider network to an enterprise customer behind a CE device 185 For concreteness, we will assume the scenario of Figure 1. On the 186 left we see a cloud services user attached to a customer site 187 network. This network connects to the outside world via a Customer 188 Edge (CE), which may be a branch-site router or switch, a special 189 purpose cloud demarcation device, or in degenerate cases the user's 190 computer itself. The NSP network is assumed to be a well-engineered 191 network providing VPN and other SLA-based services to the customer 192 site. The NSP network is managed from an Operations Support System 193 (OSS), which may include a Business Support System (BSS), the latter 194 being needed for interfacing with the customer for approval of 195 service reconfiguration, billing issues, etc. In some cases, the 196 functionality needed here may be obtained by interfacing with a 197 Looking Glass server or a Policy and Charging Rules Function (PCRF). 198 Connected to this network are datacenters (two are shown - datacenter 199 A and datacenter B), which may belong to the NSP, or to a separate 200 CSP. The orchestrator of datacenter A is depicted as "O". 201 Additionally, Internet access may be available directly from the CE 202 (not shown) or from the NSP network. 204 In the usual cloud services orchestration model the user requests a 205 well-defined resource, for example over the telephone, via a web- 206 based portal, or via a function call. The orchestrator, after 207 checking correctness, availability, and updating the billing system, 208 allocates the resource, e.g., a VM on a particular CPU located in a 209 particular rack in datacenter A. In addition, the required networking 210 resources are allocated to the VM, e.g., an IP address, an Ethernet 211 MAC address, and a VLAN tag. The VM is now started and consumes CPU 212 power, memory, and disk space, as well as communications bandwidth 213 between itself and other VMs on the same CPU, within the same rack, 214 on other racks in the same datacenter, between datacenters, and 215 between itself and the user. If it becomes necessary to move the VM 216 from its allocated position to somewhere else (VM migration), the 217 orchestrator needs to reallocate the required computational and 218 communications resources. An example case is "cloudbursting" where a 219 customer who finds himself temporarily with insufficient local 220 resources reaches out to the cloud for supplementary ones 221 [I-D.mcdysan-sdnp-cloudbursting-usecase]. A priori this requires 222 allocating new addresses and rerouting all of the aforementioned 223 traffic types, while maintaining continuous operation of the VM. 224 When the user informs the CSP that it no longer requires the VM, the 225 orchestrator needs to clear the routing entries, withdraw the 226 communications resources, release storage and computational 227 resources, and update the billing system. 229 The operations of the previous paragraph are all performed by the 230 orchestrator, with possible cooperation with orchestrators from other 231 datacenters. The needed routing information is advertised to the NSP 232 via standard routing protocols, without taking into account possible 233 effects on the NSP network. If, for example, the path in the NSP 234 network to datacenter A degrades, while the path to datacenter B is 235 performing well, this information is neither known by the 236 orchestrator, nor is there a method for the orchestrator to take it 237 into account. Instead, the NSP must find a way to reach datacenter 238 A, even if this path is expensive, or of high latency, or problematic 239 in some other way. 241 This predicament arises due to the orchestrator communicating 242 (indirectly) with the user, but not with the NSP's OSS. In addition, 243 although the CE may be capable of OAM functionality, fault and 244 performance monitoring of the communications path through the NSP 245 network are not employed. Finally, while the user can (indirectly) 246 communicate with the orchestrator, there is no coordinated path to 247 the NSP's OSS/BSS. 249 3. Optimized Cloud Access 251 ----- 252 /------/--------------I O I 253 / / ----- 254 / ------- I 255 -----I OSS I ---------- 256 / ------- /I I 257 / I / I DC A I 258 -------- / ----------- / /I I 259 I I/ I I--->-' / ---------- 260 -------- I I--->---I NSP I--->--- 261 I user I->-I CE I I I 262 -------- I I--->---I network I--->--- 263 I I I I \ ---------- 264 -------- ----------- \I I 265 I DC B I 266 I I 267 ---------- 269 Figure 2: Cloud service with dual homing between a cloud-aware CE and 270 NSP network, and coordination between CE, NSP OSS/BSS, and 271 orchestrator 273 Figure 2. depicts two enhancements to the previous scenario. The 274 trivial enhancement is the providing of dual-homing between the CE 275 and the NSP network. This is a well-known and widely deployed 276 feature, which may be implemented regardless of the cloud services. 277 We shall see that it acquires additional meaning in the context of 278 the solution described below. 280 More significantly, Figure 2 depicts three new control communications 281 channels. The CE device is now assumed to be cloud-aware, and may 282 communicate directly with the NSP OSS/BSS, and with the CSP 283 orchestrator. In addition, the latter two may communicate with each 284 other. These control channels facilitate new capabilities, that may 285 improve end-to-end QoE while optimizing operational cost. An 286 alternative to a combined cloud/network CE is a separate "cloud 287 demarcation device" placed behind the network CE. 289 Consider the provisioning of a new cloud service. With this new 290 architecture the user's request is proxied by the cloud-aware CE to 291 both the OSS/BSS and to the orchestrator. Before commissioning the 292 service, the orchestrator initiates network testing between the 293 datacenter and the CE, and with the NSP's assistance QoS parameters 294 are determined for alternative paths to various relevant datacenters. 295 The NSP and CSP (whether a single SP or two) can now jointly decide 296 on placement of the VM in order to optimize the user's end-to-end 297 Quality of Experience (QoE) while minimizing costs to both SPs. The 298 best placement will necessitate the solution of a joint CSP + NSP 299 optimization problem, while the latter minimization may only be 300 reliable when a single SP provides networking and cloud resources. 301 The joint optimization calculation will input the status of 302 computational and storage resources at all relevant datacenters; as 303 well as network delay, throughput, and packet loss to each 304 datacenter. In some cases re-allocation of existing computational 305 and networking resources may needed. 307 Similarly, the NSP OSS may trigger VM migration if network conditions 308 degrade to the point where user QoE is no longer at the desired 309 level, or may veto a CSP initiated VM migration when its effect would 310 be too onerous on the NSP network. 312 The cloud-aware CE may be configured to periodically test path 313 continuity and measure QoS parameters. The CE can then report that 314 the estimated QoE drops under that specified in the SLA (or 315 dangerously approaches it), in order to promote SLA assurance even 316 when neither OSS nor orchestrator would otherwise know of the 317 problem. Additionally, the cloud-aware CE may report workload 318 changes detected by monitoring the number of active sessions (e.g., 319 the number of "flows" or n-tuple pairs). The OSS and orchestrator 320 can jointly perform root cause analysis and decide to trigger VM 321 migration or network allocation changes or both. Finally, over- 322 extended network segments may be identified, and pro-active VM 323 migration and/or rerouting performed to better distribute the load. 325 When the CE is dual homed to the NSP network, the secondary link may 326 be utilized in the conventional manner when the primary link fails, 327 or may be selected as part of the overall optimization of QoE vs. 328 cost. Load balancing over both links may also employed. The 329 datacenters may also be connected to the network with multiple links 330 (as depicted for DC A in Figure 2), enabling further connectivity 331 optimization. 333 In addition, popular yet stationary content may be cached in the NSP 334 network, and optimization may lead to the NSP network providing this 335 content without the need to access the datacenter at all. In certain 336 cases (e.g., catastrophic failure in the NSP network or of the 337 connectivity between that network and the datacenter), the cloud- 338 aware CE may choose to bypass the NSP network altogether and reach 339 the datacenter over the public Internet (with consequent QoE 340 reduction). In other cases, it may make sense to locally provide 341 standalone resources at the cloud demarcation device itself. 343 4. Security Considerations 345 Perceived insecurity of the customer's data sent to the cloud or 346 stored in a datacenter is perhaps the single most important factor 347 impeding the wide adoption of cloud services. At present, the only 348 solutions have been end-to-end authentication and confidentiality, 349 with the high cost these place on user equipment. The cloud-aware CE 350 may assume the responsibility for securing the cloud services from 351 the edge of the customer's walled garden, all the way to the 352 datacenter. 354 Isolation of CSP customers is addressed in [I-D.masum-chari-shc]. 355 Security measures such as hiding of network topology, as well as on- 356 the-fly inspection and modification of transactions are listed as 357 requirements in [I-D.dalela-orchestration], while [I-D.dalela-sop] 358 specifies encryption and authentication of orchestration protocol 359 messages. 361 A further extension to the model is to explicitly include security 362 levels as parameters of the QoE optimization process. This parameter 363 may be relatively coarse-grained (for example, 1 for services which 364 must be provided only over secure links, 0.5 for those for which 365 access paths under direct control of the NSP is sufficient, 0 for 366 general services that may run over out-of-footprint connections). 367 Security may also take regulatory restrictions into account, such as 368 limitations on database migration across national boundaries. Thus, 369 the placement and movement of a VM will be accomplished based on full 370 optimization of computational and storage resources; network delay, 371 throughput, and packet loss; and security levels. For example, for 372 an application for which the user can not afford denial of service 373 the joint optimizaton would need to find the needed resources as 374 close as possible to the end user. 376 5. IANA Considerations 378 This document requires no IANA actions. 380 6. Acknowledgements 382 The work of Y(J)S, YG, DK, and KK was conducted under the aegis of 383 ETICS (Economics and Technologies for Inter-Carrier Services), a 384 European collaborative research project within the ICT theme of the 385 7th Framework Programme of the European Union that contributes to the 386 objective "Network of the Future". 388 7. References 390 [I-D.bitar-datacenter-vpn-applicability] 391 Bitar, N., Balus, F., Lasserre, M., Henderickx, W., 392 Sajassi, A., Fang, L., Ikejiri, Y., and M. Pisica, "Cloud 393 Networking: Framework and VPN Applicability", draft-bitar- 394 datacenter-vpn-applicability-02 (work in progress), May 395 2012. 397 [I-D.bitar-datacenter-vpn-applicability] 398 Bitar, N., Balus, F., Lasserre, M., Henderickx, W., 399 Sajassi, A., Fang, L., Ikejiri, Y., and M. Pisica, "Cloud 400 Networking: Framework and VPN Applicability", draft-bitar- 401 datacenter-vpn-applicability-02 (work in progress), May 402 2012. 404 [I-D.dalela-orchestration] 405 Dalela, A. and M. Hammer, "Service Orchestration Protocol 406 (SOP) Requirements", draft-dalela-orchestration-00 (work 407 in progress), January 2012. 409 [I-D.dalela-sop] 410 Dalela, A. and M. Hammer, "Service Orchestration 411 Protocol", draft-dalela-sop-00 (work in progress), January 412 2012. 414 [I-D.masum-chari-shc] 415 Hasan, M., Chari, A., Fahed, D., Tucker, L., Morrow, M., 416 and M. Malyon, "A framework for controlling Multitenant 417 Isolation, Connectivity and Reachability in a Hybrid Cloud 418 Environment", draft-masum-chari-shc-00 (work in progress), 419 February 2012. 421 [I-D.mcdysan-sdnp-cloudbursting-usecase] 422 McDysan, D., "Cloud Bursting Use Case", draft-mcdysan- 423 sdnp-cloudbursting-usecase-00 (work in progress), October 424 2011. 426 [I-D.ietf-nvo3-framework] 427 Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 428 Rekhter, "Framework for DC Network Virtualization", draft- 429 ietf-nvo3-framework-02 (work in progress), February 2013. 431 Authors' Addresses 433 Yaakov (Jonathan) Stein 434 RAD Data Communications 435 24 Raoul Wallenberg St., Bldg C 436 Tel Aviv 69719 437 Israel 439 Email: yaakov_s@rad.com 441 Yuri Gittik 442 RAD Data Communications 443 24 Raoul Wallenberg St., Bldg C 444 Tel Aviv 69719 445 Israel 447 Email: yuri_g@rad.com 449 Daniel Kofman 450 LINCS 451 23 Avenue d'Italie 452 Paris 75013 453 France 455 Email: daniel.kofman@telecom-paristech.fr 457 Konstantinos Katsaros 458 LINCS 459 23 Avenue d'Italie 460 Paris 75013 461 France 463 Email: katsaros@telecom-paristech.fr 465 Monique Morrow 466 Cisco Systems 467 Richtistrase 7 468 CH-8304 Wallisellen 469 Switzerland 471 Email: mmorrow@cisco.com 472 Luyuan Fang 473 Cisco Systems 474 300 Beaver Brook Road 475 Boxborough, MA 01719 476 US 478 Email: lufang@cisco.com 480 Wim Henderickx 481 Alcatel-Lucent 482 Copernicuslaan 50 483 2018 Antwerp 484 Belgium 486 Email: wim.henderickx@alcatel-lucent.com