Network Working Group Y. Stein Internet-Draft Y. Gittik Intended status: Informational RAD Data Communications Expires: September 5, 2012 M. Morrow L. Fang Cisco Systems W. Henderickx Alcatel-Lucent March 4, 2012 Accessing Cloud Services draft-stein-cloud-access-00.txt Abstract Cloud services are revolutionizing the way computational resources are provided, but at the expense of requiring an even more revolutionary overhaul of the networking infrastructure needed to deliver them. Much recent work has focused on intra- and inter- datacenter connectivity requirements and architectures, while the "access segment" connecting the cloud services user to the datacenter still needs to be addressed. In this draft we consider tighter integration between the network and the datacenter, in order to improve end-to-end Quality of Experience, while minimizing both networking and computational resource costs. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 5, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. Stein, et al. Expires September 5, 2012 [Page 1] Internet-Draft Accessing Cloud Services March 2012 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Model of Existing Cloud Services . . . . . . . . . . . . . . . 5 3. Optimized Cloud Access . . . . . . . . . . . . . . . . . . . . 7 4. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 Stein, et al. Expires September 5, 2012 [Page 2] Internet-Draft Accessing Cloud Services March 2012 1. Introduction Cloud services replace computational power and storage resources traditionally located under the user's table or on the user's in- house servers, with resources located in remote datacenters. The cloud resources may be raw computing power and storage (Infrastructure as a Service - IaaS), or computer systems along with supported operating systems and tools (Platform as a Service - PaaS), or even fully developed applications (Software as a Service - SaaS). The inter- and intra-datacenter networking architectures needed to support cloud services are described in [I-D.bitar-datacenter-vpn-applicability]. The cloud service user connects to cloud resources over a networking infrastructure. Today this infrastructure is often the public Internet, but (for reasons to be explained below) is preferably a network maintained by a Network Service Provider (N-SP). The datacenter(s) may belong to the N-SP (which is the case considered by [I-D.masum-chari-shc]), or may belong to a separate Cloud Service Provider (C-SP), and accessible from the N-SP's network. The advantages of cloud services over conventional IT services include elasticity (the ability to increase or decrease resources on demand rather than having to purchase enough resources for worst case scenarios), scalability (allocating multiple resources and load- balancing them), high-availability (resources may be backed up by similar resources at other datacenters belonging to the C-SP), and offloading of IT tasks (such as applications upgrading, firewalling, storage backup, and disaster recovery). These translate to economic efficiencies if actually delivered. The disadvantages of cloud service are lack of direct control by the customer, additional transaction latency and communications costs, and insecurity regarding remote storage of sensitive data. These translate to economic inefficiencies unless actually controlled. In order to obtain the advantages of cloud service without the disadvantages, the cloud services customer enters into a Service Level Agreement (SLA) with the C-SP. However, such an SLA by itself will be unable to guarantee end-to-end service goals, since it does not cover degradations introduced by the intervening network. Indeed, if the datacenter is accessed over the public Internet, end- to-end service goals may be unattainable. Thus an additional SLA with the N-SP (that may already be in effect for pre-cloud services) is typically required. Even when the C-SP and the N-SP are the same entity, these SLAs may be separate documents. The lack of a single end-to-end service level guarantee leads to finger pointing with the resulting decrease in service level. Stein, et al. Expires September 5, 2012 [Page 3] Internet-Draft Accessing Cloud Services March 2012 Cloud services require a fundamental rethinking of the Information Technology (IT) infrastructure, due to the requirement for dynamic changes in IT resource configuration. Physical IT resources are replaced by virtualized ones packaged in Virtual Machines (VMs). VMs can be created, relocated while running (VMotion), and destroyed on- demand. Since VMs need to interconnect, connect to physical resources, and connect to the cloud services user, they need to be allocated appropriate IP and layer 2 addresses. Since these addresses need to be allocated, moved, and destroyed on-the-fly, the cloud IT revolution directly impacts the networking infrastructure. Recent work, such as [I-D.bitar-datacenter-vpn-applicability], has focused on requirements and architectures for connectivity inside and between datacenters. However, the "access segment", that is, the networking infrastructure connecting the cloud services user to the datacenter, has not been fully addressed. The creation, management, manipulation, and removal of cloud resources is called "orchestration" (see [I-D.dalela-orchestration]). Orchestrators need to respond to user demands and uphold user SLAs, while taking into account the location and availability of IT resources, and optimizing the C-SP's operational objectives. These objectives include, for example, decreasing costs by consolidating resources, balancing use of resources by reallocating computational and storage resources, and enforcing engineering, business, and security policies. Orchestrators of the present generation do not attempt optimization of C-SP's networking resources, but this generalization is being studied. Furthermore, these orchestrators are completely oblivious to the N-SP's resources and objectives. Hence, there is no mechanism for maintaining end-to-end SLAs, or for optimizing end-to-end networking. This goal of this Internet Draft is to kick off discussions on requirements and possible mechanisms for improving end-to-end Quality of Experience while minimizing both networking and computational costs. Stein, et al. Expires September 5, 2012 [Page 4] Internet-Draft Accessing Cloud Services March 2012 2. Model of Existing Cloud Services ----- /- - - - - - - - - - - - - - - - - -I O I ----- / I ---------- / I I I DC A I / -------- ----------- /I I / I I I I / ---------- -------- I I I N-SP I--->--- I user I->-I CE I--->---I I -------- I I I network I--->--- I I I I \ ---------- -------- ----------- \I I I DC B I I I ---------- Figure 1: Simplified model of cloud service provided over Service Provider network to an enterprise customer behind a CE device For concreteness, we will assume the scenario of Figure 1. On the left we see a cloud services user attached to a customer site network. This network connects to the outside world via a Customer Edge (CE), which may be a branch-site router or switch, a special purpose cloud demarcation device, or in degenerate cases the user's computer itself. The N-SP network is assumed to be a well-engineered network providing VPN and other SLA-based services to the customer site. The N-SP network is managed from an Operations Support System (OSS), which may include a Business Support System (BSS). Connected to this network are datacenters (two are shown - datacenter A and datacenter B), which may belong to the N-SP, or to a separate C-SP. The orchestrator of datacenter A is depicted as "O". Additionally, Internet access may be available directly from the CE (not shown) or from the N-SP network. In the usual cloud services orchestration model the user requests a well-defined resource, for example over the telephone, via a web- based portal, or via a function call. The orchestrator, after checking correctness, availability, and updating the billing system, allocates the resource, e.g., a VM on a particular CPU located in a particular rack in datacenter A. In addition, the required networking resources are allocated to the VM, e.g., an IP address, an Ethernet MAC address, and a VLAN tag. The VM is now started and consumes CPU power, memory, and disk space, as well as communications bandwidth between itself and other VMs on the same CPU, within the same rack, Stein, et al. Expires September 5, 2012 [Page 5] Internet-Draft Accessing Cloud Services March 2012 on other racks in the same datacenter, between datacenters, and between itself and the user. If it becomes necessary to move the VM from its allocated position to somewhere else (VMotion), the orchestrator needs to reallocate the required computational and communications resources. In general this requires allocating new addresses and rerouting all of the aforementioned traffic types, while maintaining continuous operation of the VM. When the user informs the C-SP that it no longer requires the VM, the orchestrator needs to clear the routing entries, withdraw the communications resources, release storage and computational resources, and update the billing system. The operations of the previous paragraph are all performed by the orchestrator, with possible cooperation with orchestrators from other datacenters. The needed routing information is advertised to the N-SP via standard routing protocols, without taking into account possible effects on the N-SP network. If, for example, the path in the N-SP network to datacenter A degrades, while the path to datacenter B is performing well, this information is neither known by the orchestrator, nor is there a method for the orchestrator to take it into account. Instead, the N-SP must find a way to reach datacenter A, even if this path is expensive, or of high latency, or problematic in some other way. This predicament arises due to the orchestrator communicating (indirectly) with the user, but not with the N-SP's OSS. In addition, although the CE may be capable of OAM functionality, fault and performance monitoring of the communications path through the N-SP network are not employed. Finally, while the user can (indirectly) communicate with the orchestrator, there is no coordinated path to the N-SP's OSS/BSS. Stein, et al. Expires September 5, 2012 [Page 6] Internet-Draft Accessing Cloud Services March 2012 3. Optimized Cloud Access ----- /------/--------------I O I / / ----- / ------- I -----I OSS I ---------- / ------- I I / I I DC A I -------- / ----------- /I I I I/ I I / ---------- -------- I I--->---I N-SP I--->--- I user I->-I CE I I I -------- I I--->---I network I--->--- I I I I \ ---------- -------- ----------- \I I I DC B I I I ---------- Figure 2: Cloud service with dual homing between a cloud-aware CE and N-SP network, and coordination between CE, N-SP OSS/BSS, and orchestrator Figure 2. depicts two enhancements to the previous scenario. The trivial enhancement is the providing of dual-homing between the CE and the N-SP network. This is a well-known and widely deployed feature, which may be implemented regardless of the cloud services. We shall see that it acquires additional meaning in the context of the solution described below. More significantly, Figure 2 depicts three new control communications channels. The CE device is now assumed to be cloud-aware, and may communicate directly with the N-SP OSS/BSS, and with the C-SP orchestrator. In addition, the latter two may communicate with each other. These control channels facilitate new capabilities, that may improve end-to-end QoE while optimizing operational cost. Consider the provisioning of a new cloud service. With this new architecture the user's request is proxied by the cloud-aware CE to both the OSS/BSS and to the orchestrator. Before commissioning the service, the orchestrator initiates network testing between the datacenter and the CE, and with the N-SP's assistance QoS parameters are determined for alternative paths to various relevant datacenters. The N-SP and C-SP (whether a single SP or two) can now jointly decide on placement of the VM in order to optimize the user's end-to-end Quality of Experience (QoE) while minimizing costs to both SPs. The placement will necessitate the development of new combined C-SP + Stein, et al. Expires September 5, 2012 [Page 7] Internet-Draft Accessing Cloud Services March 2012 N-SP QoE metrics, while the latter minimization may only be reliable when a single SP provides networking and cloud resources. Similarly, the N-SP OSS may now trigger VMotion when network conditions warrant this, or may veto VMotion when its effect would be too onerous on the N-SP network. The cloud-aware CE may be configured to periodically test path continuity and measure QoS parameters. The CE may trigger VMotion when the estimated QoE drops under that to be guaranteed by the SLA, thus promoting SLA assurance even when neither OSS nor orchestrator know of the problem. Furthermore, the dual homing link from the CE to the N-SP network may be chosen not only when the primary link fails, but as part of the overall optimization of QoE vs. cost. In addition, popular yet stationary content may be cached in the N-SP network, and optimization may lead to the N-SP network providing this content without the need to access the datacenter at all. In certain cases (e.g., catastrophic failure in the N-SP network or of the connectivity between that network and the datacenter), the cloud- aware CE may choose to bypass the N-SP network altogether and reach the datacenter over the public Internet (with consequent QoE reduction). 4. Security Considerations Perceived insecurity of the customer's data sent to the cloud or stored in a datacenter is perhaps the single most important factor impeding the wide adoption of cloud services. At present, the only solutions have been end-to-end authentication and confidentiality, with the high cost these place on user equipment. The cloud-aware CE may assume the responsibility for securing the cloud services from the edge of the customer's walled garden, all the way to the VM. Isolation of C-SP customers is addressed in [I-D.masum-chari-shc]. Security measures such as hiding of network topology, as well as on- the-fly inspection and modification of transactions are listed as requirements in [I-D.dalela-orchestration], while [I-D.dalela-sop] specifies encryption and authentication of orchestration protocol messages. 5. IANA Considerations This document requires no IANA actions. Stein, et al. Expires September 5, 2012 [Page 8] Internet-Draft Accessing Cloud Services March 2012 6. References [I-D.bitar-datacenter-vpn-applicability] Bitar, N., Balus, F., Lasserre, M., Henderickx, W., Sajassi, A., Fang, L., Ikejiri, Y., and M. Pisica, "Cloud Networking: Framework and VPN Applicability", draft-bitar-datacenter-vpn-applicability-01 (work in progress), October 2011. [I-D.dalela-orchestration] Dalela, A. and M. Hammer, "Service Orchestration Protocol (SOP) Requirements", draft-dalela-orchestration-00 (work in progress), January 2012. [I-D.dalela-sop] Dalela, A. and M. Hammer, "Service Orchestration Protocol", draft-dalela-sop-00 (work in progress), January 2012. [I-D.masum-chari-shc] Hasan, M., Chari, A., Fahed, D., Tucker, L., Morrow, M., and M. Malyon, "A framework for controlling Multitenant Isolation, Connectivity and Reachability in a Hybrid Cloud Environment", draft-masum-chari-shc-00 (work in progress), February 2012. Authors' Addresses Yaakov (Jonathan) Stein RAD Data Communications 24 Raoul Wallenberg St., Bldg C Tel Aviv 69719 Israel Email: yaakov_s@rad.com Yuri Gittik RAD Data Communications 24 Raoul Wallenberg St., Bldg C Tel Aviv 69719 Israel Email: yuri_g@rad.com Stein, et al. Expires September 5, 2012 [Page 9] Internet-Draft Accessing Cloud Services March 2012 Monique Morrow Cisco Systems Richtistrase 7 CH-8304 Wallisellen Switzerland Email: mmorrow@cisco.com Luyuan Fang Cisco Systems 300 Beaver Brook Road Boxborough, MA 01719 US Email: lufang@cisco.com Wim Henderickx Alcatel-Lucent Copernicuslaan 50 2018 Antwerp Belgium Email: wim.henderickx@alcatel-lucent.com Stein, et al. Expires September 5, 2012 [Page 10]