idnits 2.17.1 draft-natarajan-containers-for-nfv-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 1, 2015) is 3123 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'ETSI-NFV-WHITE' is defined on line 429, but no explicit reference was found in the text == Unused Reference: 'ETSI-NFV-REQ' is defined on line 436, but no explicit reference was found in the text == Unused Reference: 'ETSI-NFV-ARCH' is defined on line 440, but no explicit reference was found in the text == Unused Reference: 'ETSI-NFV-TERM' is defined on line 444, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Research Task Force (IRTF) S. Natarajan 2 Internet Draft Deutsche Telekom Inc. 3 Category: Informational R. Krishnan 4 A. Ghanwani 5 Dell 6 D. Krishnaswamy 7 IBM Research 8 P. Willis 9 BT 11 Expires: March 2016 October 1, 2015 13 An Analysis of Container-based Platforms for NFV 15 draft-natarajan-containers-for-nfv-00 17 Abstract 19 With the technology advancements in the field of containers, they 20 are considered a potential alternative to virtual machine based 21 implementations. In the area of cloud applications, there are 22 comprehensive studies and early implementations of container based 23 platforms. This draft describes some of the challenges of using 24 virtual machines for NFV workloads and how containers can 25 potentially address these challenges. 27 Status of this Memo 29 This Internet-Draft is submitted to IETF in full conformance with 30 the provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF), its areas, and its working groups. Note that 34 other groups may also distribute working documents as Internet- 35 Drafts. 37 Internet-Drafts are draft documents valid for a maximum of six 38 months and may be updated, replaced, or obsoleted by other documents 39 at any time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 The list of current Internet-Drafts can be accessed at 43 http://www.ietf.org/ietf/1id-abstracts.txt. 45 The list of Internet-Draft Shadow Directories can be accessed at 46 http://www.ietf.org/shadow.html. 48 This Internet-Draft will expire in March 2016. 50 Copyright Notice 52 Copyright (c) 2015 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with 60 respect to this document. 62 Conventions used in this document 64 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 65 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 66 document are to be interpreted as described in RFC 2119. 68 Table of Contents 70 1. Introduction...................................................3 71 2. Challenges in Virtual Machine Implementations..................3 72 2.1. Performance (SLA).........................................3 73 2.1.1. Challenges...........................................3 74 2.2. Continuity/ Elasticity/ Portability.......................4 75 2.2.1. Challenges:..........................................4 76 2.3. Security..................................................5 77 2.3.1. Challenges...........................................5 78 2.4. Management................................................6 79 2.4.1. Challenges...........................................7 80 3. Benefits of Containers.........................................7 81 4. Challenges with Containers and potential solutions.............8 82 5. Conclusion.....................................................9 83 6. Future Work....................................................9 84 7. IANA Considerations............................................9 85 8. Security Considerations........................................9 86 9. Contributors..................................................10 87 10. Acknowledgements.............................................10 88 11. References...................................................10 89 11.1. Normative References....................................10 90 11.2. Informative References..................................10 91 Authors' Addresses...............................................11 93 1. Introduction 95 This draft describes some of the challenges of using virtual 96 machines for NFV workloads and how container-based platforms can 97 potentially address these challenges. It also suggests future work 98 in the area of containers. 100 2. Challenges in Virtual Machine Implementations 102 In this section, we provide our assessment of using virtual machines 103 to host VNFs. We enlist the advantages and limitations of VMs and 104 then discuss some open issues that can potentially be addressed by 105 containers. 107 2.1. Performance (SLA) 108 Performance requirements vary with each VNF type and configuration. 109 The platform should support the specification, realization and 110 runtime adaptation of different performance metrics. Achievable 111 performance can vary depending on several factors such as the 112 workload type, the size of the workload, the set of virtual machines 113 sharing the underlying infrastructure, etc. Here we highlight some 114 of the challenges based on potential deployment considerations. 116 2.1.1. Challenges 118 . VNF provisioning time (including up/down/update) constitutes the 119 time it takes to spin-up the VNF process, its application-specific 120 dependencies, and additional system dependencies. The resource 121 choices such as the hypervisor type, the guest and host OS flavor 122 and the need for hardware and software accelerators, etc., 123 constitute a significant portion of this processing time 124 (instantiation or down time) when compared to just bringing up the 125 actual VNF process. As a result, the provisioning latency is 126 heavily dependent on the optimal choice of infrastructure 127 resources. 129 . The runtime performance (achievable throughput, line rate speed, 130 maximum concurrent sessions that can be maintained, number of new 131 sessions that can be added per second) for each VNF is directly 132 dependent on the amount of resources (e.g., virtual CPUs, RAM) 133 allocated to individual VMs. Choosing the right resource setting 134 is a tricky task. If VM resources are over-provisioned, we end up 135 under-utilizing the physical resources. On the contrary if we 136 under-provision the VM resources, then upgrading the resource to 137 an advanced system setting might require scaling out or scaling up 138 of the resources and re-directing traffic to the new VM; scaling 139 up/down operations consume time and add to the latency. This 140 overhead stems from the need to account resources of components 141 other than the actual VNF process (e.g., guest OS requirements). 143 . If each network function is hosted in individual VMs, then an 144 efficient inter-VM networking solution is required for 145 performance. 147 . Deploying VNF's inside a virtual machine can impose several 148 challenges in meeting Service Level Agreements (SLA). As an 149 example, SLAs demand dynamic fine-tuning (e.g., changing base 150 memory, allocating additional vCPUs) and instantiation of additive 151 features (e.g., integration with hardware and software 152 accelerators) during runtime. In most cases, achieving this task 153 with VMs require snapshotting the current VM state, halting the 154 VM, upgrading the VM with improved features, and re-spinning the 155 VM, all of which have performance implications. 157 2.2. Continuity/ Elasticity/ Portability 159 VNF service continuity can be interrupted due to several factors: 160 undesired state of the VNF (e.g. VNF upgrade progress), underlying 161 hardware failure, and unavailability of virtualized resources, VNF 162 SW failure, etc. Some of the requirements that need consideration 163 are: 165 2.2.1. Challenges: 167 o VM-based VNF's are not completely decoupled from the underlying 168 infrastructure. As discussed in the previous section, most VNFs 169 have a dependency on the guest OS, hypervisor type, accelerator 170 used, and the host OS. Therefore porting VNFs to a new platform 171 might require identifying equivalent resources (e.g., hypervisor 172 support, new hardware model, understanding resource capabilities) 173 and repeating the provisioning steps to bring back the VNF to a 174 working state. 176 o Service continuity requirements can be classified as follows: 177 seamless (with zero impact) or non-seamless continuity (accepts 178 measurable impacts to offered services). Achieving seamless 179 service continuity requires an efficient high availability 180 solution or a quick restoration mechanism that can bring back the 181 VNF to an operational state. (Note that the need for an efficient 182 high availability solution or quick restoration mechanism is not 183 unique to VM based implementations.) For example, an anomaly 184 caused by a hardware failure can impact all VNFs hosted on that 185 infrastructure resource. To restore the VNF to a working state, 186 the user should first provision the VM (process + guest OS + 187 hypervisor info), spin-up and configure the VNF process inside the 188 VM, setup the interconnects to forward network traffic, manage the 189 VNF-related state, and update any dependent runtime agents. 191 o Addressing the service elasticity challenges require holistic 192 view of the underlying resources. The challenges for presenting a 193 holistic view include the following 195 o Performing Scalable Monitoring: Scalable continuous 196 monitoring of the individual resource's current state is 197 needed to spin-up additional resources (auto-scale or auto- 198 heal) when the system encounters performance degradation or 199 spin-down idle resources to optimize resource usage. 201 o Handling CPU-intensive vs Memory-intensive VNFs: For CPU- 202 intensive VNFs the degradation can primarily depend on the 203 VNF processing functionality. On the other hand, for I/O 204 intense workloads, the overhead is significantly impacted by 205 to the hypervisor features, its type, the number of VMs it 206 manages, the modules loaded in the guest OS etc. 208 2.3. Security 210 Broadly speaking, security can be classified into: 212 o Security features provided by the VNFs to manage the state, and 214 o Security of the VNFs and its resources. 216 Some considerations on the security of the VNF infrastructure are 217 listed here. 219 2.3.1. Challenges 221 o The adoption of virtualization techniques (e.g., para- 222 virtualization, OS-level) for hosting network functions and the 223 deployment need to support multi-tenancy requires secure slicing 224 of the infrastructure resources. In this regard, it is critical to 225 provide a solution that can ensure the following: 227 o Provision the network functions by guaranteeing complete 228 isolation across resource entities (hardware units, 229 hypervisor, virtual networks, etc.). This includes secure 230 access between VM and host interface, VM-VM communication, 231 etc. For maximizing overall resource utilization and 232 improving service agility/elasticity, sharing of resources 233 across network functions must be possible. 235 o When a resource component is compromised, quarantine the 236 compromised entity but ensure service continuity for other 237 resources. 239 o Securely recover from runtime vulnerabilities or attacks and 240 restore the network functions to an operational state. 241 Achieving this with minimal or no downtime is important. 243 Realizing the above requirements is a complex task in any type of 244 virtualization options (virtual machines, containers, etc.) 246 o Resource starvation / Availability: Applications hosted in VMs 247 can starve the underlying physical resources such that co-hosted 248 entities become unavailable. (Note that the resource starvation 249 challenge is not unique to VM based implementations.) Ideally, 250 countermeasures are required to monitor the usage patterns of 251 individual VMs and ensure fair use of individual VM resources. 253 2.4. Management 255 The management and operational aspects are primarily focused on the 256 VNF lifecycle management and its related functionalities. In 257 addition, the solution is required to handle the management of 258 failures, resource usage, state processing, smooth rollouts, and 259 security as discussed in the previous sections. Some features of 260 VM-based management solution include: 262 oCentralized control and visibility: Support for web client, 263 multi-hypervisor management, single sign-on, inventory search, 264 alerts & notifications. 266 oProactive Management: Creating host profiles, resource management 267 of VMs, dynamic resource allocation, auto-restart in HA model, 268 audit trails, patch management. 270 oExtensible platform: Define roles, permissions and licenses 271 across resources and use of APIs to integrate with other 272 solutions. 274 Thus, the key requirements for a management solution 276 o Simple to operate and deploy VNFs. 278 o Uses well-defined standard interfaces to integrate seamlessly 279 with different vendor implementations. 281 o Creates functional automation to handle VNF lifecycle 282 requirements. 284 o Provide APIs that abstracts the complex low-level information 285 from external components. 287 o Is secure. 289 2.4.1. Challenges 291 The key challenge is addressing the aforementioned requirements for 292 a management solution while dealing with the multi-dimensional 293 complexity introduced by the hypervisor and guest OS. 295 3. Benefits of Containers 297 . Containers (when compared to VMs) can provide better service 298 agility as it allows us to run the VNF process directly in the 299 host environment. This eliminates the provisioning and processing 300 delay associated with spinning up (or down/update) guest OS, 301 kernel driver association, and hypervisor processing time. This 302 facilitates meeting the SLA requirements of different VNFs. The 303 placement problem for finding a container that is running on 304 hardware of a certain type, e.g. hardware with certain offloads, 305 remains to be addressed. 307 . Containers share the host OS and only require resource 308 allocation for the individual VNF process which usually results in 309 better runtime performance when compared to VMs. 311 . With containers, the inter-VNF communication latency depends on 312 the inter-process communication option (when hosted in the same 313 host) such as bridge mode, sharing the host's network stack, 314 sharing network namespace between containers, etc. or the 315 networking solution (e.g., network overlays, virtualization, etc.) 316 used between clusters of nodes (when VNFs are hosted across 317 multiple nodes). This eliminates the overhead introduced by the 318 guest OS's network stack. 320 . Auto-scaling VNFs or achieving service elasticity in runtime can 321 be simplified by the use of container based VNFs due to the 322 lightweight resource usage of containers. Using containers can 323 simplify the allocation of additional resources to existing 324 containers or quickly spinning up alternate containers, as it only 325 requires booting the VNF process and handling the state transition 326 associated with it. This can significantly reduce the downtime or 327 upgrade time. 329 . Some container management solutions (e.g., Kubernetes 330 [KUBERNETES-SELF-HEALING]) provide self-healing features such as 331 auto-placement, restart, and replacement by using a service 332 discovery mechanism and continuously monitoring the health of 333 individual or group of containers. When a container process 334 encounters a failure, the platform auto detects the issue and 335 seamlessly recovers from failures. This can address some of the 336 service continuity requirements needed in VNF deployments. 338 4. Challenges with Containers and potential solutions 340 . Resource Management/Isolation/Security: Containers create a slice 341 of the underlying host using techniques like namespaces, cgroups, 342 chroot etc. However, there are several other kernel features that 343 are not completely isolated from the processes running inside 344 containers. This can allow a vulnerable container to compromise 345 the host or containers belonging to other users (e.g., resource 346 starvation). 348 oPotential Solution: Guaranteeing complete isolation across 349 entities requires an efficient access control mechanism and 350 resource quota mechanism. Usage of kernel security modules 351 like SELinux [SELINUX], AppArmor [APPARMOR] along with 352 containers can provide the required features for a secure VNF 353 deployment. Usage of resource quota techniques such as those 354 in Kubernetes [KUBERNETES-RESOURCE-QUOTA] can provide the 355 typical resource guarantees for a VNF deployment. 356 Additionally, a hybrid deployment with VMs and containers can 357 be envisioned depending on the degree of isolation needed 358 between VNFs. 360 . Cross-VNF compatibility and Operating System dependency: As of 361 today, containers are supported in selective operating systems 362 such as Linux, Windows and Solaris. On the other hand, in the 363 current range of VNFs, many don't support Linux OS or other OSes 364 such as Windows and Solaris. Depending on the nature of the 365 software associated with VNFs, and the libraries installed inside 366 a container, and the underlying OS version that a container 367 utilizes, some VNFs may not be compatible with other VNFs. 369 oPotential Solution: A hybrid deployment with VMs and 370 containers can be envisioned to address this problem. The 371 VNFs which don't run on container supported OSes can be run 372 in VMs. Additionally, one could envision each set of 373 compatible VNFs running within a specific VM, with different 374 sets of VNFs running on different VMs, where the VMs run on a 375 hypervisor. A notable additional challenge in this solution 376 is state transfer between containers and virtual machines. 378 . Overall Performance: Unlike VMs, containers can run directly on 379 the host OS and thus exhibit significant performance benefits. As 380 an example, the whitepaper [VCPE-CONTAINER-PERF] demonstrates ~25% 381 throughput improvement for TCP traffic for a Virtual Enterprise 382 Customer Premises Equipment (vE-CPE) use case as described in 383 [ETSI-NFV-USE-CASES]; the environments which were compared were 384 containers using LXC and VM using KVM. 386 5. Conclusion 388 The use of containers for VNFs appears to have significant 389 advantages compared to using VMs and hypervisors especially for 390 efficiency and performance. With this background, the authors urge 391 the industry to address the future work areas, especially solutions 392 for the challenges, as described in Section 4 and consider 393 container-based VNFs in real deployments beyond proof-of-concepts. 395 6. Future Work 397 Opportunistic areas for future work include but not limited to 398 developing solutions to address the challenges in VNF 399 containerization described in Section 3, distributed micro-service 400 network functions, etc. 402 7. IANA Considerations 404 This draft does not have any IANA considerations. 406 8. Security Considerations 408 VM-based VNFs can offer a greater degree of isolation and security. 409 Since container-based VNFs provide abstraction at the OS level, it 410 can introduce potential vulnerabilities in the system when deployed 411 without proper OS-level security features. This is one of the key 412 implementation/deployment challenges that needs to be further 413 investigated. 415 9. Contributors 417 10. Acknowledgements 419 The authors would like to thank Vineed Konkoth for the Virtual 420 Customer CPE Container Performance white paper, and Ashay Chaudhary 421 for the detailed comments and suggestions on this document. 423 11. References 425 11.1. Normative References 427 11.2. Informative References 429 [ETSI-NFV-WHITE] "ETSI NFV White Paper," 430 http://portal.etsi.org/NFV/NFV_White_Paper.pdf 432 [ETSI-NFV-USE-CASES] "ETSI NFV Use Cases," 433 http://www.etsi.org/deliver/etsi_gs/NFV/001_099/001/01.01.01_60/gs_N 434 FV001v010101p.pdf 436 [ETSI-NFV-REQ] "ETSI NFV Virtualization Requirements," 437 http://www.etsi.org/deliver/etsi_gs/NFV/001_099/004/01.01.01_60/gs_N 438 FV004v010101p.pdf 440 [ETSI-NFV-ARCH] "ETSI NFV Architectural Framework," 441 http://www.etsi.org/deliver/etsi_gs/NFV/001_099/002/01.01.01_60/gs_N 442 FV002v010101p.pdf 444 [ETSI-NFV-TERM] "Terminology for Main Concepts in NFV," 445 http://www.etsi.org/deliver/etsi_gs/NFV/001_099/003/01.01.01_60/gs_n 446 fv003v010101p.pdf 448 [KUBERNETES-RESOURCE-QUOTA] "Kubernetes Resource Quota," 449 http://kubernetes.io/v1.0/docs/admin/resource-quota.html 451 [KUBERNETES-SELF-HEALING] "Kubernetes Design Overview," 452 http://kubernetes.io/v1.0/docs/design/README.html 454 [SELINUX] "Security Enhanced Linux (SELinux) project," 455 http://selinuxproject.org/ 457 [APPARMOR] "Mandatory Access Control Framework," 458 https://wiki.debian.org/AppArmor 460 [VCPE-CONTAINER-PERF] "Virtual Customer CPE Container Performance 461 White Paper," http://info.ixiacom.com/rs/098-FRB-840/images/Calsoft- 462 Labs-CaseStudy2015.pdf 464 Authors' Addresses 466 Sriram Natarajan 467 Deutsche Telekom Inc. 468 sriram.natarajan@telekom.com 470 Ram (Ramki) Krishnan 471 Dell 472 ramki_krishnan@dell.com 474 Anoop Ghanwani 475 Dell 476 anoop@alumni.duke.edu 478 Dilip Krishnaswamy 479 IBM Research 480 dilikris@in.ibm.com 482 Peter Willis 483 BT 484 peter.j.willis@bt.com