idnits 2.17.1 draft-natarajan-nfvrg-containers-for-nfv-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 5, 2015) is 3097 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'ETSI-NFV-WHITE' is defined on line 438, but no explicit reference was found in the text == Unused Reference: 'ETSI-NFV-REQ' is defined on line 445, but no explicit reference was found in the text == Unused Reference: 'ETSI-NFV-ARCH' is defined on line 449, but no explicit reference was found in the text == Unused Reference: 'ETSI-NFV-TERM' is defined on line 453, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NFVRG S. Natarajan 2 Internet Draft Deutsche Telekom Inc. 3 Category: Informational R. Krishnan 4 Expires: March 2016 A. Ghanwani 5 Dell 6 D. Krishnaswamy 7 IBM Research 8 P. Willis 9 BT 10 A. Chaudhary 11 Verizon 13 Expires: March 2016 October 5, 2015 15 An Analysis of Container-based Platforms for NFV 17 draft-natarajan-nfvrg-containers-for-nfv-01 19 Abstract 21 With the technology advancements in the field of containers, they 22 are considered a potential alternative to virtual machine based 23 implementations. In the area of cloud applications, there are 24 comprehensive studies and early implementations of container based 25 platforms. This draft describes some of the challenges of using 26 virtual machines for NFV workloads and how containers can 27 potentially address these challenges. 29 Status of this Memo 31 This Internet-Draft is submitted to IETF in full conformance with 32 the provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as Internet- 37 Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six 40 months and may be updated, replaced, or obsoleted by other documents 41 at any time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/ietf/1id-abstracts.txt. 47 The list of Internet-Draft Shadow Directories can be accessed at 48 http://www.ietf.org/shadow.html. 50 This Internet-Draft will expire in March 2016. 52 Copyright Notice 54 Copyright (c) 2015 IETF Trust and the persons identified as the 55 document authors. All rights reserved. 57 This document is subject to BCP 78 and the IETF Trust's Legal 58 Provisions Relating to IETF Documents 59 (http://trustee.ietf.org/license-info) in effect on the date of 60 publication of this document. Please review these documents 61 carefully, as they describe your rights and restrictions with 62 respect to this document. 64 Conventions used in this document 66 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 67 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 68 document are to be interpreted as described in RFC 2119. 70 Table of Contents 72 1. Introduction...................................................3 73 2. Challenges in Virtual Machine Implementations..................3 74 2.1. Performance (SLA).........................................3 75 2.1.1. Challenges...........................................3 76 2.2. Continuity/ Elasticity/ Portability.......................4 77 2.2.1. Challenges:..........................................4 78 2.3. Security..................................................5 79 2.3.1. Challenges...........................................5 80 2.4. Management................................................6 81 2.4.1. Challenges...........................................7 82 3. Benefits of Containers.........................................7 83 4. Challenges with Containers and potential solutions.............8 84 5. Conclusion.....................................................9 85 6. Future Work....................................................9 86 7. IANA Considerations............................................9 87 8. Security Considerations........................................9 88 9. Contributors..................................................10 89 10. Acknowledgements.............................................10 90 11. References...................................................10 91 11.1. Normative References....................................10 92 11.2. Informative References..................................10 93 Authors' Addresses...............................................11 95 1. Introduction 97 This draft describes some of the challenges of using virtual 98 machines for NFV workloads and how container-based platforms can 99 potentially address these challenges. It also suggests future work 100 in the area of containers. 102 2. Challenges in Virtual Machine Implementations 104 In this section, we provide our assessment of using virtual machines 105 to host VNFs. We enlist the advantages and limitations of VMs and 106 then discuss some open issues that can potentially be addressed by 107 containers. 109 2.1. Performance (SLA) 110 Performance requirements vary with each VNF type and configuration. 111 The platform should support the specification, realization and 112 runtime adaptation of different performance metrics. Achievable 113 performance can vary depending on several factors such as the 114 workload type, the size of the workload, the set of virtual machines 115 sharing the underlying infrastructure, etc. Here we highlight some 116 of the challenges based on potential deployment considerations. 118 2.1.1. Challenges 120 . VNF provisioning time (including up/down/update) constitutes the 121 time it takes to spin-up the VNF process, its application-specific 122 dependencies, and additional system dependencies. The resource 123 choices such as the hypervisor type, the guest and host OS flavor 124 and the need for hardware and software accelerators, etc., 125 constitute a significant portion of this processing time 126 (instantiation or down time) when compared to just bringing up the 127 actual VNF process. As a result, the provisioning latency is 128 heavily dependent on the optimal choice of infrastructure 129 resources. 131 . The runtime performance (achievable throughput, line rate speed, 132 maximum concurrent sessions that can be maintained, number of new 133 sessions that can be added per second) for each VNF is directly 134 dependent on the amount of resources (e.g., virtual CPUs, RAM) 135 allocated to individual VMs. Choosing the right resource setting 136 is a tricky task. If VM resources are over-provisioned, we end up 137 under-utilizing the physical resources. On the contrary if we 138 under-provision the VM resources, then upgrading the resource to 139 an advanced system setting might require scaling out or scaling up 140 of the resources and re-directing traffic to the new VM; scaling 141 up/down operations consume time and add to the latency. This 142 overhead stems from the need to account resources of components 143 other than the actual VNF process (e.g., guest OS requirements). 145 . If each network function is hosted in individual VMs, then an 146 efficient inter-VM networking solution is required for 147 performance. 149 . Deploying VNF's inside a virtual machine can impose several 150 challenges in meeting Service Level Agreements (SLA). As an 151 example, SLAs demand dynamic fine-tuning (e.g., changing base 152 memory, allocating additional vCPUs) and instantiation of additive 153 features (e.g., integration with hardware and software 154 accelerators) during runtime. In most cases, achieving this task 155 with VMs require snapshotting the current VM state, halting the 156 VM, upgrading the VM with improved features, and re-spinning the 157 VM, all of which have performance implications. 159 2.2. Continuity/ Elasticity/ Portability 161 VNF service continuity can be interrupted due to several factors: 162 undesired state of the VNF (e.g. VNF upgrade progress), underlying 163 hardware failure, and unavailability of virtualized resources, VNF 164 SW failure, etc. Some of the requirements that need consideration 165 are: 167 2.2.1. Challenges: 169 o VM-based VNF's are not completely decoupled from the underlying 170 infrastructure. As discussed in the previous section, most VNFs 171 have a dependency on the guest OS, hypervisor type, accelerator 172 used, and the host OS. Therefore porting VNFs to a new platform 173 might require identifying equivalent resources (e.g., hypervisor 174 support, new hardware model, understanding resource capabilities) 175 and repeating the provisioning steps to bring back the VNF to a 176 working state. 178 o Service continuity requirements can be classified as follows: 179 seamless (with zero impact) or non-seamless continuity (accepts 180 measurable impacts to offered services). Achieving seamless 181 service continuity is harder when VNFs are hosted in VMs, since 182 this requires an efficient high availability solution or a quick 183 restoration mechanism that can bring back the VNF to an 184 operational state. (Note that the need for an efficient high 185 availability solution or quick restoration mechanism is not unique 186 to VM based implementations.) For example, an anomaly caused by a 187 hardware failure can impact all VNFs hosted on that infrastructure 188 resource. To restore the VNF to a working state, the user should 189 first provision the VM (process + guest OS + hypervisor info), 190 spin-up and configure the VNF process inside the VM, setup the 191 interconnects to forward network traffic, manage the VNF-related 192 state, and update any dependent runtime agents. 194 o Addressing the service elasticity challenges require holistic 195 view of the underlying resources. The challenges for presenting a 196 holistic view include the following 198 o Performing Scalable Monitoring: Scalable continuous 199 monitoring of the individual resource's current state is 200 needed to spin-up additional resources (auto-scale or auto- 201 heal) when the system encounters performance degradation or 202 spin-down idle resources to optimize resource usage. 204 o Handling CPU-intensive vs I/O-intensive VNFs: For CPU- 205 intensive VNFs the degradation can primarily depend on the 206 VNF processing functionality. On the other hand, for I/O 207 intense workloads, the overhead is significantly impacted by 208 to the hypervisor features, its type, the number of VMs it 209 manages, the modules loaded in the guest OS etc. 211 2.3. Security 213 Broadly speaking, security can be classified into: 215 o Security features provided by the VNFs to manage the state, and 217 o Security of the VNFs and its resources. 219 Some considerations on the security of the VNF infrastructure are 220 listed here. 222 2.3.1. Challenges 224 o The adoption of virtualization techniques (e.g., para- 225 virtualization, OS-level) for hosting network functions and the 226 deployment need to support multi-tenancy requires secure slicing 227 of the infrastructure resources. In this regard, it is critical to 228 provide a solution that can ensure the following: 230 o Provision the network functions by guaranteeing complete 231 isolation across resource entities (hardware units, 232 hypervisor, virtual networks, etc.). This includes secure 233 access between VM and host interface, VM-VM communication, 234 etc. For maximizing overall resource utilization and 235 improving service agility/elasticity, sharing of resources 236 across network functions must be possible. 238 o When a resource component is compromised, quarantine the 239 compromised entity but ensure service continuity for other 240 resources. 242 o Securely recover from runtime vulnerabilities or attacks and 243 restore the network functions to an operational state. 244 Achieving this with minimal or no downtime is important. 246 Realizing the above requirements is a complex task in any type of 247 virtualization options (virtual machines, containers, etc.) 249 o Resource starvation / Availability: Applications hosted in VMs 250 can starve the underlying physical resources such that co-hosted 251 entities become unavailable. (Note that the resource starvation 252 challenge is not unique to VM based implementations.) Ideally, 253 countermeasures are required to monitor the usage patterns of 254 individual VMs and ensure fair use of individual VM resources. 256 2.4. Management 258 The management and operational aspects are primarily focused on the 259 VNF lifecycle management and its related functionalities. In 260 addition, the solution is required to handle the management of 261 failures, resource usage, state processing, smooth rollouts, and 262 security as discussed in the previous sections. Some features of 263 VM-based management solution include: 265 oCentralized control and visibility: Support for web client, 266 multi-hypervisor management, single sign-on, inventory search, 267 alerts & notifications. 269 oProactive Management: Creating host profiles, resource management 270 of VMs, dynamic resource allocation, auto-restart in HA model, 271 audit trails, patch management. 273 oExtensible platform: Define roles, permissions and licenses 274 across resources and use of APIs to integrate with other 275 solutions. 277 Thus, the key requirements for a management solution 278 o Simple to operate and deploy VNFs. 280 o Uses well-defined standard interfaces to integrate seamlessly 281 with different vendor implementations. 283 o Creates functional automation to handle VNF lifecycle 284 requirements. 286 o Provide APIs that abstracts the complex low-level information 287 from external components. 289 o Is secure. 291 2.4.1. Challenges 293 The key challenge is addressing the aforementioned requirements for 294 a management solution while dealing with the multi-dimensional 295 complexity introduced by the hypervisor, guest OS, VNF 296 functionality, and the state of network. 298 3. Benefits of Containers 300 . Containers (when compared to VMs) can provide better service 301 agility as it allows us to run the VNF process directly in the 302 host environment. This eliminates the provisioning and processing 303 delay associated with spinning up (or down/update) guest OS, 304 kernel driver association, and hypervisor processing time. This 305 facilitates meeting the SLA requirements of different VNFs. The 306 placement problem for finding a container that is running on 307 hardware of a certain type, e.g. hardware with certain offloads, 308 remains to be addressed. 310 . Containers share the host OS and only require resource 311 allocation for the individual VNF process which usually results in 312 better runtime performance when compared to VMs. 314 . With containers, the inter-VNF communication latency depends on 315 the inter-process communication option (when hosted in the same 316 host) such as bridge mode, sharing the host's network stack, 317 sharing network namespace between containers, etc. or the 318 networking solution (e.g., network overlays, virtualization, etc.) 319 used between clusters of nodes (when VNFs are hosted across 320 multiple nodes). This eliminates the overhead introduced by the 321 guest OS's network stack, as long as the containerization 322 technology provides sufficient isolation between containers. 324 . Auto-scaling VNFs or achieving service elasticity in runtime can 325 be simplified by the use of container based VNFs due to the 326 lightweight resource usage of containers. Using containers can 327 simplify the allocation of additional resources to existing 328 containers or quickly spinning up alternate containers, as it only 329 requires booting the VNF process and handling the state transition 330 associated with it. This can significantly reduce the downtime or 331 upgrade time. 333 . Some container management solutions (e.g., Kubernetes 334 [KUBERNETES-SELF-HEALING]) provide self-healing features such as 335 auto-placement, restart, and replacement by using a service 336 discovery mechanism and continuously monitoring the health of 337 individual or group of containers. When a container process 338 encounters a failure, the platform auto detects the issue and 339 seamlessly recovers from failures. This can address some of the 340 service continuity requirements needed in VNF deployments. 342 4. Challenges with Containers and potential solutions 344 . Resource Management/Isolation/Security: Containers create a slice 345 of the underlying host using techniques like namespaces, cgroups, 346 chroot etc. However, there are several other kernel features that 347 are not completely isolated from the processes running inside 348 containers. This can allow a vulnerable container to compromise 349 the host or containers belonging to other users (e.g., resource 350 starvation). 352 oPotential Solution: Guaranteeing complete isolation across 353 entities requires an efficient access control mechanism and 354 resource quota mechanism. Usage of kernel security modules 355 like SELinux [SELINUX], AppArmor [APPARMOR] along with 356 containers can provide the required features for a secure VNF 357 deployment. Usage of resource quota techniques such as those 358 in Kubernetes [KUBERNETES-RESOURCE-QUOTA] can provide the 359 typical resource guarantees for a VNF deployment. 360 Additionally, a hybrid deployment with VMs and containers can 361 be envisioned depending on the degree of isolation needed 362 between VNFs. 364 . Cross-VNF compatibility and Operating System dependency: As of 365 today, containers are supported in selective operating systems 366 such as Linux, Windows and Solaris. On the other hand, in the 367 current range of VNFs, many don't support Linux OS or other OSes 368 such as Windows and Solaris. Depending on the nature of the 369 software associated with VNFs, and the libraries installed inside 370 a container, and the underlying OS version that a container 371 utilizes, some VNFs may not be compatible with other VNFs. 373 oPotential Solution: A hybrid deployment with VMs and 374 containers can be envisioned to address this problem. The 375 VNFs which don't run on container supported OSes can be run 376 in VMs. Additionally, one could envision each set of 377 compatible VNFs running within a specific VM, with different 378 sets of VNFs running on different VMs, where the VMs run on a 379 hypervisor. A notable additional challenge in this solution 380 is state transfer between containers and virtual machines, 381 including but not limited to latency, interoperability, etc. 383 . Overall Performance: Unlike VMs, containers can run directly on 384 the host OS and thus exhibit significant performance benefits. As 385 an example, the whitepaper [VCPE-CONTAINER-PERF] demonstrates ~25% 386 throughput improvement for TCP traffic for a Virtual Enterprise 387 Customer Premises Equipment (vE-CPE) use case as described in 388 [ETSI-NFV-USE-CASES]; the environments which were compared were 389 containers using LXC and VM using KVM. 391 5. Conclusion 393 The use of containers for VNFs appears to have significant 394 advantages compared to using VMs and hypervisors especially for 395 efficiency and performance. With this background, the authors urge 396 the industry to address the future work areas, especially solutions 397 for the challenges, as described in Section 4 and consider 398 container-based VNFs in real deployments beyond proof-of-concepts. 400 6. Future Work 402 Opportunistic areas for future work include but not limited to 403 developing solutions to address the challenges in VNF 404 containerization described in Section 3, distributed micro-service 405 network functions, etc. 407 7. IANA Considerations 409 This draft does not have any IANA considerations. 411 8. Security Considerations 413 VM-based VNFs can offer a greater degree of isolation and security 414 due to technology maturity as well as hardware support. Since 415 container-based VNFs provide abstraction at the OS level, it can 416 introduce potential vulnerabilities in the system when deployed 417 without proper OS-level security features. This is one of the key 418 implementation/deployment challenges that needs to be further 419 investigated. 421 In addition, as containerization technologies evolve to leverage the 422 virtualization capabilities provided by hardware, they can provide 423 isolation and security assurances similar to VMs. 425 9. Contributors 427 10. Acknowledgements 429 The authors would like to thank Vineed Konkoth for the Virtual 430 Customer CPE Container Performance white paper. 432 11. References 434 11.1. Normative References 436 11.2. Informative References 438 [ETSI-NFV-WHITE] "ETSI NFV White Paper," 439 http://portal.etsi.org/NFV/NFV_White_Paper.pdf 441 [ETSI-NFV-USE-CASES] "ETSI NFV Use Cases," 442 http://www.etsi.org/deliver/etsi_gs/NFV/001_099/001/01.01.01_60/gs_N 443 FV001v010101p.pdf 445 [ETSI-NFV-REQ] "ETSI NFV Virtualization Requirements," 446 http://www.etsi.org/deliver/etsi_gs/NFV/001_099/004/01.01.01_60/gs_N 447 FV004v010101p.pdf 449 [ETSI-NFV-ARCH] "ETSI NFV Architectural Framework," 450 http://www.etsi.org/deliver/etsi_gs/NFV/001_099/002/01.01.01_60/gs_N 451 FV002v010101p.pdf 453 [ETSI-NFV-TERM] "Terminology for Main Concepts in NFV," 454 http://www.etsi.org/deliver/etsi_gs/NFV/001_099/003/01.01.01_60/gs_n 455 fv003v010101p.pdf 457 [KUBERNETES-RESOURCE-QUOTA] "Kubernetes Resource Quota," 458 http://kubernetes.io/v1.0/docs/admin/resource-quota.html 460 [KUBERNETES-SELF-HEALING] "Kubernetes Design Overview," 461 http://kubernetes.io/v1.0/docs/design/README.html 463 [SELINUX] "Security Enhanced Linux (SELinux) project," 464 http://selinuxproject.org/ 466 [APPARMOR] "Mandatory Access Control Framework," 467 https://wiki.debian.org/AppArmor 469 [VCPE-CONTAINER-PERF] "Virtual Customer CPE Container Performance 470 White Paper," http://info.ixiacom.com/rs/098-FRB-840/images/Calsoft- 471 Labs-CaseStudy2015.pdf 473 Authors' Addresses 475 Sriram Natarajan 476 Deutsche Telekom Inc. 477 sriram.natarajan@telekom.com 479 Ram (Ramki) Krishnan 480 Dell 481 ramki_krishnan@dell.com 483 Anoop Ghanwani 484 Dell 485 anoop@alumni.duke.edu 487 Dilip Krishnaswamy 488 IBM Research 489 dilikris@in.ibm.com 491 Peter Willis 492 BT 493 peter.j.willis@bt.com 495 Ashay Chaudhary 496 Verizon 497 ashay.chaudhary@verizon.com