idnits 2.17.1 draft-pedro-anticipated-adaptation-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFVRG P. Martinez-Julia, Ed. 3 Internet-Draft NICT 4 Intended status: Informational 2 20, 2018 5 Expires: August 24, 2018 7 Exploiting External Event Detectors to Anticipate Resource Requirements 8 for the Elastic Adaptation of SDN/NFV Systems 9 draft-pedro-anticipated-adaptation-00 11 Abstract 13 The adoption of SDN/NFV technologies by current computer and network 14 system infrastructures is constantly increasing, becoming essential 15 for the the particular case of edge/branch network systems. The 16 systems supported by these infrastructures require to be adapted to 17 environment changes within a short period of time. Thus, the 18 complexity of new systems and the speed at which management and 19 control operations must be performed go beyond human limits. Thus, 20 management systems must be automated. However, in several situations 21 current automation techniques are not enough to respond to 22 requirement changes. Here we propose to anticipate changes in the 23 operation environments of SDN/NFV systems in response to external 24 events and reflect it in the anticipation of the amount of resources 25 required by those systems for their ulterior adaptaion. The final 26 objective is to avoid service degradation or disruption while keeping 27 close-to-optimum resource allocation to reduce monetary and operative 28 cost as much as possible. Here we discuss how to achieve such 29 capabilities by the integration of the Autonomic Resource Control 30 Architecture (ARCA) to the management and operation (MANO) of NFV 31 systems. We showcase it by building a multi-domain SDN/NFV 32 infrastructure based on OpenStack and deploying ARCA to adapt a 33 virtual system based on the edge/branch network concept to the 34 operational conditions of an emergency support service, which is 35 rarely used but that cannot leave any user unattended. 37 Status of This Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at https://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on August 24, 2018. 54 Copyright Notice 56 Copyright (c) 2018 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (https://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 72 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 3.1. Virtual Computer and Network Systems . . . . . . . . . . 4 75 3.2. SDN and NFV . . . . . . . . . . . . . . . . . . . . . . . 5 76 3.3. Management and Control . . . . . . . . . . . . . . . . . 5 77 3.4. The Autonomic Resource Control Architecture (ARCA) . . . 6 78 4. External Event Detectors . . . . . . . . . . . . . . . . . . 7 79 5. Anticipating Requirements . . . . . . . . . . . . . . . . . . 8 80 6. ARCA Integration With ETSI-NFV-MANO . . . . . . . . . . . . . 8 81 6.1. Functional Integration . . . . . . . . . . . . . . . . . 9 82 6.2. Target Experiment and Scenario . . . . . . . . . . . . . 11 83 6.3. OpenStack Platform . . . . . . . . . . . . . . . . . . . 13 84 6.4. Initial Results . . . . . . . . . . . . . . . . . . . . . 14 85 7. Relation to Other IETF/IRTF Initiatives . . . . . . . . . . . 17 86 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 87 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 88 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 89 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 90 11.1. Normative References . . . . . . . . . . . . . . . . . . 18 91 11.2. Informative References . . . . . . . . . . . . . . . . . 18 92 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 19 94 1. Introduction 96 The incorporation of Software Defined Networking (SDN) and Network 97 Function Virtualization (NFV) to current infrastructures to build 98 virtual computer and network systems is constantly increasing. The 99 need to automate the management and control of such systems has 100 motivated us to design the Autonomic Resource Control Architecture 101 (ARCA), as presented in ICIN 2018 [ICIN-2018]. Automation 102 requirements are enough justified by the increasing size and 103 complexity of systems, which in turn are essential in the current 104 digital world. Moreover, the particular requirements and market 105 benefits of network virtualization have been crystallized in the 106 uprising of SDN/NFV infrastructures. Nowadays they broad reception 107 of the combined SDN/NFV technology supposes a huge leap towards the 108 empowerment and homogenization of virtualization technologies. 109 Therefore, we have modeled ARCA to fit within the reference 110 architecture for management and orchestration of NFV elements, the 111 Virtual Network Functions (VNFs). 113 Behind the scenes, NFV is based on a highly distributed and network 114 empowered version of the well-known Cloud infrastructures and 115 platforms, also complemented by their centralized counterparts. This 116 takes to virtual networks the high degree of flexibility already 117 found for computer systems. It is highly desirable at the time NFV 118 is being exploited by many organizations to build their private 119 infrastructures, as well as by network service providers to build the 120 services they later commercialize. However, to actually exploit the 121 potential monetary and operative cost reduction that is associated to 122 such infrastructures, the amount of resources used by production 123 services must be kept close to the optimum, so the physical resources 124 are exploited as much as possible. 126 The fast detection of changes in the requirements of the virtual 127 systems deployed on the aforementioned SDN/NFV infrastructures, and 128 the consequent adaptation of allocated resources to the new 129 situations, becomes essential to actually exploit their cost and 130 operative benefits, while also avoiding service unresponsiveness due 131 to underlying resource overloading. It is widely accepted that the 132 size and complexity of systems and services makes it difficult for 133 humans to accomplish such task within their objective time 134 boundaries. Therefore, they must be automated. Luckily, the 135 architecture and underlying platforms supporting the SDN/NFV 136 technologies enable the required automation. In fact, some solutions 137 already exist to perform several batched or scripted tasks without 138 human intervention. However, those solutions still have high 139 dependences on low-level human involvement. This remarks the 140 challenge found in control and management automation, which is 141 continuously revised and enlarged. 143 ARCA provides as a small step towards the resolution of the 144 aforementioned problem. It advances the State of the Art in 145 automation of resource control and management by providing a 146 supervised but autonomous mechanism that reduces the time required to 147 perform corrective and/or adaptive changes in virtual computer and 148 network systems from hours/minutes to seconds/milliseconds. 149 Moreover, it is able to take advantage of the event notifications 150 provided by external detectors to anticipate the amount of resources 151 that the controlled SDN/NFV system will require in response to such 152 event. We propose to bring such benefit to the reference 153 architecture promoted by ETSI for the management and orchestration of 154 NFV services (see ETSI-NFV-MANO [ETSI-NFV-MANO]) by integrating ARCA 155 as the Virtual Infrastructure Manager (VIM). We showcase this 156 proposal by discussing the evaluation results obtained by ARCA when 157 runnion on a real and physical experimentation infrastructure based 158 on OpenStack [OPENSTACK]. We thus justify the need to adapt the 159 interfaces supported by the NFV-MANO to include real-world event 160 detectors, which are external to the virtualization platform and 161 virtual resources. 163 2. Terminology 165 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 166 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 167 document are to be interpreted as described in RFC 2119 [RFC2119]. 169 3. Background 171 3.1. Virtual Computer and Network Systems 173 The continuous search for efficiency and cost reduction to get the 174 most optimum exploitation of available resources (e.g. CPU power and 175 electricity) has conducted current physical infrastructures to move 176 towards virtualization infrastructures. Also, this trend enables end 177 systems to be centralized and/or distributed, so that they are 178 deployed to best accomplish customer requirements in terms of 179 resources and qualities. 181 One of the key functional requirements imposed to computer and 182 network virtualization is a high degree of flexibility and 183 reliability. Both qualities are subject to the underlying 184 technologies but, while the latter has been always enforced to 185 computer and network systems, flexibility is a relatively new 186 requirement, which whould not have been impossed without the backing 187 of virtualization and cloud technologies. 189 3.2. SDN and NFV 191 SDN and NFV are conceived to bring high degree of flexibility and 192 conceptual centralization qualities to the network. On the one hand, 193 with SDN, the network can be programmed to implement a dynamic 194 behavior that changes its topology and overall qualities. Moreover, 195 with NFV the functions that are typically provided by physical 196 network equipment are now implemented as virtual appliances that can 197 be deployed and linked together to provide customized network 198 services. SDN and NFV complements to each other to actually 199 implement the network aspect of the aforementioned virtual computer 200 and network systems. 202 Although centralization can lead us to think on the single-point-of- 203 failure concept, it is not the case for these technologoes. 204 Conceptual centralization highly differs from centralized deployment. 205 It brings all benefits from having a single point of decision but 206 retaining the benefits from distributed systems. For instance, 207 control decisions in SDN can be centralized while the mechanisms that 208 enforce such decisions into the network (SDN controllers) can be 209 implemented as highly distributed systems. The same approach can be 210 applied to NFV. Althoug network functions can be implemented in a 211 central computing facility, they can take advantage of several 212 replication and distribution techniques to achieve the properties of 213 distributed systems. Nevertheless, NFV also allows the deployment of 214 functions on top of distributed systems, so they benefit from both 215 distribution alternatives at the same time. 217 3.3. Management and Control 219 The introduction of virtualization into the computer and network 220 system landscape has increased the complexity of both underlying and 221 overlying systems. On the one hand, virtualyzing underlying systems 222 adds extra functions that must be managed propoerly to ensure the 223 correct operation of the whole system, which not just encompasses 224 underlying elements but also the virtual elements running on top of 225 them. Such functions are used to actually host the overlying virtual 226 elements, so there is an indirect management operation that involves 227 virtual systems. Moreover, such complexities are inherited by final 228 systems that get virtualized and deployed on top of those 229 virtualization infrastructures. 231 In parallel, virtual systems are empowered with additional, and 232 widely exploited, functionality that must be managed correctly. It 233 is the case of the dynamic adaptation of virtual resources to the 234 specific needs of their operation environments, or even the 235 composition of distributed elements across heterogeneous underlying 236 infrastructures, and probably providers. 238 Taking both complex functions into account, either separately or 239 jointly, makes clear that management requirements have greatly 240 supassed the limits of humans, so automation has become essential to 241 accomplish most common tasks. 243 3.4. The Autonomic Resource Control Architecture (ARCA) 245 As deeply discussed in ICIN 2018 [ICIN-2018], ARCA leverages the 246 elastic adaptation of resources assigned to virtual computer and 247 network systems by calculating or estimating their requirements from 248 the analysis of load measurements and the detection of external 249 events. These events can be notified by physical elements (things, 250 sensors) that detect changes on the environment, as well as software 251 elements that analyze digital information, such as connectors to 252 sources or analyzers of Big Data. For instance, ARCA is able to 253 consider the detection of an earthquake or a heavy rainfall to 254 overcome the damages it can make to the controlled system. 256 The policies that ARCA must enforce will be specified by 257 administrators during the configuration of the control/management 258 engine. Then, ARCA continues running autonomously, with no more 259 human involvement unless some parameter must be changed. ARCA will 260 adopt the required control and management operations to adapt the 261 controlled system to the new situation or requirements. The main 262 goal of ARCA is thus to reduce the time required for resource 263 adaptation from hours/minutes to seconds/milliseconds. With the 264 aforementioned statements, system administrators are able to specify 265 the general operational boundaries in terms of lower and upper system 266 load thresholds, as well as the minimum and maximum amount of 267 resources that can be allocated to the controlled system to overcome 268 any eventual situation, including the natural crossing of such 269 thresholds. 271 ARCA functional goal is to run autonomously while the performance 272 goal is to keep the resources assigned to the controlled resources as 273 close as possible to the optimum (e.g. 5 % from the optimum) while 274 avoiding service disruption as much as possible, keeping client 275 request discard rate as low as possible (e.g. below 1 %). To achieve 276 both goals, ARCA relies on the Autonomic Computing (AC) paradigm, in 277 the form of interconnected micro-services. Therefore, ARCA includes 278 the four main elements and activities defined by AC, incarnated as: 280 Collector Is responsible of gathering and formatting the 281 heterogeneous observations that will be used in the control 282 cycle. 284 Analyzer Correlates the observations to each other in order to find 285 the situation of the controlled system, especially the 286 current load of the resources allocated to the system and 287 the occurrence of an incident that can affect to the normal 288 operation of the system, such as an earthquake that 289 increases the traffic in an emergency-support system, which 290 is the main target scenario studied in this paper. 292 Decider Determines the necessary actions to adjust the resources to 293 the load of the controlled system. 295 Enforcer Requests the underlying and overlying infrastructure, such 296 as OpenStack, to make the necessary changes to reflect the 297 effects of the decided actions into the system. 299 Being a micro-service architecture means that the different 300 components are executed in parallel. This allows such components to 301 operate in two ways. First, their operation can be dispatched by 302 receiving a message from the previous service or an external service. 303 Second, the services can be self-dispatched, so they can activate 304 some action or send some message without being previously stimulated 305 by any message. The overall control process loops indefinitely and 306 it is closed by checking that the expected effects of an action are 307 actually taking place. The coherence among the distributed services 308 involved in the ARCA control process is ensured by enforcing a common 309 semantic representation and ontology to the messages they exchange. 311 ARCA semantics are built with the Resource Description Framework 312 (RDF) and the Web Ontology Language (OWL), which are well known and 313 widely used standards for the semantic representation and management 314 of knowledge. They provide the ability to represent new concepts 315 without requiring to change the software, just plugin extensions to 316 the ontology. ARCA stores all its knowledge is stored in the 317 Knowledge Base (KB), which is queried and kept up-to-date by the 318 analyzer and decider micro-services. It is implemented by Apache 319 Jena Fuseki, which is a high-performance RDF data store that supports 320 SPARQL through an HTTP/REST interface. Being de-facto standards, 321 both technologies enable ARCA to be easily integrated to 322 virtualization platforms like OpenStack. 324 4. External Event Detectors 326 As mentioned above, current mechanisms used to achieve automated 327 management and control rely only on the continuous monitoring of the 328 resources they control or the underlying infrastructure that host 329 them. However, there are several other sources of information that 330 can be exploited to make the systems more robust and efficient. It 331 is the case of the notifications that can be provided by physical or 332 virtual elements or devices that are watching for specific events, 333 hence called external event detectors. 335 More specifically, although the notifications provided by these 336 external event detectors are related to successes that occur outside 337 the boundaries of the controlled system, such successes can affect 338 the typical operation of controlled systems. For instance, a heavy 339 rainfall or snowfall can be detected and correlated to a huge 340 increase in the amount of requests experienced by some emergency 341 support service. 343 5. Anticipating Requirements 345 One of the main goals of the MANO mechanisms is to ensure the virtual 346 computer and network system they manage meets the requirements 347 established by their owners and administrators. It is currently 348 achieved by observing and analyzing the performance measurements 349 obtained either by directly asking the resources forming the managed 350 system of by asking the controllers of the underlying infrastructure 351 that hosts such resources. Thus, under changing or eventual 352 situations, the managed system must be adapted to cope with the new 353 requirements, incrasing the amount of resources assigned to it, or to 354 make efficient use of available infrastructures, reducing the amount 355 of resources assigned to it. 357 However, the time required by the infrastructure to make effective 358 the adaptations requested by the MANO mechanisms is longer than the 359 time required by client requests to overload the system and make it 360 discard further client requests. This situation is generally 361 undesired but particularly dangerous for some systems, such as the 362 emergency support system mentioned above. Therefore, in order to 363 avoid the disruption of the service, the change in requirements must 364 be anticipated to ensure that any adaptation has finished as soon as 365 possible, preferably before the target system gets overloaded or 366 underloaded. 368 Here we propose to integrate ARCA with NFV-MANO to take advantage of 369 the notifications provided by the aforementioned external event 370 detectors, by correlating them to the target amount of resources 371 required by the managed system and enforcing the necessary 372 adaptations beforehand, particularly before the system performance 373 metrics have actually changed. 375 6. ARCA Integration With ETSI-NFV-MANO 377 In this section we describe how to fit ARCA on a general SDN/NFV 378 underlying infrastructure and introduce a showcase experiment that 379 demonstrates its operation on an OpenStack-based experimentation 380 platform. We first describe the integration of ARCA with the NFV- 381 MANO reference architecture. We contextualize the significance of 382 this integration by describing an emergency support scenario that 383 clearly benefits from it. Then we proceed to detail the elements 384 forming the OpenStack platform and finally we discuss some initial 385 results obtained from them. 387 6.1. Functional Integration 389 The most important functional blocks of the NFV reference 390 architecture promoted by ETSI (see ETSI-NFV-MANO [ETSI-NFV-MANO]) are 391 the system support functions for operations and business (OSS/BSS), 392 the element management (EM) and, obviously. the Virtual Network 393 Functions (VNFs). But these functions cannot exist without being 394 instantiated on a specific infrastructure, the NFV infrastructure 395 (NFVI), and all of them must be coordinated, orchestrated, and 396 managed by the general NFV-MANO functions. 398 Both the NFVI and the NFV-MANO elements are subdivided into several 399 sub-components. The NFVI has the underlying physical computing, 400 storage, and network resources, which are sliced (seedraft-qiang- 401 coms-netslicing-information-model-02 402 [draft-qiang-coms-netslicing-information-model-02] and draft-geng- 403 coms-architecture-01 [draft-geng-coms-architecture-01]) and 404 virtualized to conform the virtual computing, storage, and network 405 resources that will host the VNFs. In addition, the NFV-MANO is 406 subdivided in the NFV Orchestrator (NFVO), the VNF manager (VNFM) and 407 the Virtual Infrastructure Manager (VIM). As their name indicates, 408 all high-level elements and sub-components have their own and very 409 specific objective in the NFV architecture. 411 During the design of ARCA we enforced both operational and 412 interfacing aspects to its main objectives. From the operational 413 point of view, ARCA processes observations to manage virtual 414 resources, so it plays the role of the VIM mentioned above. 415 Therefore, ARCA has been designed with appropriate interfaces to fit 416 in the place of the VIM. This way, ARCA provides the NFV reference 417 architecture with the ability to react to external events to adapt 418 virtual computer and network systems, even anticipating such 419 adaptations as performed by ARCA itself. However, some interfaces 420 must be extended to fully enable ARCA to perform its work within the 421 NFV architecture. 423 Once ARCA is placed in the position of the VIM, it enhances the 424 general NFV architecture with its autonomic management capabilities. 425 In particular, it discharges some responsibilities from the VNFM and 426 NFVO, so they can focus on their own business while the virtual 427 resources are behaving as they expect (and request). Moreover, ARCA 428 improves the scalability and reliability of the managed system in 429 case of disconnection from the orchestration layer due to some 430 failure, network split, etc. It is also achieved by the autonomic 431 capabilities, which, as described above, are guided by the rules and 432 policies specified by the administrators and, here, communicated to 433 ARCA through the NFVO. However, ARCA will not be limited to such 434 operation so, more generally, it will accomplish the requirements 435 established by the Virtual Network Operators (VNOs), which are the 436 owners of the slice of virtual resources that is managed by a 437 particular instance of NFV-MANO, and therefore ARCA. 439 In addition to the operational functions, ARCA incorporates the 440 necessary mechanisms to engage the interfaces that enable it to 441 interact with other elements of the NFV-MANO reference architecture. 442 More specifically, ARCA is bound to the Or-Vi (see ETSI-NFV-IFA-005 443 [ETSI-NFV-IFA-005]) and the Nf-Vi (see ETSI-NFV-IFA-004 444 [ETSI-NFV-IFA-004] and ETSI-NFV-IFA-019 [ETSI-NFV-IFA-019]). The 445 former is the point of attachment between the NFVO and the VIM while 446 the latter is the point of attachment between the NFVI and the VIM. 447 In our current design we decided to avoid the support for the point 448 of attachment between the VNFM and the VIM, called Vi-Vnfm (see ETSI- 449 NFV-IFA-006 [ETSI-NFV-IFA-006]). We leave it for future evolutions 450 of the proposed integration, that will be enabled by a possible 451 solution that provides the functions of the VNFM required by ARCA. 453 Through the Or-Vi, ARCA receives the instructions it will enforce to 454 the virtual computer and network system it is controlling. As 455 mentioned above, these are specified in the form of rules and 456 policies, which are in turn formatted as several statements and 457 embedded into the Or-Vi messages. In general, these will be high- 458 level objectives, so ARCA will use its reasoning capabilities to 459 translate them into more specific, low-level objectives. For 460 instance, the Or-Vi can specify some high-level statement to avoid 461 CPU overloading and ARCA will use its innate and acquired knowledge 462 to translate it to specific statements that specify which parameters 463 it has to measure (CPU load from assigned servers) and which are 464 their desired boundaries, in the form of high threshold and low 465 threshold. Moreover, the Or-Vi will be used by the NFVO to specify 466 which actions can be used by ARCA to overcome the violation of the 467 mentioned policies. 469 All information flowing the Or-Vi interface is encoded and formatted 470 by following a simple but highly extensible ontology and exploiting 471 the aforementioned semantic formats. This ensures that the 472 interconnected system is able to evolve, including the replacement of 473 components, updating (addition or removal) the supported concepts to 474 understand new scenarios, and connecting external tools to further 475 enhance the management process. The only requirement to ensure this 476 feature is to ensure that all elements support the mentioned ontology 477 and semantic formats. Although it is not a finished task, the 478 development of semantic technologies allows the easy adaptation and 479 translation of existing information formats, so it is expected that 480 more and more software pieces become easily integrable with the ETSI- 481 NFV-MANO [ETSI-NFV-MANO] architecture. 483 In contrast to the Or-Vi interface, the Nf-Vi interface exposes more 484 precise and low-level operations. Although this makes it easier to 485 be integrated to ARCA, it also makes it to be tied to specific 486 implementations. In other words, building a proxy that enforces the 487 aforementioned ontology to different interface instances to 488 homogenize them adds undesirable complexity. Therefore, new 489 components have been specifically developed for ARCA to be able to 490 interact with different NFVIs. Nevertheless, this specialization is 491 limited to the collector and enforcer. Moreover, it allows ARCA to 492 have optimized low-level operations, with high improvement of the 493 overall performance. This is the case of the specific 494 implementations of the collector and enforcer used with Mininet and 495 Docker, which are used as underlying infrastructures in previous 496 experiments described in ICIN 2017 [ICIN-2017]. Moreover, as 497 discussed in the following section, this is also the case of the 498 implementations of the collector and enforcer tied to OpenStack 499 telemetry and compute interfaces, respectively. 501 Although OpenStack still lacks some functionality regarding the 502 construction of specific virtual networks, we use it as the NFVI 503 functional block in the integrated approach. Therefore, OpenStack is 504 the provider of the underlying SDN/NFV infrastructure and we 505 exploited its APIs and SDK to achieve the integration. More 506 specifically, in our showcase we use the APIs provided by Ceilometer, 507 Gnocchi, and Compute services as well as the SDK provided for Python. 508 All of them are gathered within the Nf-Vi interface. Moreover, we 509 have extended the Or-Vi interface to connect external elements, such 510 as the physical or environmental event detectors and Big Data 511 connectors, which is becoming a mandatory requirement of the current 512 virtualization ecosystem and it conforms our main extension to the 513 NFV architecture. 515 6.2. Target Experiment and Scenario 517 From the beginning of our work on the design of ARCA we are targeting 518 real-world scenarios, so we get better suited requirements. In 519 particular we work with a scenario that represents an emergency 520 support service that is hosted on a virtual computer and network 521 system, which is in turn hosted on the distributed virtualization 522 infrastructure of a medium-sized organization. The objective is to 523 clearly represent an application that requires high dynamicity and 524 high degree of reliability. The emergency support service 525 accomplishes this by being barely used when there is no incident but 526 also being heavily loaded when there is an incident. 528 Both the underlying infrastructure and virtual network share the same 529 topology. They have four independent but interconnected network 530 domains that form part of the same administrative domain 531 (organization). The first domain hosts the systems of the 532 headquarters (HQ) of the owner organization, so the VNFs it hosts 533 (servants) implement the emergency support service. We defined them 534 as ``servants'' because they are Virtual Machine (VM) instances that 535 work together to provide a single service by means of backing the 536 Load Balancer (LB) instances deployed in the separate domains. The 537 amount of resources (servants) assigned to the service will be 538 adjusted by ARCA, attaching or detaching servants to meet the load 539 boundaries specified by administrators. 541 The other domains represent different buildings of the organization 542 and will host the clients that access to the service when an incident 543 occurs. They also host the necessary LB instances, which are also 544 VNFs that are controlled by ARCA to regulate the access of clients to 545 servants. All domains will have physical detectors to provide 546 external information that can (and will) be correlated to the load of 547 the controlled virtual computer and network system and thus will 548 affect to the amount of servants assigned to it. Although the 549 underlying infrastructure, the servants, and the ARCA instance are 550 the same as those those used in the real world, both clients and 551 detectors will be emulated. Anyway, this does not reduce the 552 transferability of the results obtained from our experiments as it 553 allows to expand the amount of clients beyond the limits of most 554 physical infrastructures. 556 Each underlying OpenStack domain will be able to host a maximum of 557 100 clients, as they will be deployed on a low profile virtual 558 machine (flavor in OpenStack). In general, clients will be 559 performing requests at a rate of one request every ten seconds, so 560 there would be a maximum of 30 requests per second. However, under 561 the simulated incident, the clients will raise their load to reach a 562 common maximum of 1200 requests per second. This mimics the shape 563 and size of a real medium-size organization of about 300 users that 564 perform a maximum of four requests per second when they need some 565 support. 567 The topology of the underlying network is simplified by connecting 568 the four domains to the same, high-performance switch. However, the 569 topology of the virtual network is built by using direct links 570 between the HQ domain and the other three domains. These are 571 complemented by links between domains 2 and 3, and between domains 3 572 and 4. This way, the three domains have three paths to reach the HQ 573 domain: a direct path with just one hop, and two indirect paths with 574 two and three hops, respectively. 576 During the execution of the experiment, the detectors notify the 577 incident to the controller as soon as it happens. However, although 578 the clients are stimulated at the same time, there is some delay 579 between the occurrence of the incident and the moment the network 580 service receives the increase in the load. One of the main targets 581 of our experiment is to study such delay and take advantage of it to 582 anticipate the amount of servants required by the system. We discuss 583 it below. 585 In summary, this scenario highlights the main benefits of ARCA to 586 play the role of VIM and interacting with the underlying OpenStack 587 platform. This means the advancement towards an efficient use of 588 resources and thus reducing the CAPEX of the system. Moreover, as 589 the operation of the system is autonomic, the involvement of human 590 administrators is reduced and, therefore, the OPEX is also reduced. 592 6.3. OpenStack Platform 594 The implementation of the scenario described above reflects the 595 requirements of any edge/branch networking infrastructure, which are 596 composed of several distributed micro-data-centers deployed on the 597 wiring centers of the buildings and/or storeys. We chose to use 598 OpenStack to meet such requirements because it is being widely used 599 in production infrastructures and the resulting infrastructure will 600 have the necessary robustness to accomplish our objectives, at the 601 time it reflects the typical underlying platform found in any SDN/NFV 602 environment. 604 We have deployed four separate network domains, each one with its own 605 OpenStack instantiation. All domains are totally capable of running 606 regular OpenStack workload, i.e. executing VMs and networks, but, as 607 mentioned above, we designate the domain 1 to be the headquarters of 608 the organization. The different underlying networks required by this 609 (quite complex) deployment are provided by several VLANs within a 610 high-end L2 switch. This switch represents the distributed network 611 of the organization. Four separated VLANs are used to isolate the 612 traffic within each domain, by connecting an interface of OpenStack's 613 controller and compute nodes. These VLANs therefore form the 614 distributed data plane. Moreover, other VLAN is used to carry the 615 control plane as well as the management plane, which are used by the 616 NFV-MANO, and thus ARCA. It is instantiated in the physical machine 617 called ARCA Node, to exchange control and management operations in 618 relation to the collector and enforcer defined in ARCA. This VLAN is 619 shared among all OpenStack domains to implement the global control of 620 the virtualization environment pertaining to the organization. 621 Finally, other VLAN is used by the infrastructure to interconnect the 622 data planes of the separated domains and also to allow all elements 623 of the infrastructure to access the Internet to perform software 624 installation and updates. 626 Installation of OpenStack is provided by the Red Hat OpenStack 627 Platform, which is tightly dependent on the Linux operating system 628 and closely related to the software developed by the OpenStack Open 629 Source project. It provides a comprehensive way to install the whole 630 platform while being easily customized to meet our specific 631 requirements, while it is also backed by operational quality support. 633 The ARCA node is also based on Linux but, since it is not directly 634 related to the OpenStack deployment, it is not based on the same 635 distribution. It is just configured to be able to access the control 636 and management interfaces offered by OpenStack, and therefore it is 637 connected to the VLAN that hosts the control and management planes. 638 On this node we deploy the NFV-MANO components, including the micro- 639 services that form an ARCA instance. 641 In summary, we dedicate nine physical computers to the OpenStack 642 deployment, all are Dell PowerEdge R610 with 2 x Xeon 5670 2.96 GHz 643 (6 core / 12 thread) CPU, 48 GiB RAM, 6 x 146 GiB HD at 10 kRPM, and 644 4 x 1 GE NIC. Moreover, we dedicate an additional computer with the 645 same specification to the ARCA Node. We dedicate a less powerful 646 computer to implement the physical router because it will not be 647 involved in the general execution of OpenStack nor in the specific 648 experiments carried out with it. Finally, as detailed above, we 649 dedicate a high-end physical switch, an HP ProCurve 1810G-24, to 650 build the interconnection networks. 652 6.4. Initial Results 654 Using the platform described above we execute an initial but long- 655 lasting experiment based on the target scenario introduced at the 656 beginning of this section. The objective of this experiment is 657 twofold. First, we aim to demonstrate how ARCA behaves in a real 658 environment. Second, we aim to stress the coupling points between 659 ARCA and OpenStack, which will raise the limitations of the existing 660 interfaces. 662 With such objectives in mind, we define a timeline that will be 663 followed by both clients and external event detectors. It forces the 664 virtualized system to experience different situations, including 665 incidents of many severities. When an incident is found in the 666 timeline, the detectors notify it to the ARCA-based VIM and the 667 clients change their request rates, which will depend on the severity 668 of the incident. This behavior is widely discussed in ICIN 2018 669 [ICIN-2018], remarking how users behave after occurring a disaster or 670 another similar incident. 672 The ARCA-based VIM will know the occurrence of the incident from two 673 sources. First, it will receive the notification from the event 674 detectors. Second, it will notice the change of the CPU load of the 675 servants assigned to the target service. In this situation, ARCA has 676 different opportunities to overcome the possible overload (or 677 underload) of the system. We explore the anticipation approach 678 deeply discussed in ICIN 2018 [ICIN-2018]. Its operation is enclosed 679 in the analyzer and decider and it is based on an algorithm that is 680 divided in two sub-algorithms. 682 The first sub-algorithm reacts to the detection of the incident and 683 ulterior correlation of its severity to the amount of servants 684 required by the system. This sub-algorithm hosts the regression of 685 the learner, which is based on the SVM/SVR technique, and predicts 686 the necessary resources from two features: the severity of the 687 incident and the time elapsed from the moment it happened. The 688 resulting amount of servants is established as the minimum amount 689 that the VIM can use. 691 The second sub-algorithm is fed with the CPU load measurements of the 692 servants assigned to the service, as reported by the OpenStack 693 platform. With this information it checks whether the system is 694 within the operating parameters established by the NFVO. If not, it 695 adjusts the resources assigned to the system. It also uses the 696 minimum amount established by the other sub-algorithm as the basis 697 for the assignation. After every correction, this algorithm learns 698 the behavior by adding new correlation vectors to the SVM/SVR 699 structure. 701 When the experiment is running, the collector component of the ARCA- 702 based VIM is attached to the Telemetry interface of OpenStack by 703 using the SDK to access the measurement data generated by Ceilometer 704 and stored by Gnocchi. In addition, it is attached to the external 705 event detectors in order to receive their notifications. On the 706 other hand, the enforcer component is attached to the Compute 707 interface of OpenStack by also using its SDK to request the 708 infrastructure to create, destroy, query, or change the status of a 709 VM that hosts a servant of the controlled system. Finally, the 710 enforcer also updates the lists of servers used by the load balancers 711 to distribute the clients among the available resources. 713 During the execution of the experiment we make the ARCA-based VIM to 714 report the severity of the last incident, if any, the time elapsed 715 since it occurred, the amount of servants assigned to the controlled 716 system, the minimum amount of servants to be assigned, as determined 717 by the anticipation algorithm, and the average load of all servants. 718 In this instance, the severities are spread between 0 (no incident) 719 and 4 (strongest incident), the elapsed times are less than 35 720 seconds, and the minimum server assignation (MSA) is below 10, 721 although the hard maximum is 15. 723 With such measurements we illustrate how the learned correlation of 724 the three features (dimensions) mentioned above is achieved. Thus, 725 when there is no incident (severity = 0), the MSA is kept to the 726 minimum. In parallel, regardless of the severity level, the 727 algorithm learned that there is no need to increase the MSA for the 728 first 5 or 10 seconds. This shows the behavior discussed in this 729 paper, that there is a delay between the occurrence of an event and 730 the actual need for updated amount of resources, and it forms one 731 fundamental aspect of our research. 733 By inspecting the results, we know that there is a burst of client 734 demands that is centered (peak) around 15 seconds after the 735 occurrence of an incident or any other change in the accounted 736 severity. We also know that the burst lasts longer for higher 737 severities, and it fluctuates a bit for the highest severities. 738 Finally, we can also notice that for the majority of severities, the 739 increased MSA is no longer required after 25 seconds from the time 740 the severity change was notified. 742 All that information becomes part of the knowledge of ARCA and it is 743 stored both by the internal structures of the SVM/SVR and, once 744 represented semantically, in the semantic database that manages the 745 knowledge base of ARCA. Thus, it is used to predict any future 746 behavior. For instance, is an incident of severity 3 has occurred 10 747 seconds ago, ARCA knows that it will need to set the MSA to 6 748 servants. In fact, this information has been used during the 749 experiment, so we can also know the accuracy of the algorithm by 750 comparing the anticipated MSA value with the required value (or even 751 the best value). However, the analysis of such information is left 752 for the future. 754 While preparing and executing the experiment we found several 755 limitation intrinsic to the current OpenStack platform. First, 756 regardless of the CPU and memory resources assigned to the underlying 757 controller nodes, the platform is unable to record and deliver 758 performance measurements at a lower interval than every 10 seconds, 759 so it is currently not suitable for real time operations, which is 760 important for our long-term research objectives. Moreover, we found 761 that the time required by the infrastructure to create a server that 762 hosts a somewhat heavy servant is around 10 seconds, which is too far 763 from our targets. Although these limitations can be improved in the 764 future, they clearly justify that our anticipation approach is 765 essential for the proper working of a virtual system and, thus, the 766 integration of external information becomes mandatory for future 767 system management technologies, especially considering the 768 virtualization environments. 770 Finally, we found it difficult for the required measurements to be 771 pushed to external components, so we had to poll for them. 772 Otherwise, some component of ARCA must be instantiated along the main 773 OpenStack components and services so it has first-hand and prompt 774 access to such features. This way, ARCA could receive push 775 notifications with the measurements, as it is for the external 776 detectors. This is a key aspect that affects the placement of the 777 NFV-VIM, or some subpart of it, on the general architecture. 778 Therefore, for future iterations of the NFV reference architecture, 779 an integrated view between the VIM and the NFVI could be required to 780 reflect the future reality. 782 7. Relation to Other IETF/IRTF Initiatives 784 TBD 786 8. IANA Considerations 788 This memo includes no request to IANA. 790 9. Security Considerations 792 The major security concerns of the integration of external event 793 detectors and ARCA to manage SDN/NFV systems is that the boundaries 794 of the control and management planes are crossed to introduce 795 information from outside. Such communications must be highly and 796 heavily secured since some malfunction or explicit attacks might 797 compromise the integrity and execution of the controlled system. 798 However, it is up to implementers to deploy the necessary 799 countermeasures to avoid such situations. From the design point of 800 view, since all oprations are performed within the control and/or 801 management planes, the security level of the current solution is 802 inherited and thus determined by the security masures established by 803 the systems conforming such planes. 805 10. Acknowledgements 807 TBD 809 11. References 810 11.1. Normative References 812 [draft-geng-coms-architecture-01] 813 "Technology Independent Information Model for Network 814 Slicing", 2018, . 817 [draft-qiang-coms-netslicing-information-model-02] 818 "Technology Independent Information Model for Network 819 Slicing", 2018, . 822 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 823 Requirement Levels", BCP 14, RFC 2119, 824 DOI 10.17487/RFC2119, March 1997, 825 . 827 11.2. Informative References 829 [ETSI-NFV-IFA-004] 830 ETSI NFV GS NFV-IFA 004, "Network Functions Virtualisation 831 (NFV); Acceleration Technologies; Management Aspects 832 Specification", 2016. 834 [ETSI-NFV-IFA-005] 835 ETSI NFV GS NFV-IFA 005, "Network Functions Virtualisation 836 (NFV); Management and Orchestration; Or-Vi reference point 837 - Interface and Information Model Specification", 2016. 839 [ETSI-NFV-IFA-006] 840 ETSI NFV GS NFV-IFA 006, "Network Functions Virtualisation 841 (NFV); Management and Orchestration; Vi-Vnfm reference 842 point - Interface and Information Model Specification", 843 2016. 845 [ETSI-NFV-IFA-019] 846 ETSI NFV GS NFV-IFA 019, "Network Functions Virtualisation 847 (NFV); Acceleration Technologies; Management Aspects 848 Specification; Release 3", 2017. 850 [ETSI-NFV-MANO] 851 ETSI NFV GS NFV-MAN 001, "Network Functions Virtualisation 852 (NFV); Management and Orchestration", 2014. 854 [ICIN-2017] 855 P. Martinez-Julia, V. P. Kafle, and H. Harai, "Achieving 856 the autonomic adaptation of resources in virtualized 857 network environments, in Proceedings of the 20th ICIN 858 Conference (Innovations in Clouds, Internet and Networks, 859 ICIN 2017). Washington, DC, USA: IEEE, 2018, pp. 1--8", 860 2017. 862 [ICIN-2018] 863 P. Martinez-Julia, V. P. Kafle, and H. Harai, 864 "Anticipating minimum resources needed to avoid service 865 disruption of emergency support systems, in Proceedings of 866 the 21th ICIN Conference (Innovations in Clouds, Internet 867 and Networks, ICIN 2018). Washington, DC, USA: IEEE, 2018, 868 pp. 1--8", 2018. 870 [OPENSTACK] 871 The OpenStack Project, "http://www.openstack.org/", 2018. 873 Author's Address 875 Pedro Martinez-Julia (editor) 876 NICT 877 4-2-1, Nukui-Kitamachi 878 Koganei, Tokyo 184-8795 879 Japan 881 Phone: +81 42 327 7293 882 Email: pedro@nict.go.jp