idnits 2.17.1 

draft-pedro-anticipated-adaptation-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	NFVRG                                             P. Martinez-Julia, Ed.
3	Internet-Draft                                                      NICT
4	Intended status: Informational                                2 20, 2018
5	Expires: August 24, 2018

7	Exploiting External Event Detectors to Anticipate Resource Requirements
8	             for the Elastic Adaptation of SDN/NFV Systems
9	                 draft-pedro-anticipated-adaptation-00

11	Abstract

13	   The adoption of SDN/NFV technologies by current computer and network
14	   system infrastructures is constantly increasing, becoming essential
15	   for the the particular case of edge/branch network systems.  The
16	   systems supported by these infrastructures require to be adapted to
17	   environment changes within a short period of time.  Thus, the
18	   complexity of new systems and the speed at which management and
19	   control operations must be performed go beyond human limits.  Thus,
20	   management systems must be automated.  However, in several situations
21	   current automation techniques are not enough to respond to
22	   requirement changes.  Here we propose to anticipate changes in the
23	   operation environments of SDN/NFV systems in response to external
24	   events and reflect it in the anticipation of the amount of resources
25	   required by those systems for their ulterior adaptaion.  The final
26	   objective is to avoid service degradation or disruption while keeping
27	   close-to-optimum resource allocation to reduce monetary and operative
28	   cost as much as possible.  Here we discuss how to achieve such
29	   capabilities by the integration of the Autonomic Resource Control
30	   Architecture (ARCA) to the management and operation (MANO) of NFV
31	   systems.  We showcase it by building a multi-domain SDN/NFV
32	   infrastructure based on OpenStack and deploying ARCA to adapt a
33	   virtual system based on the edge/branch network concept to the
34	   operational conditions of an emergency support service, which is
35	   rarely used but that cannot leave any user unattended.

37	Status of This Memo

39	   This Internet-Draft is submitted in full conformance with the
40	   provisions of BCP 78 and BCP 79.

42	   Internet-Drafts are working documents of the Internet Engineering
43	   Task Force (IETF).  Note that other groups may also distribute
44	   working documents as Internet-Drafts.  The list of current Internet-
45	   Drafts is at https://datatracker.ietf.org/drafts/current/.

47	   Internet-Drafts are draft documents valid for a maximum of six months
48	   and may be updated, replaced, or obsoleted by other documents at any
49	   time.  It is inappropriate to use Internet-Drafts as reference
50	   material or to cite them other than as "work in progress."

52	   This Internet-Draft will expire on August 24, 2018.

54	Copyright Notice

56	   Copyright (c) 2018 IETF Trust and the persons identified as the
57	   document authors.  All rights reserved.

59	   This document is subject to BCP 78 and the IETF Trust's Legal
60	   Provisions Relating to IETF Documents
61	   (https://trustee.ietf.org/license-info) in effect on the date of
62	   publication of this document.  Please review these documents
63	   carefully, as they describe your rights and restrictions with respect
64	   to this document.  Code Components extracted from this document must
65	   include Simplified BSD License text as described in Section 4.e of
66	   the Trust Legal Provisions and are provided without warranty as
67	   described in the Simplified BSD License.

69	Table of Contents

71	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
72	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
73	   3.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   4
74	     3.1.  Virtual Computer and Network Systems  . . . . . . . . . .   4
75	     3.2.  SDN and NFV . . . . . . . . . . . . . . . . . . . . . . .   5
76	     3.3.  Management and Control  . . . . . . . . . . . . . . . . .   5
77	     3.4.  The Autonomic Resource Control Architecture (ARCA)  . . .   6
78	   4.  External Event Detectors  . . . . . . . . . . . . . . . . . .   7
79	   5.  Anticipating Requirements . . . . . . . . . . . . . . . . . .   8
80	   6.  ARCA Integration With ETSI-NFV-MANO . . . . . . . . . . . . .   8
81	     6.1.  Functional Integration  . . . . . . . . . . . . . . . . .   9
82	     6.2.  Target Experiment and Scenario  . . . . . . . . . . . . .  11
83	     6.3.  OpenStack Platform  . . . . . . . . . . . . . . . . . . .  13
84	     6.4.  Initial Results . . . . . . . . . . . . . . . . . . . . .  14
85	   7.  Relation to Other IETF/IRTF Initiatives . . . . . . . . . . .  17
86	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  17
87	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  17
88	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  17
89	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  17
90	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  18
91	     11.2.  Informative References . . . . . . . . . . . . . . . . .  18
92	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  19

94	1.  Introduction

96	   The incorporation of Software Defined Networking (SDN) and Network
97	   Function Virtualization (NFV) to current infrastructures to build
98	   virtual computer and network systems is constantly increasing.  The
99	   need to automate the management and control of such systems has
100	   motivated us to design the Autonomic Resource Control Architecture
101	   (ARCA), as presented in ICIN 2018 [ICIN-2018].  Automation
102	   requirements are enough justified by the increasing size and
103	   complexity of systems, which in turn are essential in the current
104	   digital world.  Moreover, the particular requirements and market
105	   benefits of network virtualization have been crystallized in the
106	   uprising of SDN/NFV infrastructures.  Nowadays they broad reception
107	   of the combined SDN/NFV technology supposes a huge leap towards the
108	   empowerment and homogenization of virtualization technologies.
109	   Therefore, we have modeled ARCA to fit within the reference
110	   architecture for management and orchestration of NFV elements, the
111	   Virtual Network Functions (VNFs).

113	   Behind the scenes, NFV is based on a highly distributed and network
114	   empowered version of the well-known Cloud infrastructures and
115	   platforms, also complemented by their centralized counterparts.  This
116	   takes to virtual networks the high degree of flexibility already
117	   found for computer systems.  It is highly desirable at the time NFV
118	   is being exploited by many organizations to build their private
119	   infrastructures, as well as by network service providers to build the
120	   services they later commercialize.  However, to actually exploit the
121	   potential monetary and operative cost reduction that is associated to
122	   such infrastructures, the amount of resources used by production
123	   services must be kept close to the optimum, so the physical resources
124	   are exploited as much as possible.

126	   The fast detection of changes in the requirements of the virtual
127	   systems deployed on the aforementioned SDN/NFV infrastructures, and
128	   the consequent adaptation of allocated resources to the new
129	   situations, becomes essential to actually exploit their cost and
130	   operative benefits, while also avoiding service unresponsiveness due
131	   to underlying resource overloading.  It is widely accepted that the
132	   size and complexity of systems and services makes it difficult for
133	   humans to accomplish such task within their objective time
134	   boundaries.  Therefore, they must be automated.  Luckily, the
135	   architecture and underlying platforms supporting the SDN/NFV
136	   technologies enable the required automation.  In fact, some solutions
137	   already exist to perform several batched or scripted tasks without
138	   human intervention.  However, those solutions still have high
139	   dependences on low-level human involvement.  This remarks the
140	   challenge found in control and management automation, which is
141	   continuously revised and enlarged.

143	   ARCA provides as a small step towards the resolution of the
144	   aforementioned problem.  It advances the State of the Art in
145	   automation of resource control and management by providing a
146	   supervised but autonomous mechanism that reduces the time required to
147	   perform corrective and/or adaptive changes in virtual computer and
148	   network systems from hours/minutes to seconds/milliseconds.
149	   Moreover, it is able to take advantage of the event notifications
150	   provided by external detectors to anticipate the amount of resources
151	   that the controlled SDN/NFV system will require in response to such
152	   event.  We propose to bring such benefit to the reference
153	   architecture promoted by ETSI for the management and orchestration of
154	   NFV services (see ETSI-NFV-MANO [ETSI-NFV-MANO]) by integrating ARCA
155	   as the Virtual Infrastructure Manager (VIM).  We showcase this
156	   proposal by discussing the evaluation results obtained by ARCA when
157	   runnion on a real and physical experimentation infrastructure based
158	   on OpenStack [OPENSTACK].  We thus justify the need to adapt the
159	   interfaces supported by the NFV-MANO to include real-world event
160	   detectors, which are external to the virtualization platform and
161	   virtual resources.

163	2.  Terminology

165	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
166	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
167	   document are to be interpreted as described in RFC 2119 [RFC2119].

169	3.  Background

171	3.1.  Virtual Computer and Network Systems

173	   The continuous search for efficiency and cost reduction to get the
174	   most optimum exploitation of available resources (e.g.  CPU power and
175	   electricity) has conducted current physical infrastructures to move
176	   towards virtualization infrastructures.  Also, this trend enables end
177	   systems to be centralized and/or distributed, so that they are
178	   deployed to best accomplish customer requirements in terms of
179	   resources and qualities.

181	   One of the key functional requirements imposed to computer and
182	   network virtualization is a high degree of flexibility and
183	   reliability.  Both qualities are subject to the underlying
184	   technologies but, while the latter has been always enforced to
185	   computer and network systems, flexibility is a relatively new
186	   requirement, which whould not have been impossed without the backing
187	   of virtualization and cloud technologies.

189	3.2.  SDN and NFV

191	   SDN and NFV are conceived to bring high degree of flexibility and
192	   conceptual centralization qualities to the network.  On the one hand,
193	   with SDN, the network can be programmed to implement a dynamic
194	   behavior that changes its topology and overall qualities.  Moreover,
195	   with NFV the functions that are typically provided by physical
196	   network equipment are now implemented as virtual appliances that can
197	   be deployed and linked together to provide customized network
198	   services.  SDN and NFV complements to each other to actually
199	   implement the network aspect of the aforementioned virtual computer
200	   and network systems.

202	   Although centralization can lead us to think on the single-point-of-
203	   failure concept, it is not the case for these technologoes.
204	   Conceptual centralization highly differs from centralized deployment.
205	   It brings all benefits from having a single point of decision but
206	   retaining the benefits from distributed systems.  For instance,
207	   control decisions in SDN can be centralized while the mechanisms that
208	   enforce such decisions into the network (SDN controllers) can be
209	   implemented as highly distributed systems.  The same approach can be
210	   applied to NFV.  Althoug network functions can be implemented in a
211	   central computing facility, they can take advantage of several
212	   replication and distribution techniques to achieve the properties of
213	   distributed systems.  Nevertheless, NFV also allows the deployment of
214	   functions on top of distributed systems, so they benefit from both
215	   distribution alternatives at the same time.

217	3.3.  Management and Control

219	   The introduction of virtualization into the computer and network
220	   system landscape has increased the complexity of both underlying and
221	   overlying systems.  On the one hand, virtualyzing underlying systems
222	   adds extra functions that must be managed propoerly to ensure the
223	   correct operation of the whole system, which not just encompasses
224	   underlying elements but also the virtual elements running on top of
225	   them.  Such functions are used to actually host the overlying virtual
226	   elements, so there is an indirect management operation that involves
227	   virtual systems.  Moreover, such complexities are inherited by final
228	   systems that get virtualized and deployed on top of those
229	   virtualization infrastructures.

231	   In parallel, virtual systems are empowered with additional, and
232	   widely exploited, functionality that must be managed correctly.  It
233	   is the case of the dynamic adaptation of virtual resources to the
234	   specific needs of their operation environments, or even the
235	   composition of distributed elements across heterogeneous underlying
236	   infrastructures, and probably providers.

238	   Taking both complex functions into account, either separately or
239	   jointly, makes clear that management requirements have greatly
240	   supassed the limits of humans, so automation has become essential to
241	   accomplish most common tasks.

243	3.4.  The Autonomic Resource Control Architecture (ARCA)

245	   As deeply discussed in ICIN 2018 [ICIN-2018], ARCA leverages the
246	   elastic adaptation of resources assigned to virtual computer and
247	   network systems by calculating or estimating their requirements from
248	   the analysis of load measurements and the detection of external
249	   events.  These events can be notified by physical elements (things,
250	   sensors) that detect changes on the environment, as well as software
251	   elements that analyze digital information, such as connectors to
252	   sources or analyzers of Big Data.  For instance, ARCA is able to
253	   consider the detection of an earthquake or a heavy rainfall to
254	   overcome the damages it can make to the controlled system.

256	   The policies that ARCA must enforce will be specified by
257	   administrators during the configuration of the control/management
258	   engine.  Then, ARCA continues running autonomously, with no more
259	   human involvement unless some parameter must be changed.  ARCA will
260	   adopt the required control and management operations to adapt the
261	   controlled system to the new situation or requirements.  The main
262	   goal of ARCA is thus to reduce the time required for resource
263	   adaptation from hours/minutes to seconds/milliseconds.  With the
264	   aforementioned statements, system administrators are able to specify
265	   the general operational boundaries in terms of lower and upper system
266	   load thresholds, as well as the minimum and maximum amount of
267	   resources that can be allocated to the controlled system to overcome
268	   any eventual situation, including the natural crossing of such
269	   thresholds.

271	   ARCA functional goal is to run autonomously while the performance
272	   goal is to keep the resources assigned to the controlled resources as
273	   close as possible to the optimum (e.g. 5 % from the optimum) while
274	   avoiding service disruption as much as possible, keeping client
275	   request discard rate as low as possible (e.g. below 1 %).  To achieve
276	   both goals, ARCA relies on the Autonomic Computing (AC) paradigm, in
277	   the form of interconnected micro-services.  Therefore, ARCA includes
278	   the four main elements and activities defined by AC, incarnated as:

280	   Collector Is responsible of gathering and formatting the
281	             heterogeneous observations that will be used in the control
282	             cycle.

284	   Analyzer  Correlates the observations to each other in order to find
285	             the situation of the controlled system, especially the
286	             current load of the resources allocated to the system and
287	             the occurrence of an incident that can affect to the normal
288	             operation of the system, such as an earthquake that
289	             increases the traffic in an emergency-support system, which
290	             is the main target scenario studied in this paper.

292	   Decider   Determines the necessary actions to adjust the resources to
293	             the load of the controlled system.

295	   Enforcer  Requests the underlying and overlying infrastructure, such
296	             as OpenStack, to make the necessary changes to reflect the
297	             effects of the decided actions into the system.

299	   Being a micro-service architecture means that the different
300	   components are executed in parallel.  This allows such components to
301	   operate in two ways.  First, their operation can be dispatched by
302	   receiving a message from the previous service or an external service.
303	   Second, the services can be self-dispatched, so they can activate
304	   some action or send some message without being previously stimulated
305	   by any message.  The overall control process loops indefinitely and
306	   it is closed by checking that the expected effects of an action are
307	   actually taking place.  The coherence among the distributed services
308	   involved in the ARCA control process is ensured by enforcing a common
309	   semantic representation and ontology to the messages they exchange.

311	   ARCA semantics are built with the Resource Description Framework
312	   (RDF) and the Web Ontology Language (OWL), which are well known and
313	   widely used standards for the semantic representation and management
314	   of knowledge.  They provide the ability to represent new concepts
315	   without requiring to change the software, just plugin extensions to
316	   the ontology.  ARCA stores all its knowledge is stored in the
317	   Knowledge Base (KB), which is queried and kept up-to-date by the
318	   analyzer and decider micro-services.  It is implemented by Apache
319	   Jena Fuseki, which is a high-performance RDF data store that supports
320	   SPARQL through an HTTP/REST interface.  Being de-facto standards,
321	   both technologies enable ARCA to be easily integrated to
322	   virtualization platforms like OpenStack.

324	4.  External Event Detectors

326	   As mentioned above, current mechanisms used to achieve automated
327	   management and control rely only on the continuous monitoring of the
328	   resources they control or the underlying infrastructure that host
329	   them.  However, there are several other sources of information that
330	   can be exploited to make the systems more robust and efficient.  It
331	   is the case of the notifications that can be provided by physical or
332	   virtual elements or devices that are watching for specific events,
333	   hence called external event detectors.

335	   More specifically, although the notifications provided by these
336	   external event detectors are related to successes that occur outside
337	   the boundaries of the controlled system, such successes can affect
338	   the typical operation of controlled systems.  For instance, a heavy
339	   rainfall or snowfall can be detected and correlated to a huge
340	   increase in the amount of requests experienced by some emergency
341	   support service.

343	5.  Anticipating Requirements

345	   One of the main goals of the MANO mechanisms is to ensure the virtual
346	   computer and network system they manage meets the requirements
347	   established by their owners and administrators.  It is currently
348	   achieved by observing and analyzing the performance measurements
349	   obtained either by directly asking the resources forming the managed
350	   system of by asking the controllers of the underlying infrastructure
351	   that hosts such resources.  Thus, under changing or eventual
352	   situations, the managed system must be adapted to cope with the new
353	   requirements, incrasing the amount of resources assigned to it, or to
354	   make efficient use of available infrastructures, reducing the amount
355	   of resources assigned to it.

357	   However, the time required by the infrastructure to make effective
358	   the adaptations requested by the MANO mechanisms is longer than the
359	   time required by client requests to overload the system and make it
360	   discard further client requests.  This situation is generally
361	   undesired but particularly dangerous for some systems, such as the
362	   emergency support system mentioned above.  Therefore, in order to
363	   avoid the disruption of the service, the change in requirements must
364	   be anticipated to ensure that any adaptation has finished as soon as
365	   possible, preferably before the target system gets overloaded or
366	   underloaded.

368	   Here we propose to integrate ARCA with NFV-MANO to take advantage of
369	   the notifications provided by the aforementioned external event
370	   detectors, by correlating them to the target amount of resources
371	   required by the managed system and enforcing the necessary
372	   adaptations beforehand, particularly before the system performance
373	   metrics have actually changed.

375	6.  ARCA Integration With ETSI-NFV-MANO

377	   In this section we describe how to fit ARCA on a general SDN/NFV
378	   underlying infrastructure and introduce a showcase experiment that
379	   demonstrates its operation on an OpenStack-based experimentation
380	   platform.  We first describe the integration of ARCA with the NFV-
381	   MANO reference architecture.  We contextualize the significance of
382	   this integration by describing an emergency support scenario that
383	   clearly benefits from it.  Then we proceed to detail the elements
384	   forming the OpenStack platform and finally we discuss some initial
385	   results obtained from them.

387	6.1.  Functional Integration

389	   The most important functional blocks of the NFV reference
390	   architecture promoted by ETSI (see ETSI-NFV-MANO [ETSI-NFV-MANO]) are
391	   the system support functions for operations and business (OSS/BSS),
392	   the element management (EM) and, obviously. the Virtual Network
393	   Functions (VNFs).  But these functions cannot exist without being
394	   instantiated on a specific infrastructure, the NFV infrastructure
395	   (NFVI), and all of them must be coordinated, orchestrated, and
396	   managed by the general NFV-MANO functions.

398	   Both the NFVI and the NFV-MANO elements are subdivided into several
399	   sub-components.  The NFVI has the underlying physical computing,
400	   storage, and network resources, which are sliced (seedraft-qiang-
401	   coms-netslicing-information-model-02
402	   [draft-qiang-coms-netslicing-information-model-02] and draft-geng-
403	   coms-architecture-01 [draft-geng-coms-architecture-01]) and
404	   virtualized to conform the virtual computing, storage, and network
405	   resources that will host the VNFs.  In addition, the NFV-MANO is
406	   subdivided in the NFV Orchestrator (NFVO), the VNF manager (VNFM) and
407	   the Virtual Infrastructure Manager (VIM).  As their name indicates,
408	   all high-level elements and sub-components have their own and very
409	   specific objective in the NFV architecture.

411	   During the design of ARCA we enforced both operational and
412	   interfacing aspects to its main objectives.  From the operational
413	   point of view, ARCA processes observations to manage virtual
414	   resources, so it plays the role of the VIM mentioned above.
415	   Therefore, ARCA has been designed with appropriate interfaces to fit
416	   in the place of the VIM.  This way, ARCA provides the NFV reference
417	   architecture with the ability to react to external events to adapt
418	   virtual computer and network systems, even anticipating such
419	   adaptations as performed by ARCA itself.  However, some interfaces
420	   must be extended to fully enable ARCA to perform its work within the
421	   NFV architecture.

423	   Once ARCA is placed in the position of the VIM, it enhances the
424	   general NFV architecture with its autonomic management capabilities.
425	   In particular, it discharges some responsibilities from the VNFM and
426	   NFVO, so they can focus on their own business while the virtual
427	   resources are behaving as they expect (and request).  Moreover, ARCA
428	   improves the scalability and reliability of the managed system in
429	   case of disconnection from the orchestration layer due to some
430	   failure, network split, etc.  It is also achieved by the autonomic
431	   capabilities, which, as described above, are guided by the rules and
432	   policies specified by the administrators and, here, communicated to
433	   ARCA through the NFVO.  However, ARCA will not be limited to such
434	   operation so, more generally, it will accomplish the requirements
435	   established by the Virtual Network Operators (VNOs), which are the
436	   owners of the slice of virtual resources that is managed by a
437	   particular instance of NFV-MANO, and therefore ARCA.

439	   In addition to the operational functions, ARCA incorporates the
440	   necessary mechanisms to engage the interfaces that enable it to
441	   interact with other elements of the NFV-MANO reference architecture.
442	   More specifically, ARCA is bound to the Or-Vi (see ETSI-NFV-IFA-005
443	   [ETSI-NFV-IFA-005]) and the Nf-Vi (see ETSI-NFV-IFA-004
444	   [ETSI-NFV-IFA-004] and ETSI-NFV-IFA-019 [ETSI-NFV-IFA-019]).  The
445	   former is the point of attachment between the NFVO and the VIM while
446	   the latter is the point of attachment between the NFVI and the VIM.
447	   In our current design we decided to avoid the support for the point
448	   of attachment between the VNFM and the VIM, called Vi-Vnfm (see ETSI-
449	   NFV-IFA-006 [ETSI-NFV-IFA-006]).  We leave it for future evolutions
450	   of the proposed integration, that will be enabled by a possible
451	   solution that provides the functions of the VNFM required by ARCA.

453	   Through the Or-Vi, ARCA receives the instructions it will enforce to
454	   the virtual computer and network system it is controlling.  As
455	   mentioned above, these are specified in the form of rules and
456	   policies, which are in turn formatted as several statements and
457	   embedded into the Or-Vi messages.  In general, these will be high-
458	   level objectives, so ARCA will use its reasoning capabilities to
459	   translate them into more specific, low-level objectives.  For
460	   instance, the Or-Vi can specify some high-level statement to avoid
461	   CPU overloading and ARCA will use its innate and acquired knowledge
462	   to translate it to specific statements that specify which parameters
463	   it has to measure (CPU load from assigned servers) and which are
464	   their desired boundaries, in the form of high threshold and low
465	   threshold.  Moreover, the Or-Vi will be used by the NFVO to specify
466	   which actions can be used by ARCA to overcome the violation of the
467	   mentioned policies.

469	   All information flowing the Or-Vi interface is encoded and formatted
470	   by following a simple but highly extensible ontology and exploiting
471	   the aforementioned semantic formats.  This ensures that the
472	   interconnected system is able to evolve, including the replacement of
473	   components, updating (addition or removal) the supported concepts to
474	   understand new scenarios, and connecting external tools to further
475	   enhance the management process.  The only requirement to ensure this
476	   feature is to ensure that all elements support the mentioned ontology
477	   and semantic formats.  Although it is not a finished task, the
478	   development of semantic technologies allows the easy adaptation and
479	   translation of existing information formats, so it is expected that
480	   more and more software pieces become easily integrable with the ETSI-
481	   NFV-MANO [ETSI-NFV-MANO] architecture.

483	   In contrast to the Or-Vi interface, the Nf-Vi interface exposes more
484	   precise and low-level operations.  Although this makes it easier to
485	   be integrated to ARCA, it also makes it to be tied to specific
486	   implementations.  In other words, building a proxy that enforces the
487	   aforementioned ontology to different interface instances to
488	   homogenize them adds undesirable complexity.  Therefore, new
489	   components have been specifically developed for ARCA to be able to
490	   interact with different NFVIs.  Nevertheless, this specialization is
491	   limited to the collector and enforcer.  Moreover, it allows ARCA to
492	   have optimized low-level operations, with high improvement of the
493	   overall performance.  This is the case of the specific
494	   implementations of the collector and enforcer used with Mininet and
495	   Docker, which are used as underlying infrastructures in previous
496	   experiments described in ICIN 2017 [ICIN-2017].  Moreover, as
497	   discussed in the following section, this is also the case of the
498	   implementations of the collector and enforcer tied to OpenStack
499	   telemetry and compute interfaces, respectively.

501	   Although OpenStack still lacks some functionality regarding the
502	   construction of specific virtual networks, we use it as the NFVI
503	   functional block in the integrated approach.  Therefore, OpenStack is
504	   the provider of the underlying SDN/NFV infrastructure and we
505	   exploited its APIs and SDK to achieve the integration.  More
506	   specifically, in our showcase we use the APIs provided by Ceilometer,
507	   Gnocchi, and Compute services as well as the SDK provided for Python.
508	   All of them are gathered within the Nf-Vi interface.  Moreover, we
509	   have extended the Or-Vi interface to connect external elements, such
510	   as the physical or environmental event detectors and Big Data
511	   connectors, which is becoming a mandatory requirement of the current
512	   virtualization ecosystem and it conforms our main extension to the
513	   NFV architecture.

515	6.2.  Target Experiment and Scenario

517	   From the beginning of our work on the design of ARCA we are targeting
518	   real-world scenarios, so we get better suited requirements.  In
519	   particular we work with a scenario that represents an emergency
520	   support service that is hosted on a virtual computer and network
521	   system, which is in turn hosted on the distributed virtualization
522	   infrastructure of a medium-sized organization.  The objective is to
523	   clearly represent an application that requires high dynamicity and
524	   high degree of reliability.  The emergency support service
525	   accomplishes this by being barely used when there is no incident but
526	   also being heavily loaded when there is an incident.

528	   Both the underlying infrastructure and virtual network share the same
529	   topology.  They have four independent but interconnected network
530	   domains that form part of the same administrative domain
531	   (organization).  The first domain hosts the systems of the
532	   headquarters (HQ) of the owner organization, so the VNFs it hosts
533	   (servants) implement the emergency support service.  We defined them
534	   as ``servants'' because they are Virtual Machine (VM) instances that
535	   work together to provide a single service by means of backing the
536	   Load Balancer (LB) instances deployed in the separate domains.  The
537	   amount of resources (servants) assigned to the service will be
538	   adjusted by ARCA, attaching or detaching servants to meet the load
539	   boundaries specified by administrators.

541	   The other domains represent different buildings of the organization
542	   and will host the clients that access to the service when an incident
543	   occurs.  They also host the necessary LB instances, which are also
544	   VNFs that are controlled by ARCA to regulate the access of clients to
545	   servants.  All domains will have physical detectors to provide
546	   external information that can (and will) be correlated to the load of
547	   the controlled virtual computer and network system and thus will
548	   affect to the amount of servants assigned to it.  Although the
549	   underlying infrastructure, the servants, and the ARCA instance are
550	   the same as those those used in the real world, both clients and
551	   detectors will be emulated.  Anyway, this does not reduce the
552	   transferability of the results obtained from our experiments as it
553	   allows to expand the amount of clients beyond the limits of most
554	   physical infrastructures.

556	   Each underlying OpenStack domain will be able to host a maximum of
557	   100 clients, as they will be deployed on a low profile virtual
558	   machine (flavor in OpenStack).  In general, clients will be
559	   performing requests at a rate of one request every ten seconds, so
560	   there would be a maximum of 30 requests per second.  However, under
561	   the simulated incident, the clients will raise their load to reach a
562	   common maximum of 1200 requests per second.  This mimics the shape
563	   and size of a real medium-size organization of about 300 users that
564	   perform a maximum of four requests per second when they need some
565	   support.

567	   The topology of the underlying network is simplified by connecting
568	   the four domains to the same, high-performance switch.  However, the
569	   topology of the virtual network is built by using direct links
570	   between the HQ domain and the other three domains.  These are
571	   complemented by links between domains 2 and 3, and between domains 3
572	   and 4.  This way, the three domains have three paths to reach the HQ
573	   domain: a direct path with just one hop, and two indirect paths with
574	   two and three hops, respectively.

576	   During the execution of the experiment, the detectors notify the
577	   incident to the controller as soon as it happens.  However, although
578	   the clients are stimulated at the same time, there is some delay
579	   between the occurrence of the incident and the moment the network
580	   service receives the increase in the load.  One of the main targets
581	   of our experiment is to study such delay and take advantage of it to
582	   anticipate the amount of servants required by the system.  We discuss
583	   it below.

585	   In summary, this scenario highlights the main benefits of ARCA to
586	   play the role of VIM and interacting with the underlying OpenStack
587	   platform.  This means the advancement towards an efficient use of
588	   resources and thus reducing the CAPEX of the system.  Moreover, as
589	   the operation of the system is autonomic, the involvement of human
590	   administrators is reduced and, therefore, the OPEX is also reduced.

592	6.3.  OpenStack Platform

594	   The implementation of the scenario described above reflects the
595	   requirements of any edge/branch networking infrastructure, which are
596	   composed of several distributed micro-data-centers deployed on the
597	   wiring centers of the buildings and/or storeys.  We chose to use
598	   OpenStack to meet such requirements because it is being widely used
599	   in production infrastructures and the resulting infrastructure will
600	   have the necessary robustness to accomplish our objectives, at the
601	   time it reflects the typical underlying platform found in any SDN/NFV
602	   environment.

604	   We have deployed four separate network domains, each one with its own
605	   OpenStack instantiation.  All domains are totally capable of running
606	   regular OpenStack workload, i.e. executing VMs and networks, but, as
607	   mentioned above, we designate the domain 1 to be the headquarters of
608	   the organization.  The different underlying networks required by this
609	   (quite complex) deployment are provided by several VLANs within a
610	   high-end L2 switch.  This switch represents the distributed network
611	   of the organization.  Four separated VLANs are used to isolate the
612	   traffic within each domain, by connecting an interface of OpenStack's
613	   controller and compute nodes.  These VLANs therefore form the
614	   distributed data plane.  Moreover, other VLAN is used to carry the
615	   control plane as well as the management plane, which are used by the
616	   NFV-MANO, and thus ARCA.  It is instantiated in the physical machine
617	   called ARCA Node, to exchange control and management operations in
618	   relation to the collector and enforcer defined in ARCA.  This VLAN is
619	   shared among all OpenStack domains to implement the global control of
620	   the virtualization environment pertaining to the organization.
621	   Finally, other VLAN is used by the infrastructure to interconnect the
622	   data planes of the separated domains and also to allow all elements
623	   of the infrastructure to access the Internet to perform software
624	   installation and updates.

626	   Installation of OpenStack is provided by the Red Hat OpenStack
627	   Platform, which is tightly dependent on the Linux operating system
628	   and closely related to the software developed by the OpenStack Open
629	   Source project.  It provides a comprehensive way to install the whole
630	   platform while being easily customized to meet our specific
631	   requirements, while it is also backed by operational quality support.

633	   The ARCA node is also based on Linux but, since it is not directly
634	   related to the OpenStack deployment, it is not based on the same
635	   distribution.  It is just configured to be able to access the control
636	   and management interfaces offered by OpenStack, and therefore it is
637	   connected to the VLAN that hosts the control and management planes.
638	   On this node we deploy the NFV-MANO components, including the micro-
639	   services that form an ARCA instance.

641	   In summary, we dedicate nine physical computers to the OpenStack
642	   deployment, all are Dell PowerEdge R610 with 2 x Xeon 5670 2.96 GHz
643	   (6 core / 12 thread) CPU, 48 GiB RAM, 6 x 146 GiB HD at 10 kRPM, and
644	   4 x 1 GE NIC.  Moreover, we dedicate an additional computer with the
645	   same specification to the ARCA Node.  We dedicate a less powerful
646	   computer to implement the physical router because it will not be
647	   involved in the general execution of OpenStack nor in the specific
648	   experiments carried out with it.  Finally, as detailed above, we
649	   dedicate a high-end physical switch, an HP ProCurve 1810G-24, to
650	   build the interconnection networks.

652	6.4.  Initial Results

654	   Using the platform described above we execute an initial but long-
655	   lasting experiment based on the target scenario introduced at the
656	   beginning of this section.  The objective of this experiment is
657	   twofold.  First, we aim to demonstrate how ARCA behaves in a real
658	   environment.  Second, we aim to stress the coupling points between
659	   ARCA and OpenStack, which will raise the limitations of the existing
660	   interfaces.

662	   With such objectives in mind, we define a timeline that will be
663	   followed by both clients and external event detectors.  It forces the
664	   virtualized system to experience different situations, including
665	   incidents of many severities.  When an incident is found in the
666	   timeline, the detectors notify it to the ARCA-based VIM and the
667	   clients change their request rates, which will depend on the severity
668	   of the incident.  This behavior is widely discussed in ICIN 2018
669	   [ICIN-2018], remarking how users behave after occurring a disaster or
670	   another similar incident.

672	   The ARCA-based VIM will know the occurrence of the incident from two
673	   sources.  First, it will receive the notification from the event
674	   detectors.  Second, it will notice the change of the CPU load of the
675	   servants assigned to the target service.  In this situation, ARCA has
676	   different opportunities to overcome the possible overload (or
677	   underload) of the system.  We explore the anticipation approach
678	   deeply discussed in ICIN 2018 [ICIN-2018].  Its operation is enclosed
679	   in the analyzer and decider and it is based on an algorithm that is
680	   divided in two sub-algorithms.

682	   The first sub-algorithm reacts to the detection of the incident and
683	   ulterior correlation of its severity to the amount of servants
684	   required by the system.  This sub-algorithm hosts the regression of
685	   the learner, which is based on the SVM/SVR technique, and predicts
686	   the necessary resources from two features: the severity of the
687	   incident and the time elapsed from the moment it happened.  The
688	   resulting amount of servants is established as the minimum amount
689	   that the VIM can use.

691	   The second sub-algorithm is fed with the CPU load measurements of the
692	   servants assigned to the service, as reported by the OpenStack
693	   platform.  With this information it checks whether the system is
694	   within the operating parameters established by the NFVO.  If not, it
695	   adjusts the resources assigned to the system.  It also uses the
696	   minimum amount established by the other sub-algorithm as the basis
697	   for the assignation.  After every correction, this algorithm learns
698	   the behavior by adding new correlation vectors to the SVM/SVR
699	   structure.

701	   When the experiment is running, the collector component of the ARCA-
702	   based VIM is attached to the Telemetry interface of OpenStack by
703	   using the SDK to access the measurement data generated by Ceilometer
704	   and stored by Gnocchi.  In addition, it is attached to the external
705	   event detectors in order to receive their notifications.  On the
706	   other hand, the enforcer component is attached to the Compute
707	   interface of OpenStack by also using its SDK to request the
708	   infrastructure to create, destroy, query, or change the status of a
709	   VM that hosts a servant of the controlled system.  Finally, the
710	   enforcer also updates the lists of servers used by the load balancers
711	   to distribute the clients among the available resources.

713	   During the execution of the experiment we make the ARCA-based VIM to
714	   report the severity of the last incident, if any, the time elapsed
715	   since it occurred, the amount of servants assigned to the controlled
716	   system, the minimum amount of servants to be assigned, as determined
717	   by the anticipation algorithm, and the average load of all servants.
718	   In this instance, the severities are spread between 0 (no incident)
719	   and 4 (strongest incident), the elapsed times are less than 35
720	   seconds, and the minimum server assignation (MSA) is below 10,
721	   although the hard maximum is 15.

723	   With such measurements we illustrate how the learned correlation of
724	   the three features (dimensions) mentioned above is achieved.  Thus,
725	   when there is no incident (severity = 0), the MSA is kept to the
726	   minimum.  In parallel, regardless of the severity level, the
727	   algorithm learned that there is no need to increase the MSA for the
728	   first 5 or 10 seconds.  This shows the behavior discussed in this
729	   paper, that there is a delay between the occurrence of an event and
730	   the actual need for updated amount of resources, and it forms one
731	   fundamental aspect of our research.

733	   By inspecting the results, we know that there is a burst of client
734	   demands that is centered (peak) around 15 seconds after the
735	   occurrence of an incident or any other change in the accounted
736	   severity.  We also know that the burst lasts longer for higher
737	   severities, and it fluctuates a bit for the highest severities.
738	   Finally, we can also notice that for the majority of severities, the
739	   increased MSA is no longer required after 25 seconds from the time
740	   the severity change was notified.

742	   All that information becomes part of the knowledge of ARCA and it is
743	   stored both by the internal structures of the SVM/SVR and, once
744	   represented semantically, in the semantic database that manages the
745	   knowledge base of ARCA.  Thus, it is used to predict any future
746	   behavior.  For instance, is an incident of severity 3 has occurred 10
747	   seconds ago, ARCA knows that it will need to set the MSA to 6
748	   servants.  In fact, this information has been used during the
749	   experiment, so we can also know the accuracy of the algorithm by
750	   comparing the anticipated MSA value with the required value (or even
751	   the best value).  However, the analysis of such information is left
752	   for the future.

754	   While preparing and executing the experiment we found several
755	   limitation intrinsic to the current OpenStack platform.  First,
756	   regardless of the CPU and memory resources assigned to the underlying
757	   controller nodes, the platform is unable to record and deliver
758	   performance measurements at a lower interval than every 10 seconds,
759	   so it is currently not suitable for real time operations, which is
760	   important for our long-term research objectives.  Moreover, we found
761	   that the time required by the infrastructure to create a server that
762	   hosts a somewhat heavy servant is around 10 seconds, which is too far
763	   from our targets.  Although these limitations can be improved in the
764	   future, they clearly justify that our anticipation approach is
765	   essential for the proper working of a virtual system and, thus, the
766	   integration of external information becomes mandatory for future
767	   system management technologies, especially considering the
768	   virtualization environments.

770	   Finally, we found it difficult for the required measurements to be
771	   pushed to external components, so we had to poll for them.
772	   Otherwise, some component of ARCA must be instantiated along the main
773	   OpenStack components and services so it has first-hand and prompt
774	   access to such features.  This way, ARCA could receive push
775	   notifications with the measurements, as it is for the external
776	   detectors.  This is a key aspect that affects the placement of the
777	   NFV-VIM, or some subpart of it, on the general architecture.
778	   Therefore, for future iterations of the NFV reference architecture,
779	   an integrated view between the VIM and the NFVI could be required to
780	   reflect the future reality.

782	7.  Relation to Other IETF/IRTF Initiatives

784	   TBD

786	8.  IANA Considerations

788	   This memo includes no request to IANA.

790	9.  Security Considerations

792	   The major security concerns of the integration of external event
793	   detectors and ARCA to manage SDN/NFV systems is that the boundaries
794	   of the control and management planes are crossed to introduce
795	   information from outside.  Such communications must be highly and
796	   heavily secured since some malfunction or explicit attacks might
797	   compromise the integrity and execution of the controlled system.
798	   However, it is up to implementers to deploy the necessary
799	   countermeasures to avoid such situations.  From the design point of
800	   view, since all oprations are performed within the control and/or
801	   management planes, the security level of the current solution is
802	   inherited and thus determined by the security masures established by
803	   the systems conforming such planes.

805	10.  Acknowledgements

807	   TBD

809	11.  References
810	11.1.  Normative References

812	   [draft-geng-coms-architecture-01]
813	              "Technology Independent Information Model for Network
814	              Slicing", 2018, <https://www.ietf.org/id/
815	              draft-qiang-coms-netslicing-information-model-02.txt>.

817	   [draft-qiang-coms-netslicing-information-model-02]
818	              "Technology Independent Information Model for Network
819	              Slicing", 2018, <https://www.ietf.org/id/
820	              draft-qiang-coms-netslicing-information-model-02.txt>.

822	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
823	              Requirement Levels", BCP 14, RFC 2119,
824	              DOI 10.17487/RFC2119, March 1997,
825	              <https://www.rfc-editor.org/info/rfc2119>.

827	11.2.  Informative References

829	   [ETSI-NFV-IFA-004]
830	              ETSI NFV GS NFV-IFA 004, "Network Functions Virtualisation
831	              (NFV); Acceleration Technologies; Management Aspects
832	              Specification", 2016.

834	   [ETSI-NFV-IFA-005]
835	              ETSI NFV GS NFV-IFA 005, "Network Functions Virtualisation
836	              (NFV); Management and Orchestration; Or-Vi reference point
837	              - Interface and Information Model Specification", 2016.

839	   [ETSI-NFV-IFA-006]
840	              ETSI NFV GS NFV-IFA 006, "Network Functions Virtualisation
841	              (NFV); Management and Orchestration; Vi-Vnfm reference
842	              point - Interface and Information Model Specification",
843	              2016.

845	   [ETSI-NFV-IFA-019]
846	              ETSI NFV GS NFV-IFA 019, "Network Functions Virtualisation
847	              (NFV); Acceleration Technologies; Management Aspects
848	              Specification; Release 3", 2017.

850	   [ETSI-NFV-MANO]
851	              ETSI NFV GS NFV-MAN 001, "Network Functions Virtualisation
852	              (NFV); Management and Orchestration", 2014.

854	   [ICIN-2017]
855	              P. Martinez-Julia, V. P. Kafle, and H. Harai, "Achieving
856	              the autonomic adaptation of resources in virtualized
857	              network environments, in Proceedings of the 20th ICIN
858	              Conference (Innovations in Clouds, Internet and Networks,
859	              ICIN 2017). Washington, DC, USA: IEEE, 2018, pp. 1--8",
860	              2017.

862	   [ICIN-2018]
863	              P. Martinez-Julia, V. P. Kafle, and H. Harai,
864	              "Anticipating minimum resources needed to avoid service
865	              disruption of emergency support systems, in Proceedings of
866	              the 21th ICIN Conference (Innovations in Clouds, Internet
867	              and Networks, ICIN 2018). Washington, DC, USA: IEEE, 2018,
868	              pp. 1--8", 2018.

870	   [OPENSTACK]
871	              The OpenStack Project, "http://www.openstack.org/", 2018.

873	Author's Address

875	   Pedro Martinez-Julia (editor)
876	   NICT
877	   4-2-1, Nukui-Kitamachi
878	   Koganei, Tokyo  184-8795
879	   Japan

881	   Phone: +81 42 327 7293
882	   Email: pedro@nict.go.jp