idnits 2.17.1 draft-bernardos-anima-fog-monitoring-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (24 May 2022) is 702 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ANIMA WG CJ. Bernardos, Ed. 3 Internet-Draft UC3M 4 Intended status: Experimental A. Mourad 5 Expires: 25 November 2022 InterDigital 6 P. Martinez-Julia 7 NICT 8 24 May 2022 10 Autonomic setup of fog monitoring agents 11 draft-bernardos-anima-fog-monitoring-06 13 Abstract 15 The concept of fog computing has emerged driven by the Internet of 16 Things (IoT) due to the need of handling the data generated from the 17 end-user devices. The term fog is referred to any networked 18 computational resource in the continuum between things and cloud. In 19 fog computing, functions can be stiched together composing a service 20 function chain. These functions might be hosted on resources that 21 are inherently heterogeneous, volatile and mobile. This means that 22 resources might appear and disappear, and the connectivity 23 characteristics between these resources may also change dynamically. 24 This calls for new orchestration solutions able to cope with dynamic 25 changes to the resources in runtime or ahead of time (in anticipation 26 through prediction) as opposed to today's solutions which are 27 inherently reactive and static or semi-static. 29 A fog monitoring solution can be used to help predicting events so an 30 action can be taken before an event actually takes place. This 31 solution is composed of agents running on the fog nodes plus a 32 controller hosted at another device (running in the infrastructure or 33 in another fog node). Since fog environments are inherently volatile 34 and extremely dynamic, it is convenient to enable the use of 35 autonomic technologies to autonomously set-up the fog monitoring 36 platform. This document aims at presenting this use case as well as 37 specifying how to use GRASP as needed in this scenario. 39 Status of This Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at https://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on 25 November 2022. 56 Copyright Notice 58 Copyright (c) 2022 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 63 license-info) in effect on the date of publication of this document. 64 Please review these documents carefully, as they describe your rights 65 and restrictions with respect to this document. Code Components 66 extracted from this document must include Revised BSD License text as 67 described in Section 4.e of the Trust Legal Provisions and are 68 provided without warranty as described in the Revised BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 73 1.1. Problem statement . . . . . . . . . . . . . . . . . . . . 3 74 1.2. Fog monitoring framework . . . . . . . . . . . . . . . . 4 75 1.3. Supporting simple and complex monitoring metrics . . . . 6 76 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 77 3. Autonomic setup of fog monitoring framework . . . . . . . . . 7 78 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 79 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 80 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 81 7. Informative References . . . . . . . . . . . . . . . . . . . 11 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 84 1. Introduction 86 The concept of fog computing has emerged driven by the Internet of 87 Things (IoT) due to the need of handling the data generated from the 88 end-user devices. The term fog is referred to any networked 89 computational resource in the continuum between things and cloud. A 90 fog node may therefore be an infrastructure network node such as an 91 eNodeB or gNodeB, an edge server, a customer premises equipment 92 (CPE), or even a user equipment (UE) terminal node such as a laptop, 93 a smartphone, or a computing unit on-board a vehicle, robot or drone. 95 In fog computing, functions might be organized in service function 96 chains (SFCs), hosted on resources that are inherently heterogeneous, 97 volatile and mobile. This means that resources might appear and 98 disappear, and the connectivity characteristics between these 99 resources may also change dynamically. This calls for new 100 orchestration solutions able to cope with dynamic changes to the 101 resources in runtime or ahead of time (in anticipation through 102 prediction) as opposed to today's solutions which are inherently 103 reactive and static or semi-static. 105 1.1. Problem statement 107 Figure 1 shows an exemplary scenario of a (robot) network service. A 108 robot device has its (navigation) control application running in the 109 fog away from the robot, as a network service in the form of an SFC 110 "F1-F2" (e.g., F1 might be in charge of identifying obstacles and F2 111 takes decisions on the robot navigation). Initially the function F1 112 is assumed to be hosted at a fog node A and F2 at fog node B. At a 113 given point of time, fog node A becomes unavailable (e.g., due to low 114 battery issues or the fog node A moving away from the coverage of the 115 robot). There is therefore a need to predict the need of migrating/ 116 moving the function F1 to another node (e.g., fog node C in the 117 figure), and this needs to be done prior to the fog/edge node 118 becoming no longer capable/available. Such dynamic migration cannot 119 be dealt with in today's orchestration solutions, which are rather 120 reactive and static or semi-static (e.g., resources may fail, but 121 this is an exceptional event, happening with low frequency, and only 122 scaling actions are supported to react to SLA-related events). 124 -------------- 125 | ==== | 126 ------+F1+---------- 127 / | | ==== | | \ 128 / | +------+ | \ 129 | | fog node C | \ 130 | -------------- \ 131 | \ 132 | -------------- ---\---------- 133 | | ==== | | \==== | 134 | -----------+F1+------------+F2| | 135 |/ | | ==== | | | | ==== | | 136 o | +------+ | | +------+ | 137 | | fog node A | | fog node B | 138 --------+- -------------- -------------- 139 | | 140 --0----0-- 142 Figure 1: Example scenario 144 Existing frameworks rely on monitoring platforms that react to 145 resource failure events and ensure that negotiated SLAs are met. 146 However these are not designed to predict events likely to happen in 147 a volatile fog environment, such as resources moving away, resources 148 becoming unavailable due to battery issues or just changes in 149 availability of the resources because of variations of the use of the 150 local resources on the nodes. Besides, it is not feasible in this 151 kind of volatile and extremely mobile environment to perform a 152 continuous monitoring and reporting of every possible variable or 153 parameter from all the nodes hosting resources, as this would not 154 scale and would consume many resources and generate extra overhead. 156 In volatile and mobile environments, prediction (make-before-break) 157 is needed, as pure reaction (break-before-make) is not enough. This 158 prediction is not generic, and depends on the nature of the network 159 service/SFC: the functions of the SFC, the connectivity between them, 160 the service-specific requirements, etc. Monitoring has to be setup 161 differently on the nodes, depending on the specifics of the network 162 service. Besides, in order to act proactively and predict what might 163 need to be done, monitoring in such a volatile and mobile 164 environments does not only involve the nodes currently hosting the 165 resources running the network service/service function chain (i.e., 166 hosting a function), but also other nodes which are potential 167 candidates to join either in addition or in substitution to current 168 nodes for running the network service in accordance with the 169 orchestration decisions. 171 In the example of Figure 1, the fog node initially hosting function 172 F1 (fog node A) might be running out of battery and this should be 173 detected before the node A actually becomes unavailable, so the 174 function F1 can be effectively migrated in a time to a different fog 175 node C, capable of meeting the requirements of F1 (compute, 176 networking, location, expected availability, etc.). In order to be 177 able to predict the need for such a migration and have already 178 identified a target fog node where to move the function, it is needed 179 to have a monitoring solution in place that instructs each node 180 involved in the service (A and B), and also neighboring node 181 candidate (C) to host function (F1), to monitor and report on metrics 182 that are relevant for the specific network service "F1-F2" that is 183 currently running. 185 1.2. Fog monitoring framework 187 Fog environments differ from data-center ones on three key aspects: 188 heterogeneity, volatility and mobility. The fog monitoring framework 189 is used to predict events triggering and orchestration event (e.g., 190 migrating a function to a different resource). 192 The monitoring framework we propose for fog environments is composed 193 of 2 logical components: 195 * Fog agents running on each fog node. An agent is responsible for 196 sending the value of a variable or parameter to a fog monitoring 197 controller and to other fog agents. What variable or parameter 198 will be monitored and what data will be sent (including frequency) 199 is configured per agent considering the specifics of the network 200 service or SFC. A fog agent might also take some autonomous 201 actions (such as request migration of a function to a neighbor 202 node) in certain situations where connectivity with the fog 203 monitoring controller is temporarily unavailable. 205 * A fog monitoring controller (e.g., running at the edge or at a fog 206 node). This node obtains input from the orchestration logic (MANO 207 stack) and autonomously decides what variables or parameters will 208 be monitored, where will the data be collected, and how it will be 209 done, based on the requirements provided by the orchestration 210 logic managing the network services instantiated in the fog. This 211 configuration is specific to a network service, a function, or an 212 SFC as whole. 214 - It interacts with the orchestration logic to coordinate and 215 trigger orchestration events, such as function migration, 216 connectivity updates, etc. In some deployments, this entity 217 might be co-located with the orchestration logic (e.g., the 218 NFVO). 220 - It interacts with the fog agents to instruct what variables 221 and/or parameters need to be monitored. It also interacts to 222 get the resulting monitoring data. This interaction is not 223 limited to fog agents at nodes currently involved in a given 224 network service or SFC, but also includes other nodes that are 225 suitable for hosting a function that needs to be migrated. 226 This allows to provide the orchestration logic with candidate 227 nodes in a pro-active way. 229 - It is capable of autonomously discover and set up fog agents. 231 1.3. Supporting simple and complex monitoring metrics 233 Fog monitoring nodes will be capable of providing raw monitoring data 234 as well as processed data. The former are obtained directly from the 235 measured variables or parameters. The latter are obtained by 236 applying some processing function to several monitoring data items. 237 The fog monitoring controllers will specify the function to be 238 executed, which data will be collected and processed by the 239 functions, and the additional parameters that will control the 240 processing and will determine the particularities of the output of 241 each function. 243 The complexity of the functions that can be executed is arbitrary. 244 They can be either pre-instructed in the fog agents or dynamically 245 instructed by the requester (the fog monitoring controller) by 246 providing the sequence to execute the functions and their input 247 parameters. 249 Complex monitoring metrics, the processed data, can also be used as 250 part of the condition that determines the distributed and autonomic 251 actions. Thus, the logic that defines those actions is simplified 252 and the actuation components can be concentrated on their task 253 without requiring extra effort to process the raw monitoring data. 255 Adding support for complex monitoring metrics enables the fog 256 monitoring framework to avoid the transmission of unneeded data and 257 thus optimize its overall operation. For example, if the controller 258 is interested in the average of the CPU load of a fog agent for the 259 last 5 minutes, it can just request it, providing the period to 260 average as input parameter and specifying the source from which 261 measuring the CPU load variable. 263 2. Terminology 265 The following terms are using in ths document: 267 fog: Fog goes to the Extreme Edge, that is the closest 268 possible to the user including on the user device 269 itself. 271 fog node: Any device that is capable of participating in the Fog. 272 A Fog node might be volatile, mobile and constrained 273 (in terms of computing resources). Fog nodes may be 274 heterogeneous and may belong to different owners. 276 orchestrator: In this document we use orchestrator and NFVO terms 277 interchangeably. 279 3. Autonomic setup of fog monitoring framework 281 Fog nodes autonomously start fog agents at the bootstrapping, then 282 start looking for other agents and the fog monitoring controller. 283 This autonomic setup can be performed using GRASP. The procedure is 284 represented in Figure 2. The different steps are described next: 286 +--------+ +--------+ +--------+ 287 | fog | | fog | | fog | 288 | node C | | node A | | node B | +------+ 289 | | | | | | | fog | 290 | | | | | | | | | | | | +------+ | mon. | 291 | +----+ | | +----+ | | +----+ | | NFVO | | ctrl | 292 +--------+ +--------+ +--------+ +------+ +------+ 293 | | | | 294 (fog nodes A & B bootstrap) | | 295 | | | | 296 | | periodic mcast advertisement| 297 | | (ID, fog_scope) | 298 | | <----------------------------+ 299 | Mcast discovery (fog_node_ID, scope) | 300 +-------------------------------------------->| 301 +------------>| | | 302 | Mcast discovery (fog_node_ID, scope) | 303 | +------------------------------>| 304 |<------------+ | | 305 | | | | 306 | Unicast advertisement (ID, fog_scope) | 307 | |<------------------------------+ 308 |<--------------------------------------------+ 309 | | | | 310 | Unicast registration (ID, fog_node_ID | 311 | | fog_scope, capab.) | 312 | +------------------------------>| 313 +-------------------------------------------->| 314 | | | | 315 (fog nodes A & B registered) | | 316 | | | | 317 (fog node C bootstraps) | | | 318 | | | | | 319 | Mcast discovery (fog_node_ID, scope) | | 320 +---------------------------------------------------------->| 321 +-------------------------->| | | 322 +------------>| Unicast advertisement (ID, fog_scope) | 323 |<----------------------------------------------------------+ 324 |<--------------------------+ | | 325 |<------------+ Unicast registration (ID, fog_node_ID | 326 | | | fog_scope, capab.) | 327 +---------------------------------------------------------->| 328 (fog node C registered) | | | 329 | | | | | 331 Figure 2: Autonomic setup of fog agents 333 * The fog monitoring controller is regularly sending periodic 334 multicast advertisement messages, which include its ID as well as 335 the scope for the advertisement messages (i.e., the scope of where 336 the messages have to be flooded). 338 M_DISCOVERY messages are used, with new objectives and objective 339 options. GRASP specifies that "an objective option is used to 340 identify objectives for the purposes of discovery, negotiation or 341 synchronization". New objective options are defined for the 342 purposes of discovering potential fog agents with certain 343 characteristics. Non-limiting examples of these options are 344 listed below (note that the names are just examples, and the ones 345 used have to be registered by the IANA): 347 - FOGNODERADIO: used to specify a given type of radio technology, 348 e.g.,: WiFi (version), D2D, LTE, 5G, Bluetooth (version), etc. 350 - FOGNODECONNECTIVITY: used to specify a given type of 351 connectivity, e.g., layer-2, IPv4, IPv6. 353 - FOGNODEVIRTUALIZATION: used to specify a given type of 354 virtualization supported by the node where the agent runs. 355 Examples are: hypervisor (type), container, micro-kernel, bare- 356 metal, etc. 358 - FOGNODEDOMAIN: used to specify the domain/owner of the node. 359 This is useful to support operation of multiple domains/ 360 operators simultaneously on the same fog network. 362 An example of discovery message using GRASP would be the following 363 (in this example, the fog monitoring controller is identified by 364 its IPv6 address: 2001:DB8:1111:2222:3333:4444:5555:6666): 366 [M_DISCOVERY, 13948745, h'20010db8111122223333444455556666', 367 ["FOGDOMAIN", F_SYNCH_bits, 2, "operator1"]] 369 GRASP is used to allow the fog agents and the controller discovery 370 in an autonomic way. The extensions defined above, together with 371 the use of properly scoped multicast addresses (as explained 372 below), allow to precisely define which nodes participate in the 373 monitoring and to gather their principal characteristics. 375 * When a fog node bootstraps, such as nodes A and B in the figure, 376 they start sending multicast discovery messages within a given 377 scope, that is, the intended area that composes the fog. The 378 definition of the scope depends on the scenario, and examples of 379 possible scopes are: 381 - All-resources of a given manufacturer. 383 - All-resources of a given type. 385 - All-resources of a given administrative domain. 387 - All-resources of a given user. 389 - All-resources within a topological network distance (e.g., 390 number of hops). 392 - All-resources within a geographical location. 394 - Etc. 396 Combination of previous scopes are also possible. 398 The discovery messages are multicast within the scope, reaching 399 all the nodes that compose the specified fog resources. This can 400 be done for example using well defined IPv6 multicast addresses, 401 specified for each of the different scopes. This signaling is 402 based on GRASP. Different IPv6 multicast addresses need to be 403 defined to reach each different scope, using scopes equal or 404 larger than Admin-Local according to [RFC7346]. 406 * In response to multicast fog discovery messages, the fog 407 monitoring controller replies with unicast messages providing its 408 information. 410 * Fog agents can then register with a controller. The registration 411 message is unicast, and includes information on the capabilities 412 of the fog node, such as: 414 - Type of node. 416 - Vendor. 418 - Energy source: battery-powered or not. 420 - Connectivity (number of network interfaces and information 421 associated to them, such as radio technology type, layer-2 and 422 layer-3 addresses, etc.). 424 - Etc. 426 Note that registration to multiple fog monitoring controller 427 instances could also be possible if a fog node wants to belong to 428 several fog domains at the same time (but note that how the 429 orchestration of the same resource is done by multiple 430 orchestrators is not covered by this invention). The defined 431 mechanisms support this via the use of fog IDs and FOGNODEDOMAIN 432 options. 434 * A fog node C bootstraps after nodes A and B are already 435 registered. The same discovery process is followed by fog node C, 436 but in addition to the regular advertisement, registration 437 procedures described before, existing neighboring fog agents (such 438 as A and B in this example), might also respond to discovery 439 messages sent by bootstrapping nodes to provide required 440 information. This makes the procedure faster, more efficient and 441 reliable. In addition to helping the fog monitoring controller in 442 the fog agent discovery process, fog agents learn themselves about 443 the existence and associated capabilities of other fog agents. 444 This can be used to allow autonomous monitoring by the fog agents 445 without the involvement of the central controller. 447 4. IANA Considerations 449 TBD. 451 5. Security Considerations 453 TBD. 455 6. Acknowledgments 457 The work in this draft will be further developed and explored under 458 the framework of the H2020 5G-DIVE project (Grant 859881). 460 7. Informative References 462 [RFC7346] Droms, R., "IPv6 Multicast Address Scopes", RFC 7346, 463 DOI 10.17487/RFC7346, August 2014, 464 . 466 Authors' Addresses 468 Carlos J. Bernardos (editor) 469 Universidad Carlos III de Madrid 470 Av. Universidad, 30 471 28911 Leganes, Madrid 472 Spain 473 Phone: +34 91624 6236 474 Email: cjbc@it.uc3m.es 475 URI: http://www.it.uc3m.es/cjbc/ 476 Alain Mourad 477 InterDigital Europe 478 Email: Alain.Mourad@InterDigital.com 479 URI: http://www.InterDigital.com/ 481 Pedro Martinez-Julia 482 NICT 483 4-2-1, Nukui-Kitamachi, Koganei, Tokyo 484 184-8795 485 Japan 486 Phone: +81 42 327 7293 487 Email: pedro@nict.go.jp