idnits 2.17.1 draft-ietf-anima-asa-guidelines-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 412 has weird spacing: '...roperty allow...' == Line 415 has weird spacing: '...roperty allow...' == Line 419 has weird spacing: '...roperty allow...' -- The document date (14 November 2020) is 1258 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) == Outdated reference: A later version (-10) exists of draft-ietf-anima-grasp-api-07 == Outdated reference: A later version (-11) exists of draft-ietf-anima-grasp-distribution-01 == Outdated reference: A later version (-20) exists of draft-ietf-core-yang-cbor-13 Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. E. Carpenter 3 Internet-Draft Univ. of Auckland 4 Intended status: Informational L. Ciavaglia 5 Expires: 18 May 2021 Nokia 6 S. Jiang 7 Huawei Technologies Co., Ltd 8 P. Peloso 9 Nokia 10 14 November 2020 12 Guidelines for Autonomic Service Agents 13 draft-ietf-anima-asa-guidelines-00 15 Abstract 17 This document proposes guidelines for the design of Autonomic Service 18 Agents for autonomic networks, as a contribution to describing an 19 autonomic ecosystem. It is based on the Autonomic Network 20 Infrastructure outlined in the ANIMA reference model, using the 21 Autonomic Control Plane and the Generic Autonomic Signaling Protocol. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at https://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on 18 May 2021. 40 Copyright Notice 42 Copyright (c) 2020 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 47 license-info) in effect on the date of publication of this document. 48 Please review these documents carefully, as they describe your rights 49 and restrictions with respect to this document. Code Components 50 extracted from this document must include Simplified BSD License text 51 as described in Section 4.e of the Trust Legal Provisions and are 52 provided without warranty as described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Logical Structure of an Autonomic Service Agent . . . . . . . 4 58 3. Interaction with the Autonomic Networking Infrastructure . . 5 59 3.1. Interaction with the security mechanisms . . . . . . . . 5 60 3.2. Interaction with the Autonomic Control Plane . . . . . . 5 61 3.3. Interaction with GRASP and its API . . . . . . . . . . . 6 62 3.4. Interaction with policy mechanism . . . . . . . . . . . . 7 63 4. Interaction with Non-Autonomic Components . . . . . . . . . . 7 64 5. Design of GRASP Objectives . . . . . . . . . . . . . . . . . 8 65 6. Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . 9 66 6.1. Installation phase . . . . . . . . . . . . . . . . . . . 9 67 6.1.1. Installation phase inputs and outputs . . . . . . . . 10 68 6.2. Instantiation phase . . . . . . . . . . . . . . . . . . . 11 69 6.2.1. Operator's goal . . . . . . . . . . . . . . . . . . . 11 70 6.2.2. Instantiation phase inputs and outputs . . . . . . . 12 71 6.2.3. Instantiation phase requirements . . . . . . . . . . 12 72 6.3. Operation phase . . . . . . . . . . . . . . . . . . . . . 13 73 7. Coordination between Autonomic Functions . . . . . . . . . . 14 74 8. Coordination with Traditional Management Functions . . . . . 14 75 9. Robustness . . . . . . . . . . . . . . . . . . . . . . . . . 14 76 10. Security Considerations . . . . . . . . . . . . . . . . . . . 15 77 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 78 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 79 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 80 13.1. Normative References . . . . . . . . . . . . . . . . . . 16 81 13.2. Informative References . . . . . . . . . . . . . . . . . 17 82 Appendix A. Change log [RFC Editor: Please remove] . . . . . . . 19 83 Appendix B. Example Logic Flows . . . . . . . . . . . . . . . . 21 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 86 1. Introduction 88 This document proposes guidelines for the design of Autonomic Service 89 Agents (ASAs) in the context of an Autonomic Network (AN) based on 90 the Autonomic Network Infrastructure (ANI) outlined in the ANIMA 91 reference model [I-D.ietf-anima-reference-model]. This 92 infrastructure makes use of the Autonomic Control Plane (ACP) 93 [I-D.ietf-anima-autonomic-control-plane] and the Generic Autonomic 94 Signaling Protocol (GRASP) [I-D.ietf-anima-grasp]. This document is 95 a contribution to the description of an autonomic ecosystem, 96 recognizing that a deployable autonomic network needs more than just 97 ACP and GRASP implementations. It must achieve management goals that 98 a Network Operations Center (NOC) cannot achieve manually, including 99 at least a library of ASAs and corresponding GRASP objective 100 definitions. There must also be tools to deploy and oversee ASAs, 101 and integration with existing operational mechanisms [RFC8368]. 102 However, this document focuses on the design of ASAs, with some 103 reference to implementation and operational aspects. 105 There is a considerable literature about autonomic agents with a 106 variety of proposals about how they should be characterized. Some 107 examples are [DeMola06], [Huebscher08], [Movahedi12] and [GANA13]. 108 However, for the present document, the basic definitions and goals 109 for autonomic networking given in [RFC7575] apply . According to RFC 110 7575, an Autonomic Service Agent is "An agent implemented on an 111 autonomic node that implements an autonomic function, either in part 112 (in the case of a distributed function) or whole." 114 ASAs must be distinguished from other forms of software component. 115 They are components of network or service management; they do not in 116 themselves provide services. For example, the services envisaged for 117 network function virtualisation [RFC8568] or for service function 118 chaining [RFC7665] might be managed by an ASA rather than by 119 traditional configuration tools. 121 The reference model [I-D.ietf-anima-reference-model] expands this by 122 adding that an ASA is "a process that makes use of the features 123 provided by the ANI to achieve its own goals, usually including 124 interaction with other ASAs via the GRASP protocol 125 [I-D.ietf-anima-grasp] or otherwise. Of course it also interacts 126 with the specific targets of its function, using any suitable 127 mechanism. Unless its function is very simple, the ASA will need to 128 handle overlapping asynchronous operations. This will require either 129 a multi-threaded implementation, or a logically equivalent event loop 130 structure. It may therefore be a quite complex piece of software in 131 its own right, forming part of the application layer above the ANI." 133 There will certainly be very simple ASAs that manage a single 134 objective in a straightforward way and do not need asynchronous 135 operations. In such a case, many aspects of the current document do 136 not apply. However, in general a basic property of an ASA is that it 137 is a relatively complex software component that will in many cases 138 control and monitor simpler entities in the same host or elsewhere. 139 For example, a device controller that manages tens or hundreds of 140 simple devices might contain a single ASA. 142 The remainder of this document offers guidance on the design of such 143 ASAs. 145 2. Logical Structure of an Autonomic Service Agent 147 As mentioned above, all but the simplest ASAs will need to suport 148 asynchronous operations. Not all programming environments explicitly 149 support multi-threading. In that case, an 'event loop' style of 150 implementation should be adopted, in which case each thread would be 151 implemented as an event handler called in turn by the main loop. For 152 this, the GRASP API (Section 3.3) must provide non-blocking calls. 153 If necessary, the GRASP session identifier will be used to 154 distinguish simultaneous operations. 156 A typical ASA will have a main thread that performs various initial 157 housekeeping actions such as: 159 * Obtain authorization credentials. 161 * Register the ASA with GRASP. 163 * Acquire relevant policy parameters. 165 * Define data structures for relevant GRASP objectives. 167 * Register with GRASP those objectives that it will actively manage. 169 * Launch a self-monitoring thread. 171 * Enter its main loop. 173 The logic of the main loop will depend on the details of the 174 autonomic function concerned. Whenever asynchronous operations are 175 required, extra threads will be launched, or events added to the 176 event loop. Examples include: 178 * Repeatedly flood an objective to the AN, so that any ASA can 179 receive the objective's latest value. 181 * Accept incoming synchronization requests for an objective managed 182 by this ASA. 184 * Accept incoming negotiation requests for an objective managed by 185 this ASA, and then conduct the resulting negotiation with the 186 counterpart ASA. 188 * Manage subsidiary non-autonomic devices directly. 190 These threads or events should all either exit after their job is 191 done, or enter a wait state for new work, to avoid blocking others 192 unnecessarily. 194 According to the degree of parallelism needed by the application, 195 some of these threads or events might be launched in multiple 196 instances. In particular, if negotiation sessions with other ASAs 197 are expected to be long or to involve wait states, the ASA designer 198 might allow for multiple simultaneous negotiating threads, with 199 appropriate use of queues and locks to maintain consistency. 201 The main loop itself could act as the initiator of synchronization 202 requests or negotiation requests, when the ASA needs data or 203 resources from other ASAs. In particular, the main loop should watch 204 for changes in policy parameters that affect its operation. It 205 should also do whatever is required to avoid unnecessary resource 206 consumption, such as including an arbitrary wait time in each cycle 207 of the main loop. 209 The self-monitoring thread is of considerable importance. Autonomic 210 service agents must never fail. To a large extent this depends on 211 careful coding and testing, with no unhandled error returns or 212 exceptions, but if there is nevertheless some sort of failure, the 213 self-monitoring thread should detect it, fix it if possible, and in 214 the worst case restart the entire ASA. 216 Appendix B presents some example logic flows in informal pseudocode. 218 3. Interaction with the Autonomic Networking Infrastructure 220 3.1. Interaction with the security mechanisms 222 An ASA by definition runs in an autonomic node. Before any normal 223 ASAs are started, such nodes must be bootstrapped into the autonomic 224 network's secure key infrastructure in accordance with 225 [I-D.ietf-anima-bootstrapping-keyinfra]. This key infrastructure 226 will be used to secure the ACP (next section) and may be used by ASAs 227 to set up additional secure interactions with their peers, if needed. 229 Note that the secure bootstrap process itself may include special- 230 purpose ASAs that run in a constrained insecure mode. 232 3.2. Interaction with the Autonomic Control Plane 234 In a normal autonomic network, ASAs will run as users of the ACP, 235 which will provide a fully secured network environment for all 236 communication with other ASAs, in most cases mediated by GRASP (next 237 section). 239 Note that the ACP formation process itself may include special- 240 purpose ASAs that run in a constrained insecure mode. 242 3.3. Interaction with GRASP and its API 244 GRASP [I-D.ietf-anima-grasp] is expected to run as a separate process 245 with its API [I-D.ietf-anima-grasp-api] available in user space. 246 Thus ASAs may operate without special privilege, unless they need it 247 for other reasons. The ASA's view of GRASP is built around GRASP 248 objectives (Section 5), defined as data structures containing 249 administrative information such as the objective's unique name, and 250 its current value. The format and size of the value is not 251 restricted by the protocol, except that it must be possible to 252 serialise it for transmission in CBOR [RFC7049], which is no 253 restriction at all in practice. 255 The GRASP API should offer the following features: 257 * Registration functions, so that an ASA can register itself and the 258 objectives that it manages. 260 * A discovery function, by which an ASA can discover other ASAs 261 supporting a given objective. 263 * A negotiation request function, by which an ASA can start 264 negotiation of an objective with a counterpart ASA. With this, 265 there is a corresponding listening function for an ASA that wishes 266 to respond to negotiation requests, and a set of functions to 267 support negotiating steps. 269 * A synchronization function, by which an ASA can request the 270 current value of an objective from a counterpart ASA. With this, 271 there is a corresponding listening function for an ASA that wishes 272 to respond to synchronization requests. 274 * A flood function, by which an ASA can cause the current value of 275 an objective to be flooded throughout the AN so that any ASA can 276 receive it. 278 For further details and some additional housekeeping functions, see 279 [I-D.ietf-anima-grasp-api]. 281 This API is intended to support the various interactions expected 282 between most ASAs, such as the interactions outlined in Section 2. 283 However, if ASAs require additional communication between themselves, 284 they can do so using any desired protocol. One option is to use 285 GRASP discovery and synchronization as a rendez-vous mechanism 286 between two ASAs, passing communication parameters such as a TCP port 287 number via GRASP. As noted above, either the ACP or in special cases 288 the autonomic key infrastructure will be used to secure such 289 communications. 291 3.4. Interaction with policy mechanism 293 At the time of writing, the policy (or "Intent") mechanism for the 294 ANI is undefined and is regarded as a research topic. It is expected 295 to operate by an information distribution mechanism (e.g. 296 [I-D.ietf-anima-grasp-distribution]) that can reach all autonomic 297 nodes, and therefore every ASA. However, each ASA must be capable of 298 operating "out of the box" in the absence of locally defined policy, 299 so every ASA implementation must include carefully chosen default 300 values and settings for all policy parameters. 302 4. Interaction with Non-Autonomic Components 304 An ASA, to have any external effects, must also interact with non- 305 autonomic components of the node where it is installed. For example, 306 an ASA whose purpose is to manage a resource must interact with that 307 resource. An ASA whose purpose is to manage an entity that is 308 already managed by local software must interact with that software. 309 For example, if such management is performed by NETCONF [RFC6241], 310 the ASA must interact directly with the NETCONF server in the same 311 node. This is stating the obvious, and the details are specific to 312 each case, but it has an important security implication. The ASA 313 might act as a loophole by which the managed entity could penetrate 314 the security boundary of the ANI. The ASA must be designed to avoid 315 such loopholes, and should if possible operate in an unprivileged 316 mode. 318 In an environment where systems are virtualized and specialized using 319 techniques such as network function virtualization or network 320 slicing, there will be a design choice whether ASAs are deployed once 321 per physical node or once per virtual context. A related issue is 322 whether the ANI as a whole is deployed once on a physical network, or 323 whether several virtual ANIs are deployed. This aspect needs to be 324 considered by the ASA designer. 326 5. Design of GRASP Objectives 328 The general rules for the format of GRASP Objective options, their 329 names, and IANA registration are given in [I-D.ietf-anima-grasp]. 330 Additionally that document discusses various general considerations 331 for the design of objectives, which are not repeated here. However, 332 we emphasize that the GRASP protocol does not provide transactional 333 integrity. In other words, if an ASA is capable of overlapping 334 several negotiations for a given objective, then the ASA itself must 335 use suitable locking techniques to avoid interference between these 336 negotiations. For example, if an ASA is allocating part of a shared 337 resource to other ASAs, it needs to ensure that the same part of the 338 resource is not allocated twice. This might impact the design of the 339 objective as well as the logic flow of the ASA. 341 In particular, if 'dry run' mode is defined for the objective, its 342 specification, and every implementation, must consider what state 343 needs to be saved following a dry run negotiation, such that a 344 subsequent live negotiation can be expected to succeed. It must be 345 clear how long this state is kept, and what happens if the live 346 negotiation occurs after this state is deleted. An ASA that requests 347 a dry run negotiation must take account of the possibility that a 348 successful dry run is followed by a failed live negotiation. Because 349 of these complexities, the dry run mechanism should only be supported 350 by objectives and ASAs where there is a significant benefit from it. 352 The actual value field of an objective is limited by the GRASP 353 protocol definition to any data structure that can be expressed in 354 Concise Binary Object Representation (CBOR) [RFC7049]. For some 355 objectives, a single data item will suffice; for example an integer, 356 a floating point number or a UTF-8 string. For more complex cases, a 357 simple tuple structure such as [item1, item2, item3] could be used. 358 Nothing prevents using other formats such as JSON, but this requires 359 the ASA to be capable of parsing and generating JSON. The formats 360 acceptable by the GRASP API will limit the options in practice. A 361 fallback solution is for the API to accept and deliver the value 362 field in raw CBOR, with the ASA itself encoding and decoding it via a 363 CBOR library. 365 Note that a mapping from YANG to CBOR is defined by 366 [I-D.ietf-core-yang-cbor]. Subject to the size limit defined for 367 GRASP messages, nothing prevents objectives using YANG in this way. 369 6. Life Cycle 371 Autonomic functions could be permanent, in the sense that ASAs are 372 shipped as part of a product and persist throughout the product's 373 life. However, a more likely situation is that ASAs need to be 374 installed or updated dynamically, because of new requirements or 375 bugs. Because continuity of service is fundamental to autonomic 376 networking, the process of seamlessly replacing a running instance of 377 an ASA with a new version needs to be part of the ASA's design. 379 The implication of service continuity on the design of ASAs can be 380 illustrated along the three main phases of the ASA life-cycle, namely 381 Installation, Instantiation and Operation. 383 +--------------+ 384 Undeployed ------>| |------> Undeployed 385 | Installed | 386 +-->| |---+ 387 Mandate | +--------------+ | Receives a 388 is revoked | +--------------+ | Mandate 389 +---| |<--+ 390 | Instantiated | 391 +-->| |---+ 392 set | +--------------+ | set 393 down | +--------------+ | up 394 +---| |<--+ 395 | Operational | 396 | | 397 +--------------+ 399 Figure 1: Life cycle of an Autonomic Service Agent 401 6.1. Installation phase 403 Before being able to instantiate and run ASAs, the operator must 404 first provision the infrastructure with the sets of ASA software 405 corresponding to its needs and objectives. The provisioning of the 406 infrastructure is realized in the installation phase and consists in 407 installing (or checking the availability of) the pieces of software 408 of the different ASA classes in a set of Installation Hosts. 410 There are 3 properties applicable to the installation of ASAs: 412 The dynamic installation property allows installing an ASA on 413 demand, on any hosts compatible with the ASA. 415 The decoupling property allows controlling resources of a NE from a 416 remote ASA, i.e. an ASA installed on a host machine different from 417 the resources' NE. 419 The multiplicity property allows controlling multiple sets of 420 resources from a single ASA. 422 These three properties are very important in the context of the 423 installation phase as their variations condition how the ASA class 424 could be installed on the infrastructure. 426 6.1.1. Installation phase inputs and outputs 428 Inputs are: 430 [ASA class of type_x] that specifies which classes ASAs to install, 432 [Installation_target_Infrastructure] that specifies the candidate 433 Installation Hosts, 435 [ASA class placement function, e.g. under which criteria/ 436 constraints as defined by the operator] that specifies how the 437 installation phase shall meet the operator's needs and objectives 438 for the provision of the infrastructure. In the coupled mode, the 439 placement function is not necessary, whereas in the decoupled 440 mode, the placement function is mandatory, even though it can be 441 as simple as an explicit list of Installation hosts. 443 The main output of the installation phase is an up-to-date directory 444 of installed ASAs which corresponds to [list of ASA classes] 445 installed on [list of installation Hosts]. This output is also 446 useful for the coordination function and corresponds to the static 447 interaction map (see next section). 449 The condition to validate in order to pass to next phase is to ensure 450 that [list of ASA classes] are well installed on [list of 451 installation Hosts]. The state of the ASA at the end of the 452 installation phase is: installed. (not instantiated). The following 453 commands or messages are foreseen: install(list of ASA classes, 454 Installation_target_Infrastructure, ASA class placement function), 455 and un-install (list of ASA classes). 457 6.2. Instantiation phase 459 Once the ASAs are installed on the appropriate hosts in the network, 460 these ASA may start to operate. From the operator viewpoint, an 461 operating ASA means the ASA manages the network resources as per the 462 objectives given. At the ASA local level, operating means executing 463 their control loop/algorithm. 465 But right before that, there are two things to take into 466 consideration. First, there is a difference between 1. having a 467 piece of code available to run on a host and 2. having an agent based 468 on this piece of code running inside the host. Second, in a coupled 469 case, determining which resources are controlled by an ASA is 470 straightforward (the determination is embedded), in a decoupled mode 471 determining this is a bit more complex (hence a starting agent will 472 have to either discover or be taught it). 474 The instantiation phase of an ASA covers both these aspects: starting 475 the agent piece of code (when this does not start automatically) and 476 determining which resources have to be controlled (when this is not 477 obvious). 479 6.2.1. Operator's goal 481 Through this phase, the operator wants to control its autonomic 482 network in two things: 484 1 determine the scope of autonomic functions by instructing which of 485 the network resources have to be managed by which autonomic 486 function (and more precisely which class e.g. 1. version X or 487 version Y or 2. provider A or provider B), 489 2 determine how the autonomic functions are organized by instructing 490 which ASAs have to interact with which other ASAs (or more 491 precisely which set of network resources have to be handled as an 492 autonomous group by their managing ASAs). 494 Additionally in this phase, the operator may want to set objectives 495 to autonomic functions, by configuring the ASAs technical objectives. 497 The operator's goal can be summarized in an instruction to the ANIMA 498 ecosystem matching the following pattern: 500 [ASA of type_x instances] ready to control 501 [Instantiation_target_Infrastructure] with 502 [Instantiation_target_parameters] 504 6.2.2. Instantiation phase inputs and outputs 506 Inputs are: 508 [ASA of type_x instances] that specifies which are the ASAs to be 509 targeted (and more precisely which class e.g. 1. version X or 510 version Y or 2. provider A or provider B), 512 [Instantiation_target_Infrastructure] that specifies which are the 513 resources to be managed by the autonomic function, this can be the 514 whole network or a subset of it like a domain a technology segment 515 or even a specific list of resources, 517 [Instantiation_target_parameters] that specifies which are the 518 technical objectives to be set to ASAs (e.g. an optimization 519 target) 521 Outputs are: 523 [Set of ASAs - Resources relations] describing which resources are 524 managed by which ASA instances, this is not a formal message, but 525 a resulting configuration of a set of ASAs, 527 6.2.3. Instantiation phase requirements 529 The instructions described in section 4.2 could be either: 531 sent to a targeted ASA In which case, the receiving Agent will have 532 to manage the specified list of 533 [Instantiation_target_Infrastructure], with the 534 [Instantiation_target_parameters]. 536 broadcast to all ASAs In which case, the ASAs would collectively 537 determine from the list which Agent(s) would handle which 538 [Instantiation_target_Infrastructure], with the 539 [Instantiation_target_parameters]. 541 This set of instructions can be materialized through a message that 542 is named an Instance Mandate (description TBD). 544 The conclusion of this instantiation phase is a ready to operate ASA 545 (or interacting set of ASAs), then this (or those) ASA(s) can 546 describe themselves by depicting which are the resources they manage 547 and what this means in terms of metrics being monitored and in terms 548 of actions that can be executed (like modifying the parameters 549 values). A message conveying such a self description is named an 550 Instance Manifest (description TBD). 552 Though the operator may well use such a self-description "per se", 553 the final goal of such a description is to be shared with other ANIMA 554 entities like: 556 * the coordination entities (see [I-D.ciavaglia-anima-coordination]) 558 * collaborative entities in the purpose of establishing knowledge 559 exchanges (some ASAs may produce knowledge or even monitor metrics 560 that other ASAs cannot make by themselves why those would be 561 useful for their execution) 563 6.3. Operation phase 565 Note: This section is to be further developed in future revisions of 566 the document, especially the implications on the design of ASAs. 568 During the Operation phase, the operator can: 570 Activate/Deactivate ASA: meaning enabling those to execute their 571 autonomic loop or not. 573 Modify ASAs targets: meaning setting them different objectives. 575 Modify ASAs managed resources: by updating the instance mandate 576 which would specify different set of resources to manage (only 577 applicable to decouples ASAs). 579 During the Operation phase, running ASAs can interact the one with 580 the other: 582 in order to exchange knowledge (e.g. an ASA providing traffic 583 predictions to load balancing ASA) 585 in order to collaboratively reach an objective (e.g. ASAs 586 pertaining to the same autonomic function targeted to manage a 587 network domain, these ASA will collaborate - in the case of a load 588 balancing one, by modifying the links metrics according to the 589 neighboring resources loads) 591 During the Operation phase, running ASAs are expected to apply 592 coordination schemes 594 then execute their control loop under coordination supervision/ 595 instructions 597 The ASA life-cycle is discussed in more detail in "A Day in the Life 598 of an Autonomic Function" [I-D.peloso-anima-autonomic-function]. 600 7. Coordination between Autonomic Functions 602 Some autonomic functions will be completely independent of each 603 other. However, others are at risk of interfering with each other - 604 for example, two different optimization functions might both attempt 605 to modify the same underlying parameter in different ways. In a 606 complete system, a method is needed of identifying ASAs that might 607 interfere with each other and coordinating their actions when 608 necessary. This issue is considered in "Autonomic Functions 609 Coordination" [I-D.ciavaglia-anima-coordination]. 611 8. Coordination with Traditional Management Functions 613 Some ASAs will have functions that overlap with existing 614 configuration tools and network management mechanisms such as command 615 line interfaces, DHCP, DHCPv6, SNMP, NETCONF, RESTCONF and YANG-based 616 solutions. Each ASA designer will need to consider this issue and 617 how to avoid clashes and inconsistencies. Some specific 618 considerations for interaction with OAM tools are given in [RFC8368]. 619 As another example, [I-D.ietf-anima-prefix-management] describes how 620 autonomic management of IPv6 prefixes can interact with prefix 621 delegation via DHCPv6. The description of a GRASP objective and of 622 an ASA using it should include a discussion of any such interactions. 624 A related aspect is that management functions often include a data 625 model, quite likely to be expressed in a formal notation such as 626 YANG. This aspect should not be an afterthought in the design of an 627 ASA. To the contrary, the design of the ASA and of its GRASP 628 objectives should match the data model; as noted above, YANG 629 serialized as CBOR may be used directly as the value of a GRASP 630 objective. 632 9. Robustness 634 It is of great importance that all components of an autonomic system 635 are highly robust. In principle they must never fail. This section 636 lists various aspects of robustness that ASA designers should 637 consider. 639 1. If despite all precautions, an ASA does encounter a fatal error, 640 it should in any case restart automatically and try again. To 641 mitigate a hard loop in case of persistent failure, a suitable 642 pause should be inserted before such a restart. The length of 643 the pause depends on the use case. 645 2. If a newly received or calculated value for a parameter falls out 646 of bounds, the corresponding parameter should be either left 647 unchanged or restored to a safe value. 649 3. If a GRASP synchronization or negotiation session fails for any 650 reason, it may be repeated after a suitable pause. The length of 651 the pause depends on the use case. 653 4. If a session fails repeatedly, the ASA should consider that its 654 peer has failed, and cause GRASP to flush its discovery cache and 655 repeat peer discovery. 657 5. In any case, it may be prudent to repeat discovery periodically, 658 depending on the use case. 660 6. Any received GRASP message should be checked. If it is wrongly 661 formatted, it should be ignored. Within a unicast session, an 662 Invalid message (M_INVALID) may be sent. This function may be 663 provided by the GRASP implementation itself. 665 7. Any received GRASP objective should be checked. If it is wrongly 666 formatted, it should be ignored. Within a negotiation session, a 667 Negotiation End message (M_END) with a Decline option (O_DECLINE) 668 should be sent. An ASA may log such events for diagnostic 669 purposes. 671 8. If an ASA receives either an Invalid message (M_INVALID) or a 672 Negotiation End message (M_END) with a Decline option 673 (O_DECLINE), one possible reason is that the peer ASA does not 674 support a new feature of either GRASP or of the objective in 675 question. In such a case the ASA may choose to repeat the 676 operation concerned without using that new feature. 678 9. All other possible exceptions should be handled in an orderly 679 way. There should be no such thing as an unhandled exception 680 (but see point 1 above). 682 10. Security Considerations 684 ASAs are intended to run in an environment that is protected by the 685 Autonomic Control Plane [I-D.ietf-anima-autonomic-control-plane], 686 admission to which depends on an initial secure bootstrap process 687 [I-D.ietf-anima-bootstrapping-keyinfra]. In some deployments, a 688 secure partition of the link layer might be used instead 689 [I-D.carpenter-anima-l2acp-scenarios]. However, this does not 690 relieve ASAs of responsibility for security. In particular, when 691 ASAs configure or manage network elements outside the ACP, they must 692 use secure techniques and carefully validate any incoming 693 information. As noted above, this will apply in particular when an 694 ASA interacts with a management component such as a NETCONF server. 696 As appropriate to their specific functions, ASAs should take account 697 of relevant privacy considerations [RFC6973]. 699 Authorization of ASAs is a subject for future study. At present, 700 ASAs are trusted by virtue of being installed on a node that has 701 successfully joined the ACP. In the general case, a node may have 702 mutltiple roles and a role may use multiple ASAs, each using multiple 703 GRASP objectives. Additional mechanisms for the authorization of 704 nodes and ASAs to manipulate specific GRASP objectives could be 705 designed. 707 11. IANA Considerations 709 This document makes no request of the IANA. 711 12. Acknowledgements 713 Useful comments were received from Michael Behringer Toerless Eckert, 714 Alex Galis, Bing Liu, Michael Richardson, and other members of the 715 ANIMA WG. 717 13. References 719 13.1. Normative References 721 [I-D.ietf-anima-autonomic-control-plane] 722 Eckert, T., Behringer, M., and S. Bjarnason, "An Autonomic 723 Control Plane (ACP)", Work in Progress, Internet-Draft, 724 draft-ietf-anima-autonomic-control-plane-30, 30 October 725 2020, . 728 [I-D.ietf-anima-bootstrapping-keyinfra] 729 Pritikin, M., Richardson, M., Eckert, T., Behringer, M., 730 and K. Watsen, "Bootstrapping Remote Secure Key 731 Infrastructures (BRSKI)", Work in Progress, Internet- 732 Draft, draft-ietf-anima-bootstrapping-keyinfra-45, 11 733 November 2020, . 736 [I-D.ietf-anima-grasp] 737 Bormann, C., Carpenter, B., and B. Liu, "A Generic 738 Autonomic Signaling Protocol (GRASP)", Work in Progress, 739 Internet-Draft, draft-ietf-anima-grasp-15, 13 July 2017, 740 . 742 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 743 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 744 October 2013, . 746 13.2. Informative References 748 [DeMola06] De Mola, F. and R. Quitadamo, "An Agent Model for Future 749 Autonomic Communications", Proceedings of the 7th WOA 2006 750 Workshop From Objects to Agents 51-59, September 2006. 752 [GANA13] "Autonomic network engineering for the self-managing 753 Future Internet (AFI): GANA Architectural Reference Model 754 for Autonomic Networking, Cognitive Networking and Self- 755 Management.", April 2013, 756 . 759 [Huebscher08] 760 Huebscher, M. C. and J. A. McCann, "A survey of autonomic 761 computing--degrees, models, and applications", ACM 762 Computing Surveys (CSUR) Volume 40 Issue 3 DOI: 763 10.1145/1380584.1380585, August 2008. 765 [I-D.carpenter-anima-l2acp-scenarios] 766 Carpenter, B. and B. Liu, "Scenarios and Requirements for 767 Layer 2 Autonomic Control Planes", Work in Progress, 768 Internet-Draft, draft-carpenter-anima-l2acp-scenarios-02, 769 8 April 2020, . 772 [I-D.ciavaglia-anima-coordination] 773 Ciavaglia, L. and P. Peloso, "Autonomic Functions 774 Coordination", Work in Progress, Internet-Draft, draft- 775 ciavaglia-anima-coordination-01, 21 March 2016, 776 . 779 [I-D.ietf-anima-grasp-api] 780 Carpenter, B., Liu, B., Wang, W., and X. Gong, "Generic 781 Autonomic Signaling Protocol Application Program Interface 782 (GRASP API)", Work in Progress, Internet-Draft, draft- 783 ietf-anima-grasp-api-07, 13 October 2020, 784 . 787 [I-D.ietf-anima-grasp-distribution] 788 Liu, B., Xiao, X., Hecker, A., Jiang, S., Despotovic, Z., 789 and B. Carpenter, "Information Distribution over GRASP", 790 Work in Progress, Internet-Draft, draft-ietf-anima-grasp- 791 distribution-01, 1 September 2020, 792 . 795 [I-D.ietf-anima-prefix-management] 796 Jiang, S., Du, Z., Carpenter, B., and Q. Sun, "Autonomic 797 IPv6 Edge Prefix Management in Large-scale Networks", Work 798 in Progress, Internet-Draft, draft-ietf-anima-prefix- 799 management-07, 18 December 2017, 800 . 803 [I-D.ietf-anima-reference-model] 804 Behringer, M., Carpenter, B., Eckert, T., Ciavaglia, L., 805 and J. Nobre, "A Reference Model for Autonomic 806 Networking", Work in Progress, Internet-Draft, draft-ietf- 807 anima-reference-model-10, 22 November 2018, 808 . 811 [I-D.ietf-core-yang-cbor] 812 Veillette, M., Petrov, I., and A. Pelov, "CBOR Encoding of 813 Data Modeled with YANG", Work in Progress, Internet-Draft, 814 draft-ietf-core-yang-cbor-13, 4 July 2020, 815 . 818 [I-D.peloso-anima-autonomic-function] 819 Pierre, P. and L. Ciavaglia, "A Day in the Life of an 820 Autonomic Function", Work in Progress, Internet-Draft, 821 draft-peloso-anima-autonomic-function-01, 21 March 2016, 822 . 825 [Movahedi12] 826 Movahedi, Z., Ayari, M., Langar, R., and G. Pujolle, "A 827 Survey of Autonomic Network Architectures and Evaluation 828 Criteria", IEEE Communications Surveys & Tutorials Volume: 829 14 , Issue: 2 DOI: 10.1109/SURV.2011.042711.00078, 830 Page(s): 464 - 490, 2012. 832 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 833 and A. Bierman, Ed., "Network Configuration Protocol 834 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 835 . 837 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 838 Morris, J., Hansen, M., and R. Smith, "Privacy 839 Considerations for Internet Protocols", RFC 6973, 840 DOI 10.17487/RFC6973, July 2013, 841 . 843 [RFC7575] Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., 844 Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic 845 Networking: Definitions and Design Goals", RFC 7575, 846 DOI 10.17487/RFC7575, June 2015, 847 . 849 [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function 850 Chaining (SFC) Architecture", RFC 7665, 851 DOI 10.17487/RFC7665, October 2015, 852 . 854 [RFC8368] Eckert, T., Ed. and M. Behringer, "Using an Autonomic 855 Control Plane for Stable Connectivity of Network 856 Operations, Administration, and Maintenance (OAM)", 857 RFC 8368, DOI 10.17487/RFC8368, May 2018, 858 . 860 [RFC8568] Bernardos, CJ., Rahman, A., Zuniga, JC., Contreras, LM., 861 Aranda, P., and P. Lynch, "Network Virtualization Research 862 Challenges", RFC 8568, DOI 10.17487/RFC8568, April 2019, 863 . 865 Appendix A. Change log [RFC Editor: Please remove] 867 draft-ietf-anima-asa-guidelines-00, 2020-11: 869 * Adopted by WG 870 * Editorial fixes 872 draft-carpenter-anima-asa-guidelines-09, 2020-07-25: 874 * Additional text on future authorization. 875 * Editorial fixes 877 draft-carpenter-anima-asa-guidelines-08, 2020-01-10: 879 * Introduced notion of autonomic ecosystem. 880 * Minor technical clarifications. 881 * Converted to v3 format. 883 draft-carpenter-anima-asa-guidelines-07, 2019-07-17: 885 * Improved explanation of threading vs event-loop 886 * Other editorial improvements. 888 draft-carpenter-anima-asa-guidelines-06, 2018-01-07: 890 * Expanded and improved example logic flow. 891 * Editorial corrections. 893 draft-carpenter-anima-asa-guidelines-05, 2018-06-30: 895 * Added section on relationshp with non-autonomic components. 896 * Editorial corrections. 898 draft-carpenter-anima-asa-guidelines-04, 2018-03-03: 900 * Added note about simple ASAs. 901 * Added note about NFV/SFC services. 902 * Improved text about threading v event loop model 903 * Added section about coordination with traditional tools. 904 * Added appendix with example logic flow. 906 draft-carpenter-anima-asa-guidelines-03, 2017-10-25: 908 * Added details on life cycle. 909 * Added details on robustness. 910 * Added co-authors. 912 draft-carpenter-anima-asa-guidelines-02, 2017-07-01: 914 * Expanded description of event-loop case. 915 * Added note about 'dry run' mode. 917 draft-carpenter-anima-asa-guidelines-01, 2017-01-06: 919 * More sections filled in. 921 draft-carpenter-anima-asa-guidelines-00, 2016-09-30: 923 * Initial version 925 Appendix B. Example Logic Flows 927 This appendix describes generic logic flows for an Autonomic Service 928 Agent (ASA) for resource management. Note that these are 929 illustrative examples, and in no sense requirements. As long as the 930 rules of GRASP are followed, a real implementation could be 931 different. The reader is assumed to be familiar with GRASP 932 [I-D.ietf-anima-grasp] and its conceptual API 933 [I-D.ietf-anima-grasp-api]. 935 A complete autonomic function for a resource would consist of a 936 number of instances of the ASA placed at relevant points in a 937 network. Specific details will of course depend on the resource 938 concerned. One example is IP address prefix management, as specified 939 in [I-D.ietf-anima-prefix-management]. In this case, an instance of 940 the ASA would exist in each delegating router. 942 An underlying assumption is that there is an initial source of the 943 resource in question, referred to here as an origin ASA. The other 944 ASAs, known as delegators, obtain supplies of the resource from the 945 origin, and then delegate quantities of the resource to consumers 946 that request it, and recover it when no longer needed. 948 Another assumption is there is a set of network wide policy 949 parameters, which the origin will provide to the delegators. These 950 parameters will control how the delegators decide how much resource 951 to provide to consumers. Thus the ASA logic has two operating modes: 952 origin and delegator. When running as an origin, it starts by 953 obtaining a quantity of the resource from the NOC, and it acts as a 954 source of policy parameters, via both GRASP flooding and GRASP 955 synchronization. (In some scenarios, flooding or synchronization 956 alone might be sufficient, but this example includes both.) 958 When running as a delegator, it starts with an empty resource pool, 959 it acquires the policy parameters by GRASP synchronization, and it 960 delegates quantities of the resource to consumers that request it. 961 Both as an origin and as a delegator, when its pool is low it seeks 962 quantities of the resource by requesting GRASP negotiation with peer 963 ASAs. When its pool is sufficient, it hands out resource to peer 964 ASAs in response to negotiation requests. Thus, over time, the 965 initial resource pool held by the origin will be shared among all the 966 delegators according to demand. 968 In theory a network could include any number of origins and any 969 number of delegators, with the only condition being that each 970 origin's initial resource pool is unique. A realistic scenario is to 971 have exactly one origin and as many delegators as you like. A 972 scenario with no origin is useless. 974 An implementation requirement is that resource pools are kept in 975 stable storage. Otherwise, if a delegator exits for any reason, all 976 the resources it has obtained or delegated are lost. If an origin 977 exits, its entire spare pool is lost. The logic for using stable 978 storage and for crash recovery is not included in the pseudocode 979 below. 981 The description below does not implement GRASP's 'dry run' function. 982 That would require temporarily marking any resource handed out in a 983 dry run negotiation as reserved, until either the peer obtains it in 984 a live run, or a suitable timeout expires. 986 The main data structures used in each instance of the ASA are: 988 * The resource_pool, for example an ordered list of available 989 resources. Depending on the nature of the resource, units of 990 resource are split when appropriate, and a background garbage 991 collector recombines split resources if they are returned to the 992 pool. 994 * The delegated_list, where a delegator stores the resources it has 995 given to consumers routers. 997 Possible main logic flows are below, using a threaded implementation 998 model. The transformation to an event loop model should be apparent 999 - each thread would correspond to one event in the event loop. 1001 The GRASP objectives are as follows: 1003 * ["EX1.Resource", flags, loop_count, value] where the value depends 1004 on the resource concerned, but will typically include its size and 1005 identification. 1007 * ["EX1.Params", flags, loop_count, value] where the value will be, 1008 for example, a JSON object defining the applicable parameters. 1010 In the outline logic flows below, these objectives are represented 1011 simply by their names. 1013 1015 MAIN PROGRAM: 1017 Create empty resource_pool (and an associated lock) 1018 Create empty delegated_list 1019 Determine whether to act as origin 1020 if origin: 1021 Obtain initial resource_pool contents from NOC 1022 Obtain value of EX1.Params from NOC 1023 Register ASA with GRASP 1024 Register GRASP objectives EX1.Resource and EX1.Params 1025 if origin: 1026 Start FLOODER thread to flood EX1.Params 1027 Start SYNCHRONIZER listener for EX1.Params 1028 Start MAIN_NEGOTIATOR thread for EX1.Resource 1029 if not origin: 1030 Obtain value of EX1.Params from GRASP flood or synchronization 1031 Start DELEGATOR thread 1032 Start GARBAGE_COLLECTOR thread 1033 do forever: 1034 good_peer = none 1035 if resource_pool is low: 1036 Calculate amount A of resource needed 1037 Discover peers using GRASP M_DISCOVER / M_RESPONSE 1038 if good_peer in peers: 1039 peer = good_peer 1040 else: 1041 peer = #any choice among peers 1042 grasp.request_negotiate("EX1.Resource", peer) 1043 i.e., send M_REQ_NEG 1044 Wait for response (M_NEGOTIATE, M_END or M_WAIT) 1045 if OK: 1046 if offered amount of resource sufficient: 1047 Send M_END + O_ACCEPT #negotiation succeeded 1048 Add resource to pool 1049 good_peer = peer 1050 else: 1051 Send M_END + O_DECLINE #negotiation failed 1052 sleep() #sleep time depends on application scenario 1054 MAIN_NEGOTIATOR thread: 1056 do forever: 1057 grasp.listen_negotiate("EX1.Resource") 1058 i.e., wait for M_REQ_NEG 1059 Start a separate new NEGOTIATOR thread for requested amount A 1061 NEGOTIATOR thread: 1063 Request resource amount A from resource_pool 1064 if not OK: 1065 while not OK and A > Amin: 1066 A = A-1 1067 Request resource amount A from resource_pool 1068 if OK: 1069 Offer resource amount A to peer by GRASP M_NEGOTIATE 1070 if received M_END + O_ACCEPT: 1071 #negotiation succeeded 1072 elif received M_END + O_DECLINE or other error: 1073 #negotiation failed 1074 else: 1075 Send M_END + O_DECLINE #negotiation failed 1077 DELEGATOR thread: 1079 do forever: 1080 Wait for request or release for resource amount A 1081 if request: 1082 Get resource amount A from resource_pool 1083 if OK: 1084 Delegate resource to consumer 1085 Record in delegated_list 1086 else: 1087 Signal failure to consumer 1088 Signal main thread that resource_pool is low 1089 else: 1090 Delete resource from delegated_list 1091 Return resource amount A to resource_pool 1093 SYNCHRONIZER thread: 1095 do forever: 1096 Wait for M_REQ_SYN message for EX1.Params 1097 Reply with M_SYNCH message for EX1.Params 1099 FLOODER thread: 1101 do forever: 1102 Send M_FLOOD message for EX1.Params 1103 sleep() #sleep time depends on application scenario 1105 GARBAGE_COLLECTOR thread: 1107 do forever: 1108 Search resource_pool for adjacent resources 1109 Merge adjacent resources 1110 sleep() #sleep time depends on application scenario 1112 1114 Authors' Addresses 1116 Brian Carpenter 1117 School of Computer Science 1118 University of Auckland 1119 PB 92019 1120 Auckland 1142 1121 New Zealand 1123 Email: brian.e.carpenter@gmail.com 1125 Laurent Ciavaglia 1126 Nokia 1127 Villarceaux 1128 91460 Nozay 1129 France 1131 Email: laurent.ciavaglia@nokia.com 1133 Sheng Jiang 1134 Huawei Technologies Co., Ltd 1135 Q14 Huawei Campus 1136 156 Beiqing Road 1137 Hai-Dian District 1138 Beijing 1139 100095 1140 China 1142 Email: jiangsheng@huawei.com 1144 Pierre Peloso 1145 Nokia 1146 Villarceaux 1147 91460 Nozay 1148 France 1149 Email: pierre.peloso@nokia.com