idnits 2.17.1 draft-irtf-nmrg-autonomic-sla-violation-detection-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 21, 2017) is 2380 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'LMAP' is mentioned on line 538, but not defined == Missing Reference: 'IPFIX' is mentioned on line 546, but not defined == Missing Reference: 'ALTO' is mentioned on line 555, but not defined == Outdated reference: A later version (-30) exists of draft-ietf-anima-autonomic-control-plane-09 -- Obsolete informational reference (is this intentional?): RFC 4148 (Obsoleted by RFC 6248) Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Management Research Group J. Nobre 3 Internet-Draft University of Vale do Rio dos Sinos 4 Intended status: Informational L. Granville 5 Expires: April 24, 2018 Federal University of Rio Grande do Sul 6 A. Clemm 7 Huawei 8 A. Gonzalez Prieto 9 VMware 10 October 21, 2017 12 Autonomic Networking Use Case for Distributed Detection of SLA 13 Violations 14 draft-irtf-nmrg-autonomic-sla-violation-detection-12 16 Abstract 18 This document describes an experimental use case for autonomic 19 networking concerning monitoring of Service Level Agreements (SLAs). 20 The use case aims to detect violations of SLAs in a distributed 21 fashion, striving to optimize and dynamically adapt the autonomic 22 deployment of active measurement probes in a way that maximizes the 23 likelihood of detecting service level violations with a given 24 resource budget to perform active measurements, and is able to do so 25 without any outside guidance or intervention. 27 This document is a product of the IRTF Network Management Research 28 Group (NMRG). It is published for informational purposes. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on April 24, 2018. 47 Copyright Notice 49 Copyright (c) 2017 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 2. Definitions and Acronyms . . . . . . . . . . . . . . . . . . 5 66 3. Current Approaches . . . . . . . . . . . . . . . . . . . . . 6 67 4. Use Case Description . . . . . . . . . . . . . . . . . . . . 6 68 5. A Distributed Autonomic Solution . . . . . . . . . . . . . . 7 69 6. Intended User Experience . . . . . . . . . . . . . . . . . . 10 70 7. Implementation Considerations . . . . . . . . . . . . . . . . 10 71 7.1. Device Based Self-Knowledge and Decisions . . . . . . . . 11 72 7.2. Interaction with other devices . . . . . . . . . . . . . 11 73 8. Comparison with current solutions . . . . . . . . . . . . . . 11 74 9. Related IETF Work . . . . . . . . . . . . . . . . . . . . . . 12 75 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 76 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 77 12. Security Considerations . . . . . . . . . . . . . . . . . . . 13 78 13. Informative References . . . . . . . . . . . . . . . . . . . 13 79 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 81 1. Introduction 83 The Internet has been growing dramatically in terms of size, 84 capacity, and accessibility in the last years. Communication 85 requirements of distributed services and applications running on top 86 of the Internet have become increasingly demanding. Some examples 87 are real-time interactive video or financial trading. Providing such 88 services involves stringent requirements in terms of acceptable 89 latency, loss, or jitter. 91 Performance requirements lead to the articulation of Service Level 92 Objectives (SLOs) which must be met. Those SLOs are part of Service 93 Level Agreements (SLAs) that define a contract between the provider 94 and the consumer of a service. SLOs, in effect, constitute a service 95 level guarantee that the consumer of the service can expect to 96 receive (and often has to pay for). Likewise, the provider of a 97 service needs to ensure that the service level guarantee and 98 associated SLOs are met. Some examples of clauses that relate to 99 service level objectives can be found in [RFC7297]). 101 Violations of SLOs can be associated with significant financial loss, 102 which can by divided into two categories. For one, there is the loss 103 that can be incurred by the user of a service when the agreed service 104 levels are not provided. For example, a financial brokerage's stock 105 orders might suffer losses when it is unable to execute stock 106 transactions in a timely manner. An electronic retailer may lose 107 customers when their online presence is perceived by customers as 108 sluggish. An online gaming provider may not be able to provide fair 109 access to online players, resulting in frustrated players who are 110 lost as customers. In each case, the failure of a service provider 111 to meet promised service level guarantees can have a substantial 112 financial impact on users of the service. By the same token, there 113 is the loss that is incurred by the provider of a service who is 114 unable to meet promised service level objectives. Those losses can 115 take several forms, such as penalties for not meeting the service 116 and, in many cases more important, loss of revenue due to reduced 117 customer satisfaction. Hence, service level objectives are a key 118 concern for the service provider. In order to ensure that SLOs are 119 not being violated, service levels need to be continuously monitored 120 at the network infrastructure layer in order to know, for example, 121 when mitigating actions need to be taken. To that end, service level 122 measurements must take place. 124 Network measurements can be performed using active or passive 125 measurement techniques. In passive measurements, production traffic 126 is observed and no monitoring traffic is created by the measurement 127 process itself. That is, network conditions are checked in a non 128 intrusive way. In the context of IP Flow Information eXport (IPFIX), 129 several documents were produced that define how to export data 130 associated with flow records, i.e. data that is collected as part of 131 passive measurement mechanisms, generally applied against flows of 132 production traffic (e.g., [RFC7011]). In addition, it would be 133 possible to collect real data traffic (not just summarized flow 134 records) with time-stamped packets, possibly sampled (e.g., per 135 [RFC5474], as a means of measuring and inferring service levels. 136 Active measurements, on the other hand, are more intrusive to the 137 network in the sense that it involves injecting synthetic test 138 traffic into the network to measure network service levels, as 139 opposed to simply observing production traffic. The IP Performance 140 Metrics (IPPM) WG produced documents that describe active measurement 141 mechanisms, such as: One-Way Active Measurement Protocol (OWAMP) 142 [RFC4656], Two-Way Active Measurement Protocol (TWAMP) [RFC5357], and 143 Cisco Service Level Assurance Protocol (SLA) [RFC6812]. In addition, 144 there are some mechanisms that do not cleanly fit into either active 145 or passive categories, such as Performance and Diagnostic Metrics 146 Destination Option (PDM) techniques 147 [draft-ietf-ippm-6man-pdm-option]. 149 Active measurement mechanisms offer a high level of control of what 150 and how to measure. They do not require inspecting production 151 traffic. Because of this, active measurements usually offer better 152 accuracy and privacy than passive measurement mechanisms. Traffic 153 encryption and regulations that limit the amount of payload 154 inspection that can occur are non-issues. Furthermore, active 155 measurement mechanisms are able to detect end-to-end network 156 performance problems in a fine-grained way (e.g., simulating the 157 traffic that must be handled considering specific Service Level 158 Objectives - SLOs). As a result, active measurements are often 159 preferred over passive measurement for SLA monitoring. Measurement 160 probes must be hosted in network devices and measurement sessions 161 must be activated to compute the current network metrics (e.g., 162 considering those described in [RFC4148]). This activation should be 163 dynamic in order to follow changes in network conditions, such as 164 those related with routes being added or new customer demands. 166 While offering many advantages, active measurements are expensive in 167 terms of network resource consumption. Active measurements generally 168 involve measurement probes that generate synthetic test traffic that 169 is directed at a responder. The responder needs to timestamp test 170 traffic it receives and reflect it back to the originating 171 measurement probe. The measurement probe subsequently processes the 172 returned packets along with time stamping information in order to 173 compute service levels. Accordingly, active measurements consume 174 substantial CPU cycles as well as memory of network devices to 175 generate and process test traffic. In addition, synthetic traffic 176 increases network load. Active measurements thus compete for 177 resources with other functions, including routing and switching. 179 The resources required and traffic generated by the active 180 measurement sessions are to a large part a function of the number of 181 measured network destinations. (In addition, the amount of traffic 182 generated for each measurement plays a role, which in turn influences 183 the accuracy of the measurement.) The more destinations are being 184 measured, the larger the amount of resources consumed and traffic 185 needed to perform the measurements. Thus, to have a better 186 monitoring coverage it is necessary to deploy more sessions which 187 consequently increases consumed resources. Otherwise, enabling the 188 observation of just a small subset of all network flows can lead to 189 an insufficient coverage. 191 Furthermore, while some end-to-end service levels can be determined 192 by adding up the service levels observed across different path 193 segments, the same is not true for all service levels. For example, 194 the end-to-end delay or packet loss from a node A to a node C routed 195 via a node B can often be computed simply by adding delays (or loss) 196 from A to B, and B to C. This allows to decompose a large set of 197 end-to-end measurements into a much smaller set of segment 198 measurements. However, end-to-end jitter and (for example) Mean 199 Opinion Scores cannot be decomposed as easily and, for higher 200 accuracy, must be measured end-to-end. 202 Hence, the decision how to place measurement probes becomes an 203 important management activity. The goal is to obtain maximum 204 benefits of service level monitoring with a limited amount of 205 measurement overhead. Specifically, the goal is to maximize the 206 number of service level violations that are detected with a limited 207 amount of resources. 209 The use case and the solution approach described in this document 210 address an important practical issue. They are intended to provide a 211 basis for further experimentation to lead into solutions for wider 212 deployment. This document represents the consensus of the IRTF's 213 Network Management Research Group (NMRG). It was discussed 214 extensively and received three separate in-depth reviews. 216 2. Definitions and Acronyms 218 Active Measurements: Techniques to measure service levels that 219 involve generating and observing synthetic test traffic 221 Passive Measurements: Techniques used to measure service levels based 222 on observation of production traffic 224 AN: Autonomic Network; a network containing exclusively autonomic 225 nodes, requiring no configuration and deriving all required 226 information through self-knowledge, discovery, or intent. 228 Autonomic Service Agent (ASA): An agent implemented on an autonomic 229 node that implements an autonomic function, either in part (in the 230 case of a distributed function, as in the context of this document), 231 or whole. 233 Measurement Session: A communications association between a Probe and 234 a Responder used to send and reflect synthetic test traffic for 235 active measurements 237 Probe: The source of synthetic test traffic in an active measurement 238 Responder: The destination for synthetic test traffic in an active 239 measurement 241 SLA: Service Level Agreement 243 SLO: Service Level Objective 245 P2P: Peer-to-Peer 247 (Note: definitions of AN and ASA are borrowed from [RFC7575]). 249 3. Current Approaches 251 The current best practice in feasible deployments of active 252 measurement solutions to distribute the available measurement 253 sessions along the network consists in relying entirely on the human 254 administrator expertise to infer which would be the best location to 255 activate such sessions. This is done through several steps. First, 256 it is necessary to collect traffic information in order to grasp the 257 traffic matrix. Then, the administrator uses this information to 258 infer which are the best destinations for measurement sessions. 259 After that, the administrator activates sessions on the chosen subset 260 of destinations considering the available resources. This practice, 261 however, does not scale well because it is still labor intensive and 262 error-prone for the administrator to determine which sessions should 263 be activated given the set of critical flows that needs to be 264 measured. Even worse, this practice completely fails in networks 265 whose critical flows are too short in time and dynamic in terms of 266 traversing network path, like in modern cloud environments. That is 267 so because fast reactions are necessary to reconfigure the sessions 268 and administrators are not just enough in computing and activating 269 the new set of required sessions every time the network traffic 270 pattern changes. Finally, the current active measurements practice 271 usually covers only a fraction of the network flows that should be 272 observed, which invariably leads to the damaging consequence of 273 undetected SLA violations. 275 4. Use Case Description 277 The use case involves a service level provider who needs to monitor 278 the network to detect service level violations using active service 279 level measurements, and wants to be able to do so with minimal human 280 intervention. The goal is to conduct the measurements in an 281 effective manner maximizing the percentage of detected service level 282 violations. The service level provider has a bounded resource budget 283 with regards to measurements that can be performed, specifically, 284 with regards to the number of measurements that can be conducted 285 concurrently from any one network device, and possibly with regards 286 to the total amount of measurement traffic on the network. However, 287 while at any one point in time the number of measurements conducted 288 is limited, it is possible for a device to change which destinations 289 to measure over time. This can be exploited to achieve a balance of 290 eventually covering all possible destinations using a reasonable 291 amount of "sampling" where measurement coverage of a destination 292 cannot be continuous. The solution needs to be dynamic and be able 293 to cope with network conditions which may change over time. The 294 solution should also be embeddable inside network devices that 295 control the deployment of active measurement mechanisms. 297 The goal is to conduct the measurements in a smart manner that 298 ensures that the network is broadly covered and the likelihood of 299 detecting service level violations is maximized. In order to 300 maximize that likelihood, it is reasonable to focus measurement 301 resources on destinations that are more likely to incur a violation, 302 while spending less resources on destinations that are more likely to 303 be in compliance. In order to do so, there are various aspects that 304 can be exploited, including past measurements (destinations close to 305 a service level threshold requiring more focus than destinations 306 further from it), complementation with passive measurements such as 307 flow data (to identify network destinations that are currently 308 popular and critical), and observations from other parts of the 309 network. In addition, measurements can be coordinated among 310 different network devices to avoid hitting the same destination at 311 the same time and to be able to share results that may be useful in 312 future probe placement. 314 Clearly, static solutions will have severe limitations. At the same 315 time, human administrators cannot be in the loop for continuous 316 dynamic measurement probe reconfigurations. Accordingly, an 317 automated or, ideally, autonomic solution is needed in which network 318 measurements are automatically orchestrated and dynamically 319 reconfigured from within the network. This can be accomplished using 320 an autonomic solution that is distributed, using Autonomic Service 321 Agents that are implemented on nodes in the network. 323 5. A Distributed Autonomic Solution 325 The use of Autonomic Networking (AN) [RFC7575] can help such 326 detection through an efficient activation of measurement sessions. 327 Such an approach, along with a detailed assessment confirming its 328 viability, has been described [P2PBNM-Nobre-2012]. The problem to be 329 solved by AN in the present use case is how to steer the process of 330 Measurement Session activation by a complete solution that sets all 331 necessary parameters for this activation to operate efficiently, 332 reliably and securely, with no required human intervention other than 333 setting overall policy. 335 When a node first comes online, it has no information about which 336 measurements are more critical than others. In the absence of 337 information about past measurements and information from measurement 338 peers, it may start with an initial set of measurement sessions, 339 possibly randomly seeding a set of starter measurements, perhaps 340 taking a round robin approach for subsequent measurement rounds. 341 However, as measurements are collected, a node will gain increasing 342 information that it can utilize to refine its strategy of selecting 343 measurement targets going forward. For one, it may take note of 344 which targets returned measurement results very close to service 345 level thresholds that may therefore require closer scrutiny compared 346 to others. Second, it may utilize observations that are made by its 347 measurement peers in order to conclude which measurement targets may 348 be more critical than others, and in order to ensure that proper 349 overall measurement coverage is obtained (so that not every node 350 incidentally measure the same targets, while other targets are not 351 measured at all). 353 We advocate for embedding Peer-to-Peer (P2P) technology in network 354 devices in order to conduct the Measurement Session activation 355 decisions using autonomic control loops. Specifically, we advocate 356 for network devices to implement an autonomic function to monitor 357 service levels for violations of service level objectives, 358 determining which Measurement Sessions to set up at any given point 359 in time based on current and past observations of the node, and of 360 other peer nodes. 362 By performing these functions locally and autonomically on the device 363 itself, which measurements to conduct can be modified quickly based 364 on local observations while taking local resource availability into 365 account. This allows a solution to be more robust and react more 366 dynamically to rapidly changing service levels than a solution that 367 has to rely on central coordination. However, in order to optimize 368 decisions which measurements to conduct, a node will need to 369 communicate with other nodes. This allows a node to take into 370 account other nodes' observations in addition to its own in its 371 decisions. 373 For example, remote destinations whose observed service levels are on 374 the verge of violating stated objectives may require closer 375 monitoring than remote destinations that are comfortably within a 376 range of tolerance. It also allows nodes to coordinate their probing 377 decisions to collectively achieve the best possible measurement 378 coverage. As the amount of resources available for monitoring and 379 for exchange of measurement data and coordination with other nodes 380 are limited, a node may further be interested in identifying other 381 nodes whose observations are most similar to and correlated with its 382 own. This helps a node prioritize and guide with which other nodes 383 to primarily coordinate and exchange data with. All of this requires 384 the use of a P2P overlay. 386 A P2P overlay is essential for several reasons: 388 o It makes it possible for nodes (respectively Autonomic Service 389 Agents that are deployed on those nodes) in the network to 390 autonomically set up Measurement Sessions, without having to rely 391 on central management system or controller to perform 392 configuration operations associated with configuring measurement 393 probes and responders. 395 o It facilitates the exchange of data between different nodes to 396 share measurement results so that each node can refine its 397 measurement strategy based not just its own observations, but 398 observations from its peers. 400 o It allows nodes to coordinate their measurements to obtain the 401 best possible test coverage and avoid measurements that have a 402 very low likelihood of detecting service level violations. 404 The provisioning of the P2P overlay should be transparent for the 405 network administrator. An Autonomic Control Plane such as defined in 406 [I-D.anima-autonomic-control-plane] provides an ideal candidate for 407 the P2P overlay to run on. 409 An autonomic solution for the distributed detection of SLA violations 410 provide several benefits. First, efficiency: this solution should 411 optimize the resource consumption and avoid resource starvation on 412 the network devices. A device that is "self-aware" of its available 413 resources will be able to adjust measurement activities rapidly as 414 needed, without requiring a separate control loop involving resource 415 monitoring by an external system. Secondly, placing logic where to 416 conduct measurements in the node enables rapid control loops in which 417 devices are able to react instantly to observations and adjust their 418 measurement strategy. For example, a device could decide to adjust 419 the amount of synthetic test traffic being sent during the 420 measurement itself depending on results observed so far on this and 421 on other concurrent measurement sessions. As a result, the solution 422 could decrease the time necessary to detect SLA violations. 423 Adaptivity features of an autonomic loop could capture faster the 424 network dynamics than an human administrator and even a central 425 controller. Finally, the solution could help to reduce the workload 426 of human administrator, or, at least, to avoid their need to perform 427 operational tasks. 429 In practice, these factors combine to maximize the likelihood of SLA 430 violations being detected while operating within a given resource 431 budget, allowing to conduct a continuous measurement strategy that 432 takes into account past measurement results, observations of other 433 measures such as link utilization or flow data, sharing of 434 measurement results between network devices, and coordinating future 435 measurement activities among nodes. Combined this can result in 436 efficient measurement decisions that achieve a golden balance between 437 broad network coverage and honing in on service level "hot spots". 439 6. Intended User Experience 441 The autonomic solution should not require any human intervention in 442 the distributed detection of SLA violations. By virtue of the 443 solution being autonomic, human users will not have to plan which 444 measurements to conduct in a network, often a very labor intensive 445 task today that requires detailed analysis of traffic matrices and 446 network topologies and is not prone to easy dynamic adjustment. 447 Likewise, they will not have to configure measurement probes and 448 responders. 450 There are some ways in which a human administrator may still interact 451 with the solution. For one, the human administrator will of course 452 be notified and obtain reports about service level violations that 453 are observed. Second, a human administrator may set a policies 454 regarding how closely to monitor the network for service level 455 violations and how many resources to spend. For example, an 456 administrator may set a resource budget that is assigned to network 457 devices for measurement operations. With that given budget, the 458 number of SLO violations that are detected will be maximized. 459 Alternatively, an administrator may set a target for the percentage 460 of SLO violations that must be detected, i.e. a target for the ratio 461 between the number of detected SLO violations, and the number of 462 total SLO violations that are actually occurring (some of which might 463 go undetected). In that case, the solution will aim to minimize the 464 resources spent (i.e. the amount of test traffic and Measurement 465 Sessions) that are required to achieve that target. 467 7. Implementation Considerations 469 The active measurement model assumes that a typical infrastructure 470 will have multiple network segments and Autonomous Systems (ASs), and 471 a reasonably large number of routers. It also considers that 472 multiple SLOs can be in place at a given time. Since 473 interoperability in a heterogenous network is a goal, features found 474 on different active measurement mechanisms (e.g. OWAMP, TWAMP, and 475 IPSLA) and device programability interfaces (such as Juniper's Junos 476 API or Cisco's Embedded Event Manager) could be used for the 477 implementation. The autonomic solution should include and/or 478 reference specific algorithms, protocols, metrics and technologies 479 for the implementation of distributed detection of SLA violations as 480 a whole. 482 Finally, it should be noted that there are multiple deployment 483 scenarios, including deployment scenarios that involve physical 484 devices hosting autonomic functions, or virtualized infrastructure 485 hosting the same. Co-deployment in conjunction with Virtual Network 486 Functions (VNF) is a possibility for further study. 488 7.1. Device Based Self-Knowledge and Decisions 490 Each device has self-knowledge about the local SLA monitoring. This 491 could be in the form of historical measurement data and SLOs. 492 Besides that, the devices would have algorithms that could decide 493 which probes should be activated in a given time. The choice of 494 which algorithm is better for a specific situation would be also 495 autonomic. 497 7.2. Interaction with other devices 499 Network devices should share information about service level 500 measurement results. This information can speed up the detection of 501 SLA violations and increase the number of detected SLA violations. 502 For example, if one device detects that a remote destination is in 503 danger of violating an SLO, other devices may conduct additional 504 measurements to the same destination or other destinations in its 505 proximity. For any given network device, the exchange of data may be 506 more important with some devices (for example, devices in the same 507 network neighborhood, or devices that are "correlated" by some other 508 means) than with others. The definition of network devices that 509 exchange measurement data, i.e., management peers, creates a new 510 topology. Different approaches could be used to define this topology 511 (e.g., correlated peers [P2PBNM-Nobre-2012]). To bootstrap peer 512 selection, each device should use its known endpoints neighbors 513 (e.g., FIB and RIB tables) as the initial seed to get possible peers. 514 It should be noted that a solution will benefit if topology 515 information and network discovery functions are provided by the 516 underlying autonomic framework. A solution will need to be able to 517 discover measurement peers as well as measurement targets, 518 specifically measurement targets that support active measurement 519 responders and which will be able to respond to measurement requests 520 and reflect measurement traffic as needed. 522 8. Comparison with current solutions 524 There is no standardized solution for distributed autonomic detection 525 of SLA violations. Current solutions are restricted to ad hoc 526 scripts running on a per node fashion to automate some 527 administrator's actions. There are some proposals for passive probe 528 activation (e.g., DECON and CSAMP), but without the focus on 529 autonomic features. 531 9. Related IETF Work 533 The following paragraphs discuss related IETF work and are provided 534 for reference. This section is not exhaustive, rather it provides an 535 overview of the various initiatives and how they relate to autonomic 536 distributed detection of SLA violations. 538 1. [LMAP]: The Large-Scale Measurement of Broadband Performance 539 Working Group aims at the standards for performance management. 540 Since their mechanisms also consist in deploying measurement 541 probes the autonomic solution could be relevant for LMAP 542 specially considering SLA violation screening. Besides that, a 543 solution to decrease the workload of human administrators in 544 service providers is probably highly desirable. 546 2. [IPFIX]: IP Flow Information EXport (IPFIX) aims at the process 547 of standardization of IP flows (i.e., netflows). IPFIX uses 548 measurement probes (i.e., metering exporters) to gather flow 549 data. In this context, the autonomic solution for the activation 550 of active measurement probes could be possibly extended to 551 address also passive measurement probes. Besides that, flow 552 information could be used in the decision making of probe 553 activation. 555 3. [ALTO]: The Application Layer Traffic Optimization Working Group 556 aims to provide topological information at a higher abstraction 557 layer, which can be based upon network policy, and with 558 application-relevant service functions located in it. Their work 559 could be leveraged for the definition of the topology regarding 560 the network devices which exchange measurement data. 562 10. Acknowledgements 564 We wish to acknowledge the helpful contributions, comments, and 565 suggestions that were received from Mohamed Boucadair, Brian 566 Carpenter, Hanlin Fang, Bruno Klauser, Diego Lopez, Vincent Roca, and 567 Eric Voit. In addition, we thank Diego Lopez, Vincent Roca, and 568 Brian Carpenter for their detailed reviews. 570 11. IANA Considerations 572 This memo includes no request to IANA. 574 12. Security Considerations 576 Security of the solution hinges on the security of the network 577 underlay, i.e. the Autonomic Control Plane. If the Autonomic Control 578 Plane were to be compromised, an attacker could undermine the 579 effectiveness of measurement coordination by reporting fraudulent 580 measurement results to peers. This would cause measurement probes to 581 be deployed in an ineffective manner that would increase the 582 likelihood that violations of service level objectives go undetected. 584 Likewise, security of the solution hinges on the security of the 585 deployment mechanism for autonomic functions, in this case, the 586 autonomic function that conducts the service level measurements. If 587 an attacker were able to hijack an autonomic function, it could try 588 to exhaust or exceed the resources that should be spent on autonomic 589 measurements in order to deplete network resources, including network 590 bandwidth due to higher-than-necessary volumes of synthetic test 591 traffic generated by measurement probes. Again, it could also lead 592 to reporting of misleading results, among other things resulting in 593 non-optimal selection of measurement targets and in turn an increase 594 in the likelihood that service level violations go undetected. 596 13. Informative References 598 [draft-anima-boot] 599 Pritikin, M., Richardson, M., Behringer, M., Bjarnason, 600 S., and K. Watsen, "draft-ietf-anima-bootstrapping- 601 keyinfra", draft-ietf-anima-bootstrapping-keyinfra-06 602 (work in progress), May 2017. 604 [draft-ietf-ippm-6man-pdm-option] 605 Elkins, N., Hamilton, R., and M. Ackermann, "draft-ietf- 606 ippm-6man-pdm-option", draft-ietf-ippm-6man-pdm-option-11 607 (work in progress), June 2017. 609 [I-D.anima-autonomic-control-plane] 610 Behringer, M., Eckert, T., and S. Bjarnason, "An Autonomic 611 Control Plane", draft-ietf-anima-autonomic-control- 612 plane-09 (work in progress), August 2017. 614 [P2PBNM-Nobre-2012] 615 Nobre, J., Granville, L., Clemm, A., and A. Gonzalez 616 Prieto, "Decentralized Detection of SLA Violations Using 617 P2P Technology, 8th International Conference Network and 618 Service Management (CNSM)", 2012, 619 . 622 [RFC4148] Stephan, E., "IP Performance Metrics (IPPM) Metrics 623 Registry", BCP 108, RFC 4148, DOI 10.17487/RFC4148, August 624 2005, . 626 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 627 Zekauskas, "A One-way Active Measurement Protocol 628 (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, 629 . 631 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 632 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 633 RFC 5357, DOI 10.17487/RFC5357, October 2008, 634 . 636 [RFC5474] Duffield, N., Ed., Chiou, D., Claise, B., Greenberg, A., 637 Grossglauser, M., and J. Rexford, "A Framework for Packet 638 Selection and Reporting", RFC 5474, DOI 10.17487/RFC5474, 639 March 2009, . 641 [RFC6812] Chiba, M., Clemm, A., Medley, S., Salowey, J., Thombare, 642 S., and E. Yedavalli, "Cisco Service-Level Assurance 643 Protocol", RFC 6812, DOI 10.17487/RFC6812, January 2013, 644 . 646 [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, 647 "Specification of the IP Flow Information Export (IPFIX) 648 Protocol for the Exchange of Flow Information", STD 77, 649 RFC 7011, DOI 10.17487/RFC7011, September 2013, 650 . 652 [RFC7297] Boucadair, M., Jacquenet, C., and N. Wang, "IP 653 Connectivity Provisioning Profile (CPP)", RFC 7297, 654 DOI 10.17487/RFC7297, July 2014, 655 . 657 [RFC7575] Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., 658 Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic 659 Networking: Definitions and Design Goals", RFC 7575, 660 DOI 10.17487/RFC7575, June 2015, 661 . 663 Authors' Addresses 664 Jeferson Campos Nobre 665 University of Vale do Rio dos Sinos 666 Porto Alegre 667 Brazil 669 Email: jcnobre@unisinos.br 671 Lisandro Zambenedetti Granvile 672 Federal University of Rio Grande do Sul 673 Porto Alegre 674 Brazil 676 Email: granville@inf.ufrgs.br 678 Alexander Clemm 679 Huawei 680 Santa Clara, California 681 USA 683 Email: ludwig@clemm.org 685 Alberto Gonzalez Prieto 686 VMware 687 Palo Alto, California 688 USA 690 Email: agonzalezpri@vmware.com