idnits 2.17.1 draft-irtf-nmrg-autonomic-sla-violation-detection-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 26, 2017) is 2495 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'LMAP' is mentioned on line 518, but not defined == Missing Reference: 'IPFIX' is mentioned on line 526, but not defined == Missing Reference: 'ALTO' is mentioned on line 535, but not defined == Outdated reference: A later version (-30) exists of draft-ietf-anima-autonomic-control-plane-06 -- Obsolete informational reference (is this intentional?): RFC 4148 (Obsoleted by RFC 6248) Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Management Research Group J. Nobre 3 Internet-Draft University of Vale do Rio dos Sinos 4 Intended status: Informational L. Granville 5 Expires: December 28, 2017 Federal University of Rio Grande do Sul 6 A. Clemm 7 Huawei 8 A. Gonzalez Prieto 9 June 26, 2017 11 Autonomic Networking Use Case for Distributed Detection of SLA 12 Violations 13 draft-irtf-nmrg-autonomic-sla-violation-detection-09 15 Abstract 17 This document describes a use case for autonomic networking 18 concerning monitoring of Service Level Agreements (SLAs). The use 19 case aims to detect violations of SLAs in a distributed fashion, 20 striving to optimize and dynamically adapt the autonomic deployment 21 of active measurement probes in a way that maximizes the likelihood 22 of detecting service level violations with a given resource budget to 23 perform active measurements, and is able to do so without any outside 24 guidance or intervention. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on December 28, 2017. 43 Copyright Notice 45 Copyright (c) 2017 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Definitions and Acronyms . . . . . . . . . . . . . . . . . . 5 62 3. Current Approaches . . . . . . . . . . . . . . . . . . . . . 5 63 4. Use Case Description . . . . . . . . . . . . . . . . . . . . 6 64 5. A Distributed Autonomic Solution . . . . . . . . . . . . . . 7 65 6. Intended User Experience . . . . . . . . . . . . . . . . . . 9 66 7. Implementation Considerations . . . . . . . . . . . . . . . . 10 67 7.1. Device Based Self-Knowledge and Decisions . . . . . . . . 10 68 7.2. Interaction with other devices . . . . . . . . . . . . . 11 69 8. Comparison with current solutions . . . . . . . . . . . . . . 11 70 9. Related IETF Work . . . . . . . . . . . . . . . . . . . . . . 11 71 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 72 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 73 12. Security Considerations . . . . . . . . . . . . . . . . . . . 12 74 13. Informative References . . . . . . . . . . . . . . . . . . . 13 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 77 1. Introduction 79 The Internet has been growing dramatically in terms of size, 80 capacity, and accessibility in the last years. Communication 81 requirements of distributed services and applications running on top 82 of the Internet have become increasingly demanding. Some examples 83 are real-time interactive video or financial trading. Providing such 84 services involves stringent requirements in terms of acceptable 85 latency, loss, or jitter. 87 Performance requirements lead to the articulation of Service Level 88 Objectives (SLOs) which must be met. Those SLOs are part of Service 89 Level Agreements (SLAs) that define a contract between the provider 90 and the consumer of a service. SLOs, in effect, constitute a service 91 level guarantee that the consumer of the service can expect to 92 receive (and often has to pay for). Likewise, the provider of a 93 service needs to ensure that the service level guarantee and 94 associated SLOs are met. Some examples of clauses that relate to 95 service level objectives can be found in [RFC7297]). 97 Violations of SLOs can be associated with significant financial loss, 98 which can by divided into two categories. For one, there is the loss 99 that can be incurred by the user of a service when the agreed service 100 levels are not provided. For example, a financial brokerage's stock 101 orders might suffer losses when it is unable to execute stock 102 transactions in a timely manner. An electronic retailer may lose 103 customers when their online presence is perceived by customers as 104 sluggish. An online gaming provider may not be able to provide fair 105 access to online players, resulting in frustrated players who are 106 lost as customers. In each case, the failure of a service provider 107 to meet promised service level guarantees can have a substantial 108 financial impact on users of the service. By the same token, there 109 is the loss that is incurred by the provider of a service who is 110 unable to meet promised service level objectives. Those losses can 111 take several forms, such as penalties for not meeting the service 112 and, in many cases more important, loss of revenue due to reduced 113 customer satisfaction. Hence, service level objectives are a key 114 concern for the service provider. In order to ensure that SLOs are 115 not being violated, service levels need to be continuously monitored 116 at the network infrastructure layer in order to know, for example, 117 when mitigating actions need to be taken. To that end, service level 118 measurements must take place. 120 Network measurements can be performed using active or passive 121 measurement techniques. In passive measurements, production traffic 122 is observed and no monitoring traffic is created by the measurement 123 process itself. That is, network conditions are checked in a non 124 intrusive way. In the context of IP Flow Information eXport (IPFIX), 125 several documents were produced that define how to export data 126 associated with flow records, i.e. data that is collected as part of 127 passive measurement mechanisms, generally applied against flows of 128 production traffic (e.g., [RFC7011]). In addition, it would be 129 possible to collect real data traffic (not just summarized flow 130 records) with time-stamped packets, possibly sampled (e.g., per 131 [RFC5474], as a means of measuring and inferring service levels. 132 Active measurements, on the other hand, are more intrusive to the 133 network in the sense that it involves injecting synthetic test 134 traffic into the network to measure network service levels, as 135 opposed to simply observing production traffic. The IP Performance 136 Metrics (IPPM) WG produced documents that describe active measurement 137 mechanisms, such as: One-Way Active Measurement Protocol (OWAMP) 138 [RFC4656], Two-Way Active Measurement Protocol (TWAMP) [RFC5357], and 139 Cisco Service Level Assurance Protocol (SLA) [RFC6812]. In addition, 140 there are some mechanisms that do not cleanly fit into either active 141 or passive categories, such as Performance and Diagnostic Metrics 142 Destination Option (PDM) techniques 143 [draft-ietf-ippm-6man-pdm-option]. 145 Active measurement mechanisms offer a high level of control of what 146 and how to measure. They do not require inspecting production 147 traffic. Because of this, active measurements usually offer better 148 accuracy and privacy than passive measurement mechanisms. Traffic 149 encryption and regulations that limit the amount of payload 150 inspection that can occur are non-issues. Furthermore, active 151 measurement mechanisms are able to detect end-to-end network 152 performance problems in a fine-grained way (e.g., simulating the 153 traffic that must be handled considering specific Service Level 154 Objectives - SLOs). As a result, active measurements are often 155 preferred over passive measurement for SLA monitoring. Measurement 156 probes must be hosted in network devices and measurement sessions 157 must be activated to compute the current network metrics (e.g., 158 considering those described in [RFC4148]). This activation should be 159 dynamic in order to follow changes in network conditions, such as 160 those related with routes being added or new customer demands. 162 While offering many advantages, active measurements are expensive in 163 terms of network resource consumption. Active measurements generally 164 involve measurement probes that generate synthetic test traffic that 165 is directed at a responder. The responder needs to timestamp test 166 traffic it receives and reflect it back to the originating 167 measurement probe. The measurement probe subsequently processes the 168 returned packets along with time stamping information in order to 169 compute service levels. Accordingly, active measurements consume 170 substantial CPU cycles as well as memory of network devices to 171 generate and process test traffic. In addition, synthetic traffic 172 increases network load. Active measurements thus compete for 173 resources with other functions, including routing and switching. 175 The resources required and traffic generated by the active 176 measurement sessions are to a large part a function of the number of 177 measured network destinations. (In addition, the amount of traffic 178 generated for each measurement plays a role, which in turn influences 179 the accuracy of the measurement.) The more destinations are being 180 measured, the larger the amount of resources consumed and traffic 181 needed to perform the measurements. Thus, to have a better 182 monitoring coverage it is necessary to deploy more sessions which 183 consequently increases consumed resources. Otherwise, enabling the 184 observation of just a small subset of all network flows can lead to 185 an insufficient coverage. 187 Furthermore, while some end-to-end service levels can be determined 188 by adding up the service levels observed across different path 189 segments, the same is not true for all service levels. For example, 190 the end-to-end delay or packet loss from a node A to a node C routed 191 via a node B can often be computed simply by adding delays (or loss) 192 from A to B, and B to C. This allows to decompose a large set of 193 end-to-end measurements into a much smaller set of segment 194 measurements. However, end-to-end jitter and (for example) Mean 195 Opinion Scores cannot be decomposed as easily and, for higher 196 accuracy, must be measured end-to-end. 198 Hence, the decision how to place measurement probes becomes an 199 important management activity. The goal is to obtain maximum 200 benefits of service level monitoring with a limited amount of 201 measurement overhead. Specifically, the goal is to maximize the 202 number of service level violations that are detected with a limited 203 amount of resources. 205 2. Definitions and Acronyms 207 Active Measurements: Techniques to measure service levels that 208 involve generating and observing synthetic test traffic 210 Passive Measurements: Techniques used to measure service levels based 211 on observation of production traffic 213 AN: Autonomic Network; a network containing exclusively autonomic 214 nodes, requiring no configuration and deriving all required 215 inofrmaiton through self-knowledge, discovery, or intent. 217 Measurement Session: A communications association between a Probe and 218 a Responder used to send and reflect synthetic test traffic for 219 active measurements 221 Probe: The source of synthetic test traffic in an active measurement 223 Responder: The destination for synthetic test traffic in an active 224 measurement 226 SLA: Service Level Agreement 228 SLO: Service Level Objective 230 P2P: Peer-to-Peer 232 3. Current Approaches 234 The current best practice in feasible deployments of active 235 measurement solutions to distribute the available measurement 236 sessions along the network consists in relying entirely on the human 237 administrator expertise to infer which would be the best location to 238 activate such sessions. This is done through several steps. First, 239 it is necessary to collect traffic information in order to grasp the 240 traffic matrix. Then, the administrator uses this information to 241 infer which are the best destinations for measurement sessions. 242 After that, the administrator activates sessions on the chosen subset 243 of destinations considering the available resources. This practice, 244 however, does not scale well because it is still labor intensive and 245 error-prone for the administrator to determine which sessions should 246 be activated given the set of critical flows that needs to be 247 measured. Even worse, this practice completely fails in networks 248 whose critical flows are too short in time and dynamic in terms of 249 traversing network path, like in modern cloud environments. That is 250 so because fast reactions are necessary to reconfigure the sessions 251 and administrators are not just enough in computing and activating 252 the new set of required sessions every time the network traffic 253 pattern changes. Finally, the current active measurements practice 254 usually covers only a fraction of the network flows that should be 255 observed, which invariably leads to the damaging consequence of 256 undetected SLA violations. 258 4. Use Case Description 260 The use case involves a service level provider who needs to monitor 261 the network to detect service level violations using active service 262 level measurements, and wants to be able to do so with minimal human 263 intervention. The goal is to conduct the measurements in an 264 effective manner maximizing the percentage of detected service level 265 violations. The service level provider has a bounded resource budget 266 with regards to measurements that can be performed, specifically, 267 with regards to the number of measurements that can be conducted 268 concurrently from any one network device, and possibly with regards 269 to the total amount of measurement traffic on the network. However, 270 while at any one point in time the number of measurements conducted 271 is limited, it is possible for a device to change which destinations 272 to measure over time. This can be exploited to achieve a balance of 273 eventually covering all possible destinations using a reasonable 274 amount of "sampling" where measurement coverage of a destination 275 cannot be continuous. The solution needs to be dynamic and be able 276 to cope with network conditions which may change over time. The 277 solution should also be embeddable inside network devices that 278 control the deployment of active measurement mechanisms. 280 The goal is to conduct the measurements in a smart manner that 281 ensures that the network is broadly covered and the likelihood of 282 detecting service level violations is maximized. In order to 283 maximize that likelihood, it is reasonable to focus measurement 284 resources on destinations that are more likely to incur a violation, 285 while spending less resources on destinations that are more likely to 286 be in compliance. In order to do so, there are various aspects that 287 can be exploited, including past measurements (destinations close to 288 a service level threshold requiring more focus than destinations 289 further from it), complementation with passive measurements such as 290 flow data (to identify network destinations that are currently 291 popular and critical), an observations from other parts of the 292 network. In addition, measurements can be coordinated among 293 different network devices to avoid hitting the same destination at 294 the same time and to be able to share results that may be useful in 295 future probe placement. 297 Clearly, static solutions will have severe limitations. At the same 298 time, human administrators cannot be in the loop for continuous 299 dynamic measurement probe reconfigurations. Accordingly, an 300 automated or, ideally, autonomic solution is needed in which network 301 measurements are automatically orchestrated and dynamically 302 reconfigured from within the network. 304 5. A Distributed Autonomic Solution 306 The use of Autonomic Networking (AN) [RFC7575] can help such 307 detection through an efficient activation of measurement sessions. 308 Such an approach, along with a detailed assessment confirming its 309 viability, has been described [P2PBNM-Nobre-2012]. The problem to be 310 solved by AN in the present use case is how to steer the process of 311 Measurement Session activation by a complete solution that sets all 312 necessary parameters for this activation to operate efficiently, 313 reliably and securely, with no required human intervention other than 314 setting overall policy. 316 When a node first comes online, it has no information about which 317 measurements are more critical than others. In the absence of 318 information about past measurements and information from measurement 319 peers, it may start with an initial set of measurement sessions, 320 possibly randomly seeding a set of starter measurements, perhaps 321 taking a round robin approach for subsequent measurement rounds. 322 However, as measurements are collected, a node will gain increasing 323 information that it can utilize to refine its strategy of selecting 324 measurement targets going forward. For one, it may take note of 325 which targets returned measurement results very close to service 326 level thresholds that may therefore require closer scrutiny compared 327 to others. Second, it may utilize observations that are made by its 328 measurement peers in order to conclude which measurement targets may 329 be more critical than others, and in order to ensure that proper 330 overall measurement coverage is obtained (so that not every node 331 incidentally measure the same targets, while other targets are not 332 measured at all). 334 We advocate for embedding Peer-to-Peer (P2P) technology in network 335 devices in order to conduct the Measurement Session activation 336 decisions using autonomic control loops. Specifically, we advocate 337 for network devices to implement an autonomic function to monitor 338 service levels for violations of service level objectives, 339 determining which Measurement Sessions to set up at any given point 340 in time based on current and past observations of the node, and of 341 other peer nodes. 343 By performing these functions locally and autonomically on the device 344 itself, which measurements to conduct can be modified quickly based 345 on local observations while taking local resource availability into 346 account. This allows a solution to be more robust and react more 347 dynamically to rapidly changing service levels than a solution that 348 has to rely on central coordination. However, in order to optimize 349 decisions which measurements to conduct, a node will need to 350 communicate with other nodes. This allows a node to take into 351 account other nodes' observations in addition to its own in its 352 decisions. 354 For example, remote destinations whose observed service levels are on 355 the verge of violating stated objectives may require closer 356 monitoring than remote destinations that are comfortably within a 357 range of tolerance. It also allows nodes to coordinate their probing 358 decisions to collectively achieve the best possible measurement 359 coverage. As the amount of resources available for monitoring and 360 for exchange of measurement data and coordination with other nodes 361 are limited, a node may further be interested in identifying other 362 nodes whose observations are most similar to and correlated with its 363 own. This helps a node prioritize and guide with which other nodes 364 to primarily coordinate and exchange data with. All of this requires 365 the use of a P2P overlay. 367 A P2P overlay is essential for several reasons: 369 o It makes it possible for nodes in the network to autonomically set 370 up Measurement Sessions, without having to rely on central 371 management system or controller to perform configuration 372 operations associated with configuring measurement probes and 373 responders. 375 o It facilitates the exchange of data between different nodes to 376 share measurement results so that each node can refine its 377 measurement strategy based not just its own observations, but 378 observations from its peers. 380 o It allows nodes to coordinate their measurements to obtain the 381 best possible test coverage and avoid measurements that have a 382 very low likelihood of detecting service level violations. 384 The provisioning of the P2P overlay should be transparent for the 385 network administrator. An Autonomic Control Plane such as defined in 386 [I-D.anima-autonomic-control-plane] provides an ideal candidate for 387 the P2P overlay to run on. 389 An autonomic solution for the distributed detection of SLA violations 390 provide several benefits. First, efficiency: this solution should 391 optimize the resource consumption and avoid resource starvation on 392 the network devices. A device that is "self-aware" of its available 393 resources will be able to adjust measurement activities rapidly as 394 needed, without requiring a separate control loop involving resource 395 monitoring by an external system. Secondly, placing logic where to 396 conduct measurements in the node enables rapid control loops in which 397 devices are able to react instantly to observations and adjust their 398 measurement strategy. For example, a device could decide to adjust 399 the amount of synthetic test traffic being sent during the 400 measurement itself depending on results observed so far on this and 401 on other concurrent measurement sessions. As a result, the solution 402 could decrease the time necessary to detect SLA violations. 403 Adaptivity features of an autonomic loop could capture faster the 404 network dynamics than an human administrator and even a central 405 controller. Finally, the solution could help to reduce the workload 406 of human administrator, or, at least, to avoid their need to perform 407 operational tasks. 409 In practice, these factors combine to maximize the likelihood of SLA 410 violations being detected while operating within a given resource 411 budget, allowing to conduct a continuous measurement strategy that 412 takes into account past measurement results, observations of other 413 measures such as link utilization or flow data, sharing of 414 measurement results between network devices, and coordinating future 415 measurement activities among nodes. Combined this can result in 416 efficient measurement decisions that achieve a golden balance between 417 broad network coverage and honing in on service level "hot spots". 419 6. Intended User Experience 421 The autonomic solution should not require any human intervention in 422 the distributed detection of SLA violations. By virtue of the 423 solution being autonomic, human users will not have to plan which 424 measurements to conduct in a network, often a very labor intensive 425 task today that requires detailed analysis of traffic matrices and 426 network topologies and is not prone to easy dynamic adjustment. 427 Likewise, they will not have to configure measurement probes and 428 responders. 430 There are some ways in which a human administrator may still interact 431 with the solution. For one, the human administrator will of course 432 be notified and obtain reports about service level violations that 433 are observed. Second, a human administrator may set a policies 434 regarding how closely to monitor the network for service level 435 violations and how many resources to spend. For example, an 436 administrator may set a resource budget that is assigned to network 437 devices for measurement operations. With that given budget, the 438 number of SLO violations that are detected will be maximized. 439 Alternatively, an administrator may set a target for the percentage 440 of SLO violations that must be detected, i.e. a target for the ratio 441 between the number of detected SLO violations, and the number of 442 total SLO violations that are actually occurring (some of which might 443 go undetected). In that case, the solution will aim to minimize the 444 resources spent (i.e. the amount of test traffic and Measurement 445 Sessions) that are required to achieve that target. 447 7. Implementation Considerations 449 The active measurement model assumes that a typical infrastructure 450 will have multiple network segments and Autonomous Systems (ASs), and 451 a reasonably large number of routers. It also considers that 452 multiple SLOs can be in place at a given time. Since 453 interoperability in a heterogenous network is a goal, features found 454 on different active measurement mechanisms (e.g. OWAMP, TWAMP, and 455 IPSLA) and device programability interfaces (such as Juniper's Junos 456 API or Cisco's Embedded Event Manager) could be used for the 457 implementation. The autonomic solution should include and/or 458 reference specific algorithms, protocols, metrics and technologies 459 for the implementation of distributed detection of SLA violations as 460 a whole. 462 Finally, it should be noted that there are multiple deployment 463 scenarios, including deployment scenarios that involve physical 464 devices hosting autonomic functions, or virtualized infrastructure 465 hosting the same. Co-deployment in conjunction with Virtual Network 466 Functions (VNF) is a possibility for further study. 468 7.1. Device Based Self-Knowledge and Decisions 470 Each device has self-knowledge about the local SLA monitoring. This 471 could be in the form of historical measurement data and SLOs. 472 Besides that, the devices would have algorithms that could decide 473 which probes should be activated in a given time. The choice of 474 which algorithm is better for a specific situation would be also 475 autonomic. 477 7.2. Interaction with other devices 479 Network devices should share information about service level 480 measurement results. This information can speed up the detection of 481 SLA violations and increase the number of detected SLA violations. 482 For example, if one device detects that a remote destination is in 483 danger of violating an SLO, other devices may conduct additional 484 measurements to the same destination or other destinations in its 485 proximity. For any given network device, the exchange of data may be 486 more important with some devices (for example, devices in the same 487 network neighborhood, or devices that are "correlated" by some other 488 means) than with others. The definition of network devices that 489 exchange measurement data, i.e., management peers, creates a new 490 topology. Different approaches could be used to define this topology 491 (e.g., correlated peers [P2PBNM-Nobre-2012]). To bootstrap peer 492 selection, each device should use its known endpoints neighbors 493 (e.g., FIB and RIB tables) as the initial seed to get possible peers. 494 It should be noted that a solution will benefit if topology 495 information and network discovery functions are provided by the 496 underlying autonomic framework. A solution will need to be able to 497 discover measurement peers as well as measurement targets, 498 specifically measurement targets that support active measurement 499 responders and which will be able to respond to measurement requests 500 and reflect measurement traffic as needed. 502 8. Comparison with current solutions 504 There is no standardized solution for distributed autonomic detection 505 of SLA violations. Current solutions are restricted to ad hoc 506 scripts running on a per node fashion to automate some 507 administrator's actions. There are some proposals for passive probe 508 activation (e.g., DECON and CSAMP), but without the focus on 509 autonomic features. 511 9. Related IETF Work 513 The following paragraphs discuss related IETF work and are provided 514 for reference. This section is not exhaustive, rather it provides an 515 overview of the various initiatives and how they relate to autonomic 516 distributed detection of SLA violations. 518 1. [LMAP]: The Large-Scale Measurement of Broadband Performance 519 Working Group aims at the standards for performance management. 520 Since their mechanisms also consist in deploying measurement 521 probes the autonomic solution could be relevant for LMAP 522 specially considering SLA violation screening. Besides that, a 523 solution to decrease the workload of human administrators in 524 service providers is probably highly desirable. 526 2. [IPFIX]: IP Flow Information EXport (IPFIX) aims at the process 527 of standardization of IP flows (i.e., netflows). IPFIX uses 528 measurement probes (i.e., metering exporters) to gather flow 529 data. In this context, the autonomic solution for the activation 530 of active measurement probes could be possibly extended to 531 address also passive measurement probes. Besides that, flow 532 information could be used in the decision making of probe 533 activation. 535 3. [ALTO]: The Application Layer Traffic Optimization Working Group 536 aims to provide topological information at a higher abstraction 537 layer, which can be based upon network policy, and with 538 application-relevant service functions located in it. Their work 539 could be leveraged for the definition of the topology regarding 540 the network devices which exchange measurement data. 542 10. Acknowledgements 544 We wish to acknowledge the helpful contributions, comments, and 545 suggestions that were received from Mohamed Boucadair, Hanlin Fang, 546 Bruno Klauser, Diego Lopez, Vincent Roca, and Eric Voit. 548 11. IANA Considerations 550 This memo includes no request to IANA. 552 12. Security Considerations 554 Security of the solution hinges on the security of the network 555 underlay, i.e. the Autonomic Control Plane. If the Autonomic Control 556 Plane were to be compromised, an attacker could undermine the 557 effectiveness of measurement coordination by reporting fraudulent 558 measurement results to peers. This would cause measurement probes to 559 be deployed in an ineffective manner that would increase the 560 likelihood that violations of service level objectives go undetected. 562 Likewise, security of the solution hinges on the security of the 563 deployment mechanism for autonomic functions, in this case, the 564 autonomic function that conducts the service level measurements. If 565 an attacker were able to hijack an autonomic function, it could try 566 to exhaust or exceed the resources that should be spent on autonomic 567 measurements in order to deplete network resources, including network 568 bandwidth due to higher-than-necessary volumes of synthetic test 569 traffic generated by measurement probes. Again, it could also lead 570 to reporting of misleading results, among other things resulting in 571 non-optimal selection of measurement targets and in turn an increase 572 in the likelihood that service level violations go undetected. 574 13. Informative References 576 [draft-anima-boot] 577 Pritikin, M., Richardson, M., Behringer, M., Bjarnason, 578 S., and K. Watsen, "draft-ietf-anima-bootstrapping- 579 keyinfra", draft-ietf-anima-bootstrapping-keyinfra-06 580 (work in progress), May 2017. 582 [draft-ietf-ippm-6man-pdm-option] 583 Elkins, N., Hamilton, R., and M. Ackermann, "draft-ietf- 584 ippm-6man-pdm-option", draft-ietf-ippm-6man-pdm-option-11 585 (work in progress), June 2017. 587 [I-D.anima-autonomic-control-plane] 588 Behringer, M., Eckert, T., and S. Bjarnason, "An Autonomic 589 Control Plane", draft-ietf-anima-autonomic-control- 590 plane-06 (work in progress), March 2017. 592 [P2PBNM-Nobre-2012] 593 Nobre, J., Granville, L., Clemm, A., and A. Gonzalez 594 Prieto, "Decentralized Detection of SLA Violations Using 595 P2P Technology, 8th International Conference Network and 596 Service Management (CNSM)", 2012, 597 . 600 [RFC4148] Stephan, E., "IP Performance Metrics (IPPM) Metrics 601 Registry", BCP 108, RFC 4148, DOI 10.17487/RFC4148, August 602 2005, . 604 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 605 Zekauskas, "A One-way Active Measurement Protocol 606 (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, 607 . 609 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 610 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 611 RFC 5357, DOI 10.17487/RFC5357, October 2008, 612 . 614 [RFC5474] Duffield, N., Ed., Chiou, D., Claise, B., Greenberg, A., 615 Grossglauser, M., and J. Rexford, "A Framework for Packet 616 Selection and Reporting", RFC 5474, DOI 10.17487/RFC5474, 617 March 2009, . 619 [RFC6812] Chiba, M., Clemm, A., Medley, S., Salowey, J., Thombare, 620 S., and E. Yedavalli, "Cisco Service-Level Assurance 621 Protocol", RFC 6812, DOI 10.17487/RFC6812, January 2013, 622 . 624 [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, 625 "Specification of the IP Flow Information Export (IPFIX) 626 Protocol for the Exchange of Flow Information", STD 77, 627 RFC 7011, DOI 10.17487/RFC7011, September 2013, 628 . 630 [RFC7297] Boucadair, M., Jacquenet, C., and N. Wang, "IP 631 Connectivity Provisioning Profile (CPP)", RFC 7297, 632 DOI 10.17487/RFC7297, July 2014, 633 . 635 [RFC7575] Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., 636 Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic 637 Networking: Definitions and Design Goals", RFC 7575, 638 DOI 10.17487/RFC7575, June 2015, 639 . 641 Authors' Addresses 643 Jeferson Campos Nobre 644 University of Vale do Rio dos Sinos 645 Porto Alegre 646 Brazil 648 Email: jcnobre@unisinos.br 650 Lisandro Zambenedetti Granvile 651 Federal University of Rio Grande do Sul 652 Porto Alegre 653 Brazil 655 Email: granville@inf.ufrgs.br 657 Alexander Clemm 658 Huawei 659 Santa Clara, California 660 USA 662 Email: ludwig@clemm.org 663 Alberto Gonzalez Prieto 664 Santa Clara, California 665 USA 667 Email: alberto.gonzalezprieto@yahoo.com