idnits 2.17.1 draft-ietf-opsawg-ntf-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 19, 2021) is 1156 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-13) exists of draft-ietf-grow-bmp-local-rib-09 == Outdated reference: A later version (-17) exists of draft-ietf-ippm-ioam-data-11 == Outdated reference: A later version (-08) exists of draft-ietf-netconf-distributed-notif-01 == Outdated reference: A later version (-12) exists of draft-ietf-netconf-udp-notif-01 == Outdated reference: A later version (-09) exists of draft-irtf-nmrg-ibn-concepts-definitions-02 == Outdated reference: A later version (-16) exists of draft-song-ippm-postcard-based-telemetry-08 == Outdated reference: A later version (-21) exists of draft-song-opsawg-ifit-framework-13 -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) -- Obsolete informational reference (is this intentional?): RFC 8321 (Obsoleted by RFC 9341) Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 OPSAWG H. Song 3 Internet-Draft Futurewei 4 Intended status: Informational F. Qin 5 Expires: August 23, 2021 China Mobile 6 P. Martinez-Julia 7 NICT 8 L. Ciavaglia 9 Nokia 10 A. Wang 11 China Telecom 12 February 19, 2021 14 Network Telemetry Framework 15 draft-ietf-opsawg-ntf-07 17 Abstract 19 Network telemetry is a technology for gaining network insight and 20 facilitating efficient and automated network management. It 21 encompasses various techniques for remote data generation, 22 collection, correlation, and consumption. This document describes an 23 architectural framework for network telemetry, motivated by 24 challenges that are encountered as part of the operation of networks 25 and by the requirements that ensue. Network telemetry, as 26 necessitated by best industry practices, covers technologies and 27 protocols that extend beyond conventional network Operations, 28 Administration, and Management (OAM). The presented network 29 telemetry framework promises flexibility, scalability, accuracy, 30 coverage, and performance. In addition, it facilitates the 31 implementation of automated control loops to address both today's and 32 tomorrow's network operational needs. This document clarifies the 33 terminologies and classifies the modules and components of a network 34 telemetry system from several different perspectives. The framework 35 and taxonomy help to set a common ground for the collection of 36 related work and provide guidance for related technique and standard 37 developments. 39 Status of This Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at https://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on August 23, 2021. 56 Copyright Notice 58 Copyright (c) 2021 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (https://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 74 2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6 76 3.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7 77 3.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 7 78 3.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 9 79 3.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 10 80 4. The Necessity of a Network Telemetry Framework . . . . . . . 12 81 5. Network Telemetry Framework . . . . . . . . . . . . . . . . . 13 82 5.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 14 83 5.1.1. Management Plane Telemetry . . . . . . . . . . . . . 17 84 5.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 17 85 5.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 18 86 5.1.4. External Data Telemetry . . . . . . . . . . . . . . . 20 87 5.2. Second Level Function Components . . . . . . . . . . . . 21 88 5.3. Data Acquisition Mechanism and Type Abstraction . . . . . 22 89 5.4. Mapping Existing Mechanisms into the Framework . . . . . 24 90 6. Evolution of Network Telemetry Applications . . . . . . . . . 25 91 7. Security Considerations . . . . . . . . . . . . . . . . . . . 26 92 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 93 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 27 94 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 28 95 11. Informative References . . . . . . . . . . . . . . . . . . . 28 96 Appendix A. A Survey on Existing Network Telemetry Techniques . 32 97 A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 32 98 A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 32 99 A.1.2. gRPC Network Management Interface . . . . . . . . . . 32 100 A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 33 101 A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 33 102 A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 33 103 A.3.1. The Alternate Marking (AM) technology . . . . . . . . 33 104 A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 34 105 A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 35 106 A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 35 107 A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 35 108 A.4. External Data and Event Telemetry . . . . . . . . . . . . 35 109 A.4.1. Sources of External Events . . . . . . . . . . . . . 36 110 A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 37 111 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 37 113 1. Introduction 115 Network visibility is the ability of management tools to see the 116 state and behavior of a network, which is essential for successful 117 network operation. Network Telemetry revolves around network data 118 that can help provide insights about the current state of the 119 network, including network devices, forwarding, control, and 120 management planes, and that can be generated and obtained through a 121 variety of techniques, including but not limited to network 122 instrumentation and measurements, and that can be processed for 123 purposes ranging from service assurance to network security using a 124 wide variety of techniques including machine learning, data analysis, 125 and correlation. In this document, Network Telemetry refer to both 126 the data itself (i.e., "Network Telemetry Data"), and the techniques 127 and processes used to generate, export, collect, and consume that 128 data for use by potentially automated management applications. 129 Network telemetry extends beyond the conventional network Operations, 130 Administration, and Management (OAM) techniques and expects to 131 support better flexibility, scalability, accuracy, coverage, and 132 performance. 134 However, the term of network telemetry lacks a solid and unambiguous 135 definition. The scope and coverage of it cause confusion and 136 misunderstandings. It is beneficial to clarify the concept and 137 provide a clear architectural framework for network telemetry, so we 138 can articulate the technical field, and better align the related 139 techniques and standard works. 141 To fulfill such an undertaking, we first discuss some key 142 characteristics of network telemetry which set a clear distinction 143 from the conventional network OAM and show that some conventional OAM 144 technologies can be considered a subset of the network telemetry 145 technologies. We then provide an architectural framework for network 146 telemetry which includes four modules, each concerned with a 147 different category of telemetry data and corresponding procedures. 148 All the modules are internally structured in the same way, including 149 components that allow to configure data sources with regards to what 150 data to generate and how to make that available to client 151 applications, components that instrument the underlying data sources, 152 and components that perform the actual rendering, encoding, and 153 exporting of the generated data. We show how the network telemetry 154 framework can benefit the current and future network operations. 155 Based on the distinction of modules and function components, we can 156 map the existing and emerging techniques and protocols into the 157 framework. The framework can also simplify the tasks for designing, 158 maintaining, and understanding a network telemetry system. At last, 159 we outline the evolution stages of the network telemetry system and 160 discuss the potential security concerns. 162 The purpose of the framework and taxonomy is to set a common ground 163 for the collection of related work and provide guidance for future 164 technique and standard developments. To the best of our knowledge, 165 this document is the first such effort for network telemetry in 166 industry standards organizations. 168 2. Glossary 170 Before further discussion, we list some key terminology and acronyms 171 used in this documents. We make an intended differentiation between 172 the terms of network telemetry and OAM. However, it should be 173 understood that there is not a hard-line distinction between the two 174 concepts. Rather, network telemetry is considered as the extension 175 of OAM. It covers all the existing OAM protocols but puts more 176 emphasis on the newer and emerging techniques and protocols 177 concerning all aspects of network data from acquisition to 178 consumption. 180 AI: Artificial Intelligence. In network domain, AI refers to the 181 machine-learning based technologies for automated network 182 operation and other tasks. 184 AM: Alternate Marking, a flow performance measurement method, 185 specified in [RFC8321]. 187 BMP: BGP Monitoring Protocol, specified in [RFC7854]. 189 DNP: Dynamic Network Probe, referring to programmable in-network 190 sensors for network monitoring and measurement. 192 DPI: Deep Packet Inspection, referring to the techniques that 193 examines packet beyond packet L3/L4 headers. 195 gNMI: gRPC Network Management Interface, a network management 196 protocol from OpenConfig Operator Working Group, mainly 197 contributed by Google. See [gnmi] for details. 199 gRPC: gRPC Remote Procedure Call, a open source high performance RPC 200 framework that gNMI is based on. See [grpc] for details. 202 IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. 204 IOAM: In-situ OAM, a dataplane on-path telemetry technique. 206 NETCONF: Network Configuration Protocol, specified in [RFC6241]. 208 NetFlow: A Cisco protocol for flow record collecting, described in 209 [RFC3594]. 211 Network Telemetry: The process and instrumentation for acquiring and 212 utilizing network data remotely for network monitoring and 213 operation. A general term for a large set of network visibility 214 techniques and protocols, concerning aspects like data generation, 215 collection, correlation, and consumption. Network telemetry 216 addresses the current network operation issues and enables smooth 217 evolution toward future intent-driven autonomous networks. 219 NMS: Network Management System, referring to applications that allow 220 network administrators manage a network. 222 OAM: Operations, Administration, and Maintenance. A group of 223 network management functions that provide network fault 224 indication, fault localization, performance information, and data 225 and diagnosis functions. Most conventional network monitoring 226 techniques and protocols belong to network OAM. 228 PBT: Postcard-Based Telemetry, a dataplane on-path telemetry 229 technique. 231 SMIv2 Structure of Management Information Version 2, specified in 232 [RFC2578]. 234 SNMP: Simple Network Management Protocol. Version 1 and 2 are 235 specified in [RFC1157] and [RFC3416], respectively. 237 YANG: The abbreviation of "Yet Another Next Generation". YANG is a 238 data modeling language for the definition of data sent over 239 network management protocols such as the NETCONF and RESTCONF. 240 YANG is defined in [RFC6020]. 242 YANG ECA A YANG model for Event-Condition-Action policies, defined 243 in [I-D.wwx-netmod-event-yang]. 245 YANG PUSH: A method to subscribe pushed data from remote YANG 246 datastore on network devices. Details are specified in [RFC8641] 247 and [RFC8639]. 249 3. Background 251 The term "big data" is used to describe the extremely large volume of 252 data sets that can be analyzed computationally to reveal patterns, 253 trends, and associations. Networks are undoubtedly a source of big 254 data because of their scale and the volume of network traffic they 255 forward. It is easy to see that network operations can benefit from 256 network big data. 258 Today one can access advanced big data analytics capability through a 259 plethora of commercial and open source platforms (e.g., Apache 260 Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine 261 learning). Thanks to the advance of computing and storage 262 technologies, network big data analytics gives network operators an 263 opportunity to gain network insights and move towards network 264 autonomy. Some operators start to explore the application of 265 Artificial Intelligence (AI) to make sense of network data. Software 266 tools can use the network data to detect and react on network faults, 267 anomalies, and policy violations, as well as predicting future 268 events. In turn, the network policy updates for planning, intrusion 269 prevention, optimization, and self-healing may be applied. 271 It is conceivable that an autonomic network [RFC7575] is the logical 272 next step for network evolution following Software Defined Network 273 (SDN), aiming to reduce (or even eliminate) human labor, make more 274 efficient use of network resources, and provide better services more 275 aligned with customer requirements. Intent-based Networking (IBN) 276 [I-D.irtf-nmrg-ibn-concepts-definitions] requires network visibility 277 and telemetry data in order to ensure that the network is behaving as 278 intended. Although it takes time to reach the ultimate goal, the 279 journey has started nevertheless. 281 However, while the data processing capability is improved and 282 applications are hungry for more data, the networks lag behind in 283 extracting and translating network data into useful and actionable 284 information in efficient ways. The system bottleneck is shifting 285 from data consumption to data supply. Both the number of network 286 nodes and the traffic bandwidth keep increasing at a fast pace. The 287 network configuration and policy change at smaller time slots than 288 before. More subtle events and fine-grained data through all network 289 planes need to be captured and exported in real time. In a nutshell, 290 it is a challenge to get enough high-quality data out of the network 291 in a manner that is efficient, timely, and flexible. Therefore, we 292 need to survey the existing technologies and protocols and identify 293 any potential gaps. 295 In the remainder of this section, first we clarify the scope of 296 network data (i.e., telemetry data) concerned in the context. Then, 297 we discuss several key use cases for today's and future network 298 operations. Next, we show why the current network OAM techniques and 299 protocols are insufficient for these use cases. The discussion 300 underlines the need of new methods, techniques, and protocols which 301 we assign under the umbrella term - Network Telemetry. 303 3.1. Telemetry Data Coverage 305 Any information that can be extracted from networks (including data 306 plane, control plane, and management plane) and used to gain 307 visibility or as basis for actions is considered telemetry data. It 308 includes statistics, event records and logs, snapshots of state, 309 configuration data, etc. It also covers the outputs of any active 310 and passive measurements [RFC7799]. Specially, raw data can be 311 processed in-network before being sent to a data consumer. Such 312 processed data is also considered telemetry data. A classification 313 of telemetry data is provided in Section 5. 315 3.2. Use Cases 317 The following set of use cases is essential for network operations. 318 While the list is by no means exhaustive, it is enough to highlight 319 the requirements for data velocity, variety, volume, and veracity in 320 networks. 322 o Security: Network intrusion detection and prevention systems need 323 to monitor network traffic and activities and act upon anomalies. 324 Given increasingly sophisticated attack vector coupled with 325 increasingly severe consequences of security breaches, new tools 326 and techniques need to be developed, relying on wider and deeper 327 visibility into networks. 329 o Policy and Intent Compliance: Network policies are the rules that 330 constraint the services for network access, provide service 331 differentiation, or enforce specific treatment on the traffic. 332 For example, a service function chain is a policy that requires 333 the selected flows to pass through a set of ordered network 334 functions. Intent, as defined in 336 [I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational 337 goal that a network should meet and outcomes that a network is 338 supposed to deliver, defined in a declarative manner without 339 specifying how to achieve or implement them. An intent requires a 340 complex translation and mapping process before being applied on 341 networks. While a policy or an intent is enforced, the compliance 342 needs to be verified and monitored continuously, relying on 343 visibility that is provided through network telemetry data, and 344 any violation needs to be reported immediately. 346 o SLA Compliance: A Service-Level Agreement (SLA) defines the level 347 of service a user expects from a network operator, which include 348 the metrics for the service measurement and remedy/penalty 349 procedures when the service level misses the agreement. Users 350 need to check if they get the service as promised and network 351 operators need to evaluate how they can deliver the services that 352 can meet the SLA based on realtime network telemetry data, 353 including data from network measurements. 355 o Root Cause Analysis: Any network failure can be the effect of a 356 sequence of chained events. Troubleshooting and recovery require 357 quick identification of the root cause of any observable issues. 358 However, the root cause is not always straightforward to identify, 359 especially when the failure is sporadic and the number of event 360 messages, both related and unrelated to the same cause, is 361 overwhelming. While machine learning technologies can be used for 362 root cause analysis, it up to the network to sense and provide the 363 relevant data to feed into machine learning applications. 365 o Network Optimization: This covers all short-term and long-term 366 network optimization techniques, including load balancing, Traffic 367 Engineering (TE), and network planning. Network operators are 368 motivated to optimize their network utilization and differentiate 369 services for better Return On Investment (ROI) or lower Capital 370 Expenditures (CAPEX). The first step is to know the real-time 371 network conditions before applying policies for traffic 372 manipulation. In some cases, micro-bursts need to be detected in 373 a very short time-frame so that fine-grained traffic control can 374 be applied to avoid network congestion. Long-term planning of 375 network capacity and topology requires analysis of real-world 376 network telemetry data that is obtained over long periods of time. 378 o Event Tracking and Prediction: The visibility into traffic path 379 and performance is critical for services and applications that 380 rely on healthy network operation. Numerous related network 381 events are of interest to network operators. For example, Network 382 operators want to learn where and why packets are dropped for an 383 application flow. They also want to be warned of issues in 384 advance so proactive actions can be taken to avoid catastrophic 385 consequences. 387 3.3. Challenges 389 For a long time, network operators have relied upon SNMP [RFC3416], 390 Command-Line Interface (CLI), or Syslog to monitor the network. Some 391 other OAM techniques as described in [RFC7276] are also used to 392 facilitate network troubleshooting. These conventional techniques 393 are not sufficient to support the above use cases for the following 394 reasons: 396 o Most use cases need to continuously monitor the network and 397 dynamically refine the data collection in real-time. The poll- 398 based low-frequency data collection is ill-suited for these 399 applications. Subscription-based streaming data directly pushed 400 from the data source (e.g., the forwarding chip) is preferred to 401 provide enough data quantity and precision at scale. 403 o Comprehensive data is needed from packet processing engine to 404 traffic manager, from line cards to main control board, from user 405 flows to control protocol packets, from device configurations to 406 operations, and from physical layer to application layer. 407 Conventional OAM only covers a narrow range of data (e.g., SNMP 408 only handles data from the Management Information Base (MIB)). 409 Traditional network devices cannot provide all the necessary 410 probes. More open and programmable network devices are therefore 411 needed. 413 o Many application scenarios need to correlate network-wide data 414 from multiple sources (i.e., from distributed network devices, 415 different components of a network device, or different network 416 planes). A piecemeal solution is often lacking the capability to 417 consolidate the data from multiple sources. The composition of a 418 complete solution, as partly proposed by Autonomic Resource 419 Control Architecture(ARCA) 420 [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and 421 guided by a comprehensive framework. 423 o Some of the conventional OAM techniques (e.g., CLI and Syslog) 424 lack a formal data model. The unstructured data hinder the tool 425 automation and application extensibility. Standardized data 426 models are essential to support the programmable networks. 428 o Although some conventional OAM techniques support data push (e.g., 429 SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data 430 are limited to only predefined management plane warnings (e.g., 431 SNMP Trap) or sampled user packets (e.g., sFlow). Network 432 operators require the data with arbitrary source, granularity, and 433 precision which are beyond the capability of the existing 434 techniques. 436 o The conventional passive measurement techniques can either consume 437 excessive network resources and render excessive redundant data, 438 or lead to inaccurate results; on the other hand, the conventional 439 active measurement techniques can interfere with the user traffic 440 and their results are indirect. Techniques that can collect 441 direct and on-demand data from user traffic are more favorable. 443 These challenges were addressed by newer standards and techniques 444 (e.g., IPFIX/Netflow, PSAMP, IOAM, and YANG-Push) and more are 445 emerging. These standards and techniques need to be recognized and 446 accommodated in a new framework. 448 3.4. Network Telemetry 450 Network telemetry has emerged as a mainstream technical term to refer 451 to the network data collection and consumption techniques. Several 452 network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and 453 gRPC [grpc]) have been widely deployed. Network telemetry allows 454 separate entities to acquire data from network devices so that data 455 can be visualized and analyzed to support network monitoring and 456 operation. Network telemetry covers the conventional network OAM and 457 has a wider scope. It is expected that network telemetry can provide 458 the necessary network insight for autonomous networks and address the 459 shortcomings of conventional OAM techniques. 461 Network telemetry usually assumes machines as data consumers rather 462 than human operators. Hence, the network telemetry can directly 463 trigger the automated network operation, while in contrast some 464 conventional OAM tools are designed and used to help human operators 465 to monitor and diagnose the networks and guide manual network 466 operations. Such a proposition leads to very different techniques. 468 Although new network telemetry techniques are emerging and subject to 469 continuous evolution, several characteristics of network telemetry 470 have been well accepted. Note that network telemetry is intended to 471 be an umbrella term covering a wide spectrum of techniques, so the 472 following characteristics are not expected to be held by every 473 specific technique. 475 o Push and Streaming: Instead of polling data from network devices, 476 telemetry collectors subscribe to streaming data pushed from data 477 sources in network devices. 479 o Volume and Velocity: The telemetry data is intended to be consumed 480 by machines rather than by human being. Therefore, the data 481 volume can be huge and the processing is optimized for the needs 482 of automation in realtime. 484 o Normalization and Unification: Telemetry aims to address the 485 overall network automation needs. Efforts are made to normalize 486 the data representation and unify the protocols, so to simplify 487 data analysis and provide integrated analysis across heterogeneous 488 devices and data sources across a network. 490 o Model-based: The telemetry data is modeled in advance which allows 491 applications to configure and consume data with ease. 493 o Data Fusion: The data for a single application can come from 494 multiple data sources (e.g., cross-domain, cross-device, and 495 cross-layer) and needs to be correlated to take effect. 497 o Dynamic and Interactive: Since the network telemetry means to be 498 used in a closed control loop for network automation, it needs to 499 run continuously and adapt to the dynamic and interactive queries 500 from the network operation controller. 502 In addition, an ideal network telemetry solution may also have the 503 following features or properties: 505 o In-Network Customization: The data that is generated can be 506 customized in network at run-time to cater to the specific need of 507 applications. This needs the support of a programmable data plane 508 which allows probes with custom functions to be deployed at 509 flexible locations. 511 o In-Network Data Aggregation and Correlation: Network devices and 512 aggregation points can work out which events and what data needs 513 to be stored, reported, or discarded thus reducing the load on the 514 central collection and processing points while still ensuring that 515 the right information is ready to be processed in a timely way. 517 o In-Network Processing: Sometimes it is not necessary or feasible 518 to gather all information to a central point to be processed and 519 acted upon. It is possible for the data processing to be done in 520 network, allowing reactive actions to be taken locally. 522 o Direct Data Plane Export: The data originated from the data plane 523 forwarding chips can be directly exported to the data consumer for 524 efficiency, especially when the data bandwidth is large and the 525 real-time processing is required. 527 o In-band Data Collection: In addition to the passive and active 528 data collection approaches, the new hybrid approach allows to 529 directly collect data for any target flow on its entire forwarding 530 path [I-D.song-opsawg-ifit-framework]. 532 It is worth noting that a network telemetry system should not be 533 intrusive to normal network operations by avoiding the pitfall of the 534 "observer effect". That is, it should not change the network 535 behavior and affect the forwarding performance. Otherwise, the whole 536 purpose of network telemetry is compromised. 538 Although in many cases a system for network telemetry involves a 539 remote data collecting and consuming entity, it is important to 540 understand that there are no inherent assumptions about how a system 541 should be architected. Telemetry data producers and consumers can 542 work in distributed or peer-to-peer fashions rather than assuming a 543 centralized data consuming entity. In such cases, a network node can 544 be the direct consumer of telemetry data from other nodes. 546 4. The Necessity of a Network Telemetry Framework 548 Network data analytics and machine-learning technologies are applied 549 for network operation automation, relying on abundant and coherent 550 data from networks. Data acquisition that is limited to a single 551 source and static in nature will in many cases not be sufficient to 552 meet an application's telemetry data needs. As a result, multiple 553 data sources, involving a variety of techniques and standards, will 554 need to be integrated. It is desirable to have a framework that 555 classifies and organizes different telemetry data source and types, 556 defines different components of a network telemetry system and their 557 interactions, and helps coordinate and integrate multiple telemetry 558 approaches across layers. This allows flexible combinations of data 559 for different applications, while normalizing and simplifying 560 interfaces. In detail, such a framework would benefit application 561 development for the following reasons: 563 o Future networks, autonomous or otherwise, depend on holistic and 564 comprehensive network visibility. All the use cases and 565 applications are better to be supported uniformly and coherently 566 under a single intelligent agent using an integrated, converged 567 mechanism and common telemetry data representations wherever 568 feasible. Therefore, the protocols and mechanisms should be 569 consolidated into a minimum yet comprehensive set. A telemetry 570 framework can help to normalize the technique developments. 572 o Network visibility presents multiple viewpoints. For example, the 573 device viewpoint takes the network infrastructure as the 574 monitoring object from which the network topology and device 575 status can be acquired; the traffic viewpoint takes the flows or 576 packets as the monitoring object from which the traffic quality 577 and path can be acquired. An application may need to switch its 578 viewpoint during operation. It may also need to correlate a 579 service and its impact on user experience to acquire the 580 comprehensive information. 582 o Applications require network telemetry to be elastic in order to 583 make efficient use of network resources and reduce the impact of 584 processing related to network telemetry on network performance. 585 For example, routine network monitoring should cover the entire 586 network with a low data sampling rate. Only when issues arise or 587 critical trends emerge should telemetry data source be modified 588 and telemetry data rates boosted as needed. 590 o Efficient data fusion is critical for applications to reduce the 591 overall quantity of data and improve the accuracy of analysis. 593 A telemetry framework collects together all of the telemetry-related 594 works from different sources and working groups within IETF. This 595 makes it possible to assemble a comprehensive network telemetry 596 system and to avoid repetitious or redundant work. The framework 597 should cover the concepts and components from the standardization 598 perspective. This document describes the modules which make up a 599 network telemetry framework and decomposes the telemetry system into 600 a set of distinct components that existing and future work can easily 601 map to. 603 5. Network Telemetry Framework 605 The top level network telemetry framework partitions the network 606 telemetry into four modules based on the telemetry data object source 607 and represents their relationship. At the next level, the framework 608 decomposes each module into separate components. Each of the modules 609 follows the same underlying structure, with one component dedicated 610 to the configuration of data subscriptions and data sources, a second 611 component dedicated to encoding and exporting data, and a third 612 component instrumenting the generation of telemetry related to the 613 underlying resources. Throughout the framework, the same set of 614 abstract data acquiring mechanisms and data types are applied. The 615 two-level architecture with the uniform data abstraction helps 616 accurately pinpoint a protocol or technique to its position in a 617 network telemetry system or disaggregate a network telemetry system 618 into manageable parts. 620 5.1. Top Level Modules 622 Telemetry can be applied on the forwarding plane, the control plane, 623 and the management plane in a network, as well as other sources out 624 of the network, as shown in Figure 1. Therefore, we categorize the 625 network telemetry into four distinct modules with each having its own 626 interface to Network Operation Applications. 628 +------------------------------+ 629 | | 630 | Network Operation |<-------+ 631 | Applications | | 632 | | | 633 +------------------------------+ | 634 ^ ^ ^ | 635 | | | | 636 V | V V 637 +-----------|---+--------------+ +-----------+ 638 | | | | | | 639 | Control Pl|ane| | | External | 640 | Telemetry | <---> | | Data and | 641 | | | | | Event | 642 | ^ V | Management | | Telemetry | 643 +------|--------+ Plane | | | 644 | V | Telemetry | +-----------+ 645 | Forwarding | | 646 | Plane <---> | 647 | Telemetry | | 648 | | | 649 +---------------+--------------+ 651 Figure 1: Modules in Layer Category of NTF 653 The rationale of this partition lies in the different telemetry data 654 objects which result in different data source and export locations. 655 Such differences have profound implications on in-network data 656 programming and processing capability, data encoding and transport 657 protocol, and required data bandwidth and latency. 659 We summarize the major differences of the four modules in the 660 following table. They are compared from six angles: 662 o Data Object 664 o Data Export Location 666 o Data Model 667 o Data Encoding 669 o Telemetry Protocol 671 o Transport Method 673 Data Object is the target and source of each module. Because the 674 data source varies, the location where data is mostly conveniently 675 exported also varies. For example, forwarding plane data mainly 676 originates from the fast path(e.g., forwarding chips) while control 677 plane data mainly originates from the slow path (e.g., main control 678 CPU). For convenience and efficiency, it is preferred to export the 679 data from locations near the source. Because each location that can 680 export data has different capability, the proper data model, 681 encoding, and transport method cannot be kept the same. For example, 682 the forwarding chip has high throughput but limited capacity for 683 processing complex data and maintaining states, while the main 684 control CPU is capable of complex data and state processing, but has 685 limited bandwidth for high throughput data. As a result, the 686 suitable telemetry protocol for each module can be different. Some 687 representative techniques are shown in the corresponding table blocks 688 to highlight the technical diversity of these modules. Note that the 689 selected techniques just reflect the de-facto state of the art and 690 are not exhaustive. The key point is that one cannot expect to use a 691 universal protocol to cover all the network telemetry requirements. 693 +---------+--------------+--------------+--------------+-----------+ 694 | Module | Control | Management | Forwarding | External | 695 | | Plane | Plane | Plane | Data | 696 +---------+--------------+--------------+--------------+-----------+ 697 |Object | control | config. & | flow & packet| terminal, | 698 | | protocol & | operation | QoS, traffic | social & | 699 | | signaling, | state, MIB | stat., buffer| environ- | 700 | | RIB, ACL | | & queue stat.| mental | 701 +---------+--------------+--------------+--------------+-----------+ 702 |Export | main control | main control | fwding chip | various | 703 |Location | CPU, | CPU | or linecard | | 704 | | linecard CPU | | CPU; main | | 705 | | or fwding | | control CPU | | 706 | | chip | | unlikely | | 707 +---------+--------------+--------------+--------------+-----------+ 708 |Data | YANG, | MIB, syslog, | template, | YANG | 709 |Model | custom | YANG, | YANG, | | 710 | | | custom | custom | | 711 +---------+--------------+--------------+--------------+-----------+ 712 |Data | GPB, JSON, | GPB, JSON, | plain | GPB, JSON | 713 |Encoding | XML, plain | XML | | XML, plain| 714 +---------+--------------+--------------+--------------+-----------+ 715 |Protocol | gRPC,NETCONF,| gRPC,NETCONF,| IPFIX, mirror| gRPC | 716 | | IPFIX,mirror | | | | 717 +---------+--------------+--------------+--------------+-----------+ 718 |Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP | 719 | | UDP | | | UDP | 720 +---------+--------------+--------------+--------------+-----------+ 722 Figure 2: Comparison of the Data Object Modules 724 Note that the interaction with the applications that consume network 725 telemetry data can be indirect. Some in-device data transfer is 726 possible. For example, in the management plane telemetry, the 727 management plane may need to acquire data from the data plane. Some 728 of the operational states can only be derived from data plane data 729 sources such as the interface status and statistics. For another 730 example, obtaining control plane telemetry data may require the 731 ability access the Forwarding Information Base (FIB) of the data 732 plane. 734 On the other hand, an application may involve more than one plane and 735 interact with multiple planes simultaneously. For example, an SLA 736 compliance application may require both the data plane telemetry and 737 the control plane telemetry. 739 The requirements and challenges for each module are summarized as 740 follows (note that the requirements may pertain across all telemetry 741 modules; however, we emphasize those that are most pronounced for a 742 particular plane). 744 5.1.1. Management Plane Telemetry 746 The management plane of network elements interacts with the Network 747 Management System (NMS), and provides information such as performance 748 data, network logging data, network warning and defects data, and 749 network statistics and state data. The management plane includes 750 many protocols, including some that are considered "legacy", such as 751 SNMP and syslog. Regardless the protocol, management plane telemetry 752 must address the following requirements: 754 o Convenient Data Subscription: An application should have the 755 freedom to choose the data export means such as the data types and 756 the export frequency. 758 o Structured Data: For automatic network operation, machines will 759 replace human for network data comprehension. The schema 760 languages such as YANG can efficiently describe structured data 761 and normalize data encoding and transformation. 763 o High Speed Data Transport: In order to keep up with the velocity 764 of information, a server needs to be able to send large amounts of 765 data at high frequency. Compact encoding formats are needed to 766 compress the data and improve the data transport efficiency. The 767 subscription mode, by replacing the query mode, reduces the 768 interactions between clients and servers and helps to improve the 769 server's efficiency. 771 5.1.2. Control Plane Telemetry 773 The control plane telemetry refers to the health condition monitoring 774 of different network control protocols covering Layer 2 to Layer 7. 775 Keeping track of the running status of these protocols is beneficial 776 for detecting, localizing, and even predicting various network 777 issues, as well as network optimization, in real-time and in fine 778 granularity. Some particular challenges and issues faced by the 779 control plane telemetry are as follows: 781 o One challenging problem for the control plane telemetry is how to 782 correlate the End-to-End (E2E) Key Performance Indicators (KPI) to 783 a specific layer's KPIs. For example, an IPTV user may describe 784 his User Experience (UE) by the video fluency and definition. 785 Then in case of an unusually poor UE KPI or a service 786 disconnection, it is non-trivial to delimit and pinpoint the issue 787 in the responsible protocol layer (e.g., the Transport Layer or 788 the Network Layer), the responsible protocol (e.g., ISIS or BGP at 789 the Network Layer), and finally the responsible device(s) with 790 specific reasons. 792 o Traditional OAM-based approaches for control plane KPI measurement 793 include PING (L3), Tracert (L3), Y.1731 (L2), and so on. One 794 common issue behind these methods is that they only measure the 795 KPIs instead of reflecting the actual running status of these 796 protocols, making them less effective or efficient for control 797 plane troubleshooting and network optimization. 799 o An example of the control plane telemetry is the BGP monitoring 800 protocol (BMP), it is currently used to monitoring the BGP routes 801 and enables rich applications, such as BGP peer analysis, AS 802 analysis, prefix analysis, security analysis, and so on. However, 803 the monitoring of other layers, protocols and the cross-layer, 804 cross-protocol KPI correlations are still in their infancy (e.g., 805 the IGP monitoring is missing), which require further research. 807 5.1.3. Forwarding Plane Telemetry 809 An effective forwarding plane telemetry system relies on the data 810 that the network device can expose. The quality, quantity, and 811 timeliness of data must meet some stringent requirements. This 812 raises some challenges to the network data plane devices where the 813 first hand data originates. 815 o A data plane device's main function is user traffic processing and 816 forwarding. While supporting network visibility is important, the 817 telemetry is just an auxiliary function, and it should not impede 818 normal traffic processing and forwarding (i.e., the performance is 819 not lowered and the behavior is not altered due to the telemetry 820 functions). 822 o Network operation applications require end-to-end visibility 823 across various sources, which can result in a huge volume of data. 824 However, the sheer data quantity should not exhaust the network 825 bandwidth, regardless of the data delivery approach (i.e., whether 826 through in-band or out-of-band channels). 828 o The data plane devices must provide timely data with the minimum 829 possible delay. Long processing, transport, storage, and analysis 830 delay can impact the effectiveness of the control loop and even 831 render the data useless. 833 o The data should be structured and labeled, and easy for 834 applications to parse and consume. At the same time, the data 835 types needed by applications can vary significantly. The data 836 plane devices need to provide enough flexibility and 837 programmability to support the precise data provision for 838 applications. 840 o The data plane telemetry should support incremental deployment and 841 work even though some devices are unaware of the system. This 842 challenge is highly relevant to the standards and legacy networks. 844 Although not specific to the forwarding plane, these challenges are 845 more difficult to the forwarding plane because of the limited 846 resource and flexibility. The data plane programmability is 847 essential to support network telemetry. Newer data plane forwarding 848 chips are equipped with advanced telemetry features and provide 849 flexibility to support customized telemetry functions. 851 Technique Taxonomy: concerning about how one instruments the 852 telemetry, there can be multiple possible dimensions to classify the 853 forwarding plane telemetry techniques. 855 o Active, Passive, and Hybrid: This dimension concerns about the 856 end-to-end measurement. Active and passive methods (as well as 857 the hybrid types) are well documented in [RFC7799]. Passive 858 methods include TCPDUMP, IPFIX [RFC7011], sflow, and traffic 859 mirroring. These methods usually have low data coverage. The 860 bandwidth cost is very high in order to improve the data coverage. 861 On the other hand, active methods include Ping, OWAMP [RFC4656], 862 TWAMP [RFC5357], and Cisco's SLA Protocol [RFC6812]. These 863 methods are intrusive and only provide indirect network 864 measurement results. Hybrid methods, including in-situ OAM 865 [I-D.ietf-ippm-ioam-data], Alternate-Marking (AM) [RFC8321], and 866 Multipoint Alternate Marking 867 [I-D.fioccola-ippm-multipoint-alt-mark], provide a well-balanced 868 and more flexible approach. However, these methods are also more 869 complex to implement. 871 o In-Band and Out-of-Band: The telemetry data, before being exported 872 to some collector, can be carried in user packets. Such methods 873 are considered in-band (e.g., in-situ OAM 874 [I-D.ietf-ippm-ioam-data]). If the telemetry data is directly 875 exported to some collector without modifying the user packets, 876 such methods are considered out-of-band (e.g., postcard-based 877 INT). It is possible to have hybrid methods. For example, only 878 the telemetry instruction or partial data is carried by user 879 packets (e.g., AM [RFC8321]). 881 o E2E and In-Network: Some E2E methods start from and end at the 882 network end hosts (e.g., Ping). The other methods work in 883 networks and are transparent to end hosts. However, if needed, 884 in-network methods can be easily extended into end hosts. 886 o Data Subject: Depending on the telemetry objective, the methods 887 can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), 888 path-based (e.g., Traceroute), and node-based (e.g., IPFIX 889 [RFC7011]). The various data objects can be packet, flow record, 890 measurement, states, and signal. 892 5.1.4. External Data Telemetry 894 Events that occur outside the boundaries of the network system are 895 another important source of network telemetry. Correlating both 896 internal telemetry data and external events with the requirements of 897 network systems, as presented in 898 [I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and 899 functional advantage to management operations. 901 As with other sources of telemetry information, the data and events 902 must meet strict requirements, especially in terms of timeliness, 903 which is essential to properly incorporate external event information 904 to management cycles. The specific challenges are described as 905 follows: 907 o The role of external event detector can be played by multiple 908 elements, including hardware (e.g. physical sensors, such as 909 seismometers) and software (e.g. Big Data sources that analyze 910 streams of information, such as Twitter messages). Thus, the 911 transmitted data must support different shapes but, at the same 912 time, follow a common but extensible schema. 914 o Since the main function of the external event detectors is to 915 perform the notifications, their timeliness is assumed. However, 916 once messages have been dispatched, they must be quickly collected 917 and inserted into the control plane with variable priority, which 918 will be high for important sources and/or important events and low 919 for secondary ones. 921 o The schema used by external detectors must be easily adopted by 922 current and future devices and applications. Therefore, it must 923 be easily mapped to current information models, such as in terms 924 of YANG. 926 Organizing together both internal and external telemetry information 927 will be key for the general exploitation of the management 928 possibilities of current and future network systems, as reflected in 929 the incorporation of cognitive capabilities to new hardware and 930 software (virtual) elements. 932 5.2. Second Level Function Components 934 Reflecting the best current practice, the telemetry module at each 935 plane is further partitioned into five distinct components: 937 o Data Query, Analysis, and Storage: This component works at the 938 application layer. It is a part of the network management system 939 at the receiver side. On the one hand, it is responsible for 940 issuing data requirements. The data of interest can be modeled 941 data through configuration or custom data through programming. 942 The data requirements can be queries for one-shot data or 943 subscriptions for events or streaming data. On the other hand, it 944 receives, stores, and processes the returned data from network 945 devices. Data analysis can be interactive to initiate further 946 data queries. This component can reside in either network devices 947 or remote controllers. It can be centralized and distributed, and 948 involve one or more instances. 950 o Data Configuration and Subscription: This component deploys data 951 queries on devices. It determines the protocol and channel for 952 applications to acquire desired data. This component is also 953 responsible for configuring the desired data that might not be 954 directly available form data sources. The subscription data can 955 be described by models, templates, or programs. 957 o Data Encoding and Export: This component determines how telemetry 958 data is delivered to the data analysis and storage component. The 959 data encoding and the transport protocol may vary due to the data 960 exporting location. 962 o Data Generation and Processing: The requested data needs to be 963 captured, processed, and formatted in network devices from raw 964 data sources. This may involve in-network computing and 965 processing on either the fast path or the slow path in network 966 devices. 968 o Data Object and Source: This component determines the monitoring 969 object and original data source. The data source usually just 970 provides raw data which needs further processing. A data source 971 can be considered a probe. Some data sources can be dynamically 972 installed, while others will be more static. 974 +----------------------------------------+ 975 +----------------------------------------+ | 976 | | | 977 | Data Query, Analysis, & Storage | | 978 | | + 979 +-------+++ -----------------------------+ 980 ||| ^^^ 981 ||| ||| 982 ||V ||| 983 +--+V--------------------+++------------+ 984 +-----V---------------------+------------+ | 985 +---------------------+-------+----------+ | | 986 | Data Configuration | | | | 987 | & Subscription | Data Encoding | | | 988 | (model, template, | & Export | | | 989 | & program) | | | | 990 +---------------------+------------------| | | 991 | | | | 992 | Data Generation | | | 993 | & Processing | | | 994 | | | | 995 +----------------------------------------| | | 996 | | | | 997 | Data Object and Source | |-+ 998 | |-+ 999 +----------------------------------------+ 1001 Figure 3: Components in the Network Telemetry Framework 1003 5.3. Data Acquisition Mechanism and Type Abstraction 1005 Broadly speaking, network data can be acquired through subscription 1006 (push) and query (poll). A subscription is a contract between 1007 publisher and subscriber. After initial setup, the subscribed data 1008 is automatically delivered to registered subscribers until the 1009 subscription expires. Subscription can be partitioned into two sub 1010 modes: the Publish-Subscription (Pub-Sub) mode and the Subscription- 1011 Publish (Sub-Pub) mode. In the Pub-Sub mode, a publisher publishes 1012 pre-defined data and any qualified subscribers can subscribe the data 1013 as-is. In the Sub-Pub mode, a subscriber initiates a data request 1014 and sends it to a publisher; the publisher will deliver the requested 1015 data when available. While for both modes, the subscribed data is 1016 pushed to the subscriber, the Sub-Pub mode allows subscribers to 1017 customize their subscriptions. 1019 In contrast, query is used when a querier expects immediate and one- 1020 off feedback from network devices. The queried data may be directly 1021 extracted from some specific data source, or synthesized and 1022 processed from raw data. Query suits for interactive network 1023 telemetry applications. 1025 There are four types of data from network devices that a telemetry 1026 data consumer can subscribe or query: 1028 o Simple Data: The data that are steadily available from some data 1029 store or static probes in network devices. such data can be 1030 specified by YANG model. 1032 o Complex Data: The data need to be synthesized or processed in 1033 network from raw data from one or more network devices. The data 1034 processing function can be statically or dynamically loaded into 1035 network devices. 1037 o Event-triggered Data: The data are conditionally acquired based on 1038 the occurrence of some events. It can be actively pushed through 1039 subscription or passively polled through query. There are many 1040 ways to model events, including using Finite State Machine (FSM) 1041 or Event Condition Action (ECA) [I-D.wwx-netmod-event-yang]. 1043 o Streaming Data: The data are continuously generated. It can be 1044 time series or the dump of databases. The streaming data reflect 1045 realtime network states and metrics and require large bandwidth 1046 and processing power. The streaming data are always actively 1047 pushed to the subscribers. 1049 The above data types are not mutually exclusive. Rather, they often 1050 overlap. For example, event-triggered data can be simple or complex, 1051 and streaming data can be simple, complex, or triggered by events. 1052 The relationships of these data types are illustrated in Figure 4. 1054 +--------------+ 1055 +------>| Simple Data |<------+ 1056 | +------------- + | 1057 | ^ | 1058 | | | 1059 | +------+-------+ | 1060 | +-->| Complex Data |<--+ | 1061 | | +--------------+ | | 1062 | | | | 1063 | | | | 1064 +-------+---+----------+ +-----+---+-------+ 1065 | Event-triggered Data |<----+ Streaming Data | 1066 +----------------------+ +-----------------+ 1068 Figure 4: Data Type Relationship 1070 Subscription usually deals with event-triggered data and streaming 1071 data, and query usually deals with simple data and complex data. But 1072 the other ways are also possible. The conventional OAM techniques 1073 are mostly about querying simple data. While these techniques are 1074 still useful, more advanced network telemetry techniques are designed 1075 mainly for event-triggered or streaming data subscription, and 1076 complex data query. 1078 5.4. Mapping Existing Mechanisms into the Framework 1080 The following two tables show how the existing mechanisms (mainly 1081 published in IETF and with the emphasis on the latest new 1082 technologies) are positioned in the framework. Given the vast body 1083 of existing work, we cannot provide an exhaustive list, so the 1084 mechanisms in the tables should be considered as just examples. 1085 Also, some comprehensive protocols and techniques may cover multiple 1086 aspects or modules of the framework, so a name in a block only 1087 emphasizes one particular characteristic of it. More details about 1088 some listed mechanisms can be found in Appendix A. 1090 The first table is based on the data acquisition mechanisms and data 1091 types. 1093 +----------------------+-----------+--------------+ 1094 | | Query | Subscription | 1095 +----------------------+-----------+--------------+ 1096 | Simple Data | SNMP | YANG | 1097 +----------------------+-----------+--------------+ 1098 | Complex Data | DNP | YANG PUSH | 1099 +----------------------+-----------+--------------+ 1100 | Event-triggered Data | DNP | YANG PUSH | 1101 +----------------------+-----------+--------------+ 1102 | Streaming Data | N/A | gRPC | 1103 +----------------------+-----------+--------------+ 1105 Figure 5: Existing Work Mapping I 1107 The second table is based on the telemetry modules and components. 1109 +-------------+-----------------+---------------+--------------+ 1110 | | Management | Control | Forwarding | 1111 | | Plane | Plane | Plane | 1112 +-------------+-----------------+---------------+--------------+ 1113 | data config.| gRPC, NETCONF, | NETCONF/YANG | NETCONF/YANG,| 1114 | & subscribe | SMIv2,YANG PUSH | YANG PUSH | YANG PUSH | 1115 +-------------+-----------------+---------------+--------------+ 1116 | data gen. & | DNP, | DNP, | IOAM, PSAMP | 1117 | process | YANG | YANG | PBT, AM, | 1118 | | | | DNP | 1119 +-------------+-----------------+---------------+--------------+ 1120 | data | gRPC, NETCONF | BMP, NETCONF | IPFIX | 1121 | export | YANG PUSH | | | 1122 +-------------+-----------------+---------------+--------------+ 1124 Figure 6: Existing Work Mapping II 1126 6. Evolution of Network Telemetry Applications 1128 Network telemetry is a fast evolving technical area. As the network 1129 moves towards the automated operation, network telemetry applications 1130 undergo several stages of evolution which add new layer of 1131 requirements to the underlying network telemetry techniques. Each 1132 stage is built upon the techniques adopted by the previous stages 1133 plus some new requirements. 1135 Stage 0 - Static Telemetry: The telemetry data source and type are 1136 determined at design time. The network operator can only 1137 configure how to use it with limited flexibility. 1139 Stage 1 - Dynamic Telemetry: The custom telemetry data can be 1140 dynamically programmed or configured at runtime without 1141 interrupting the network operation, allowing a tradeoff among 1142 resource, performance, flexibility, and coverage. DNP is an 1143 effort towards this direction. 1145 Stage 2 - Interactive Telemetry: The network operator can 1146 continuously customize and fine tune the telemetry data in real 1147 time to reflect the network operation's visibility requirements. 1148 Compared with Stage 1, the changes are frequent based on the real- 1149 time feedback. At this stage, some tasks can be automated, but 1150 human operators still need to sit in the middle to make decisions. 1152 Stage 3 - Closed-loop Telemetry: The telemetry is free from the 1153 interference of human operators, except for generating the 1154 reports. The intelligent network operation engine automatically 1155 issues the telemetry data requests, analyzes the data, and updates 1156 the network operations in closed control loops. 1158 Existing technologies are ready for stage 0 and stage 1. Individual 1159 stage 2 and stage 3 applications are also possible now. However, the 1160 future autonomic networks may need a comprehensive operation 1161 management system which works at stage 2 and stage 3 to cover all the 1162 network operation tasks. A well-defined network telemetry framework 1163 is the first step towards this direction. 1165 7. Security Considerations 1167 The complexity of network telemetry raises significant security 1168 implications. For example, telemetry data can be manipulated to 1169 exhaust various network resources at each plane as well as the data 1170 consumer; falsified or tampered data can mislead the decision making 1171 and paralyze networks; wrong configuration and programming for 1172 telemetry is equally harmful. 1174 Given that this document has proposed a framework for network 1175 telemetry and the telemetry mechanisms discussed are more extensive 1176 (in both message frequency and traffic amount) than the conventional 1177 network OAM concepts, we must also reflect that various new security 1178 considerations may also arise. A number of techniques already exist 1179 for securing the forwarding plane, the control plane, and the 1180 management plane in a network, but it is important to consider if any 1181 new threat vectors are now being enabled via the use of network 1182 telemetry procedures and mechanisms. 1184 Security considerations for networks that use telemetry methods may 1185 include: 1187 o Telemetry framework trust and policy model; 1189 o Role management and access control for enabling and disabling 1190 telemetry capabilities; 1192 o Protocol transport used telemetry data and inherent security 1193 capabilities; 1195 o Telemetry data stores, storage encryption and methods of access; 1197 o Tracking telemetry events and any abnormalities that might 1198 identify malicious attacks using telemetry interfaces. 1200 o Authentication and signing of telemetry data to make data more 1201 trustworthy. 1203 Some of the security considerations highlighted above may be 1204 minimized or negated with policy management of network telemetry. In 1205 a network telemetry deployment it would be advantageous to separate 1206 telemetry capabilities into different classes of policies, i.e., Role 1207 Based Access Control and Event-Condition-Action policies. Also, 1208 potential conflicts between network telemetry mechanisms must be 1209 detected accurately and resolved quickly to avoid unnecessary network 1210 telemetry traffic propagation escalating into an unintended or 1211 intended denial of service attack. 1213 Further study of the security issues will be required, and it is 1214 expected that the secuirty mechanisms and protocols are developed and 1215 deployed along with a network telemetry system. 1217 8. IANA Considerations 1219 This document includes no request to IANA. 1221 9. Contributors 1223 The other contributors of this document are listed as follows. 1225 o Tianran Zhou 1227 o Zhenbin Li 1229 o Zhenqiang Li 1231 o Daniel King 1232 o Adrian Farrel 1234 o Alexander Clemm 1236 10. Acknowledgments 1238 We would like to thank Greg Mirsky, Randy Presuhn, Joe Clarke, Victor 1239 Liu, James Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, 1240 Parviz Yegani, Young Lee, Qin Wu, and many others who have provided 1241 helpful comments and suggestions to improve this document. 1243 11. Informative References 1245 [gnmi] "gNMI - gRPC Network Management Interface", 1246 . 1249 [grpc] "gPPC, A high performance, open-source universal RPC 1250 framework", . 1252 [I-D.fioccola-ippm-multipoint-alt-mark] 1253 Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, 1254 "Multipoint Alternate Marking method for passive and 1255 hybrid performance monitoring", draft-fioccola-ippm- 1256 multipoint-alt-mark-04 (work in progress), June 2018. 1258 [I-D.ietf-grow-bmp-adj-rib-out] 1259 Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S. 1260 Zhuang, "Support for Adj-RIB-Out in BGP Monitoring 1261 Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-07 (work 1262 in progress), August 2019. 1264 [I-D.ietf-grow-bmp-local-rib] 1265 Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, 1266 "Support for Local RIB in BGP Monitoring Protocol (BMP)", 1267 draft-ietf-grow-bmp-local-rib-09 (work in progress), 1268 January 2021. 1270 [I-D.ietf-ippm-ioam-data] 1271 Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields 1272 for In-situ OAM", draft-ietf-ippm-ioam-data-11 (work in 1273 progress), November 2020. 1275 [I-D.ietf-netconf-distributed-notif] 1276 Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, 1277 "Subscription to Distributed Notifications", draft-ietf- 1278 netconf-distributed-notif-01 (work in progress), November 1279 2020. 1281 [I-D.ietf-netconf-udp-notif] 1282 Zheng, G., Zhou, T., Graf, T., Francois, P., and P. 1283 Lucente, "UDP-based Transport for Configured 1284 Subscriptions", draft-ietf-netconf-udp-notif-01 (work in 1285 progress), November 2020. 1287 [I-D.irtf-nmrg-ibn-concepts-definitions] 1288 Clemm, A., Ciavaglia, L., Granville, L., and J. Tantsura, 1289 "Intent-Based Networking - Concepts and Definitions", 1290 draft-irtf-nmrg-ibn-concepts-definitions-02 (work in 1291 progress), September 2020. 1293 [I-D.kumar-rtgwg-grpc-protocol] 1294 Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC 1295 Protocol", draft-kumar-rtgwg-grpc-protocol-00 (work in 1296 progress), July 2016. 1298 [I-D.openconfig-rtgwg-gnmi-spec] 1299 Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack, 1300 C., and C. Morrow, "gRPC Network Management Interface 1301 (gNMI)", draft-openconfig-rtgwg-gnmi-spec-01 (work in 1302 progress), March 2018. 1304 [I-D.pedro-nmrg-anticipated-adaptation] 1305 Martinez-Julia, P., "Exploiting External Event Detectors 1306 to Anticipate Resource Requirements for the Elastic 1307 Adaptation of SDN/NFV Systems", draft-pedro-nmrg- 1308 anticipated-adaptation-02 (work in progress), June 2018. 1310 [I-D.song-ippm-postcard-based-telemetry] 1311 Song, H., Zhou, T., Li, Z., Mirsky, G., Shin, J., and K. 1312 Lee, "Postcard-based On-Path Flow Data Telemetry using 1313 Packet Marking", draft-song-ippm-postcard-based- 1314 telemetry-08 (work in progress), October 2020. 1316 [I-D.song-opsawg-dnp4iq] 1317 Song, H. and J. Gong, "Requirements for Interactive Query 1318 with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 1319 (work in progress), June 2017. 1321 [I-D.song-opsawg-ifit-framework] 1322 Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- 1323 situ Flow Information Telemetry", draft-song-opsawg-ifit- 1324 framework-13 (work in progress), October 2020. 1326 [I-D.wwx-netmod-event-yang] 1327 WU, Q., Bryskin, I., Birkholz, H., Liu, X., and B. Claise, 1328 "A YANG Data model for ECA Policy Management", draft-wwx- 1329 netmod-event-yang-10 (work in progress), November 2020. 1331 [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, 1332 "Simple Network Management Protocol (SNMP)", RFC 1157, 1333 DOI 10.17487/RFC1157, May 1990, 1334 . 1336 [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. 1337 Schoenwaelder, Ed., "Structure of Management Information 1338 Version 2 (SMIv2)", STD 58, RFC 2578, 1339 DOI 10.17487/RFC2578, April 1999, 1340 . 1342 [RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981, 1343 DOI 10.17487/RFC2981, October 2000, 1344 . 1346 [RFC3416] Presuhn, R., Ed., "Version 2 of the Protocol Operations 1347 for the Simple Network Management Protocol (SNMP)", 1348 STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002, 1349 . 1351 [RFC3594] Duffy, P., "PacketCable Security Ticket Control Sub-Option 1352 for the DHCP CableLabs Client Configuration (CCC) Option", 1353 RFC 3594, DOI 10.17487/RFC3594, September 2003, 1354 . 1356 [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management 1357 Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, 1358 September 2004, . 1360 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 1361 Zekauskas, "A One-way Active Measurement Protocol 1362 (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, 1363 . 1365 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 1366 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 1367 RFC 5357, DOI 10.17487/RFC5357, October 2008, 1368 . 1370 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 1371 the Network Configuration Protocol (NETCONF)", RFC 6020, 1372 DOI 10.17487/RFC6020, October 2010, 1373 . 1375 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 1376 and A. Bierman, Ed., "Network Configuration Protocol 1377 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 1378 . 1380 [RFC6812] Chiba, M., Clemm, A., Medley, S., Salowey, J., Thombare, 1381 S., and E. Yedavalli, "Cisco Service-Level Assurance 1382 Protocol", RFC 6812, DOI 10.17487/RFC6812, January 2013, 1383 . 1385 [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, 1386 "Specification of the IP Flow Information Export (IPFIX) 1387 Protocol for the Exchange of Flow Information", STD 77, 1388 RFC 7011, DOI 10.17487/RFC7011, September 2013, 1389 . 1391 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 1392 Weingarten, "An Overview of Operations, Administration, 1393 and Maintenance (OAM) Tools", RFC 7276, 1394 DOI 10.17487/RFC7276, June 2014, 1395 . 1397 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 1398 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 1399 DOI 10.17487/RFC7540, May 2015, 1400 . 1402 [RFC7575] Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., 1403 Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic 1404 Networking: Definitions and Design Goals", RFC 7575, 1405 DOI 10.17487/RFC7575, June 2015, 1406 . 1408 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 1409 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 1410 May 2016, . 1412 [RFC7854] Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP 1413 Monitoring Protocol (BMP)", RFC 7854, 1414 DOI 10.17487/RFC7854, June 2016, 1415 . 1417 [RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, 1418 L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 1419 "Alternate-Marking Method for Passive and Hybrid 1420 Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, 1421 January 2018, . 1423 [RFC8639] Voit, E., Clemm, A., Gonzalez Prieto, A., Nilsen-Nygaard, 1424 E., and A. Tripathy, "Subscription to YANG Notifications", 1425 RFC 8639, DOI 10.17487/RFC8639, September 2019, 1426 . 1428 [RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications 1429 for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641, 1430 September 2019, . 1432 Appendix A. A Survey on Existing Network Telemetry Techniques 1434 In this non-normative appendix, we provide an overview of some 1435 existing techniques and standard proposals for each network telemetry 1436 module. 1438 A.1. Management Plane Telemetry 1440 A.1.1. Push Extensions for NETCONF 1442 NETCONF [RFC6241] is one popular network management protocol, which 1443 is also recommended by IETF. Although it can be used for data 1444 collection, NETCONF is good at configurations. YANG Push [RFC8641] 1445 [RFC8639] extends NETCONF and enables subscriber applications to 1446 request a continuous, customized stream of updates from a YANG 1447 datastore. Providing such visibility into changes made upon YANG 1448 configuration and operational objects enables new capabilities based 1449 on the remote mirroring of configuration and operational state. 1450 Moreover, distributed data collection mechanism 1451 [I-D.ietf-netconf-distributed-notif] via UDP based publication 1452 channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for 1453 the NETCONF based telemetry. 1455 A.1.2. gRPC Network Management Interface 1457 gRPC Network Management Interface (gNMI) 1458 [I-D.openconfig-rtgwg-gnmi-spec] is a network management protocol 1459 based on the gRPC [I-D.kumar-rtgwg-grpc-protocol] RPC (Remote 1460 Procedure Call) framework. With a single gRPC service definition, 1461 both configuration and telemetry can be covered. gRPC is an HTTP/2 1462 [RFC7540] based open source micro service communication framework. 1463 It provides a number of capabilities which are well-suited for 1464 network telemetry, including: 1466 o Full-duplex streaming transport model combined with a binary 1467 encoding mechanism provided further improved telemetry efficiency. 1469 o gRPC provides higher-level features consistency across platforms 1470 that common HTTP/2 libraries typically do not. This 1471 characteristic is especially valuable for the fact that telemetry 1472 data collectors normally reside on a large variety of platforms. 1474 o The built-in load-balancing and failover mechanism. 1476 A.2. Control Plane Telemetry 1478 A.2.1. BGP Monitoring Protocol 1480 BGP Monitoring Protocol (BMP) [RFC7854] is used to monitor BGP 1481 sessions and intended to provide a convenient interface for obtaining 1482 route views. 1484 The BGP routing information is collected from the monitored device(s) 1485 to the BMP monitoring station by setting up the BMP TCP session. The 1486 BGP peers are monitored by the BMP Peer Up and Peer Down 1487 Notifications. The BGP routes (including Adjacency_RIB_In [RFC7854], 1488 Adjacency_RIB_out [I-D.ietf-grow-bmp-adj-rib-out], and Local_Rib 1489 [I-D.ietf-grow-bmp-local-rib] are encapsulated in the BMP Route 1490 Monitoring Message and the BMP Route Mirroring Message, in the form 1491 of both initial table dump and real-time route update. In addition, 1492 BGP statistics are reported through the BMP Stats Report Message, 1493 which could be either timer triggered or event-driven. More BMP 1494 extensions can be explored to enrich the applications of BGP 1495 monitoring. 1497 A.3. Data Plane Telemetry 1499 A.3.1. The Alternate Marking (AM) technology 1501 The Alternate Marking method is efficient to perform packet loss, 1502 delay, and jitter measurements both in an IP and Overlay Networks, as 1503 presented in [RFC8321] and [I-D.fioccola-ippm-multipoint-alt-mark]. 1505 This technique can be applied to point-to-point and multipoint-to- 1506 multipoint flows. Alternate Marking creates batches of packets by 1507 alternating the value of 1 bit (or a label) of the packet header. 1508 These batches of packets are unambiguously recognized over the 1509 network and the comparison of packet counters for each batch allows 1510 the packet loss calculation. The same idea can be applied to delay 1511 measurement by selecting ad hoc packets with a marking bit dedicated 1512 for delay measurements. 1514 Alternate Marking method needs two counters each marking period for 1515 each flow under monitor. For instance, by considering n measurement 1516 points and m monitored flows, the order of magnitude of the packet 1517 counters for each time interval is n*m*2 (1 per color). 1519 Since networks offer rich sets of network performance measurement 1520 data (e.g packet counters), traditional approaches run into 1521 limitations. One reason is the fact that the bottleneck is the 1522 generation and export of the data and the amount of data that can be 1523 reasonably collected from the network. In addition, management tasks 1524 related to determining and configuring which data to generate lead to 1525 significant deployment challenges. 1527 Multipoint Alternate Marking approach, described in 1528 [I-D.fioccola-ippm-multipoint-alt-mark], aims to resolve this issue 1529 and makes the performance monitoring more flexible in case a detailed 1530 analysis is not needed. 1532 An application orchestrates network performance measurements tasks 1533 across the network to allow an optimized monitoring and it can 1534 calibrate how deep can be obtained monitoring data from the network 1535 by configuring measurement points roughly or meticulously. 1537 Using Alternate Marking, it is possible to monitor a Multipoint 1538 Network without examining in depth by using the Network Clustering 1539 (subnetworks that are portions of the entire network that preserve 1540 the same property of the entire network, called clusters). So in 1541 case there is packet loss or the delay is too high the filtering 1542 criteria could be specified more in order to perform a detailed 1543 analysis by using a different combination of clusters up to a per- 1544 flow measurement as described in Alternate-Marking (AM) [RFC8321]. 1546 In summary, an application can configure end-to-end network 1547 monitoring. If the network does not experiment issues, this 1548 approximate monitoring is good enough and is very cheap in terms of 1549 network resources. However, in case of problems, the application 1550 becomes aware of the issues from this approximate monitoring and, in 1551 order to localize the portion of the network that has issues, 1552 configures the measurement points more exhaustively. So a new 1553 detailed monitoring is performed. After the detection and resolution 1554 of the problem the initial approximate monitoring can be used again. 1556 A.3.2. Dynamic Network Probe 1558 Hardware-based Dynamic Network Probe (DNP) [I-D.song-opsawg-dnp4iq] 1559 provides a programmable means to customize the data that an 1560 application collects from the data plane. A direct benefit of DNP is 1561 the reduction of the exported data. A full DNP solution covers 1562 several components including data source, data subscription, and data 1563 generation. The data subscription needs to define the complex data 1564 which can be composed and derived from the raw data sources. The 1565 data generation takes advantage of the moderate in-network computing 1566 to produce the desired data. 1568 While DNP can introduce unforeseeable flexibility to the data plane 1569 telemetry, it also faces some challenges. It requires a flexible 1570 data plane that can be dynamically reprogrammed at run-time. The 1571 programming API is yet to be defined. 1573 A.3.3. IP Flow Information Export (IPFIX) protocol 1575 Traffic on a network can be seen as a set of flows passing through 1576 network elements. IP Flow Information Export (IPFIX) [RFC7011] 1577 provides a means of transmitting traffic flow information for 1578 administrative or other purposes. A typical IPFIX enabled system 1579 includes a pool of Metering Processes collects data packets at one or 1580 more Observation Points, optionally filters them and aggregates 1581 information about these packets. An Exporter then gathers each of 1582 the Observation Points together into an Observation Domain and sends 1583 this information via the IPFIX protocol to a Collector. 1585 A.3.4. In-Situ OAM 1587 Traditional passive and active monitoring and measurement techniques 1588 are either inaccurate or resource-consuming. It is preferable to 1589 directly acquire data associated with a flow's packets when the 1590 packets pass through a network. In-situ OAM (iOAM) 1591 [I-D.ietf-ippm-ioam-data], a data generation technique, embeds a new 1592 instruction header to user packets and the instruction directs the 1593 network nodes to add the requested data to the packets. Thus, at the 1594 path end, the packet's experience gained on the entire forwarding 1595 path can be collected. Such firsthand data is invaluable to many 1596 network OAM applications. 1598 However, iOAM also faces some challenges. The issues on performance 1599 impact, security, scalability and overhead limits, encapsulation 1600 difficulties in some protocols, and cross-domain deployment need to 1601 be addressed. 1603 A.3.5. Postcard Based Telemetry 1605 PBT [I-D.song-ippm-postcard-based-telemetry] is an alternative to 1606 IOAM. PBT directly exports data at each node through an independent 1607 packet. PBT solves several issues of IOAM. It can also help to 1608 identify packet drop location in case a packet is dropped on its 1609 forwarding path. 1611 A.4. External Data and Event Telemetry 1612 A.4.1. Sources of External Events 1614 To ensure that the information provided by external event detectors 1615 and used by the network management solutions is meaningful for the 1616 management purposes, the network telemetry framework must ensure that 1617 such detectors (sources) are easily connected to the management 1618 solutions (sinks). This requires the specification of a simple 1619 taxonomy of detectors and match it to the connectors and/or 1620 interfaces required to connect them. 1622 Once detectors are classified in such taxonomy, their definitions are 1623 enlarged with the qualities and other aspects used to handle them and 1624 represented in the ontology and information model (e.g. YANG). 1625 Therefore, differentiating several types of detectors as potential 1626 sources of external events is essential for the integrity of the 1627 management framework. We thus differentiate the following source 1628 types of external events: 1630 o Smart objects and sensors. With the consolidation of the Internet 1631 of Things~(IoT) any network system will have many smart objects 1632 attached to its physical surroundings and logical operation 1633 environments. Most of these objects will be essentially based on 1634 sensors of many kinds (e.g. temperature, humidity, presence) and 1635 the information they provide can be very useful for the management 1636 of the network, even when they are not specifically deployed for 1637 such purpose. Elements of this source type will usually provide a 1638 specific protocol for interaction, especially one of those 1639 protocols related to IoT, such as the Constrained Application 1640 Protocol (CoAP). It will be used by the telemetry framework to 1641 interact with the relevant objects. 1643 o Online news reporters. Several online news services have the 1644 ability to provide enormous quantity of information about 1645 different events occurring in the world. Some of those events can 1646 impact on the network system managed by a specific framework and, 1647 therefore, it will be interested on getting such information. For 1648 instance, diverse security reports, such as the Common 1649 Vulnerabilities and Exposures (CVE), can be issued by the 1650 corresponding authority and used by the management solution to 1651 update the managed system if needed. Instead of a specific 1652 protocol and data format, the sources of this kind of information 1653 usually follow a relaxed but structured format. This format will 1654 be part of both the ontology and information model of the 1655 telemetry framework. 1657 o Global event analyzers. The advance of Big Data analyzers 1658 provides a huge amount of information and, more interestingly, the 1659 identification of events detected by analyzing many data streams 1660 from different origins. In contrast with the other types of 1661 sources, which are focused in specific events, the detectors of 1662 this source type will detect very generic events. For example, a 1663 sports event takes place and some unexpected movement makes it 1664 highly interesting and many people connects to sites that are 1665 covering such event. The systems supporting the services that 1666 cover the event can be affected by such situation so their 1667 management solutions should be aware of it. In contrast with the 1668 other source types, a new information model, format, and reporting 1669 protocol is required to integrate the detectors of this type with 1670 the management solution. 1672 Additional types of detector types can be added to the system but 1673 they will be generally the result of composing the properties offered 1674 by these main classes. In any case, future revisions of the network 1675 telemetry framework will include the required types that cover new 1676 circumstances and that cannot be obtained by composition. 1678 A.4.2. Connectors and Interfaces 1680 For allowing external event detectors to be properly integrated with 1681 other management solutions, both elements must expose interfaces and 1682 protocols that are subject to their particular objective. Since 1683 external event detectors will be focused on providing their 1684 information to their main consumers, which generally will not be 1685 limited to the network management solutions, the framework must 1686 include the definition of the required connectors for ensuring the 1687 interconnection between detectors (sources) and their consumers 1688 within the management systems (sinks) are effective. 1690 In some situations, the interconnection between the external event 1691 detectors and the management system is via the management plane. For 1692 those situations there will be a special connector that provides the 1693 typical interfaces found in most other elements connected to the 1694 management plane. For instance, the interfaces will accomplish with 1695 a specific information model (YANG) and specific telemetry protocol, 1696 such as NETCONF, SNMP, or gRPC. 1698 Authors' Addresses 1700 Haoyu Song 1701 Futurewei 1702 2330 Central Expressway 1703 Santa Clara 1704 USA 1706 Email: hsong@futurewei.com 1707 Fengwei Qin 1708 China Mobile 1709 No. 32 Xuanwumenxi Ave., Xicheng District 1710 Beijing, 100032 1711 P.R. China 1713 Email: qinfengwei@chinamobile.com 1715 Pedro Martinez-Julia 1716 NICT 1717 4-2-1, Nukui-Kitamachi 1718 Koganei, Tokyo 184-8795 1719 Japan 1721 Email: pedro@nict.go.jp 1723 Laurent Ciavaglia 1724 Nokia 1725 Villarceaux 91460 1726 France 1728 Email: laurent.ciavaglia@nokia.com 1730 Aijun Wang 1731 China Telecom 1732 Beiqijia Town, Changping District 1733 Beijing, 102209 1734 P.R. China 1736 Email: wangaj.bri@chinatelecom.cn