idnits 2.17.1 draft-ietf-opsawg-ntf-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (7 October 2021) is 904 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-17) exists of draft-ietf-ippm-ioam-data-14 == Outdated reference: A later version (-08) exists of draft-ietf-netconf-distributed-notif-02 == Outdated reference: A later version (-12) exists of draft-ietf-netconf-udp-notif-03 == Outdated reference: A later version (-09) exists of draft-irtf-nmrg-ibn-concepts-definitions-05 == Outdated reference: A later version (-16) exists of draft-song-ippm-postcard-based-telemetry-10 == Outdated reference: A later version (-21) exists of draft-song-opsawg-ifit-framework-15 -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) -- Obsolete informational reference (is this intentional?): RFC 8321 (Obsoleted by RFC 9341) Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 OPSAWG H. Song 3 Internet-Draft Futurewei 4 Intended status: Informational F. Qin 5 Expires: 10 April 2022 China Mobile 6 P. Martinez-Julia 7 NICT 8 L. Ciavaglia 9 Nokia 10 A. Wang 11 China Telecom 12 7 October 2021 14 Network Telemetry Framework 15 draft-ietf-opsawg-ntf-08 17 Abstract 19 Network telemetry is a technology for gaining network insight and 20 facilitating efficient and automated network management. It 21 encompasses various techniques for remote data generation, 22 collection, correlation, and consumption. This document describes an 23 architectural framework for network telemetry, motivated by 24 challenges that are encountered as part of the operation of networks 25 and by the requirements that ensue. This document clarifies the 26 terminologies and classifies the modules and components of a network 27 telemetry system from several different perspectives. The framework 28 and taxonomy help to set a common ground for the collection of 29 related work and provide guidance for related technique and standard 30 developments. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at https://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on 10 April 2022. 49 Copyright Notice 51 Copyright (c) 2021 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 56 license-info) in effect on the date of publication of this document. 57 Please review these documents carefully, as they describe your rights 58 and restrictions with respect to this document. Code Components 59 extracted from this document must include Simplified BSD License text 60 as described in Section 4.e of the Trust Legal Provisions and are 61 provided without warranty as described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 3.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7 69 3.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 7 70 3.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 9 71 3.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 10 72 3.5. The Necessity of a Network Telemetry Framework . . . . . 13 73 4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 14 74 4.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 14 75 4.1.1. Management Plane Telemetry . . . . . . . . . . . . . 18 76 4.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 18 77 4.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 19 78 4.1.4. External Data Telemetry . . . . . . . . . . . . . . . 21 79 4.2. Second Level Function Components . . . . . . . . . . . . 22 80 4.3. Data Acquisition Mechanism and Type Abstraction . . . . . 23 81 4.4. Mapping Existing Mechanisms into the Framework . . . . . 25 82 5. Evolution of Network Telemetry Applications . . . . . . . . . 26 83 6. Security Considerations . . . . . . . . . . . . . . . . . . . 26 84 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 85 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 28 86 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 28 87 10. Informative References . . . . . . . . . . . . . . . . . . . 29 88 Appendix A. A Survey on Existing Network Telemetry Techniques . 33 89 A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 33 90 A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 34 91 A.1.2. gRPC Network Management Interface . . . . . . . . . . 34 92 A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 34 93 A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 34 94 A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 35 95 A.3.1. The Alternate Marking (AM) technology . . . . . . . . 35 96 A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 36 97 A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 37 98 A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 37 99 A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 37 100 A.4. External Data and Event Telemetry . . . . . . . . . . . . 37 101 A.4.1. Sources of External Events . . . . . . . . . . . . . 38 102 A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 39 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 39 105 1. Introduction 107 Network visibility is the ability of management tools to see the 108 state and behavior of a network, which is essential for successful 109 network operation. Network Telemetry revolves around network data 110 that can help provide insights about the current state of the 111 network, including network devices, forwarding, control, and 112 management planes, and that can be generated and obtained through a 113 variety of techniques, including but not limited to network 114 instrumentation and measurements, and that can be processed for 115 purposes ranging from service assurance to network security using a 116 wide variety of techniques including machine learning, data analysis, 117 and correlation. In this document, Network Telemetry refer to both 118 the data itself (i.e., "Network Telemetry Data"), and the techniques 119 and processes used to generate, export, collect, and consume that 120 data for use by potentially automated management applications. 121 Network telemetry extends beyond the historical network Operations, 122 Administration, and Management (OAM) techniques and expects to 123 support better flexibility, scalability, accuracy, coverage, and 124 performance. 126 However, the term "network telemetry" lacks an unambiguous 127 definition. The scope and coverage of it cause confusion and 128 misunderstandings. It is beneficial to clarify the concept and 129 provide a clear architectural framework for network telemetry, so we 130 can articulate the technical field, and better align the related 131 techniques and standard works. 133 To fulfill such an undertaking, we first discuss some key 134 characteristics of network telemetry which set a clear distinction 135 from the conventional network OAM and show that some conventional OAM 136 technologies can be considered a subset of the network telemetry 137 technologies. We then provide an architectural framework for network 138 telemetry which includes four modules, each concerned with a 139 different category of telemetry data and corresponding procedures. 140 All the modules are internally structured in the same way, including 141 components that allow to configure data sources with regards to what 142 data to generate and how to make that available to client 143 applications, components that instrument the underlying data sources, 144 and components that perform the actual rendering, encoding, and 145 exporting of the generated data. We show how the network telemetry 146 framework can benefit the current and future network operations. 147 Based on the distinction of modules and function components, we can 148 map the existing and emerging techniques and protocols into the 149 framework. The framework can also simplify the tasks for designing, 150 maintaining, and understanding a network telemetry system. At last, 151 we outline the evolution stages of the network telemetry system and 152 discuss the potential security concerns. 154 The purpose of the framework and taxonomy is to set a common ground 155 for the collection of related work and provide guidance for future 156 technique and standard developments. To the best of our knowledge, 157 this document is the first such effort for network telemetry in 158 industry standards organizations. 160 2. Glossary 162 Before further discussion, we list some key terminology and acronyms 163 used in this document. We make an intended differentiation between 164 the terms of network telemetry and OAM. However, it should be 165 understood that there is not a hard-line distinction between the two 166 concepts. Rather, network telemetry is considered as an extension of 167 OAM. It covers all the existing OAM protocols but puts more emphasis 168 on the newer and emerging techniques and protocols concerning all 169 aspects of network data from acquisition to consumption. 171 AI: Artificial Intelligence. In network domain, AI refers to the 172 machine-learning based technologies for automated network 173 operation and other tasks. 175 AM: Alternate Marking, a flow performance measurement method, 176 specified in [RFC8321]. 178 BMP: BGP Monitoring Protocol, specified in [RFC7854]. 180 DPI: Deep Packet Inspection, referring to the techniques that 181 examines packet beyond packet L3/L4 headers. 183 gNMI: gRPC Network Management Interface, a network management 184 protocol from OpenConfig Operator Working Group, mainly 185 contributed by Google. See [gnmi] for details. 187 GPB: Google Protocol Buffer, an extensible mechanism for serializing 188 structured data. 190 gRPC: gRPC Remote Procedure Call, a open source high performance RPC 191 framework that gNMI is based on. See [grpc] for details. 193 IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. 195 IOAM: In-situ OAM, a dataplane on-path telemetry technique. 197 JSON: An open standard file format and data interchange format that 198 uses human-readable text to store and transmit data objects. 200 MIB: Management Information Base, a database used for managing the 201 entities in a network. 203 NETCONF: Network Configuration Protocol, specified in [RFC6241]. 205 NetFlow: A Cisco protocol for flow record collecting, described in 206 [RFC3594]. 208 Network Telemetry: The process and instrumentation for acquiring and 209 utilizing network data remotely for network monitoring and 210 operation. A general term for a large set of network visibility 211 techniques and protocols, concerning aspects like data generation, 212 collection, correlation, and consumption. Network telemetry 213 addresses the current network operation issues and enables smooth 214 evolution toward future intent-driven autonomous networks. 216 NMS: Network Management System, referring to applications that allow 217 network administrators to manage a network. 219 OAM: Operations, Administration, and Maintenance. A group of 220 network management functions that provide network fault 221 indication, fault localization, performance information, and data 222 and diagnosis functions. Most conventional network monitoring 223 techniques and protocols belong to network OAM. 225 PBT: Postcard-Based Telemetry, a dataplane on-path telemetry 226 technique. 228 SMIv2 Structure of Management Information Version 2, defining MIB 229 objects, specified in [RFC2578]. 231 SNMP: Simple Network Management Protocol. Version 1 and 2 are 232 specified in [RFC1157] and [RFC3416], respectively. 234 YANG: YANG is a data modeling language for the definition of data 235 sent over network management protocols such as the NETCONF and 236 RESTCONF. YANG is defined in [RFC6020] and [RFC7950]. 238 YANG ECA A YANG model for Event-Condition-Action policies, defined 239 in [I-D.wwx-netmod-event-yang]. 241 YANG-Push: A mechanism that allows subscriber applications to 242 request a stream of updates from a YANG datastore on a network 243 device. Details are specified in [RFC8641] and [RFC8639]. 245 3. Background 247 The term "big data" is used to describe the extremely large volume of 248 data sets that can be analyzed computationally to reveal patterns, 249 trends, and associations. Networks are undoubtedly a source of big 250 data because of their scale and the volume of network traffic they 251 forward. It is easy to see that network operations can benefit from 252 network big data to gather insights into flows without breaching 253 privacy. 255 Today one can access advanced big data analytics capability through a 256 plethora of commercial and open source platforms (e.g., Apache 257 Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine 258 learning). Thanks to the advance of computing and storage 259 technologies, network big data analytics gives network operators an 260 opportunity to gain network insights and move towards network 261 autonomy. Some operators start to explore the application of 262 Artificial Intelligence (AI) to make sense of network data. Software 263 tools can use the network data to detect and react on network faults, 264 anomalies, and policy violations, as well as predicting future 265 events. In turn, the network policy updates for planning, intrusion 266 prevention, optimization, and self-healing may be applied. 268 It is conceivable that an autonomic network [RFC7575] is the logical 269 next step for network evolution following Software Defined Network 270 (SDN), aiming to reduce (or even eliminate) human labor, make more 271 efficient use of network resources, and provide better services more 272 aligned with customer requirements. The related technique of 273 Intent-based Networking (IBN) 274 [I-D.irtf-nmrg-ibn-concepts-definitions] requires network visibility 275 and telemetry data in order to ensure that the network is behaving as 276 intended. 278 However, while the data processing capability is improved and 279 applications are hungry for more data, the networks lag behind in 280 extracting and translating network data into useful and actionable 281 information in efficient ways. The system bottleneck is shifting 282 from data consumption to data supply. Both the number of network 283 nodes and the traffic bandwidth keep increasing at a fast pace. The 284 network configuration and policy change at smaller time slots than 285 before. More subtle events and fine-grained data through all network 286 planes need to be captured and exported in real time. In a nutshell, 287 it is a challenge to get enough high-quality data out of the network 288 in a manner that is efficient, timely, and flexible. Therefore, we 289 need to survey the existing technologies and protocols and identify 290 any potential gaps. 292 In the remainder of this section, first we clarify the scope of 293 network data (i.e., telemetry data) concerned in the context. Then, 294 we discuss several key use cases for today's and future network 295 operations. Next, we show why the current network OAM techniques and 296 protocols are insufficient for these use cases. The discussion 297 underlines the need of new methods, techniques, and protocols, as 298 well as the extensions of existing ones, which we assign under the 299 umbrella term - Network Telemetry. 301 3.1. Telemetry Data Coverage 303 Any information that can be extracted from networks (including data 304 plane, control plane, and management plane) and used to gain 305 visibility or as basis for actions is considered telemetry data. It 306 includes statistics, event records and logs, snapshots of state, 307 configuration data, etc. It also covers the outputs of any active 308 and passive measurements [RFC7799]. In some cases, raw data is 309 processed in network before being sent to a data consumer. Such 310 processed data is also considered telemetry data. The value of 311 telemetry data varies. Less but higher quality data are often better 312 than lots of low quality data. A classification of telemetry data is 313 provided in Section 4. 315 3.2. Use Cases 317 The following set of use cases is essential for network operations. 318 While the list is by no means exhaustive, it is enough to highlight 319 the requirements for data velocity, variety, volume, and veracity in 320 networks. 322 * Security: Network intrusion detection and prevention systems need 323 to monitor network traffic and activities and act upon anomalies. 324 Given increasingly sophisticated attack vector coupled with 325 increasingly severe consequences of security breaches, new tools 326 and techniques need to be developed, relying on wider and deeper 327 visibility into networks. The ultimate goal is to achieve the 328 ideal security with no or minimal human intervention. 330 * Policy and Intent Compliance: Network policies are the rules that 331 constrain the services for network access, provide service 332 differentiation, or enforce specific treatment on the traffic. 333 For example, a service function chain is a policy that requires 334 the selected flows to pass through a set of ordered network 335 functions. Intent, as defined in 336 [I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational 337 goal that a network should meet and outcomes that a network is 338 supposed to deliver, defined in a declarative manner without 339 specifying how to achieve or implement them. An intent requires a 340 complex translation and mapping process before being applied on 341 networks. While a policy or an intent is enforced, the compliance 342 needs to be verified and monitored continuously relying on 343 visibility that is provided through network telemetry data, any 344 violation needs to be reported immediately, and updates need to be 345 applied to ensure the intent remains in force. 347 * SLA Compliance: A Service-Level Agreement (SLA) defines the level 348 of service a user expects from a network operator, which include 349 the metrics for the service measurement and remedy/penalty 350 procedures when the service level misses the agreement. Users 351 need to check if they get the service as promised and network 352 operators need to evaluate how they can deliver the services that 353 can meet the SLA based on realtime network telemetry data, 354 including data from network measurements. 356 * Root Cause Analysis: Any network failure can be the effect of a 357 sequence of chained events. Troubleshooting and recovery require 358 quick identification of the root cause of any observable issues. 359 However, the root cause is not always straightforward to identify, 360 especially when the failure is sporadic and the number of event 361 messages, both related and unrelated to the same cause, is 362 overwhelming. While machine learning technologies can be used for 363 root cause analysis, it up to the network to sense and provide the 364 relevant diagnostic data which are either actively fed into or 365 passively retrieved by machine learning applications. 367 * Network Optimization: This covers all short-term and long-term 368 network optimization techniques, including load balancing, Traffic 369 Engineering (TE), and network planning. Network operators are 370 motivated to optimize their network utilization and differentiate 371 services for better Return On Investment (ROI) or lower Capital 372 Expenditures (CAPEX). The first step is to know the real-time 373 network conditions before applying policies for traffic 374 manipulation. In some cases, micro-bursts need to be detected in 375 a very short time-frame so that fine-grained traffic control can 376 be applied to avoid network congestion. Long-term planning of 377 network capacity and topology requires analysis of real-world 378 network telemetry data that is obtained over long periods of time. 380 * Event Tracking and Prediction: The visibility into traffic path 381 and performance is critical for services and applications that 382 rely on healthy network operation. Numerous related network 383 events are of interest to network operators. For example, Network 384 operators want to learn where and why packets are dropped for an 385 application flow. They also want to be warned of issues in 386 advance so proactive actions can be taken to avoid catastrophic 387 consequences. 389 3.3. Challenges 391 For a long time, network operators have relied upon SNMP [RFC3416], 392 Command-Line Interface (CLI), or Syslog to monitor the network. Some 393 other OAM techniques as described in [RFC7276] are also used to 394 facilitate network troubleshooting. These conventional techniques 395 are not sufficient to support the above use cases for the following 396 reasons: 398 * Most use cases need to continuously monitor the network and 399 dynamically refine the data collection in real-time. The poll- 400 based low-frequency data collection is ill-suited for these 401 applications. Subscription-based streaming data directly pushed 402 from the data source (e.g., the forwarding chip) is preferred to 403 provide enough data quantity and precision at scale. 405 * Comprehensive data is needed from packet processing engine to 406 traffic manager, from line cards to main control board, from user 407 flows to control protocol packets, from device configurations to 408 operations, and from physical layer to application layer. 409 Conventional OAM only covers a narrow range of data (e.g., SNMP 410 only handles data from the Management Information Base (MIB)). 411 Traditional network devices cannot provide all the necessary 412 probes. More open and programmable network devices are therefore 413 needed. 415 * Many application scenarios need to correlate network-wide data 416 from multiple sources (i.e., from distributed network devices, 417 different components of a network device, or different network 418 planes). A piecemeal solution is often lacking the capability to 419 consolidate the data from multiple sources. The composition of a 420 complete solution, as partly proposed by Autonomic Resource 421 Control Architecture(ARCA) 422 [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and 423 guided by a comprehensive framework. 425 * Some of the conventional OAM techniques (e.g., CLI and Syslog) 426 lack a formal data model. The unstructured data hinder the tool 427 automation and application extensibility. Standardized data 428 models are essential to support the programmable networks. 430 * Although some conventional OAM techniques support data push (e.g., 431 SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data 432 are limited to only predefined management plane warnings (e.g., 433 SNMP Trap) or sampled user packets (e.g., sFlow). Network 434 operators require the data with arbitrary source, granularity, and 435 precision which are beyond the capability of the existing 436 techniques. 438 * The conventional passive measurement techniques can either consume 439 excessive network resources and render excessive redundant data, 440 or lead to inaccurate results; on the other hand, the conventional 441 active measurement techniques can interfere with the user traffic 442 and their results are indirect. Techniques that can collect 443 direct and on-demand data from user traffic are more favorable. 445 These challenges were addressed by newer standards and techniques 446 (e.g., IPFIX/Netflow, PSAMP, IOAM, and YANG-Push) and more are 447 emerging. These standards and techniques need to be recognized and 448 accommodated in a new framework. 450 3.4. Network Telemetry 452 Network telemetry has emerged as a mainstream technical term to refer 453 to the network data collection and consumption techniques. Several 454 network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and 455 gRPC [grpc]) have been widely deployed. Network telemetry allows 456 separate entities to acquire data from network devices so that data 457 can be visualized and analyzed to support network monitoring and 458 operation. Network telemetry covers the conventional network OAM and 459 has a wider scope. It is expected that network telemetry can provide 460 the necessary network insight for autonomous networks and address the 461 shortcomings of conventional OAM techniques. 463 Network telemetry usually assumes machines as data consumers rather 464 than human operators. Hence, the network telemetry can directly 465 trigger the automated network operation, while in contrast some 466 conventional OAM tools are designed and used to help human operators 467 to monitor and diagnose the networks and guide manual network 468 operations. Such a proposition leads to very different techniques. 470 Although new network telemetry techniques are emerging and subject to 471 continuous evolution, several characteristics of network telemetry 472 have been well accepted. Note that network telemetry is intended to 473 be an umbrella term covering a wide spectrum of techniques, so the 474 following characteristics are not expected to be held by every 475 specific technique. 477 * Push and Streaming: Instead of polling data from network devices, 478 telemetry collectors subscribe to streaming data pushed from data 479 sources in network devices. 481 * Volume and Velocity: The telemetry data is intended to be consumed 482 by machines rather than by human being. Therefore, the data 483 volume can be huge and the processing is optimized for the needs 484 of automation in realtime. 486 * Normalization and Unification: Telemetry aims to address the 487 overall network automation needs. Efforts are made to normalize 488 the data representation and unify the protocols, so to simplify 489 data analysis and provide integrated analysis across heterogeneous 490 devices and data sources across a network. 492 * Model-based: The telemetry data is modeled in advance which allows 493 applications to configure and consume data with ease. 495 * Data Fusion: The data for a single application can come from 496 multiple data sources (e.g., cross-domain, cross-device, and 497 cross-layer) and needs to be correlated to take effect. 499 * Dynamic and Interactive: Since the network telemetry means to be 500 used in a closed control loop for network automation, it needs to 501 run continuously and adapt to the dynamic and interactive queries 502 from the network operation controller. 504 In addition, an ideal network telemetry solution may also have the 505 following features or properties: 507 * In-Network Customization: The data that is generated can be 508 customized in network at run-time to cater to the specific need of 509 applications. This needs the support of a programmable data plane 510 which allows probes with custom functions to be deployed at 511 flexible locations. 513 * In-Network Data Aggregation and Correlation: Network devices and 514 aggregation points can work out which events and what data needs 515 to be stored, reported, or discarded thus reducing the load on the 516 central collection and processing points while still ensuring that 517 the right information is ready to be processed in a timely way. 519 * In-Network Processing: Sometimes it is not necessary or feasible 520 to gather all information to a central point to be processed and 521 acted upon. It is possible for the data processing to be done in 522 network, allowing reactive actions to be taken locally. 524 * Direct Data Plane Export: The data originated from the data plane 525 forwarding chips can be directly exported to the data consumer for 526 efficiency, especially when the data bandwidth is large and the 527 real-time processing is required. 529 * In-band Data Collection: In addition to the passive and active 530 data collection approaches, the new hybrid approach allows to 531 directly collect data for any target flow on its entire forwarding 532 path [I-D.song-opsawg-ifit-framework]. 534 It is worth noting that a network telemetry system should not be 535 intrusive to normal network operations by avoiding the pitfall of the 536 "observer effect". That is, it should not change the network 537 behavior and affect the forwarding performance. Otherwise, the whole 538 purpose of network telemetry is compromised. 540 Although in many cases a system for network telemetry involves a 541 remote data collecting and consuming entity, it is important to 542 understand that there are no inherent assumptions about how a system 543 should be architected. Telemetry data producers and consumers can 544 work in distributed or peer-to-peer fashions rather than assuming a 545 centralized data consuming entity. In such cases, a network node can 546 be the direct consumer of telemetry data from other nodes. 548 3.5. The Necessity of a Network Telemetry Framework 550 Network data analytics and machine-learning technologies are applied 551 for network operation automation, relying on abundant and coherent 552 data from networks. Data acquisition that is limited to a single 553 source and static in nature will in many cases not be sufficient to 554 meet an application's telemetry data needs. As a result, multiple 555 data sources, involving a variety of techniques and standards, will 556 need to be integrated. It is desirable to have a framework that 557 classifies and organizes different telemetry data source and types, 558 defines different components of a network telemetry system and their 559 interactions, and helps coordinate and integrate multiple telemetry 560 approaches across layers. This allows flexible combinations of data 561 for different applications, while normalizing and simplifying 562 interfaces. In detail, such a framework would benefit application 563 development for the following reasons: 565 * Future networks, autonomous or otherwise, depend on holistic and 566 comprehensive network visibility. All the use cases and 567 applications are better to be supported uniformly and coherently 568 under a single intelligent agent using an integrated, converged 569 mechanism and common telemetry data representations wherever 570 feasible. Therefore, the protocols and mechanisms should be 571 consolidated into a minimum yet comprehensive set. A telemetry 572 framework can help to normalize the technique developments. 574 * Network visibility presents multiple viewpoints. For example, the 575 device viewpoint takes the network infrastructure as the 576 monitoring object from which the network topology and device 577 status can be acquired; the traffic viewpoint takes the flows or 578 packets as the monitoring object from which the traffic quality 579 and path can be acquired. An application may need to switch its 580 viewpoint during operation. It may also need to correlate a 581 service and its impact on user experience to acquire the 582 comprehensive information. 584 * Applications require network telemetry to be elastic in order to 585 make efficient use of network resources and reduce the impact of 586 processing related to network telemetry on network performance. 587 For example, routine network monitoring should cover the entire 588 network with a low data sampling rate. Only when issues arise or 589 critical trends emerge should telemetry data source be modified 590 and telemetry data rates boosted as needed. 592 * Efficient data fusion is critical for applications to reduce the 593 overall quantity of data and improve the accuracy of analysis. 595 A telemetry framework collects together all of the telemetry-related 596 works from different sources and working groups within IETF. This 597 makes it possible to assemble a comprehensive network telemetry 598 system and to avoid repetitious or redundant work. The framework 599 should cover the concepts and components from the standardization 600 perspective. This document describes the modules which make up a 601 network telemetry framework and decomposes the telemetry system into 602 a set of distinct components that existing and future work can easily 603 map to. 605 4. Network Telemetry Framework 607 The top level network telemetry framework partitions the network 608 telemetry into four modules based on the telemetry data object source 609 and represents their relationship. At the next level, the framework 610 decomposes each module into separate components. Each of the modules 611 follows the same underlying structure, with one component dedicated 612 to the configuration of data subscriptions and data sources, a second 613 component dedicated to encoding and exporting data, and a third 614 component instrumenting the generation of telemetry related to the 615 underlying resources. Throughout the framework, the same set of 616 abstract data acquiring mechanisms and data types (Section 4.3)are 617 applied. The two-level architecture with the uniform data 618 abstraction helps accurately pinpoint a protocol or technique to its 619 position in a network telemetry system or disaggregate a network 620 telemetry system into manageable parts. 622 4.1. Top Level Modules 624 Telemetry can be applied on the forwarding plane, the control plane, 625 and the management plane in a network, as well as other sources out 626 of the network, as shown in Figure 1. Therefore, we categorize the 627 network telemetry into four distinct modules with each having its own 628 interface to Network Operation Applications. 630 +------------------------------+ 631 | | 632 | Network Operation |<-------+ 633 | Applications | | 634 | | | 635 +------------------------------+ | 636 ^ ^ ^ | 637 | | | | 638 V V | V 639 +--------------+-----------|---+ +-----------+ 640 | | Control | | | | 641 | | Plane | | | External | 642 | <---> | | | Data and | 643 | | Telemetry | | | Event | 644 | Management | ^ V | | Telemetry | 645 | Plane +-------|-------+ | | 646 | Telemetry | V | +-----------+ 647 | | Forwarding | 648 | | Plane | 649 | <---> | 650 | | Telemetry | 651 | | | 652 +--------------+---------------+ 654 Figure 1: Modules in Layer Category of NTF 656 The rationale of this partition lies in the different telemetry data 657 objects which result in different data source and export locations. 658 Such differences have profound implications on in-network data 659 programming and processing capability, data encoding and transport 660 protocol, and required data bandwidth and latency. Data can be sent 661 directly, or proxied via the control and management planes. There 662 are advantages/disadvantages to both approaches. 664 We summarize the major differences of the four modules in the 665 following table. They are compared from six angles: 667 * Data Object 669 * Data Export Location 671 * Data Model 673 * Data Encoding 675 * Telemetry Protocol 677 * Transport Method 678 Data Object is the target and source of each module. Because the 679 data source varies, the location where data is mostly conveniently 680 exported also varies. For example, forwarding plane data mainly 681 originates as data exported from the forwarding ASICs, while control 682 plane data mainly originates from the protocol daemons running on the 683 control CPU(s). For convenience and efficiency, it is preferred to 684 export the data off the device from locations near the source. 685 Because the locations that can export data have different 686 capabilities, different choices of data model, encoding, and 687 transport method are made to balance the performance and cost. For 688 example, the forwarding chip has high throughput but limited capacity 689 for processing complex data and maintaining states, while the main 690 control CPU is capable of complex data and state processing, but has 691 limited bandwidth for high throughput data. As a result, the 692 suitable telemetry protocol for each module can be different. Some 693 representative techniques are shown in the corresponding table blocks 694 to highlight the technical diversity of these modules. Note that the 695 selected techniques just reflect the de-facto state of the art and 696 are not exhaustive. The key point is that one cannot expect to use a 697 universal protocol to cover all the network telemetry requirements. 699 +---------+--------------+--------------+---------------+-----------+ 700 | Module | Control | Management | Forwarding | External | 701 | | Plane | Plane | Plane | Data | 702 +---------+--------------+--------------+---------------+-----------+ 703 |Object | control | config. & | flow & packet | terminal, | 704 | | protocol & | operation | QoS, traffic | social & | 705 | | signaling, | state | stat., buffer | environ- | 706 | | RIB, ACL | | & queue stat.,| mental | 707 | | | | ACL, FIB | | 708 +---------+--------------+--------------+---------------+-----------+ 709 |Export | main control | main control | fwding chip | various | 710 |Location | CPU, | CPU | or linecard | | 711 | | linecard CPU | | CPU; main | | 712 | | or fwding | | control CPU | | 713 | | chip | | unlikely | | 714 +---------+--------------+--------------+---------------+-----------+ 715 |Data | YANG, | YANG, MIB, | template, | YANG, | 716 |Model | custom | syslog, | YANG, | custom | 717 | | | | custom | | 718 +---------+--------------+--------------+---------------+-----------+ 719 |Data | GPB, JSON, | GPB, JSON, | plain | GPB, JSON | 720 |Encoding | XML, plain | XML | | XML, plain| 721 +---------+--------------+--------------+---------------+-----------+ 722 |Protocol | gRPC,NETCONF,| gRPC,NETCONF,| IPFIX, mirror,| gRPC | 723 | | IPFIX,mirror | | gRPC, NETFLOW | | 724 +---------+--------------+--------------+---------------+-----------+ 725 |Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP | 726 | | UDP | | | UDP | 727 +---------+--------------+--------------+---------------+-----------+ 729 Figure 2: Comparison of the Data Object Modules 731 Note that the interaction with the applications that consume network 732 telemetry data can be indirect. Some in-device data transfer is 733 possible. For example, in the management plane telemetry, the 734 management plane will need to acquire data from the data plane. Some 735 of the operational states can only be derived from data plane data 736 sources such as the interface status and statistics. As another 737 example, obtaining control plane telemetry data may require the 738 ability to access the Forwarding Information Base (FIB) of the data 739 plane. 741 On the other hand, an application may involve more than one plane and 742 interact with multiple planes simultaneously. For example, an SLA 743 compliance application may require both the data plane telemetry and 744 the control plane telemetry. 746 The requirements and challenges for each module are summarized as 747 follows (note that the requirements may pertain across all telemetry 748 modules; however, we emphasize those that are most pronounced for a 749 particular plane). 751 4.1.1. Management Plane Telemetry 753 The management plane of network elements interacts with the Network 754 Management System (NMS), and provides information such as performance 755 data, network logging data, network warning and defects data, and 756 network statistics and state data. The management plane includes 757 many protocols, including some that are considered "legacy", such as 758 SNMP and syslog. Regardless the protocol, management plane telemetry 759 must address the following requirements: 761 * Convenient Data Subscription: An application should have the 762 freedom to choose the data export means such as the data types (as 763 described in Figure 4) and the export means and frequency (e.g., 764 on-change or periodic subscription). 766 * Structured Data: For automatic network operation, machines will 767 replace human for network data comprehension. Data modeling 768 languages, such as YANG, can efficiently describe structured data 769 and normalize data encoding and transformation. 771 * High Speed Data Transport: In order to keep up with the velocity 772 of information, a server needs to be able to send large amounts of 773 data at high frequency. Compact encoding formats or data 774 compression schemes are needed to compress the data and improve 775 the data transport efficiency. The subscription mode, by 776 replacing the query mode, reduces the interactions between clients 777 and servers and helps to improve the server's efficiency. 779 4.1.2. Control Plane Telemetry 781 The control plane telemetry refers to the health condition monitoring 782 of different network control protocols at all layers of the protocol 783 stack. Keeping track of the operational status of these protocols is 784 beneficial for detecting, localizing, and even predicting various 785 network issues, as well as network optimization, in real-time and 786 with fine granularity. Some particular challenges and issues faced 787 by the control plane telemetry are as follows: 789 * One challenging problem for the control plane telemetry is how to 790 correlate the End-to-End (E2E) Key Performance Indicators (KPI) to 791 a specific layer's KPIs. For example, an IPTV user may describe 792 his User Experience (UE) by the video fluency and definition. 793 Then in case of an unusually poor UE KPI or a service 794 disconnection, it is non-trivial to delimit and pinpoint the issue 795 in the responsible protocol layer (e.g., the Transport Layer or 796 the Network Layer), the responsible protocol (e.g., ISIS or BGP at 797 the Network Layer), and finally the responsible device(s) with 798 specific reasons. 800 * Traditional OAM-based approaches for control plane KPI measurement 801 include Ping (L3), Traceroute (L3), Y.1731 (L2), and so on. One 802 common issue behind these methods is that they only measure the 803 KPIs instead of reflecting the actual running status of these 804 protocols, making them less effective or efficient for control 805 plane troubleshooting and network optimization. 807 * An example of the control plane telemetry is the BGP monitoring 808 protocol (BMP), it is currently used for monitoring the BGP routes 809 and enables rich applications, such as BGP peer analysis, AS 810 analysis, prefix analysis, and security analysis. However, the 811 monitoring of other layers, protocols and the cross-layer, cross- 812 protocol KPI correlations are still in their infancy (e.g., the 813 IGP monitoring is not as exensive as BMP), which require further 814 research. 816 4.1.3. Forwarding Plane Telemetry 818 An effective forwarding plane telemetry system relies on the data 819 that the network device can expose. The quality, quantity, and 820 timeliness of data must meet some stringent requirements. This 821 raises some challenges to the network data plane devices where the 822 first hand data originates. 824 * A data plane device's main function is user traffic processing and 825 forwarding. While supporting network visibility is important, the 826 telemetry is just an auxiliary function, and it should strive to 827 not impede normal traffic processing and forwarding (i.e., the 828 forwarding behavior should not be altered and the tradeoff between 829 forwarding and telemtry should be well balanced). 831 * Network operation applications require end-to-end visibility 832 across various sources, which can result in a huge volume of data. 833 However, the sheer quantity of data must not exhaust the network 834 bandwidth, regardless of the data delivery approach (i.e., whether 835 through in-band or out-of-band channels). 837 * The data plane devices must provide timely data with the minimum 838 possible delay. Long processing, transport, storage, and analysis 839 delay can impact the effectiveness of the control loop and even 840 render the data useless. 842 * The data should be structured and labeled, and easy for 843 applications to parse and consume. At the same time, the data 844 types needed by applications can vary significantly. The data 845 plane devices need to provide enough flexibility and 846 programmability to support the precise data provision for 847 applications. 849 * The data plane telemetry should support incremental deployment and 850 work even though some devices are unaware of the system. 852 Although not specific to the forwarding plane, these challenges are 853 more difficult to the forwarding plane because of the limited 854 resource and flexibility. Data plane programmability is essential to 855 support network telemetry. Newer data plane forwarding chips are 856 equipped with advanced telemetry features and provide flexibility to 857 support customized telemetry functions. 859 Technique Taxonomy: concerning about how one instruments the 860 telemetry, there can be multiple possible dimensions to classify the 861 forwarding plane telemetry techniques. 863 * Active, Passive, and Hybrid: This dimension concerns about the 864 end-to-end measurement. Active and passive methods (as well as 865 the hybrid types) are well documented in [RFC7799]. Passive 866 methods include TCPDUMP, IPFIX [RFC7011], sflow, and traffic 867 mirroring. These methods usually have low data coverage. The 868 bandwidth cost is very high in order to improve the data coverage. 869 On the other hand, active methods include Ping, OWAMP [RFC4656], 870 TWAMP [RFC5357], and Cisco's SLA Protocol [RFC6812]. These 871 methods are intrusive and only provide indirect network 872 measurements. Hybrid methods, including in-situ OAM 873 [I-D.ietf-ippm-ioam-data], Alternate-Marking (AM) [RFC8321], and 874 Multipoint Alternate Marking [I-D.ietf-ippm-multipoint-alt-mark], 875 provide a well-balanced and more flexible approach. However, 876 these methods are also more complex to implement. 878 * In-Band and Out-of-Band: Telemetry data carried in user packets 879 before being exported to a data collector is considered in-band 880 (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]). Telemetry data 881 that is directly exported to a data collector without modifying 882 user packets is considered out-of-band (e.g., the postcard-based 883 approach described in Appendix). It is also possible to have 884 hybrid methods, where only the telemetry instruction or partial 885 data is carried by user packets (e.g., AM [RFC8321]). 887 * End-to-End and In-Network: End-to-End methods start from, and end 888 at, the network end hosts (e.g., Ping). In-Network methods work 889 in networks and are transparent to end hosts. However, if needed, 890 In-Network methods can be easily extended into end hosts. 892 * Data Subject: Depending on the telemetry objective, the methods 893 can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), 894 path-based (e.g., Traceroute), and node-based (e.g., IPFIX 895 [RFC7011]). The various data objects can be packet, flow record, 896 measurement, states, and signal. 898 4.1.4. External Data Telemetry 900 Events that occur outside the boundaries of the network system are 901 another important source of network telemetry. Correlating both 902 internal telemetry data and external events with the requirements of 903 network systems, as presented in 904 [I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and 905 functional advantage to management operations. 907 As with other sources of telemetry information, the data and events 908 must meet strict requirements, especially in terms of timeliness, 909 which is essential to properly incorporate external event information 910 into network management applications. The specific challenges are 911 described as follows: 913 * The role of the external event detector can be played by multiple 914 elements, including hardware (e.g. physical sensors, such as 915 seismometers) and software (e.g. Big Data sources that analyze 916 streams of information, such as Twitter messages). Thus, the 917 transmitted data must support different shapes but, at the same 918 time, follow a common but extensible schema. 920 * Since the main function of the external event detectors is to 921 perform the notifications, their timeliness is assumed. However, 922 once messages have been dispatched, they must be quickly collected 923 and inserted into the control plane with variable priority, which 924 is higher for important sources and events and lower for secondary 925 ones. 927 * The schema used by external detectors must be easily adopted by 928 current and future devices and applications. Therefore, it must 929 be easily mapped to current data models, such as in terms of YANG. 931 Organizing both internal and external telemetry information together 932 will be key for the general exploitation of the management 933 possibilities of current and future network systems, as reflected in 934 the incorporation of cognitive capabilities to new hardware and 935 software (virtual) elements. 937 4.2. Second Level Function Components 939 The telemetry module as each plane can be further partitioned into 940 five distinct conceptual components: 942 * Data Query, Analysis, and Storage: This component works at the 943 application layer. It is normally a part of the network 944 management system at the receiver side. On the one hand, it is 945 responsible for issuing data requirements. The data of interest 946 can be modeled data through configuration or custom data through 947 programming. The data requirements can be queries for one-shot 948 data or subscriptions for events or streaming data. On the other 949 hand, it receives, stores, and processes the returned data from 950 network devices. Data analysis can be interactive to initiate 951 further data queries. This component can reside in either network 952 devices or remote controllers. It can be centralized and 953 distributed, and involve one or more instances. 955 * Data Configuration and Subscription: This component manages data 956 queries on devices. It determines the protocol and channel for 957 applications to acquire desired data. This component is also 958 responsible for configuring the desired data that might not be 959 directly available form data sources. The subscription data can 960 be described by models, templates, or programs. 962 * Data Encoding and Export: This component determines how telemetry 963 data is delivered to the data analysis and storage component with 964 access control. The data encoding and the transport protocol may 965 vary due to the data export location. 967 * Data Generation and Processing: The requested data needs to be 968 captured, filtered, processed, and formatted in network devices 969 from raw data sources. This may involve in-network computing and 970 processing on either the fast path or the slow path in network 971 devices. 973 * Data Object and Source: This component determines the monitoring 974 objects and original data sources provisioned in device. A data 975 source usually just provides raw data which needs further 976 processing. Each data source can be considered a probe. Some 977 data sources can be dynamically installed, while others will be 978 more static. 980 +----------------------------------------+ 981 +----------------------------------------+ | 982 | | | 983 | Data Query, Analysis, & Storage | | 984 | | + 985 +-------+++ -----------------------------+ 986 ||| ^^^ 987 ||| ||| 988 ||V ||| 989 +--+V--------------------+++------------+ 990 +-----V---------------------+------------+ | 991 +---------------------+-------+----------+ | | 992 | Data Configuration | | | | 993 | & Subscription | Data Encoding | | | 994 | (model, template, | & Export | | | 995 | & program) | | | | 996 +---------------------+------------------| | | 997 | | | | 998 | Data Generation | | | 999 | & Processing | | | 1000 | | | | 1001 +----------------------------------------| | | 1002 | | | | 1003 | Data Object and Source | |-+ 1004 | |-+ 1005 +----------------------------------------+ 1007 Figure 3: Components in the Network Telemetry Framework 1009 4.3. Data Acquisition Mechanism and Type Abstraction 1011 Broadly speaking, network data can be acquired through subscription 1012 (push) and query (poll). A subscription is a contract between 1013 publisher and subscriber. After initial setup, the subscribed data 1014 is automatically delivered to registered subscribers until the 1015 subscription expires. There are two variations of subscription. The 1016 subscriptions can be either pre-defined, or the subscribers are 1017 allowed to configure and tailor the published data to their specific 1018 needs. 1020 In contrast, queries are used when a client expects immediate and 1021 one-off feedback from network devices. The queried data may be 1022 directly extracted from some specific data source, or synthesized and 1023 processed from raw data. Queries work well for interactive network 1024 telemetry applications. 1026 In general, data can be pulled (i.e., queried) whenever needed, but 1027 in many cases, pushing the data (i.e., subscription) is more 1028 efficient, and can reduce the latency of a client detecting a change. 1029 From the data consumer point of view, there are four types of data 1030 from network devices that a telemetry data consumer can subscribe or 1031 query: 1033 * Simple Data: The data that are steadily available from some 1034 datastore or static probes in network devices. 1036 * Derived Data: The data need to be synthesized or processed in 1037 network from raw data from one or more network devices. The data 1038 processing function can be statically or dynamically loaded into 1039 network devices. 1041 * Event-triggered Data: The data are conditionally acquired based on 1042 the occurrence of some events. For example, a network interface 1043 changing its operational state from up to down can be a trigger 1044 event. Such data can be actively pushed through subscription or 1045 passively polled through query. There are many ways to model 1046 events, including using Finite State Machine (FSM) or Event 1047 Condition Action (ECA) [I-D.wwx-netmod-event-yang]. 1049 * Streaming Data: The data are continuously generated. It can be 1050 time series or the dump of databases. For example, an interface 1051 packet counter is exported every second. The streaming data 1052 reflect realtime network states and metrics and require large 1053 bandwidth and processing power. The streaming data are always 1054 actively pushed to the subscribers. 1056 The above data types are not mutually exclusive. Rather, they are 1057 often composite. Derived data is composed of simple data; Event- 1058 triggered data can be simple or derived; streaming data can be based 1059 on some recurring event. The relationships of these data types are 1060 illustrated in Figure 4. 1062 +----------------------+ +-----------------+ 1063 | Event-triggered Data |<----+ Streaming Data | 1064 +-------+---+----------+ +-----+---+-------+ 1065 | | | | 1066 | | | | 1067 | | +--------------+ | | 1068 | +-->| Derived Data |<--+ | 1069 | +------+------ + | 1070 | | | 1071 | V | 1072 | +--------------+ | 1073 +------>| Simple Data |<------+ 1074 +--------------+ 1076 Figure 4: Data Type Relationship 1078 Subscription usually deals with event-triggered data and streaming 1079 data, and query usually deals with simple data and derived data. But 1080 the other ways are also possible. Advanced network telemetry 1081 techniques are designed mainly for event-triggered or streaming data 1082 subscription, and derived data query. 1084 4.4. Mapping Existing Mechanisms into the Framework 1086 The following table shows how the existing mechanisms (mainly 1087 published in IETF and with the emphasis on the latest new 1088 technologies) are positioned in the framework. Given the vast body 1089 of existing work, we cannot provide an exhaustive list, so the 1090 mechanisms in the tables should be considered as just examples. 1091 Also, some comprehensive protocols and techniques may cover multiple 1092 aspects or modules of the framework, so a name in a block only 1093 emphasizes one particular characteristic of it. More details about 1094 some listed mechanisms can be found in Appendix A. 1096 +-------------+-----------------+---------------+--------------+ 1097 | | Management | Control | Forwarding | 1098 | | Plane | Plane | Plane | 1099 +-------------+-----------------+---------------+--------------+ 1100 | data config.| gNMI, NETCONF, | gNMI, NETCONF,| NETCONF, | 1101 | & subscribe | SNMP, YANG-Push | YANG-Push | YANG-Push | 1102 +-------------+-----------------+---------------+--------------+ 1103 | data gen. & | MIB, | YANG | IOAM, PSAMP | 1104 | process | YANG | | PBT, AM, | 1105 +-------------+-----------------+---------------+--------------+ 1106 | data encode.| gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP | 1107 | & export | | | | 1108 +-------------+-----------------+---------------+--------------+ 1109 Figure 5: Existing Work Mapping II 1111 5. Evolution of Network Telemetry Applications 1113 Network telemetry is an evolving technical area. As the network 1114 moves towards the automated operation, network telemetry applications 1115 undergo several stages of evolution which add new layer of 1116 requirements to the underlying network telemetry techniques. Each 1117 stage is built upon the techniques adopted by the previous stages 1118 plus some new requirements. 1120 Stage 0 - Static Telemetry: The telemetry data source and type are 1121 determined at design time. The network operator can only 1122 configure how to use it with limited flexibility. 1124 Stage 1 - Dynamic Telemetry: The custom telemetry data can be 1125 dynamically programmed or configured at runtime without 1126 interrupting the network operation, allowing a tradeoff among 1127 resource, performance, flexibility, and coverage. 1129 Stage 2 - Interactive Telemetry: The network operator can 1130 continuously customize and fine tune the telemetry data in real 1131 time to reflect the network operation's visibility requirements. 1132 Compared with Stage 1, the changes are frequent based on the real- 1133 time feedback. At this stage, some tasks can be automated, but 1134 human operators still need to sit in the middle to make decisions. 1136 Stage 3 - Closed-loop Telemetry: The telemetry is free from the 1137 interference of human operators, except for generating the 1138 reports. The intelligent network operation engine automatically 1139 issues the telemetry data requests, analyzes the data, and updates 1140 the network operations in closed control loops. 1142 Existing technologies are ready for stage 0 and stage 1. Individual 1143 stage 2 and stage 3 applications are also possible now. However, the 1144 future autonomic networks may need a comprehensive operation 1145 management system which works at stage 2 and stage 3 to cover all the 1146 network operation tasks. A well-defined network telemetry framework 1147 is the first step towards this direction. 1149 6. Security Considerations 1151 The complexity of network telemetry raises significant security 1152 implications. For example, telemetry data can be manipulated to 1153 exhaust various network resources at each plane as well as the data 1154 consumer; falsified or tampered data can mislead the decision making 1155 and paralyze networks; wrong configuration and programming for 1156 telemetry is equally harmful. The telemetry data is highly 1157 sensitive, which exposes a lot of information about the network and 1158 its configuration. Some of that information can make designing 1159 attacks against the network much easier (e.g., exact details of what 1160 software and patches have been installed), and allows an attacker to 1161 determine whether a device may be subject to unprotected security 1162 vulnerability. 1164 Given that this document has proposed a framework for network 1165 telemetry and the telemetry mechanisms discussed are more extensive 1166 (in both message frequency and traffic amount) than the conventional 1167 network OAM concepts, we must also reflect that various new security 1168 considerations may also arise. A number of techniques already exist 1169 for securing the forwarding plane, the control plane, and the 1170 management plane in a network, but it is important to consider if any 1171 new threat vectors are now being enabled via the use of network 1172 telemetry procedures and mechanisms. 1174 Security considerations for networks that use telemetry methods may 1175 include: 1177 * Telemetry framework trust and policy model; 1179 * Role management and access control for enabling and disabling 1180 telemetry capabilities; 1182 * Protocol transport used telemetry data and inherent security 1183 capabilities; 1185 * Telemetry data stores, storage encryption and methods of access; 1187 * Tracking telemetry events and any abnormalities that might 1188 identify malicious attacks using telemetry interfaces. 1190 * Authentication and signing of telemetry data to make data more 1191 trustworthy. 1193 * Segregating the telemetry data traffic from the data traffic 1194 carried over the network (e.g., historically management access and 1195 management data may be carried via an independent management 1196 network). 1198 Some of the security considerations highlighted above may be 1199 minimized or negated with policy management of network telemetry. In 1200 a network telemetry deployment it would be advantageous to separate 1201 telemetry capabilities into different classes of policies, i.e., Role 1202 Based Access Control and Event-Condition-Action policies. Also, 1203 potential conflicts between network telemetry mechanisms must be 1204 detected accurately and resolved quickly to avoid unnecessary network 1205 telemetry traffic propagation escalating into an unintended or 1206 intended denial of service attack. 1208 Further study of the security issues will be required, and it is 1209 expected that the secuirty mechanisms and protocols are developed and 1210 deployed along with a network telemetry system. 1212 In addition to security, privacy is also an important issue. Network 1213 telemetry means to improve the network operation which can ultimately 1214 benefit end user's quality of experience. The network operators must 1215 be held accountable and strive for a balance between managing the 1216 network and maintaining the user privacy of that network. 1218 7. IANA Considerations 1220 This document includes no request to IANA. 1222 8. Contributors 1224 The other contributors of this document are listed as follows. 1226 * Tianran Zhou 1228 * Zhenbin Li 1230 * Zhenqiang Li 1232 * Daniel King 1234 * Adrian Farrel 1236 * Alexander Clemm 1238 9. Acknowledgments 1240 We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe 1241 Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe 1242 Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, and many others 1243 who have provided helpful comments and suggestions to improve this 1244 document. 1246 10. Informative References 1248 [gnmi] "gNMI - gRPC Network Management Interface", 1249 . 1252 [grpc] "gPPC, A high performance, open-source universal RPC 1253 framework", . 1255 [I-D.ietf-grow-bmp-adj-rib-out] 1256 Evens, T., Bayraktar, S., Lucente, P., Mi, P., and S. 1257 Zhuang, "Support for Adj-RIB-Out in the BGP Monitoring 1258 Protocol (BMP)", Work in Progress, Internet-Draft, draft- 1259 ietf-grow-bmp-adj-rib-out-07, 5 August 2019, 1260 . 1263 [I-D.ietf-grow-bmp-local-rib] 1264 Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, 1265 "Support for Local RIB in BGP Monitoring Protocol (BMP)", 1266 Work in Progress, Internet-Draft, draft-ietf-grow-bmp- 1267 local-rib-13, 31 August 2021, 1268 . 1271 [I-D.ietf-ippm-ioam-data] 1272 Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields 1273 for In-situ OAM", Work in Progress, Internet-Draft, draft- 1274 ietf-ippm-ioam-data-14, 24 June 2021, 1275 . 1278 [I-D.ietf-ippm-multipoint-alt-mark] 1279 Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, 1280 "Multipoint Alternate-Marking Method for Passive and 1281 Hybrid Performance Monitoring", Work in Progress, 1282 Internet-Draft, draft-ietf-ippm-multipoint-alt-mark-09, 23 1283 March 2020, . 1286 [I-D.ietf-netconf-distributed-notif] 1287 Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, 1288 "Subscription to Distributed Notifications", Work in 1289 Progress, Internet-Draft, draft-ietf-netconf-distributed- 1290 notif-02, 6 May 2021, . 1293 [I-D.ietf-netconf-udp-notif] 1294 Zheng, G., Zhou, T., Graf, T., Francois, P., and P. 1295 Lucente, "UDP-based Transport for Configured 1296 Subscriptions", Work in Progress, Internet-Draft, draft- 1297 ietf-netconf-udp-notif-03, 12 July 2021, 1298 . 1301 [I-D.irtf-nmrg-ibn-concepts-definitions] 1302 Clemm, A., Ciavaglia, L., Granville, L. Z., and J. 1303 Tantsura, "Intent-Based Networking - Concepts and 1304 Definitions", Work in Progress, Internet-Draft, draft- 1305 irtf-nmrg-ibn-concepts-definitions-05, 2 September 2021, 1306 . 1309 [I-D.kumar-rtgwg-grpc-protocol] 1310 Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC 1311 Protocol", Work in Progress, Internet-Draft, draft-kumar- 1312 rtgwg-grpc-protocol-00, 8 July 2016, 1313 . 1316 [I-D.openconfig-rtgwg-gnmi-spec] 1317 Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack, 1318 C., and C. Morrow, "gRPC Network Management Interface 1319 (gNMI)", Work in Progress, Internet-Draft, draft- 1320 openconfig-rtgwg-gnmi-spec-01, 5 March 2018, 1321 . 1324 [I-D.pedro-nmrg-anticipated-adaptation] 1325 Martinez-Julia, P., "Exploiting External Event Detectors 1326 to Anticipate Resource Requirements for the Elastic 1327 Adaptation of SDN/NFV Systems", Work in Progress, 1328 Internet-Draft, draft-pedro-nmrg-anticipated-adaptation- 1329 02, 29 June 2018, . 1332 [I-D.song-ippm-postcard-based-telemetry] 1333 Song, H., Mirsky, G., Filsfils, C., Abdelsalam, A., Zhou, 1334 T., Li, Z., Shin, J., and K. Lee, "Postcard-based On-Path 1335 Flow Data Telemetry using Packet Marking", Work in 1336 Progress, Internet-Draft, draft-song-ippm-postcard-based- 1337 telemetry-10, 9 July 2021, 1338 . 1341 [I-D.song-opsawg-dnp4iq] 1342 Song, H. and J. Gong, "Requirements for Interactive Query 1343 with Dynamic Network Probes", Work in Progress, Internet- 1344 Draft, draft-song-opsawg-dnp4iq-01, 19 June 2017, 1345 . 1348 [I-D.song-opsawg-ifit-framework] 1349 Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- 1350 situ Flow Information Telemetry", Work in Progress, 1351 Internet-Draft, draft-song-opsawg-ifit-framework-15, 28 1352 September 2021, . 1355 [I-D.wwx-netmod-event-yang] 1356 Wu, Q., Bryskin, I., Birkholz, H., Liu, X., and B. Claise, 1357 "A YANG Data model for ECA Policy Management", Work in 1358 Progress, Internet-Draft, draft-wwx-netmod-event-yang-10, 1359 1 November 2020, . 1362 [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, 1363 "Simple Network Management Protocol (SNMP)", RFC 1157, 1364 DOI 10.17487/RFC1157, May 1990, 1365 . 1367 [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. 1368 Schoenwaelder, Ed., "Structure of Management Information 1369 Version 2 (SMIv2)", STD 58, RFC 2578, 1370 DOI 10.17487/RFC2578, April 1999, 1371 . 1373 [RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981, 1374 DOI 10.17487/RFC2981, October 2000, 1375 . 1377 [RFC3416] Presuhn, R., Ed., "Version 2 of the Protocol Operations 1378 for the Simple Network Management Protocol (SNMP)", 1379 STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002, 1380 . 1382 [RFC3594] Duffy, P., "PacketCable Security Ticket Control Sub-Option 1383 for the DHCP CableLabs Client Configuration (CCC) Option", 1384 RFC 3594, DOI 10.17487/RFC3594, September 2003, 1385 . 1387 [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management 1388 Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, 1389 September 2004, . 1391 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 1392 Zekauskas, "A One-way Active Measurement Protocol 1393 (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, 1394 . 1396 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 1397 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 1398 RFC 5357, DOI 10.17487/RFC5357, October 2008, 1399 . 1401 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 1402 the Network Configuration Protocol (NETCONF)", RFC 6020, 1403 DOI 10.17487/RFC6020, October 2010, 1404 . 1406 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 1407 and A. Bierman, Ed., "Network Configuration Protocol 1408 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 1409 . 1411 [RFC6812] Chiba, M., Clemm, A., Medley, S., Salowey, J., Thombare, 1412 S., and E. Yedavalli, "Cisco Service-Level Assurance 1413 Protocol", RFC 6812, DOI 10.17487/RFC6812, January 2013, 1414 . 1416 [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, 1417 "Specification of the IP Flow Information Export (IPFIX) 1418 Protocol for the Exchange of Flow Information", STD 77, 1419 RFC 7011, DOI 10.17487/RFC7011, September 2013, 1420 . 1422 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 1423 Weingarten, "An Overview of Operations, Administration, 1424 and Maintenance (OAM) Tools", RFC 7276, 1425 DOI 10.17487/RFC7276, June 2014, 1426 . 1428 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 1429 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 1430 DOI 10.17487/RFC7540, May 2015, 1431 . 1433 [RFC7575] Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., 1434 Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic 1435 Networking: Definitions and Design Goals", RFC 7575, 1436 DOI 10.17487/RFC7575, June 2015, 1437 . 1439 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 1440 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 1441 May 2016, . 1443 [RFC7854] Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP 1444 Monitoring Protocol (BMP)", RFC 7854, 1445 DOI 10.17487/RFC7854, June 2016, 1446 . 1448 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", 1449 RFC 7950, DOI 10.17487/RFC7950, August 2016, 1450 . 1452 [RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, 1453 L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 1454 "Alternate-Marking Method for Passive and Hybrid 1455 Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, 1456 January 2018, . 1458 [RFC8639] Voit, E., Clemm, A., Gonzalez Prieto, A., Nilsen-Nygaard, 1459 E., and A. Tripathy, "Subscription to YANG Notifications", 1460 RFC 8639, DOI 10.17487/RFC8639, September 2019, 1461 . 1463 [RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications 1464 for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641, 1465 September 2019, . 1467 Appendix A. A Survey on Existing Network Telemetry Techniques 1469 In this non-normative appendix, we provide an overview of some 1470 existing techniques and standard proposals for each network telemetry 1471 module. 1473 A.1. Management Plane Telemetry 1474 A.1.1. Push Extensions for NETCONF 1476 NETCONF [RFC6241] is a popular network management protocol 1477 recommended by IETF. Its core strength is for managing 1478 configuration, but can also be used for data collection. YANG-Push 1479 [RFC8641] [RFC8639] extends NETCONF and enables subscriber 1480 applications to request a continuous, customized stream of updates 1481 from a YANG datastore. Providing such visibility into changes made 1482 upon YANG configuration and operational objects enables new 1483 capabilities based on the remote mirroring of configuration and 1484 operational state. Moreover, distributed data collection mechanism 1485 [I-D.ietf-netconf-distributed-notif] via UDP based publication 1486 channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for 1487 the NETCONF based telemetry. 1489 A.1.2. gRPC Network Management Interface 1491 gRPC Network Management Interface (gNMI) 1492 [I-D.openconfig-rtgwg-gnmi-spec] is a network management protocol 1493 based on the gRPC [I-D.kumar-rtgwg-grpc-protocol] RPC (Remote 1494 Procedure Call) framework. With a single gRPC service definition, 1495 both configuration and telemetry can be covered. gRPC is an HTTP/2 1496 [RFC7540] based open source micro service communication framework. 1497 It provides a number of capabilities which are well-suited for 1498 network telemetry, including: 1500 * Full-duplex streaming transport model combined with a binary 1501 encoding mechanism provides good telemetry efficiency. 1503 * gRPC provides higher-level features consistency across platforms 1504 that common HTTP/2 libraries typically do not. This 1505 characteristic is especially valuable for the fact that telemetry 1506 data collectors normally reside on a large variety of platforms. 1508 * The built-in load-balancing and failover mechanism. 1510 A.2. Control Plane Telemetry 1512 A.2.1. BGP Monitoring Protocol 1514 BGP Monitoring Protocol (BMP) [RFC7854] is used to monitor BGP 1515 sessions and is intended to provide a convenient interface for 1516 obtaining route views. 1518 The BGP routing information is collected from the monitored device(s) 1519 to the BMP monitoring station by setting up the BMP TCP session. The 1520 BGP peers are monitored by the BMP Peer Up and Peer Down 1521 Notifications. The BGP routes (including Adjacency_RIB_In [RFC7854], 1522 Adjacency_RIB_out [I-D.ietf-grow-bmp-adj-rib-out], and Local_Rib 1523 [I-D.ietf-grow-bmp-local-rib] are encapsulated in the BMP Route 1524 Monitoring Message and the BMP Route Mirroring Message, providing 1525 both an initial table dump and real-time route updates. In addition, 1526 BGP statistics are reported through the BMP Stats Report Message, 1527 which could be either timer triggered or event-driven. Future BMP 1528 extensions could further enrich BGP monitoring applications. 1530 A.3. Data Plane Telemetry 1532 A.3.1. The Alternate Marking (AM) technology 1534 The Alternate Marking method enables efficient measurements of packet 1535 loss, delay, and jitter both in IP and Overlay Networks, as presented 1536 in [RFC8321] and [I-D.ietf-ippm-multipoint-alt-mark]. 1538 This technique can be applied to point-to-point and multipoint-to- 1539 multipoint flows. Alternate Marking creates batches of packets by 1540 alternating the value of 1 bit (or a label) of the packet header. 1541 These batches of packets are unambiguously recognized over the 1542 network and the comparison of packet counters for each batch allows 1543 the packet loss calculation. The same idea can be applied to delay 1544 measurement by selecting ad hoc packets with a marking bit dedicated 1545 for delay measurements. 1547 Alternate Marking method needs two counters each marking period for 1548 each flow under monitor. For instance, by considering n measurement 1549 points and m monitored flows, the order of magnitude of the packet 1550 counters for each time interval is n*m*2 (1 per color). 1552 Since networks offer rich sets of network performance measurement 1553 data (e.g packet counters), traditional approaches run into 1554 limitations. The bottleneck is the generation and export of the data 1555 and the amount of data that can be reasonably collected from the 1556 network. In addition, management tasks related to determining and 1557 configuring which data to generate lead to significant deployment 1558 challenges. 1560 The Multipoint Alternate Marking approach, described in 1561 [I-D.ietf-ippm-multipoint-alt-mark], aims to resolve this issue and 1562 make the performance monitoring more flexible in case a detailed 1563 analysis is not needed. 1565 An application orchestrates network performance measurements tasks 1566 across the network to allow for optimized monitoring. The 1567 application can choose how roughly or precisely to configure 1568 measurement points depending on the application's requirements. 1570 Using Alternate Marking, it is possible to monitor a Multipoint 1571 Network without in depth examination by using the Network Clustering 1572 (subnetworks that are portions of the entire network that preserve 1573 the same property of the entire network, called clusters). So in the 1574 case that there is packet loss or the delay is too high then the 1575 specific filtering criteria could be applied to gather a more 1576 detailed analysis by using a different combination of clusters up to 1577 a per-flow measurement as described in Alternate-Marking (AM) 1578 [RFC8321]. 1580 In summary, an application can configure end-to-end network 1581 monitoring. If the network does not experience issues, this 1582 approximate monitoring is good enough and is very cheap in terms of 1583 network resources. However, in case of problems, the application 1584 becomes aware of the issues from this approximate monitoring and, in 1585 order to localize the portion of the network that has issues, 1586 configures the measurement points more extensively, allowing more 1587 detailed monitoring to be performed. After the detection and 1588 resolution of the problem, the initial approximate monitoring can be 1589 used again. 1591 A.3.2. Dynamic Network Probe 1593 Hardware-based Dynamic Network Probe (DNP) [I-D.song-opsawg-dnp4iq] 1594 proposes a programmable means to customize the data that an 1595 application collects from the data plane. A direct benefit of DNP is 1596 the reduction of the exported data. A full DNP solution covers 1597 several components including data source, data subscription, and data 1598 generation. The data subscription needs to define the derived data 1599 which can be composed and derived from the raw data sources. The 1600 data generation takes advantage of the moderate in-network computing 1601 to produce the desired data. 1603 While DNP can introduce unforeseeable flexibility to the data plane 1604 telemetry, it also faces some challenges. It requires a flexible 1605 data plane that can be dynamically reprogrammed at run-time. The 1606 programming API is yet to be defined. 1608 A.3.3. IP Flow Information Export (IPFIX) protocol 1610 Traffic on a network can be seen as a set of flows passing through 1611 network elements. IP Flow Information Export (IPFIX) [RFC7011] 1612 provides a means of transmitting traffic flow information for 1613 administrative or other purposes. A typical IPFIX enabled system 1614 includes a pool of Metering Processes that collects data packets at 1615 one or more Observation Points, optionally filters them and 1616 aggregates information about these packets. An Exporter then gathers 1617 each of the Observation Points together into an Observation Domain 1618 and sends this information via the IPFIX protocol to a Collector. 1620 A.3.4. In-Situ OAM 1622 Traditional passive and active monitoring and measurement techniques 1623 are either inaccurate or resource-consuming. It is preferable to 1624 directly acquire data associated with a flow's packets when the 1625 packets pass through a network. In-situ OAM (iOAM) 1626 [I-D.ietf-ippm-ioam-data], a data generation technique, embeds a new 1627 instruction header to user packets and the instruction directs the 1628 network nodes to add the requested data to the packets. Thus, at the 1629 path end, the packet's experience gained on the entire forwarding 1630 path can be collected. Such firsthand data is invaluable to many 1631 network OAM applications. 1633 However, iOAM also faces some challenges. The issues on performance 1634 impact, security, scalability and overhead limits, encapsulation 1635 difficulties in some protocols, and cross-domain deployment need to 1636 be addressed. 1638 A.3.5. Postcard Based Telemetry 1640 PBT [I-D.song-ippm-postcard-based-telemetry] is a proposed 1641 complementary technique to IOAM. PBT directly exports data at each 1642 node through an independent packet. At the cost of higher bandwidth 1643 overhead and the need for data correlation, PBT shows several 1644 advantages over IOAM. It can also help to identify packet drop 1645 location in case a packet is dropped on its forwarding path. 1647 A.4. External Data and Event Telemetry 1648 A.4.1. Sources of External Events 1650 To ensure that the information provided by external event detectors 1651 and used by the network management solutions is meaningful for 1652 management purposes, the network telemetry framework must ensure that 1653 such detectors (sources) are easily connected to the management 1654 solutions (sinks). This requires the specification of a list of 1655 potential external data sources that could be of interest in network 1656 management and match it to the connectors and/or interfaces required 1657 to connect them. 1659 Categories of external event sources that may be of interest to 1660 network management include:: 1662 * Smart objects and sensors. With the consolidation of the Internet 1663 of Things~(IoT) any network system will have many smart objects 1664 attached to its physical surroundings and logical operation 1665 environments. Most of these objects will be essentially based on 1666 sensors of many kinds (e.g. temperature, humidity, presence) and 1667 the information they provide can be very useful for the management 1668 of the network, even when they are not specifically deployed for 1669 such purpose. Elements of this source type will usually provide a 1670 specific protocol for interaction, especially one of those 1671 protocols related to IoT, such as the Constrained Application 1672 Protocol (CoAP). 1674 * Online news reporters. Several online news services have the 1675 ability to provide enormous quantity of information about 1676 different events occurring in the world. Some of those events can 1677 impact on the network system managed by a specific framework and, 1678 therefore, such information may be of interest to the management 1679 solution. For instance, diverse security reports, such as the 1680 Common Vulnerabilities and Exposures (CVE), can be issued by the 1681 corresponding authority and used by the management solution to 1682 update the managed system if needed. Instead of a specific 1683 protocol and data format, the sources of this kind of information 1684 usually follow a relaxed but structured format. This format will 1685 be part of both the ontology and information model of the 1686 telemetry framework. 1688 * Global event analyzers. The advance of Big Data analyzers 1689 provides a huge amount of information and, more interestingly, the 1690 identification of events detected by analyzing many data streams 1691 from different origins. In contrast with the other types of 1692 sources, which are focused on specific events, the detectors of 1693 this source type will detect generic events. For example, a 1694 sports event takes place and some unexpected movement makes it 1695 highly interesting and many people connects to sites that are 1696 reporting on the event. The underlying networks supporting the 1697 services that cover the event can be affected by such situation so 1698 their management solutions should be aware of it. In contrast 1699 with the other source types, a new information model, format, and 1700 reporting protocol is required to integrate the detectors of this 1701 type with the management solution. 1703 Additional types of detector types can be added to the system but 1704 they will be generally the result of composing the properties offered 1705 by these main classes. 1707 A.4.2. Connectors and Interfaces 1709 For allowing external event detectors to be properly integrated with 1710 other management solutions, both elements must expose interfaces and 1711 protocols that are subject to their particular objective. Since 1712 external event detectors will be focused on providing their 1713 information to their main consumers, which generally will not be 1714 limited to the network management solutions, the framework must 1715 include the definition of the required connectors for ensuring the 1716 interconnection between detectors (sources) and their consumers 1717 within the management systems (sinks) are effective. 1719 In some situations, the interconnection between the external event 1720 detectors and the management system is via the management plane. For 1721 those situations there will be a special connector that provides the 1722 typical interfaces found in most other elements connected to the 1723 management plane. For instance, the interfaces could accomplish this 1724 with a specific data model (YANG) and specific telemetry protocol, 1725 such as NETCONF, YANG-Push, or gRPC. 1727 Authors' Addresses 1729 Haoyu Song 1730 Futurewei 1731 2330 Central Expressway 1732 Santa Clara, 1733 United States of America 1735 Email: haoyu.song@futurewei.com 1737 Fengwei Qin 1738 China Mobile 1739 No. 32 Xuanwumenxi Ave., Xicheng District 1740 Beijing, 100032 1741 P.R. China 1742 Email: qinfengwei@chinamobile.com 1744 Pedro Martinez-Julia 1745 NICT 1746 4-2-1, Nukui-Kitamachi, Tokyo 1747 184-8795 1748 Japan 1750 Email: pedro@nict.go.jp 1752 Laurent Ciavaglia 1753 Nokia 1754 91460 Villarceaux 1755 France 1757 Email: laurent.ciavaglia@nokia.com 1759 Aijun Wang 1760 China Telecom 1761 Beiqijia Town, Changping District 1762 Beijing, 102209 1763 P.R. China 1765 Email: wangaj.bri@chinatelecom.cn