idnits 2.17.1 draft-ietf-opsawg-ntf-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 21, 2021) is 1189 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-13) exists of draft-ietf-grow-bmp-local-rib-08 == Outdated reference: A later version (-17) exists of draft-ietf-ippm-ioam-data-11 == Outdated reference: A later version (-08) exists of draft-ietf-netconf-distributed-notif-01 == Outdated reference: A later version (-12) exists of draft-ietf-netconf-udp-notif-01 == Outdated reference: A later version (-09) exists of draft-irtf-nmrg-ibn-concepts-definitions-02 == Outdated reference: A later version (-16) exists of draft-song-ippm-postcard-based-telemetry-08 == Outdated reference: A later version (-21) exists of draft-song-opsawg-ifit-framework-13 -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) -- Obsolete informational reference (is this intentional?): RFC 8321 (Obsoleted by RFC 9341) Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 OPSAWG H. Song 3 Internet-Draft Futurewei 4 Intended status: Informational F. Qin 5 Expires: July 25, 2021 China Mobile 6 P. Martinez-Julia 7 NICT 8 L. Ciavaglia 9 Nokia 10 A. Wang 11 China Telecom 12 January 21, 2021 14 Network Telemetry Framework 15 draft-ietf-opsawg-ntf-06 17 Abstract 19 Network telemetry is a technology for gaining network insight and 20 facilitating efficient and automated network management. It 21 encompasses various techniques for remote data generation, 22 collection, correlation, and consumption. This document describes an 23 architectural framework for network telemetry, motivated by 24 challenges that are encountered as part of the operation of networks 25 and by the requirements that ensue. Network telemetry, as 26 necessitated by best industry practices, covers technologies and 27 protocols that extend beyond conventional network Operations, 28 Administration, and Management (OAM). The presented network 29 telemetry framework promises better flexibility, scalability, 30 accuracy, coverage, and performance. In addition, it facilitates the 31 implementation of automated control loops to address both today's and 32 tomorrow's network operational needs. This document clarifies the 33 terminologies and classifies the modules and components of a network 34 telemetry system from several different perspectives. The framework 35 and taxonomy help to set a common ground for the collection of 36 related work and provide guidance for related technique and standard 37 developments. 39 Status of This Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at https://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on July 25, 2021. 56 Copyright Notice 58 Copyright (c) 2021 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (https://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 74 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 5 76 2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5 77 2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 7 78 2.4. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 8 79 2.5. Network Telemetry . . . . . . . . . . . . . . . . . . . . 10 80 3. The Necessity of a Network Telemetry Framework . . . . . . . 12 81 4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 13 82 4.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 13 83 4.1.1. Management Plane Telemetry . . . . . . . . . . . . . 17 84 4.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 17 85 4.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 18 86 4.1.4. External Data Telemetry . . . . . . . . . . . . . . . 20 87 4.2. Second Level Function Components . . . . . . . . . . . . 20 88 4.3. Data Acquiring Mechanism and Type Abstraction . . . . . . 22 89 4.4. Existing Works Mapped in the Framework . . . . . . . . . 24 90 5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 26 91 6. Security Considerations . . . . . . . . . . . . . . . . . . . 26 92 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 93 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 28 94 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 28 95 10. Informative References . . . . . . . . . . . . . . . . . . . 28 96 Appendix A. A Survey on Existing Network Telemetry Techniques . 32 97 A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 32 98 A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 32 99 A.1.2. gRPC Network Management Interface . . . . . . . . . . 33 100 A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 33 101 A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 33 102 A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 34 103 A.3.1. The Alternate Marking technology . . . . . . . . . . 34 104 A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 35 105 A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 35 106 A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 35 107 A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 36 108 A.4. External Data and Event Telemetry . . . . . . . . . . . . 36 109 A.4.1. Sources of External Events . . . . . . . . . . . . . 36 110 A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 37 111 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38 113 1. Introduction 115 Network visibility is the ability of management tools to see the 116 state and behavior of a network, which is essential for successful 117 network operation. Network Telemetry revolves around network data 118 that can help provide insights about the current state of the 119 network, including network devices, forwarding, control, and 120 management planes, and that can be generated and obtained through a 121 variety of techniques, including but not limited to network 122 instrumentation and measurements, and that can be processed for 123 purposes ranging from service assurance to network security using a 124 wide variety of techniques including machine learning, data analysis, 125 and correlation. In this document, Network Telemetry refer to both 126 the data itself (i.e., "Network Telemetry Data"), and the techniques 127 and processes used to generate, export, collect, and consume that 128 data for use by potentially automated management applications. 129 Network telemetry extends beyond the conventional network Operations, 130 Administration, and Management (OAM) techniques and expects to 131 support better flexibility, scalability, accuracy, coverage, and 132 performance. 134 However, the term of network telemetry lacks a solid and unambiguous 135 definition. The scope and coverage of it cause confusion and 136 misunderstandings. It is beneficial to clarify the concept and 137 provide a clear architectural framework for network telemetry, so we 138 can articulate the technical field, and better align the related 139 techniques and standard works. 141 To fulfill such an undertaking, we first discuss some key 142 characteristics of network telemetry which set a clear distinction 143 from the conventional network OAM and show that some conventional OAM 144 technologies can be considered a subset of the network telemetry 145 technologies. We then provide an architectural framework for network 146 telemetry which includes four modules, each concerned with a 147 different category of telemetry data and corresponding procedures. 148 All the modules are internally structured in the same way, including 149 components that allow to configure data sources with regards to what 150 data to generate and how to make that available to client 151 applications, components that instrument the underlying data sources, 152 and components that perform the actual rendering, encoding, and 153 exporting of the generated data. We show how the network telemetry 154 framework can benefit the current and future network operations. 155 Based on the distinction of modules and function components, we can 156 map the existing and emerging techniques and protocols into the 157 framework. The framework can also simplify the tasks for designing, 158 maintaining, and understanding a network telemetry system. At last, 159 we outline the evolution stages of the network telemetry system and 160 discuss the potential security concerns. 162 The purpose of the framework and taxonomy is to set a common ground 163 for the collection of related work and provide guidance for future 164 technique and standard developments. To the best of our knowledge, 165 this document is the first such effort for network telemetry in 166 industry standards organizations. 168 2. Background 170 The term "big data" is used to describe the extremely large volume of 171 data sets that can be analyzed computationally to reveal patterns, 172 trends, and associations. Networks are undoubtedly a source of big 173 data because of their scale and the volume of network traffic they 174 forward. It is easy to see that network operations can benefit from 175 network big data. 177 Today one can access advanced big data analytics capability through a 178 plethora of commercial and open source platforms (e.g., Apache 179 Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine 180 learning). Thanks to the advance of computing and storage 181 technologies, network big data analytics gives network operators an 182 opportunity to gain network insights and move towards network 183 autonomy. Some operators start to explore the application of 184 Artificial Intelligence (AI) to make sense of network data. Software 185 tools can use the network data to detect and react on network faults, 186 anomalies, and policy violations, as well as predicting future 187 events. In turn, the network policy updates for planning, intrusion 188 prevention, optimization, and self-healing may be applied. 190 It is conceivable that an autonomic network [RFC7575] is the logical 191 next step for network evolution following Software Defined Network 192 (SDN), aiming to reduce (or even eliminate) human labor, make more 193 efficient use of network resources, and provide better services more 194 aligned with customer requirements. Intent-based Networking (IBN) 195 [I-D.irtf-nmrg-ibn-concepts-definitions] provides the necessary 196 mechanisms. Although it takes time to reach the ultimate goal, the 197 journey has started nevertheless. 199 However, while the data processing capability is improved and 200 applications are hungry for more data, the networks lag behind in 201 extracting and translating network data into useful and actionable 202 information in efficient ways. The system bottleneck is shifting 203 from data consumption to data supply. Both the number of network 204 nodes and the traffic bandwidth keep increasing at a fast pace. The 205 network configuration and policy change at smaller time slots than 206 before. More subtle events and fine-grained data through all network 207 planes need to be captured and exported in real time. In a nutshell, 208 it is a challenge to get enough high-quality data out of the network 209 in a manner that is efficient, timely, and flexible. Therefore, we 210 need to survey the existing technologies and protocols and identify 211 any potential gaps. 213 In the remainder of this section, first we clarify the scope of 214 network data (i.e., telemetry data) concerned in the context. Then, 215 we discuss several key use cases for today's and future network 216 operations. Next, we show why the current network OAM techniques and 217 protocols are insufficient for these use cases. The discussion 218 underlines the need of new methods, techniques, and protocols which 219 we assign under the umbrella term - Network Telemetry. 221 2.1. Telemetry Data Coverage 223 Any information that can be extracted from networks (including data 224 plane, control plane, and management plane) and used to gain 225 visibility or as basis for actions is considered telemetry data. It 226 includes statistics, event records and logs, snapshots of state, 227 configuration data, etc. It also covers the outputs of any active 228 and passive measurements [RFC7799]. Specially, raw data can be 229 processed in-network before being sent to a data consumer. Such 230 processed data is also considered telemetry data. A classification 231 of telemetry data is provided in Section 4. 233 2.2. Use Cases 235 The following set of use cases is essential for network operations. 236 While the list is by no means exhaustive, it is enough to highlight 237 the requirements for data velocity, variety, volume, and veracity in 238 networks. 240 Security: Network intrusion detection and prevention systems need to 241 monitor network traffic and activities and act upon anomalies. 242 Given increasingly sophisticated attack vector coupled with 243 increasingly severe consequences of security breaches, new tools 244 and techniques need to be developed, relying on wider and deeper 245 visibility in networks. 247 Policy and Intent Compliance: Network policies are the rules that 248 constraint the services for network access, provide service 249 differentiation, or enforce specific treatment on the traffic. 250 For example, a service function chain is a policy that requires 251 the selected flows to pass through a set of ordered network 252 functions. Intent, as defined in 253 [I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational 254 goal that a network should meet and outcomes that a network is 255 supposed to deliver, defined in a declarative manner without 256 specifying how to achieve or implement them. An intent requires a 257 complex translation and mapping process before being applied on 258 networks. While a policy or an intent is enforced, the compliance 259 needs to be verified and monitored continuously, and any violation 260 needs to be reported immediately. 262 SLA Compliance: A Service-Level Agreement (SLA) defines the level of 263 service a user expects from a network operator, which include the 264 metrics for the service measurement and remedy/penalty procedures 265 when the service level misses the agreement. Users need to check 266 if they get the service as promised and network operators need to 267 evaluate how they can deliver the services that can meet the SLA 268 based on realtime network measurement. 270 Root Cause Analysis: Any network failure can be the effect of a 271 sequence of chained events. Troubleshooting and recovery require 272 quick identification of the root cause of any observable issues. 273 However, the root cause is not always straightforward to identify, 274 especially when the failure is sporadic and the number of event 275 messages, both related and unrelated to the same cause, is 276 overwhelming. While machine learning technologies can be used for 277 root cause analysis, it up to the network to sense and provide the 278 relevant data. 280 Network Optimization: This covers all short-term and long-term 281 network optimization techniques, including load balancing, Traffic 282 Engineering (TE), and network planning. Network operators are 283 motivated to optimize their network utilization and differentiate 284 services for better Return On Investment (ROI) or lower Capital 285 Expenditures (CAPEX). The first step is to know the real-time 286 network conditions before applying policies for traffic 287 manipulation. In some cases, micro-bursts need to be detected in 288 a very short time-frame so that fine-grained traffic control can 289 be applied to avoid network congestion. Long-term planning of 290 network capacity and topology requires analysis of real-world 291 network telemetry data that is obtained over long periods of time. 293 Event Tracking and Prediction: The visibility of traffic path and 294 performance is critical for services and applications that rely on 295 healthy network operation. Numerous related network events are of 296 interest to network operators. For example, Network operators 297 want to learn where and why packets are dropped for an application 298 flow. They also want to be warned of issues in advance so 299 proactive actions can be taken to avoid catastrophic consequences. 301 2.3. Challenges 303 For a long time, network operators have relied upon SNMP [RFC3416], 304 Command-Line Interface (CLI), or Syslog to monitor the network. Some 305 other OAM techniques as described in [RFC7276] are also used to 306 facilitate network troubleshooting. These conventional techniques 307 are not sufficient to support the above use cases for the following 308 reasons: 310 o Most use cases need to continuously monitor the network and 311 dynamically refine the data collection in real-time. The poll- 312 based low-frequency data collection is ill-suited for these 313 applications. Subscription-based streaming data directly pushed 314 from the data source (e.g., the forwarding chip) is preferred to 315 provide enough data quantity and precision at scale. 317 o Comprehensive data is needed from packet processing engine to 318 traffic manager, from line cards to main control board, from user 319 flows to control protocol packets, from device configurations to 320 operations, and from physical layer to application layer. 321 Conventional OAM only covers a narrow range of data (e.g., SNMP 322 only handles data from the Management Information Base (MIB)). 323 Traditional network devices cannot provide all the necessary 324 probes. More open and programmable network devices are therefore 325 needed. 327 o Many application scenarios need to correlate network-wide data 328 from multiple sources (i.e., from distributed network devices, 329 different components of a network device, or different network 330 planes). A piecemeal solution is often lacking the capability to 331 consolidate the data from multiple sources. The composition of a 332 complete solution, as partly proposed by Autonomic Resource 333 Control Architecture(ARCA) 334 [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and 335 guided by a comprehensive framework. 337 o Some of the conventional OAM techniques (e.g., CLI and Syslog) 338 lack a formal data model. The unstructured data hinder the tool 339 automation and application extensibility. Standardized data 340 models are essential to support the programmable networks. 342 o Although some conventional OAM techniques support data push (e.g., 343 SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data 344 are limited to only predefined management plane warnings (e.g., 345 SNMP Trap) or sampled user packets (e.g., sFlow). Network 346 operators require the data with arbitrary source, granularity, and 347 precision which are beyond the capability of the existing 348 techniques. 350 o The conventional passive measurement techniques can either consume 351 excessive network resources and render excessive redundant data, 352 or lead to inaccurate results; on the other hand, the conventional 353 active measurement techniques can interfere with the user traffic 354 and their results are indirect. Techniques that can collect 355 direct and on-demand data from user traffic are more favorable. 357 These challenges were addressed by newer standards and techniques 358 (e.g., IPFIX/Netflow, PSAMP, IOAM, and YANG-Push) and more are 359 emerging. These standards and techniques need to be recognized and 360 accommodated in a new framework. 362 2.4. Glossary 364 Before further discussion, we list some key terminology and acronyms 365 used in this documents. We make an intended differentiation between 366 the terms of network telemetry and OAM. However, it should be 367 understood that there is not a hard-line distinction between the two 368 concepts. Rather, network telemetry is considered as the extension 369 of OAM. It covers all the existing OAM protocols but puts more 370 emphasis on the newer and emerging techniques and protocols 371 concerning all aspects of network data from acquisition to 372 consumption. 374 AI: Artificial Intelligence. In network domain, AI refers to the 375 machine-learning based technologies for automated network 376 operation and other tasks. 378 AM: Alternate Marking, a flow performance measurement method, 379 specified in [RFC8321]. 381 BMP: BGP Monitoring Protocol, specified in [RFC7854]. 383 DNP: Dynamic Network Probe, referring to programmable in-network 384 sensors for network monitoring and measurement. 386 DPI: Deep Packet Inspection, referring to the techniques that 387 examines packet beyond packet L3/L4 headers. 389 gNMI: gRPC Network Management Interface, a network management 390 protocol from OpenConfig Operator Working Group, mainly 391 contributed by Google. See [gnmi] for details. 393 gRPC: gRPC Remote Procedure Call, a open source high performance RPC 394 framework that gNMI is based on. See [grpc] for details. 396 IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. 398 IOAM: In-situ OAM, a dataplane on-path telemetry technique. 400 NETCONF: Network Configuration Protocol, specified in [RFC6241]. 402 NetFlow: A Cisco protocol for flow record collecting, described in 403 [RFC3594]. 405 Network Telemetry: The process and instrumentation for acquiring and 406 utilizing network data remotely for network monitoring and 407 operation. A general term for a large set of network visibility 408 techniques and protocols, concerning aspects like data generation, 409 collection, correlation, and consumption. Network telemetry 410 addresses the current network operation issues and enables smooth 411 evolution toward future intent-driven autonomous networks. 413 NMS: Network Management System, referring to applications that allow 414 network administrators manage a network. 416 OAM: Operations, Administration, and Maintenance. A group of 417 network management functions that provide network fault 418 indication, fault localization, performance information, and data 419 and diagnosis functions. Most conventional network monitoring 420 techniques and protocols belong to network OAM. 422 PBT: Postcard-Based Telemetry, a dataplane on-path telemetry 423 technique. 425 SMIv2 Structure of Management Information Version 2, specified in 426 [RFC2578]. 428 SNMP: Simple Network Management Protocol. Version 1 and 2 are 429 specified in [RFC1157] and [RFC3416], respectively. 431 YANG: The abbreviation of "Yet Another Next Generation". YANG is a 432 data modeling language for the definition of data sent over 433 network management protocols such as the NETCONF and RESTCONF. 434 YANG is defined in [RFC6020]. 436 YANG ECA A YANG model for Event-Condition-Action policies, defined 437 in [I-D.wwx-netmod-event-yang]. 439 YANG FSM: A YANG model that describes events, operations, and finite 440 state machine of YANG-defined network elements. 442 YANG PUSH: A method to subscribe pushed data from remote YANG 443 datastore on network devices. Details are specified in [RFC8641] 444 and [RFC8639]. 446 2.5. Network Telemetry 448 Network telemetry has emerged as a mainstream technical term to refer 449 to the network data collection and consumption techniques. Several 450 network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and 451 gPRC [grpc]) have been widely deployed. Network telemetry allows 452 separate entities to acquire data from network devices so that data 453 can be visualized and analyzed to support network monitoring and 454 operation. Network telemetry covers the conventional network OAM and 455 has a wider scope. It is expected that network telemetry can provide 456 the necessary network insight for autonomous networks and address the 457 shortcomings of conventional OAM techniques. 459 Network telemetry usually assumes machines as data consumer rather 460 than human operators. Hence, the network telemetry can directly 461 trigger the automated network operation, while in contrast some 462 conventional OAM tools are designed and used to help human operators 463 to monitor and diagnose the networks and guide manual network 464 operations. Such a proposition leads to very different techniques. 466 Although new network telemetry techniques are emerging and subject to 467 continuous evolution, several characteristics of network telemetry 468 have been well accepted. Note that network telemetry is intended to 469 be an umbrella term covering a wide spectrum of techniques, so the 470 following characteristics are not expected to be held by every 471 specific technique. 473 o Push and Streaming: Instead of polling data from network devices, 474 telemetry collectors subscribe to streaming data pushed from data 475 sources in network devices. 477 o Volume and Velocity: The telemetry data is intended to be consumed 478 by machines rather than by human being. Therefore, the data 479 volume is huge and the processing is often in realtime. 481 o Normalization and Unification: Telemetry aims to address the 482 overall network automation needs. Efforts are made to normalize 483 the data representation and unify the protocols, so to simplify 484 data analysis and tying it all in with automation solutions 486 o Model-based: The telemetry data is modeled in advance which allows 487 applications to configure and consume data with ease. 489 o Data Fusion: The data for a single application can come from 490 multiple data sources (e.g., cross-domain, cross-device, and 491 cross-layer) and needs to be correlated to take effect. 493 o Dynamic and Interactive: Since the network telemetry means to be 494 used in a closed control loop for network automation, it needs to 495 run continuously and adapt to the dynamic and interactive queries 496 from the network operation controller. 498 In addition, an ideal network telemetry solution may also have the 499 following features or properties: 501 o In-Network Customization: The data can be customized in network at 502 run-time to cater to the specific need of applications. This 503 needs the support of a programmable data plane which allows probes 504 with custom functions to be deployed at flexible locations. 506 o In-Network Data Aggregation and Correlation: Network devices and 507 aggregation points can work out which events and what data needs 508 to be stored, reported, or discarded thus reducing the load on the 509 central collection and processing points while still ensuring that 510 the right information is ready to be processed in a timely way. 512 o In-Network Processing: Sometimes it is not necessary or feasible 513 to gather all information to a central point to be processed and 514 acted upon. It is possible for the data processing to be done in 515 network, allowing reactive actions to be taken locally. 517 o Direct Data Plane Export: The data originated from the data plane 518 forwarding chips can be directly exported to the data consumer for 519 efficiency, especially when the data bandwidth is large and the 520 real-time processing is required. 522 o In-band Data Collection: In addition to the passive and active 523 data collection approaches, the new hybrid approach allows to 524 directly collect data for any target flow on its entire forwarding 525 path [I-D.song-opsawg-ifit-framework]. 527 It is worth noting that, a network telemetry system should not be 528 intrusive to normal network operations, by avoiding the pitfall of 529 the "observer effect". That is, it should not change the network 530 behavior and affect the forwarding performance. Otherwise, the whole 531 purpose of network telemetry is defied. 533 Although in many cases a system for network telemetry involves a 534 remote data collecting and consuming entity, it is important to 535 understand that there are no inherent assumptions about how a system 536 should be architected. Telemetry data producers and consumers can 537 work in distributed or peer-to-peer fashions rather than assuming a 538 centralized data consuming entity. In such cases, a network node can 539 be the direct consumer of telemetry data from other nodes. 541 3. The Necessity of a Network Telemetry Framework 543 Network data analytics and machine-learning technologies are applied 544 for network operation automation, relying on abundant and coherent 545 data from networks. Data acquisition that is limited to a single 546 source and static in nature will in many cases not be sufficient to 547 meet an application's telemetry data needs. As a result, multiple 548 data sources, involving a variety of techniques and standards, will 549 need to be integrated. It is desirable to have a framework that 550 classifies and organizes different telemetry data source and types, 551 defines different components of a network telemetry system and their 552 interactions, and helps coordinate and integrate multiple telemetry 553 approaches across layers. This allows flexible combinations of data 554 for different applications, while normalizing and simplifying 555 interfaces. In detail, such a framework would benefit application 556 development for the following reasons: 558 o Future networks, autonomous or otherwise, depend on holistic and 559 comprehensive network visibility. All the use cases and 560 applications are better to be supported uniformly and coherently 561 under a single intelligent agent. Therefore, the protocols and 562 mechanisms should be consolidated into a minimum yet comprehensive 563 set. A telemetry framework can help to normalize the technique 564 developments. 566 o Network visibility presents multiple viewpoints. For example, the 567 device viewpoint takes the network infrastructure as the 568 monitoring object from which the network topology and device 569 status can be acquired; the traffic viewpoint takes the flows or 570 packets as the monitoring object from which the traffic quality 571 and path can be acquired. An application may need to switch its 572 viewpoint during operation. It may also need to correlate a 573 service and its impact on network experience to acquire the 574 comprehensive information. 576 o Applications require network telemetry to be elastic in order to 577 make efficient use of network resources and reduce the impact of 578 processing related to network telemetry on network performance. 579 For example, routine network monitoring should cover the entire 580 network with a low data sampling rate. Only when issues arise or 581 critical trends emerge should telemetry data source be modified 582 and telemetry data rates boosted as needed. 584 o Efficient data fusion is critical for applications to reduce the 585 overall quantity of data and improve the accuracy of analysis. 587 A telemetry framework collects together all of the telemetry-related 588 works from different sources and working groups within IETF. This 589 makes it possible to assemble a comprehensive network telemetry 590 system and to avoid repetitious or redundant work. The framework 591 should cover the concepts and components from the standardization 592 perspective. This document describes the modules which make up a 593 network telemetry framework and decomposes the telemetry system into 594 a set of distinct components that existing and future work can easily 595 map to. 597 4. Network Telemetry Framework 599 The top level network telemetry framework partitions the network 600 telemetry into four modules based on the telemetry data object source 601 and represents their relationship. At the next level, the framework 602 decomposes each module into separate components. Each of the modules 603 follows the same underlying structure, with one component dedicated 604 to the configuration of data subscriptions and data sources, a second 605 component dedicated to encoding and exporting data, and a third 606 component instrumenting the generation of telemetry related to the 607 underlying resources. Throughout the framework, the same set of 608 abstract data acquiring mechanisms and data types are applied. The 609 two-level architecture with the uniform data abstraction helps 610 accurately pinpoint a protocol or technique to its position in a 611 network telemetry system or disaggregate a network telemetry system 612 into manageable parts. 614 4.1. Top Level Modules 616 Telemetry can be applied on the forwarding plane, the control plane, 617 and the management plane in a network, as well as other sources out 618 of the network, as shown in Figure 1. Therefore, we categorize the 619 network telemetry into four distinct modules with each having its own 620 interface to Network Operation Applications. 622 +------------------------------+ 623 | | 624 | Network Operation |<-------+ 625 | Applications | | 626 | | | 627 +------------------------------+ | 628 ^ ^ ^ | 629 | | | | 630 V | V V 631 +-----------|---+--------------+ +-----------+ 632 | | | | | | 633 | Control Pl|ane| | | External | 634 | Telemetry | <---> | | Data and | 635 | | | | | Event | 636 | ^ V | Management | | Telemetry | 637 +------|--------+ Plane | | | 638 | V | Telemetry | +-----------+ 639 | Forwarding | | 640 | Plane <---> | 641 | Telemetry | | 642 | | | 643 +---------------+--------------+ 645 Figure 1: Modules in Layer Category of NTF 647 The rationale of this partition lies in the different telemetry data 648 objects which result in different data source and export locations. 649 Such differences have profound implications on in-network data 650 programming and processing capability, data encoding and transport 651 protocol, and data bandwidth and latency. 653 We summarize the major differences of the four modules in the 654 following table. They are compared from six aspects: 656 o Data Object 658 o Data Export Location 660 o Data Model 662 o Data Encoding 664 o Telemetry Protocol 666 o Transport Method 668 Data object is the target and source of each module. Because the 669 data source varies, the data export location varies. For example, 670 the forwarding plane data are mainly from the fast path(e.g., 671 forwarding chips) while the control plane data are mainly from the 672 slow path (e.g., main control CPU). For convenience and efficiency, 673 it is preferred to export the data from locations near the source. 674 Because each data export location has different capability, the 675 proper data model, encoding, and transport method cannot be kept the 676 same. For example, the forwarding chip has high throughput but 677 limited capacity for processing complex data and maintaining states, 678 while the main control CPU is capable of complex data and state 679 processing, but has limited bandwidth for high throughput data. As a 680 result, the suitable telemetry protocol for each module can be 681 different. Some representative techniques are shown in the 682 corresponding table blocks to highlight the technical diversity of 683 these modules. Note that the selected techniques just reflect the 684 de-facto state of the art and are not exhaustive. The key point is 685 that one cannot expect to use a universal protocol to cover all the 686 network telemetry requirements. 688 +---------+--------------+--------------+--------------+-----------+ 689 | Module | Control | Management | Forwarding | External | 690 | | Plane | Plane | Plane | Data | 691 +---------+--------------+--------------+--------------+-----------+ 692 |Object | control | config. & | flow & packet| terminal, | 693 | | protocol & | operation | QoS, traffic | social & | 694 | | signaling, | state, MIB | stat., buffer| environ- | 695 | | RIB, ACL | | & queue stat.| mental | 696 +---------+--------------+--------------+--------------+-----------+ 697 |Export | main control | main control | fwding chip | various | 698 |Location | CPU, | CPU | or linecard | | 699 | | linecard CPU | | CPU; main | | 700 | | or fwding | | control CPU | | 701 | | chip | | unlikely | | 702 +---------+--------------+--------------+--------------+-----------+ 703 |Data | YANG, | MIB, syslog, | template, | YANG | 704 |Model | custom | YANG, | YANG, | | 705 | | | custom | custom | | 706 +---------+--------------+--------------+--------------+-----------+ 707 |Data | GPB, JSON, | GPB, JSON, | plain | GPB, JSON | 708 |Encoding | XML, plain | XML | | XML, plain| 709 +---------+--------------+--------------+--------------+-----------+ 710 |Protocol | gRPC,NETCONF,| gPRC,NETCONF,| IPFIX, mirror| gRPC | 711 | | IPFIX,mirror | | | | 712 +---------+--------------+--------------+--------------+-----------+ 713 |Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP | 714 | | UDP | | | UDP | 715 +---------+--------------+--------------+--------------+-----------+ 717 Figure 2: Comparison of the Data Object Modules 719 Note that the interaction with the network operation applications can 720 be indirect. Some in-device data transfer is possible. For example, 721 in the management plane telemetry, the management plane may need to 722 acquire data from the data plane. Some of the operational states can 723 only be derived from the data plane such as the interface status and 724 statistics. For another example, the control plane telemetry may 725 need to access the Forwarding Information Base (FIB) in data plane. 727 On the other hand, an application may involve more than one plane and 728 interact with multiple planes simultaneously. For example, an SLA 729 compliance application may require both the data plane telemetry and 730 the control plane telemetry. 732 The requirements and challenges for each module are summarized as 733 follows. 735 4.1.1. Management Plane Telemetry 737 The management plane of network elements interacts with the Network 738 Management System (NMS), and provides information such as performance 739 data, network logging data, network warning and defects data, and 740 network statistics and state data. The management plane includes 741 many protocols, including some that are considered "legacy", such as 742 SNMP and syslog. Regardless the protocol, management plane telemetry 743 must address the following requirements: 745 Convenient Data Subscription: An application should have the freedom 746 to choose the data export means such as the data types and the 747 export frequency. 749 Structured Data: For automatic network operation, machines will 750 replace human for network data comprehension. The schema 751 languages such as YANG can efficiently describe structured data 752 and normalize data encoding and transformation. 754 High Speed Data Transport: In order to keep up with the velocity of 755 information, a server needs to be able to send large amounts of 756 data at high frequency. Compact encoding formats are needed to 757 compress the data and improve the data transport efficiency. The 758 subscription mode, by replacing the query mode, reduces the 759 interactions between clients and servers and helps to improve the 760 server's efficiency. 762 4.1.2. Control Plane Telemetry 764 The control plane telemetry refers to the health condition monitoring 765 of different network control protocols covering Layer 2 to Layer 7. 766 Keeping track of the running status of these protocols is beneficial 767 for detecting, localizing, and even predicting various network 768 issues, as well as network optimization, in real-time and in fine 769 granularity. 771 One of the most challenging problems for the control plane telemetry 772 is how to correlate the End-to-End (E2E) Key Performance Indicators 773 (KPI) to a specific layer's KPIs. For example, an IPTV user may 774 describe his User Experience (UE) by the video fluency and 775 definition. Then in case of an unusually poor UE KPI or a service 776 disconnection, it is non-trivial to delimit and pinpoint the issue in 777 the responsible protocol layer (e.g., the Transport Layer or the 778 Network Layer), the responsible protocol (e.g., ISIS or BGP at the 779 Network Layer), and finally the responsible device(s) with specific 780 reasons. 782 Traditional OAM-based approaches for control plane KPI measurement 783 include PING (L3), Tracert (L3), Y.1731 (L2), and so on. One common 784 issue behind these methods is that they only measure the KPIs instead 785 of reflecting the actual running status of these protocols, making 786 them less effective or efficient for control plane troubleshooting 787 and network optimization. 789 An example of the control plane telemetry is the BGP monitoring 790 protocol (BMP), it is currently used to monitoring the BGP routes and 791 enables rich applications, such as BGP peer analysis, AS analysis, 792 prefix analysis, security analysis, and so on. However, the 793 monitoring of other layers, protocols and the cross-layer, cross- 794 protocol KPI correlations are still in their infancy (e.g., the IGP 795 monitoring is missing), which require further research. 797 4.1.3. Forwarding Plane Telemetry 799 An effective forwarding plane telemetry system relies on the data 800 that the network device can expose. The quality, quantity, and 801 timeliness of data must meet some stringent requirements. This 802 raises some challenges to the network data plane devices where the 803 first hand data originate. 805 o A data plane device's main function is user traffic processing and 806 forwarding. While supporting network visibility is important, the 807 telemetry is just an auxiliary function, and it should not impede 808 normal traffic processing and forwarding (i.e., the performance is 809 not lowered and the behavior is not altered due to the telemetry 810 functions). 812 o Network operation applications require end-to-end visibility 813 across various sources, which can result in a huge volume of data. 814 However, the sheer data quantity should not exhaust the network 815 bandwidth, regardless of the data delivery approach (i.e., whether 816 through in-band or out-of-band channels). 818 o The data plane devices must provide timely data with the minimum 819 possible delay. Long processing, transport, storage, and analysis 820 delay can impact the effectiveness of the control loop and even 821 render the data useless. 823 o The data should be structured and labeled, and easy for 824 applications to parse and consume. At the same time, the data 825 types needed by applications can vary significantly. The data 826 plane devices need to provide enough flexibility and 827 programmability to support the precise data provision for 828 applications. 830 o The data plane telemetry should support incremental deployment and 831 work even though some devices are unaware of the system. This 832 challenge is highly relevant to the standards and legacy networks. 834 Although not specific to the forwarding plane, these challenges are 835 more difficult to the forwarding plane because of the limited 836 resource and flexibility. The data plane programmability is 837 essential to support network telemetry. Newer data plane forwarding 838 chips are equipped with advanced telemetry features and provide 839 flexibility to support customized telemetry functions. 841 4.1.3.1. Technique Taxonomy 843 There can be multiple possible dimensions to classify the forwarding 844 plane telemetry techniques. 846 Active, Passive, and Hybrid: Active and passive methods (as well as 847 the hybrid types) are well documented in [RFC7799]. Passive 848 methods include TCPDUMP, IPFIX [RFC7011], sflow, and traffic 849 mirroring. These methods usually have low data coverage. The 850 bandwidth cost is very high in order to improve the data coverage. 851 On the other hand, active methods include Ping, OWAMP [RFC4656], 852 TWAMP [RFC5357], and Cisco's SLA Protocol [RFC6812]. These 853 methods are intrusive and only provide indirect network 854 measurement results. Hybrid methods, including in-situ OAM 855 [I-D.ietf-ippm-ioam-data], IPFPM [RFC8321], and Multipoint 856 Alternate Marking [I-D.fioccola-ippm-multipoint-alt-mark], provide 857 a well-balanced and more flexible approach. However, these 858 methods are also more complex to implement. 860 In-Band and Out-of-Band: The telemetry data, before being exported 861 to some collector, can be carried in user packets. Such methods 862 are considered in-band (e.g., in-situ OAM 863 [I-D.ietf-ippm-ioam-data]). If the telemetry data is directly 864 exported to some collector without modifying the user packets, 865 such methods are considered out-of-band (e.g., postcard-based 866 INT). It is possible to have hybrid methods. For example, only 867 the telemetry instruction or partial data is carried by user 868 packets (e.g., IPFPM [RFC8321]). 870 E2E and In-Network: Some E2E methods start from and end at the 871 network end hosts (e.g., Ping). The other methods work in 872 networks and are transparent to end hosts. However, if needed, 873 in-network methods can be easily extended into end hosts. 875 Information Type: Depending on the telemetry objective, the methods 876 can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), 877 path-based (e.g., Traceroute), and node-based (e.g., IPFIX 879 [RFC7011]). The various data objects can be packet, flow record, 880 measurement, states, and signal. 882 4.1.4. External Data Telemetry 884 Events that occur outside the boundaries of the network system are 885 another important source of network telemetry. Correlating both 886 internal telemetry data and external events with the requirements of 887 network systems, as presented in 888 [I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and 889 functional advantage to management operations. 891 As with other sources of telemetry information, the data and events 892 must meet strict requirements, especially in terms of timeliness, 893 which is essential to properly incorporate external event information 894 to management cycles. The specific challenges are described as 895 follows: 897 o The role of external event detector can be played by multiple 898 elements, including hardware (e.g. physical sensors, such as 899 seismometers) and software (e.g. Big Data sources that analyze 900 streams of information, such as Twitter messages). Thus, the 901 transmitted data must support different shapes but, at the same 902 time, follow a common but extensible schema. 904 o Since the main function of the external event detectors is to 905 perform the notifications, their timeliness is assumed. However, 906 once messages have been dispatched, they must be quickly collected 907 and inserted into the control plane with variable priority, which 908 will be high for important sources and/or important events and low 909 for secondary ones. 911 o The schema used by external detectors must be easily adopted by 912 current and future devices and applications. Therefore, it must 913 be easily mapped to current information models, such as in terms 914 of YANG. 916 Organizing together both internal and external telemetry information 917 will be key for the general exploitation of the management 918 possibilities of current and future network systems, as reflected in 919 the incorporation of cognitive capabilities to new hardware and 920 software (virtual) elements. 922 4.2. Second Level Function Components 924 Reflecting the best current practice, the telemetry module at each 925 plane is further partitioned into five distinct components: 927 Data Query, Analysis, and Storage: This component works at the 928 application layer. It is a part of the network management system 929 at the receiver side. On the one hand, it is responsible for 930 issuing data requirements. The data of interest can be modeled 931 data through configuration or custom data through programming. 932 The data requirements can be queries for one-shot data or 933 subscriptions for events or streaming data. On the other hand, it 934 receives, stores, and processes the returned data from network 935 devices. Data analysis can be interactive to initiate further 936 data queries. This component can reside in either network devices 937 or remote controllers. It can be centralized and distributed, and 938 involve one or more instances. 940 Data Configuration and Subscription: This component deploys data 941 queries on devices. It determines the protocol and channel for 942 applications to acquire desired data. This component is also 943 responsible for configuring the desired data that might not be 944 directly available form data sources. The subscription data can 945 be described by models, templates, or programs. 947 Data Encoding and Export: This component determines how telemetry 948 data are delivered to the data analysis and storage component. 949 The data encoding and the transport protocol may vary due to the 950 data exporting location. 952 Data Generation and Processing: The requested data needs to be 953 captured, processed, and formatted in network devices from raw 954 data sources. This may involve in-network computing and 955 processing on either the fast path or the slow path in network 956 devices. 958 Data Object and Source: This component determines the monitoring 959 object and original data source. The data source usually just 960 provides raw data which needs further processing. A data source 961 can be considered a probe. A probe can be statically installed or 962 dynamically installed. 964 +----------------------------------------+ 965 +----------------------------------------+ | 966 | | | 967 | Data Query, Analysis, & Storage | | 968 | | + 969 +-------+++ -----------------------------+ 970 ||| ^^^ 971 ||| ||| 972 ||V ||| 973 +--+V--------------------+++------------+ 974 +-----V---------------------+------------+ | 975 +---------------------+-------+----------+ | | 976 | Data Configuration | | | | 977 | & Subscription | Data Encoding | | | 978 | (model, template, | & Export | | | 979 | & program) | | | | 980 +---------------------+------------------| | | 981 | | | | 982 | Data Generation | | | 983 | & Processing | | | 984 | | | | 985 +----------------------------------------| | | 986 | | | | 987 | Data Object and Source | |-+ 988 | |-+ 989 +----------------------------------------+ 991 Figure 3: Components in the Network Telemetry Framework 993 4.3. Data Acquiring Mechanism and Type Abstraction 995 Broadly speaking, network data can be acquired through subscription 996 (push) and query (poll). Subscription is a contract between 997 publisher and subscriber. After initial setup, the subscribed data 998 is automatically delivered to registered subscribers until the 999 subscription expires. Subscription can be partitioned into two sub 1000 modes: the Publish-Subscription (Pub-Sub) mode and the Subscription- 1001 Publish (Sub-Pub) mode. In the Pub-Sub mode, a publisher publishes 1002 pre-defined data and any qualified subscribers can subscribe the data 1003 as-is. In the Sub-Pub mode, a subscriber initiates a data request 1004 and sends it to a publisher; the publisher will deliver the requested 1005 data when available. 1007 In contrast, query is used when a querier expects immediate and one- 1008 off feedback from network devices. The queried data may be directly 1009 extracted from some specific data source, or synthesized and 1010 processed from raw data. Query suits for interactive network 1011 telemetry applications. 1013 There are four types of data from network devices: 1015 Simple Data: The data that are steadily available from some data 1016 store or static probes in network devices. such data can be 1017 specified by YANG model. 1019 Complex Data: The data need to be synthesized or processed in 1020 network from raw data from one or more network devices. The data 1021 processing function can be statically or dynamically loaded into 1022 network devices. 1024 Event-triggered Data: The data are conditionally acquired based on 1025 the occurrence of some events. It can be actively pushed through 1026 subscription or passively polled through query. There are many 1027 ways to model events, including using Finite State Machine (FSM) 1028 or Event Condition Action (ECA) [I-D.wwx-netmod-event-yang]. 1030 Streaming Data: The data are continuously generated. It can be time 1031 series or the dump of databases. The streaming data reflect 1032 realtime network states and metrics and require large bandwidth 1033 and processing power. The streaming data are always actively 1034 pushed to the subscribers. 1036 The above data types are not mutually exclusive. Rather, they often 1037 overlap. For example, event-triggered data can be simple or complex, 1038 and streaming data can be simple, complex, or triggered by events. 1039 The relationships of these data types are illustrated in Figure 4. 1041 +--------------+ 1042 +------>| Simple Data |<------+ 1043 | +------------- + | 1044 | ^ | 1045 | | | 1046 | +------+-------+ | 1047 | +-->| Complex Data |<--+ | 1048 | | +--------------+ | | 1049 | | | | 1050 | | | | 1051 +-------+---+----------+ +-----+---+-------+ 1052 | Event-triggered Data |<----+ Streaming Data | 1053 +----------------------+ +-----------------+ 1055 Figure 4: Data Type Relationship 1057 Subscription usually deals with event-triggered data and streaming 1058 data, and query usually deals with simple data and complex data. But 1059 the other ways are also possible. The conventional OAM techniques 1060 are mostly about querying simple data. While these techniques are 1061 still useful, more advanced network telemetry techniques are designed 1062 mainly for event-triggered or streaming data subscription, and 1063 complex data query. 1065 4.4. Existing Works Mapped in the Framework 1067 The following two tables provide a non-exhaustive list of existing 1068 works (mainly published in IETF and with the emphasis on the latest 1069 new technologies) and shows their positions in the framework. More 1070 details can be found in Appendix A. 1072 The first table is based on the data acquiring mechanisms and data 1073 types. 1075 +-----------------+---------------+----------------+ 1076 | | Query | Subscription | 1077 | | | | 1078 +-----------------+---------------+----------------+ 1079 | Simple Data | SNMP, NETCONF,| SNMP, NETCONF | 1080 | | YANG, BMP, | YANG, gRPC | 1081 | | SMIv2, gRPC | | 1082 +-----------------+---------------+----------------+ 1083 | Complex Data | DNP, YANG FSM | DNP, YANG PUSH | 1084 | | gRPC, NETCONF | gPRC, NETCONF | 1085 +-----------------+---------------+----------------+ 1086 | Event-triggered | DNP, NETCONF, | gRPC, NETCONF, | 1087 | Data | YANG FSM | YANG PUSH, DNP | 1088 | | | YANG FSM | 1089 +-----------------+---------------+----------------+ 1090 | Streaming Data | | gRPC, NETCONF, | 1091 | | N/A | IOAM, PBT, DNP | 1092 | | | IPFIX, IPFPM | 1093 +-----------------+---------------+----------------+ 1095 Figure 5: Existing Work Mapping I 1097 The second table is based on the telemetry modules and components. 1099 +-------------+-----------------+---------------+--------------+ 1100 | | Management | Control | Forwarding | 1101 | | Plane | Plane | Plane | 1102 +-------------+-----------------+---------------+--------------+ 1103 | data config.| gRPC, NETCONF, | NETCONF/YANG | NETCONF/YANG,| 1104 | & subscribe | SMIv2,YANG PUSH | YANG PUSH | YANG PUSH | 1105 +-------------+-----------------+---------------+--------------+ 1106 | data gen. & | DNP, | DNP, | IOAM, PSAMP | 1107 | process | YANG | YANG | PBT, IPFPM, | 1108 | | | | DNP | 1109 +-------------+-----------------+---------------+--------------+ 1110 | data | gRPC, NETCONF | BMP, NETCONF | IPFIX | 1111 | export | YANG PUSH | | | 1112 +-------------+-----------------+---------------+--------------+ 1114 Figure 6: Existing Work Mapping II 1116 5. Evolution of Network Telemetry 1118 Network telemetry is a fast evolving technical area. As the network 1119 moves towards the automated operation, network telemetry undergoes 1120 several stages of evolution. Each stage is built upon the techniques 1121 enabled by previous stages. 1123 Stage 0 - Static Telemetry: The telemetry data source and type are 1124 determined at design time. The network operator can only 1125 configure how to use it with limited flexibility. 1127 Stage 1 - Dynamic Telemetry: The custom telemetry data can be 1128 dynamically programmed or configured at runtime without 1129 interrupting the network operation, allowing a tradeoff among 1130 resource, performance, flexibility, and coverage. DNP is an 1131 effort towards this direction. 1133 Stage 2 - Interactive Telemetry: The network operator can 1134 continuously customize and fine tune the telemetry data in real 1135 time to reflect the network operation's visibility requirements. 1136 Compared with Stage 1, the changes are frequent based on the real- 1137 time feedback. At this stage, some tasks can be automated, but 1138 human operators still need to sit in the middle to make decisions. 1140 Stage 3 - Closed-loop Telemetry: The telemetry is free from the 1141 interference of human operators, except for generating the 1142 reports. The intelligent network operation engine automatically 1143 issues the telemetry data requests, analyzes the data, and updates 1144 the network operations in closed control loops. 1146 The most of the existing technologies belong to stage 0 and stage 1. 1147 Individual stage 2 and stage 3 applications are also possible now. 1148 However, the future autonomic networks may need a comprehensive 1149 operation management system which relies on stage 2 and stage 3 1150 telemetry to cover all the network operation tasks. A well-defined 1151 network telemetry framework is the first step towards this direction. 1153 6. Security Considerations 1155 The complexity of network telemetry raises significant security 1156 implications. For example, telemetry data can be manipulated to 1157 exhaust various network resources at each plane as well as the data 1158 consumer; falsified or tampered data can mislead the decision making 1159 and paralyze networks; wrong configuration and programming for 1160 telemetry is equally harmful. 1162 Given that this document has proposed a framework for network 1163 telemetry and the telemetry mechanisms discussed are more extensive 1164 (in both message frequency and traffic amount) than the conventional 1165 network OAM concepts, we must also reflect that various new security 1166 considerations may also arise. A number of techniques already exist 1167 for securing the forwarding plane, the control plane, and the 1168 management plane in a network, but it is important to consider if any 1169 new threat vectors are now being enabled via the use of network 1170 telemetry procedures and mechanisms. 1172 Security considerations for networks that use telemetry methods may 1173 include: 1175 o Telemetry framework trust and policy model; 1177 o Role management and access control for enabling and disabling 1178 telemetry capabilities; 1180 o Protocol transport used telemetry data and inherent security 1181 capabilities; 1183 o Telemetry data stores, storage encryption and methods of access; 1185 o Tracking telemetry events and any abnormalities that might 1186 identify malicious attacks using telemetry interfaces. 1188 o Authentication and signing of telemetry data to make data more 1189 trustworthy. 1191 Some of the security considerations highlighted above may be 1192 minimized or negated with policy management of network telemetry. In 1193 a network telemetry deployment it would be advantageous to separate 1194 telemetry capabilities into different classes of policies, i.e., Role 1195 Based Access Control and Event-Condition-Action policies. Also, 1196 potential conflicts between network telemetry mechanisms must be 1197 detected accurately and resolved quickly to avoid unnecessary network 1198 telemetry traffic propagation escalating into an unintended or 1199 intended denial of service attack. 1201 Further study of the security issues will be required, and it is 1202 expected that the secuirty mechanisms and protocols are developed and 1203 deployed along with a network telemetry system. 1205 7. IANA Considerations 1207 This document includes no request to IANA. 1209 8. Contributors 1211 The other contributors of this document are listed as follows. 1213 o Tianran Zhou 1215 o Zhenbin Li 1217 o Zhenqiang Li 1219 o Daniel King 1221 o Adrian Farrel 1223 o Alexander Clemm 1225 9. Acknowledgments 1227 We would like to thank Greg Mirsky, Randy Presuhn, Joe Clarke, Victor 1228 Liu, James Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, 1229 Parviz Yegani, Young Lee, Qin Wu, and many others who have provided 1230 helpful comments and suggestions to improve this document. 1232 10. Informative References 1234 [gnmi] "gNMI - gRPC Network Management Interface", 1235 . 1238 [grpc] "gPPC, A high performance, open-source universal RPC 1239 framework", . 1241 [I-D.fioccola-ippm-multipoint-alt-mark] 1242 Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, 1243 "Multipoint Alternate Marking method for passive and 1244 hybrid performance monitoring", draft-fioccola-ippm- 1245 multipoint-alt-mark-04 (work in progress), June 2018. 1247 [I-D.ietf-grow-bmp-adj-rib-out] 1248 Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S. 1249 Zhuang, "Support for Adj-RIB-Out in BGP Monitoring 1250 Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-07 (work 1251 in progress), August 2019. 1253 [I-D.ietf-grow-bmp-local-rib] 1254 Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, 1255 "Support for Local RIB in BGP Monitoring Protocol (BMP)", 1256 draft-ietf-grow-bmp-local-rib-08 (work in progress), 1257 November 2020. 1259 [I-D.ietf-ippm-ioam-data] 1260 Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields 1261 for In-situ OAM", draft-ietf-ippm-ioam-data-11 (work in 1262 progress), November 2020. 1264 [I-D.ietf-netconf-distributed-notif] 1265 Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, 1266 "Subscription to Distributed Notifications", draft-ietf- 1267 netconf-distributed-notif-01 (work in progress), November 1268 2020. 1270 [I-D.ietf-netconf-udp-notif] 1271 Zheng, G., Zhou, T., Graf, T., Francois, P., and P. 1272 Lucente, "UDP-based Transport for Configured 1273 Subscriptions", draft-ietf-netconf-udp-notif-01 (work in 1274 progress), November 2020. 1276 [I-D.irtf-nmrg-ibn-concepts-definitions] 1277 Clemm, A., Ciavaglia, L., Granville, L., and J. Tantsura, 1278 "Intent-Based Networking - Concepts and Definitions", 1279 draft-irtf-nmrg-ibn-concepts-definitions-02 (work in 1280 progress), September 2020. 1282 [I-D.kumar-rtgwg-grpc-protocol] 1283 Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC 1284 Protocol", draft-kumar-rtgwg-grpc-protocol-00 (work in 1285 progress), July 2016. 1287 [I-D.openconfig-rtgwg-gnmi-spec] 1288 Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack, 1289 C., and C. Morrow, "gRPC Network Management Interface 1290 (gNMI)", draft-openconfig-rtgwg-gnmi-spec-01 (work in 1291 progress), March 2018. 1293 [I-D.pedro-nmrg-anticipated-adaptation] 1294 Martinez-Julia, P., "Exploiting External Event Detectors 1295 to Anticipate Resource Requirements for the Elastic 1296 Adaptation of SDN/NFV Systems", draft-pedro-nmrg- 1297 anticipated-adaptation-02 (work in progress), June 2018. 1299 [I-D.song-ippm-postcard-based-telemetry] 1300 Song, H., Zhou, T., Li, Z., Mirsky, G., Shin, J., and K. 1301 Lee, "Postcard-based On-Path Flow Data Telemetry using 1302 Packet Marking", draft-song-ippm-postcard-based- 1303 telemetry-08 (work in progress), October 2020. 1305 [I-D.song-opsawg-dnp4iq] 1306 Song, H. and J. Gong, "Requirements for Interactive Query 1307 with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 1308 (work in progress), June 2017. 1310 [I-D.song-opsawg-ifit-framework] 1311 Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- 1312 situ Flow Information Telemetry", draft-song-opsawg-ifit- 1313 framework-13 (work in progress), October 2020. 1315 [I-D.wwx-netmod-event-yang] 1316 WU, Q., Bryskin, I., Birkholz, H., Liu, X., and B. Claise, 1317 "A YANG Data model for ECA Policy Management", draft-wwx- 1318 netmod-event-yang-10 (work in progress), November 2020. 1320 [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, 1321 "Simple Network Management Protocol (SNMP)", RFC 1157, 1322 DOI 10.17487/RFC1157, May 1990, 1323 . 1325 [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. 1326 Schoenwaelder, Ed., "Structure of Management Information 1327 Version 2 (SMIv2)", STD 58, RFC 2578, 1328 DOI 10.17487/RFC2578, April 1999, 1329 . 1331 [RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981, 1332 DOI 10.17487/RFC2981, October 2000, 1333 . 1335 [RFC3416] Presuhn, R., Ed., "Version 2 of the Protocol Operations 1336 for the Simple Network Management Protocol (SNMP)", 1337 STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002, 1338 . 1340 [RFC3594] Duffy, P., "PacketCable Security Ticket Control Sub-Option 1341 for the DHCP CableLabs Client Configuration (CCC) Option", 1342 RFC 3594, DOI 10.17487/RFC3594, September 2003, 1343 . 1345 [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management 1346 Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, 1347 September 2004, . 1349 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 1350 Zekauskas, "A One-way Active Measurement Protocol 1351 (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, 1352 . 1354 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 1355 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 1356 RFC 5357, DOI 10.17487/RFC5357, October 2008, 1357 . 1359 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 1360 the Network Configuration Protocol (NETCONF)", RFC 6020, 1361 DOI 10.17487/RFC6020, October 2010, 1362 . 1364 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 1365 and A. Bierman, Ed., "Network Configuration Protocol 1366 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 1367 . 1369 [RFC6812] Chiba, M., Clemm, A., Medley, S., Salowey, J., Thombare, 1370 S., and E. Yedavalli, "Cisco Service-Level Assurance 1371 Protocol", RFC 6812, DOI 10.17487/RFC6812, January 2013, 1372 . 1374 [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, 1375 "Specification of the IP Flow Information Export (IPFIX) 1376 Protocol for the Exchange of Flow Information", STD 77, 1377 RFC 7011, DOI 10.17487/RFC7011, September 2013, 1378 . 1380 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 1381 Weingarten, "An Overview of Operations, Administration, 1382 and Maintenance (OAM) Tools", RFC 7276, 1383 DOI 10.17487/RFC7276, June 2014, 1384 . 1386 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 1387 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 1388 DOI 10.17487/RFC7540, May 2015, 1389 . 1391 [RFC7575] Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., 1392 Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic 1393 Networking: Definitions and Design Goals", RFC 7575, 1394 DOI 10.17487/RFC7575, June 2015, 1395 . 1397 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 1398 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 1399 May 2016, . 1401 [RFC7854] Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP 1402 Monitoring Protocol (BMP)", RFC 7854, 1403 DOI 10.17487/RFC7854, June 2016, 1404 . 1406 [RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, 1407 L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 1408 "Alternate-Marking Method for Passive and Hybrid 1409 Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, 1410 January 2018, . 1412 [RFC8639] Voit, E., Clemm, A., Gonzalez Prieto, A., Nilsen-Nygaard, 1413 E., and A. Tripathy, "Subscription to YANG Notifications", 1414 RFC 8639, DOI 10.17487/RFC8639, September 2019, 1415 . 1417 [RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications 1418 for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641, 1419 September 2019, . 1421 Appendix A. A Survey on Existing Network Telemetry Techniques 1423 In this non-normative appendix, we provide an overview of some 1424 existing techniques and standard proposals for each network telemetry 1425 module. 1427 A.1. Management Plane Telemetry 1429 A.1.1. Push Extensions for NETCONF 1431 NETCONF [RFC6241] is one popular network management protocol, which 1432 is also recommended by IETF. Although it can be used for data 1433 collection, NETCONF is good at configurations. YANG Push [RFC8641] 1434 [RFC8639] extends NETCONF and enables subscriber applications to 1435 request a continuous, customized stream of updates from a YANG 1436 datastore. Providing such visibility into changes made upon YANG 1437 configuration and operational objects enables new capabilities based 1438 on the remote mirroring of configuration and operational state. 1440 Moreover, distributed data collection mechanism 1441 [I-D.ietf-netconf-distributed-notif] via UDP based publication 1442 channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for 1443 the NETCONF based telemetry. 1445 A.1.2. gRPC Network Management Interface 1447 gRPC Network Management Interface (gNMI) 1448 [I-D.openconfig-rtgwg-gnmi-spec] is a network management protocol 1449 based on the gRPC [I-D.kumar-rtgwg-grpc-protocol] RPC (Remote 1450 Procedure Call) framework. With a single gRPC service definition, 1451 both configuration and telemetry can be covered. gRPC is an HTTP/2 1452 [RFC7540] based open source micro service communication framework. 1453 It provides a number of capabilities which are well-suited for 1454 network telemetry, including: 1456 o Full-duplex streaming transport model combined with a binary 1457 encoding mechanism provided further improved telemetry efficiency. 1459 o gRPC provides higher-level features consistency across platforms 1460 that common HTTP/2 libraries typically do not. This 1461 characteristic is especially valuable for the fact that telemetry 1462 data collectors normally reside on a large variety of platforms. 1464 o The built-in load-balancing and failover mechanism. 1466 A.2. Control Plane Telemetry 1468 A.2.1. BGP Monitoring Protocol 1470 BGP Monitoring Protocol (BMP) [RFC7854] is used to monitor BGP 1471 sessions and intended to provide a convenient interface for obtaining 1472 route views. 1474 The BGP routing information is collected from the monitored device(s) 1475 to the BMP monitoring station by setting up the BMP TCP session. The 1476 BGP peers are monitored by the BMP Peer Up and Peer Down 1477 Notifications. The BGP routes (including Adjacency_RIB_In [RFC7854], 1478 Adjacency_RIB_out [I-D.ietf-grow-bmp-adj-rib-out], and Local_Rib 1479 [I-D.ietf-grow-bmp-local-rib] are encapsulated in the BMP Route 1480 Monitoring Message and the BMP Route Mirroring Message, in the form 1481 of both initial table dump and real-time route update. In addition, 1482 BGP statistics are reported through the BMP Stats Report Message, 1483 which could be either timer triggered or event-driven. More BMP 1484 extensions can be explored to enrich the applications of BGP 1485 monitoring. 1487 A.3. Data Plane Telemetry 1489 A.3.1. The Alternate Marking technology 1491 The Alternate Marking method is efficient to perform packet loss, 1492 delay, and jitter measurements both in an IP and Overlay Networks, as 1493 presented in [RFC8321] and [I-D.fioccola-ippm-multipoint-alt-mark]. 1495 This technique can be applied to point-to-point and multipoint-to- 1496 multipoint flows. Alternate Marking creates batches of packets by 1497 alternating the value of 1 bit (or a label) of the packet header. 1498 These batches of packets are unambiguously recognized over the 1499 network and the comparison of packet counters for each batch allows 1500 the packet loss calculation. The same idea can be applied to delay 1501 measurement by selecting ad hoc packets with a marking bit dedicated 1502 for delay measurements. 1504 Alternate Marking method needs two counters each marking period for 1505 each flow under monitor. For instance, by considering n measurement 1506 points and m monitored flows, the order of magnitude of the packet 1507 counters for each time interval is n*m*2 (1 per color). 1509 Since networks offer rich sets of network performance measurement 1510 data (e.g packet counters), traditional approaches run into 1511 limitations. One reason is the fact that the bottleneck is the 1512 generation and export of the data and the amount of data that can be 1513 reasonably collected from the network. In addition, management tasks 1514 related to determining and configuring which data to generate lead to 1515 significant deployment challenges. 1517 Multipoint Alternate Marking approach, described in 1518 [I-D.fioccola-ippm-multipoint-alt-mark], aims to resolve this issue 1519 and makes the performance monitoring more flexible in case a detailed 1520 analysis is not needed. 1522 An application orchestrates network performance measurements tasks 1523 across the network to allow an optimized monitoring and it can 1524 calibrate how deep can be obtained monitoring data from the network 1525 by configuring measurement points roughly or meticulously. 1527 Using Alternate Marking, it is possible to monitor a Multipoint 1528 Network without examining in depth by using the Network Clustering 1529 (subnetworks that are portions of the entire network that preserve 1530 the same property of the entire network, called clusters). So in 1531 case there is packet loss or the delay is too high the filtering 1532 criteria could be specified more in order to perform a detailed 1533 analysis by using a different combination of clusters up to a per- 1534 flow measurement as described in IPFPM [RFC8321]. 1536 In summary, an application can configure end-to-end network 1537 monitoring. If the network does not experiment issues, this 1538 approximate monitoring is good enough and is very cheap in terms of 1539 network resources. However, in case of problems, the application 1540 becomes aware of the issues from this approximate monitoring and, in 1541 order to localize the portion of the network that has issues, 1542 configures the measurement points more exhaustively. So a new 1543 detailed monitoring is performed. After the detection and resolution 1544 of the problem the initial approximate monitoring can be used again. 1546 A.3.2. Dynamic Network Probe 1548 Hardware-based Dynamic Network Probe (DNP) [I-D.song-opsawg-dnp4iq] 1549 provides a programmable means to customize the data that an 1550 application collects from the data plane. A direct benefit of DNP is 1551 the reduction of the exported data. A full DNP solution covers 1552 several components including data source, data subscription, and data 1553 generation. The data subscription needs to define the complex data 1554 which can be composed and derived from the raw data sources. The 1555 data generation takes advantage of the moderate in-network computing 1556 to produce the desired data. 1558 While DNP can introduce unforeseeable flexibility to the data plane 1559 telemetry, it also faces some challenges. It requires a flexible 1560 data plane that can be dynamically reprogrammed at run-time. The 1561 programming API is yet to be defined. 1563 A.3.3. IP Flow Information Export (IPFIX) protocol 1565 Traffic on a network can be seen as a set of flows passing through 1566 network elements. IP Flow Information Export (IPFIX) [RFC7011] 1567 provides a means of transmitting traffic flow information for 1568 administrative or other purposes. A typical IPFIX enabled system 1569 includes a pool of Metering Processes collects data packets at one or 1570 more Observation Points, optionally filters them and aggregates 1571 information about these packets. An Exporter then gathers each of 1572 the Observation Points together into an Observation Domain and sends 1573 this information via the IPFIX protocol to a Collector. 1575 A.3.4. In-Situ OAM 1577 Traditional passive and active monitoring and measurement techniques 1578 are either inaccurate or resource-consuming. It is preferable to 1579 directly acquire data associated with a flow's packets when the 1580 packets pass through a network. In-situ OAM (iOAM) 1581 [I-D.ietf-ippm-ioam-data], a data generation technique, embeds a new 1582 instruction header to user packets and the instruction directs the 1583 network nodes to add the requested data to the packets. Thus, at the 1584 path end, the packet's experience gained on the entire forwarding 1585 path can be collected. Such firsthand data is invaluable to many 1586 network OAM applications. 1588 However, iOAM also faces some challenges. The issues on performance 1589 impact, security, scalability and overhead limits, encapsulation 1590 difficulties in some protocols, and cross-domain deployment need to 1591 be addressed. 1593 A.3.5. Postcard Based Telemetry 1595 PBT [I-D.song-ippm-postcard-based-telemetry] is an alternative to 1596 IOAM. PBT directly exports data at each node through an independent 1597 packet. PBT solves several issues of IOAM. It can also help to 1598 identify packet drop location in case a packet is dropped on its 1599 forwarding path. 1601 A.4. External Data and Event Telemetry 1603 A.4.1. Sources of External Events 1605 To ensure that the information provided by external event detectors 1606 and used by the network management solutions is meaningful for the 1607 management purposes, the network telemetry framework must ensure that 1608 such detectors (sources) are easily connected to the management 1609 solutions (sinks). This requires the specification of a simple 1610 taxonomy of detectors and match it to the connectors and/or 1611 interfaces required to connect them. 1613 Once detectors are classified in such taxonomy, their definitions are 1614 enlarged with the qualities and other aspects used to handle them and 1615 represented in the ontology and information model (e.g. YANG). 1616 Therefore, differentiating several types of detectors as potential 1617 sources of external events is essential for the integrity of the 1618 management framework. We thus differentiate the following source 1619 types of external events: 1621 o Smart objects and sensors. With the consolidation of the Internet 1622 of Things~(IoT) any network system will have many smart objects 1623 attached to its physical surroundings and logical operation 1624 environments. Most of these objects will be essentially based on 1625 sensors of many kinds (e.g. temperature, humidity, presence) and 1626 the information they provide can be very useful for the management 1627 of the network, even when they are not specifically deployed for 1628 such purpose. Elements of this source type will usually provide a 1629 specific protocol for interaction, especially one of those 1630 protocols related to IoT, such as the Constrained Application 1631 Protocol (CoAP). It will be used by the telemetry framework to 1632 interact with the relevant objects. 1634 o Online news reporters. Several online news services have the 1635 ability to provide enormous quantity of information about 1636 different events occurring in the world. Some of those events can 1637 impact on the network system managed by a specific framework and, 1638 therefore, it will be interested on getting such information. For 1639 instance, diverse security reports, such as the Common 1640 Vulnerabilities and Exposures (CVE), can be issued by the 1641 corresponding authority and used by the management solution to 1642 update the managed system if needed. Instead of a specific 1643 protocol and data format, the sources of this kind of information 1644 usually follow a relaxed but structured format. This format will 1645 be part of both the ontology and information model of the 1646 telemetry framework. 1648 o Global event analyzers. The advance of Big Data analyzers 1649 provides a huge amount of information and, more interestingly, the 1650 identification of events detected by analyzing many data streams 1651 from different origins. In contrast with the other types of 1652 sources, which are focused in specific events, the detectors of 1653 this source type will detect very generic events. For example, a 1654 sports event takes place and some unexpected movement makes it 1655 highly interesting and many people connects to sites that are 1656 covering such event. The systems supporting the services that 1657 cover the event can be affected by such situation so their 1658 management solutions should be aware of it. In contrast with the 1659 other source types, a new information model, format, and reporting 1660 protocol is required to integrate the detectors of this type with 1661 the management solution. 1663 Additional types of detector types can be added to the system but 1664 they will be generally the result of composing the properties offered 1665 by these main classes. In any case, future revisions of the network 1666 telemetry framework will include the required types that cover new 1667 circumstances and that cannot be obtained by composition. 1669 A.4.2. Connectors and Interfaces 1671 For allowing external event detectors to be properly integrated with 1672 other management solutions, both elements must expose interfaces and 1673 protocols that are subject to their particular objective. Since 1674 external event detectors will be focused on providing their 1675 information to their main consumers, which generally will not be 1676 limited to the network management solutions, the framework must 1677 include the definition of the required connectors for ensuring the 1678 interconnection between detectors (sources) and their consumers 1679 within the management systems (sinks) are effective. 1681 In some situations, the interconnection between the external event 1682 detectors and the management system is via the management plane. For 1683 those situations there will be a special connector that provides the 1684 typical interfaces found in most other elements connected to the 1685 management plane. For instance, the interfaces will accomplish with 1686 a specific information model (YANG) and specific telemetry protocol, 1687 such as NETCONF, SNMP, or gRPC. 1689 Authors' Addresses 1691 Haoyu Song 1692 Futurewei 1693 2330 Central Expressway 1694 Santa Clara 1695 USA 1697 Email: hsong@futurewei.com 1699 Fengwei Qin 1700 China Mobile 1701 No. 32 Xuanwumenxi Ave., Xicheng District 1702 Beijing, 100032 1703 P.R. China 1705 Email: qinfengwei@chinamobile.com 1707 Pedro Martinez-Julia 1708 NICT 1709 4-2-1, Nukui-Kitamachi 1710 Koganei, Tokyo 184-8795 1711 Japan 1713 Email: pedro@nict.go.jp 1715 Laurent Ciavaglia 1716 Nokia 1717 Villarceaux 91460 1718 France 1720 Email: laurent.ciavaglia@nokia.com 1721 Aijun Wang 1722 China Telecom 1723 Beiqijia Town, Changping District 1724 Beijing, 102209 1725 P.R. China 1727 Email: wangaj.bri@chinatelecom.cn