idnits 2.17.1 draft-song-opsawg-ifit-framework-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (November 4, 2019) is 1633 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 798 == Outdated reference: A later version (-03) exists of draft-herbert-ipv4-eh-01 == Outdated reference: A later version (-09) exists of draft-ietf-ippm-multipoint-alt-mark-02 == Outdated reference: A later version (-07) exists of draft-kumar-ippm-ifa-01 == Outdated reference: A later version (-15) exists of draft-mirsky-ippm-hybrid-two-step-04 == Outdated reference: A later version (-16) exists of draft-song-ippm-postcard-based-telemetry-06 == Outdated reference: A later version (-13) exists of draft-song-mpls-extension-header-02 == Outdated reference: A later version (-14) exists of draft-zhou-ippm-enhanced-alternate-marking-04 Summary: 1 error (**), 0 flaws (~~), 9 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 OPSAWG H. Song, Ed. 3 Internet-Draft Futurewei 4 Intended status: Informational F. Qin 5 Expires: May 7, 2020 China Mobile 6 H. Chen 7 China Telecom 8 J. Jin 9 LG U+ 10 J. Shin 11 SK Telecom 12 November 4, 2019 14 In-situ Flow Information Telemetry 15 draft-song-opsawg-ifit-framework-07 17 Abstract 19 For efficient network operation, most network operators rely on 20 traditional Operation, Administration and Maintenance (OAM) methods, 21 which include proactive and reactive techniques, running in active 22 and passive modes. As networks increase in scale, they become more 23 susceptible to measurement accuracy and misconfiguration errors. 25 With the advent of programmable data-plane, emerging on-path 26 telemetry techniques provide unprecedented flow insight and fast 27 notification of network issues (e.g., jitter, increased latency, 28 packet loss, significant bit error variations, and unequal load- 29 balancing). 31 This document outlines an In-situ Flow Information Telemetry (iFIT) 32 reference framework, which enumerates several high level components 33 and describes how these components can be assembled to achieve a 34 complete and closed-loop working solution for on-path telemetry. 36 iFIT addresses several deployment challenges for on-path telemetry 37 techniques, especially in carrier networks. As an open framework, it 38 does not detail the implementation of the components as well as the 39 interface between the components. 41 Requirements Language 43 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 44 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 45 "OPTIONAL" in this document are to be interpreted as described in BCP 46 14 [RFC2119][RFC8174] when, and only when, they appear in all 47 capitals, as shown here. 49 Status of This Memo 51 This Internet-Draft is submitted in full conformance with the 52 provisions of BCP 78 and BCP 79. 54 Internet-Drafts are working documents of the Internet Engineering 55 Task Force (IETF). Note that other groups may also distribute 56 working documents as Internet-Drafts. The list of current Internet- 57 Drafts is at https://datatracker.ietf.org/drafts/current/. 59 Internet-Drafts are draft documents valid for a maximum of six months 60 and may be updated, replaced, or obsoleted by other documents at any 61 time. It is inappropriate to use Internet-Drafts as reference 62 material or to cite them other than as "work in progress." 64 This Internet-Draft will expire on May 7, 2020. 66 Copyright Notice 68 Copyright (c) 2019 IETF Trust and the persons identified as the 69 document authors. All rights reserved. 71 This document is subject to BCP 78 and the IETF Trust's Legal 72 Provisions Relating to IETF Documents 73 (https://trustee.ietf.org/license-info) in effect on the date of 74 publication of this document. Please review these documents 75 carefully, as they describe your rights and restrictions with respect 76 to this document. Code Components extracted from this document must 77 include Simplified BSD License text as described in Section 4.e of 78 the Trust Legal Provisions and are provided without warranty as 79 described in the Simplified BSD License. 81 Table of Contents 83 1. Requirements and Challenges . . . . . . . . . . . . . . . . . 3 84 2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 5 85 3. iFIT Framework Overview . . . . . . . . . . . . . . . . . . . 6 86 3.1. Passport vs. Postcard . . . . . . . . . . . . . . . . . . 7 87 4. Architectural Components of iFIT . . . . . . . . . . . . . . 8 88 4.1. Smart Flow and Data Selection . . . . . . . . . . . . . . 8 89 4.1.1. Example: Sketch-guided Elephant Flow Selection . . . 9 90 4.1.2. Example: Adaptive Packet Sampling . . . . . . . . . . 9 91 4.2. Smart Data Export . . . . . . . . . . . . . . . . . . . . 9 92 4.2.1. Example: Event-based Anomaly Monitor . . . . . . . . 10 93 4.3. Dynamic Network Probe . . . . . . . . . . . . . . . . . . 10 94 4.3.1. Examples . . . . . . . . . . . . . . . . . . . . . . 11 95 4.4. Encapsulation and Tunneling . . . . . . . . . . . . . . . 11 96 4.5. On-demand Technique Selection and Integration . . . . . . 12 98 5. iFIT Closed-Loop Architecture . . . . . . . . . . . . . . . . 12 99 5.1. Example: Intelligent Multipoint Performance Monitoring . 14 100 5.2. Example: Intent-based Network Monitoring . . . . . . . . 14 101 6. Summary and Future Work . . . . . . . . . . . . . . . . . . . 15 102 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15 103 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 104 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 15 105 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 106 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 107 11.1. Normative References . . . . . . . . . . . . . . . . . . 16 108 11.2. Informative References . . . . . . . . . . . . . . . . . 16 109 11.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 18 110 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 112 1. Requirements and Challenges 114 The sheer complexity of today's networks requires radical rethinking 115 of existing methods used for network monitoring and troubleshooting. 116 Current dynamic networks require "on-path" fault monitoring and 117 traffic measurement solutions for a wide range of use cases which 118 include intelligent management of existing network traffic, and 119 better traffic visibility of emerging applications such as large 120 scale Virtual Server (VS) mobility, fluid content distribution, and 121 elastic bandwidth allocation. 123 Furthermore, the ability to expedite failure detection, fault 124 localization, and recovery mechanisms, particularly in the case of 125 soft failures or path degradation are experienced, without causing 126 extreme or obvious disruption. This is extremely important for since 127 these types of network issues are often difficult to localize with 128 existing Operation, Administration and Maintenance (OAM) methods and 129 reduce overall network efficiency. 131 Future networks must also support application-aware networking. 132 Application-aware networking is an emerging industry term and 133 typically used to describe the capacity of an intelligent network to 134 maintain current information about user and application connections 135 that use network resources and, as a result, the operator can 136 optimize the network resource usage and monitoring to ensure 137 application and traffic optimality. 139 Application-aware network operation is important for user SLA 140 compliance, service path enforcement, fault diagnosis, and network 141 resource optimization. A family of on-path flow telemetry 142 techniques, including In-situ OAM (IOAM) 143 [I-D.brockners-inband-oam-data], Postcard Based Telemetry (PBT) 144 [I-D.song-ippm-postcard-based-telemetry], In-band Flow Analyzer (IFA) 145 [I-D.kumar-ippm-ifa], Enhanced Alternate Marking (EAM) 147 [I-D.zhou-ippm-enhanced-alternate-marking], and Hybrid Two Steps 148 (HTS) [I-D.mirsky-ippm-hybrid-two-step], are emerging, which can 149 provide flow information on the entire forwarding path on a per- 150 packet basis in real time. These on-path flow telemetry techniques 151 are very different from the previous active and passive OAM schemes 152 in that they directly modify the user packets. Given the unique 153 characteristics of the aforementioned techniques, we may categorize 154 these on-path telemetry techniques as the hybrid OAM type III, 155 supplementing the classification defined in [RFC7799]. 157 These techniques are invaluable for application-aware network 158 operations not only in data center and enterprise networks but also 159 in carrier networks which may cross multiple domains. Carrier 160 network operators have shown strong interest in utilizing such 161 techniques for various purposes. For example, it is vital for the 162 operators who offer bandwidth intensive, latency and loss sensitive 163 services such as video streaming and gaming to closely monitor the 164 relevant flows in real time as the indispensable first step for any 165 further measure. 167 However, successfully applying such techniques in carrier networks 168 needs to consider performance, deployability, and flexibility. 169 Specifically, several practical challenges need to be addressed: 171 o C1: On-path flow telemetry incurs extra packet processing which 172 may strain the network data plane. The potential impact on the 173 forwarding performance creates an unfavorable "observer effect" 174 which not only damages the fidelity of the measurement but also 175 defies the purpose of the measurement. 177 o C2: On-path flow telemetry can generate a huge amount of OAM data 178 which may claim too much transport bandwidth and inundate the 179 servers for data collection, storage, and analysis. Increasing 180 the data handling capacity is technically viable but expensive. 181 For example, assume IOAM is applied to all the traffic. One node 182 will collect a few tens of bytes as telemetry data for each 183 packet. The whole forwarding path might accumulate a data trace 184 with a size similar to the average size of the original packets. 185 Exporting the telemetry data will consume almost half of the 186 network bandwidth. 188 o C3: The collectible data defined currently are essential but 189 limited. As the network operation evolves to be declarative 190 (intent-based) and automated, and the trends of network 191 virtualization, network convergence, and packet-optical 192 integration continue, more data will be needed in an on-demand and 193 interactive fashion. Flexibility and extensibility on data 194 defining, acquisition, and filtering, must be considered. 196 o C4: If we were to apply some on-path telemetry technique in 197 today's carrier networks, we must provide solutions to tailor the 198 provider's network deployment base and support an incremental 199 deployment strategy. That is, we need to support established 200 encapsulation schemes for various predominant protocols such as 201 Ethernet, IPv4, and MPLS with backward compatibility and properly 202 handle various transport tunnels. 204 o C5: Applying only a single underlying telemetry technique may lead 205 to defective result. For example, packet drop can cause the loss 206 of the flow telemetry data and the packet drop location and reason 207 remains unknown if only In-situ OAM trace option is used. A 208 comprehensive solution needs the flexibility to switch between 209 different underlying techniques and adjust the configurations and 210 parameters at runtime. 212 o C6: Development of simplified on-path telemetry primitives and 213 models, including: telemetry data (e.g., nodes, links, ports, 214 paths, flows, timestamps) query primitives. These may be used by 215 an API-based telemetry service for external applications, for 216 monitoring end-to-end latency measurement of network paths and 217 application latency calculation. 219 2. Glossary 221 This section defines and explains some terms used in this document. 223 On-path Telemetry: Acquiring data about a packets on its forwarding 224 path. The term refers to a class of data plane telemetry 225 techniques which collect data about user flows and packets along 226 their forwarding paths. IOAM, PBT, IFA, EAM, and HTS are all on- 227 path telemetry techniques. Such techniques may need to mark user 228 packets, or insert instruction or data to the headers of user 229 packets. 231 iFIT: In-situ Flow Information Telemetry 233 iFIT Framework: A reference framework that supports network OAM 234 applications to apply dataplane on-path telemetry techniques. 236 iFIT Application: A network OAM application that applies the iFIT 237 framework. 239 iFIT Domain: The network domain that participates in an iFIT 240 application. 242 iFIT Node: A network node that is in an iFIT domain and is capable 243 of iFIT-specific functions. 245 iFIT Head Node: The entry node to an iFIT domain. Usually the 246 instruction header encapsulation, if needed, happens here. 248 iFIT End Node: The exit node of an iFIT domain. Usually the 249 instruction header decapsulation, if needed, happens here. 251 3. iFIT Framework Overview 253 To address the aforementioned challenges, we propose an architectural 254 framework based on multiple network operators' requirements and 255 common industry practice, which can help to build a workable on-path 256 flow telemetry solution. We name the framework "In-situ Flow 257 Information Telemetry" (iFIT) to reflect the fact that this framework 258 is dedicated to the on-path telemetry data about user/application 259 flow experience. As an architectural framework for building a 260 complete solution, iFIT works a level higher than specific data plane 261 OAM techniques, be it active, passive, or hybrid. The framework is 262 built up on a few high level architectural components (Section 4). 263 By assembling these components, a closed-loop can be formed to 264 provide a complete solution for static, dynamic, and interactive 265 telemetry applications (Section 5). 267 iFIT is an open framework. It does not enforce any specific 268 implementation on each component, neither does it define interfaces 269 (e.g., API, protocol) between components. The choice of underlying 270 on-path telemetry techniques and other implementation details is 271 determined by application implementer. 273 The network architecture that applies iFIT is shown in Figure 1. The 274 iFIT domain is confined between the iFIT head nodes and the iFIT end 275 nodes. An iFIT domain may cross multiple network domains. An iFIT 276 application uses a controller to configure all the iFIT nodes. The 277 configuration determines what telemetry data are collected. After 278 the telemetry data processing and analyzing, the iFIT application may 279 instruct the controller to modify the iFIT node configuration and 280 affect the future telemetry data collection. How applications 281 communicate with the controller is out of scope for this document 283 iFIT supports two basic on-path telemetry data collection modes: 284 passport mode (e.g., IOAM trace option and IFA), in which telemetry 285 data are carried in user packets and exported at the iFIT end nodes, 286 and postcard mode (e.g., PBT), in which each node in the iFIT domain 287 may export telemetry data through independent OAM packets. Note that 288 the boundary between the two modes can be blurry. An application 289 only need to mix the two modes. 291 +-------------------------------------+ 292 | iFIT Application | 293 | +------------+ +-----------+ | 294 | | | | | | 295 | | Controller |<-------| Collector | | 296 | | | | | | 297 | +-----:------+ +-----------+ | 298 | : ^ | 299 +-------:---------------------|-------+ 300 :configuration |telemetry data 301 : | 302 ...............:.....................|.......... 303 : : : | : 304 : +---------:---+-------------:---++---------:---+ 305 : | : | : | : | 306 V | V | V | V | 307 +------+-+ +-----+--+ +------+-+ +------+-+ 308 packets| iFIT | | Path | | Path | | iFIT | 309 ==>| Head |====>| Node |==//==>| Node |====>| End |==> 310 | Node | | A | | B | | Node | 311 +--------+ +--------+ +--------+ +--------+ 313 |<--- iFIT Domain --->| 315 Figure 1: iFIT Network Architecture 317 3.1. Passport vs. Postcard 319 [passport-postcard] first uses the analogy of passport and postcard 320 to describe how the packet trace data can be collected and exported. 321 In the passport mode, each node on the path adds the telemetry data 322 to the user packets. The accumulated data trace is exported at a 323 configured end node. In the postcard mode, each node directly 324 exports the telemetry data using an independent packet while the user 325 packets are intact. 327 A prominent advantage of the passport mode is that it naturally 328 retains the telemetry data correlation along the entire path. The 329 passport mode also reduces the number of data export packets and the 330 bandwidth consumed by the data export packets. These can help to 331 make the data collector and analyzer's work easier. On the other 332 hand, the passport mode requires more processing on the user packets 333 and increases the size of user packets, which can cause various 334 problems. Some other issues are documented in 335 [I-D.song-ippm-postcard-based-telemetry]. 337 The postcard mode provides a perfect complement to the passport mode. 338 It addresses most of the issues faced by the passport mode, at a cost 339 of needing extra effort to correlate the postcard packets. 341 4. Architectural Components of iFIT 343 The high level components of iFIT are listed as follows: 345 o Smart flow and data selection policy to address the challenge C1 346 described in Section 1. 348 o Smart data export to address the challenge C2. 350 o Dynamic network probe to address C3. 352 o Encapsulation and tunneling to address C4. 354 o On-demand technique selection and integration to address C5. 356 Note that this document does not directly address the challenge C6 357 which is left to be a concern for iFIT application implementers. 359 Next we provide a detailed description of each component. 361 4.1. Smart Flow and Data Selection 363 In most cases, it is impractical to enable the data collection for 364 all the flows and for all the packets in a flow due to the potential 365 performance and bandwidth impact. Therefore, a workable solution 366 must select only a subset of flows and flow packets to enable the 367 data collection, even though this means the loss of some information. 369 In the data plane, the Access Control List (ACL) provides an ideal 370 means to determine the subset of flow(s). 371 [I-D.song-ippm-ioam-data-validation-option] describes how one can set 372 a sample rate or probability to a flow to allow only a subset of flow 373 packets to be monitored, how one can collect a different set of data 374 for different packets, and how one can disable or enable data 375 collection on any specific network node. The document further 376 introduces an enhancement to IOAM to allow any node to accept or deny 377 the data collection in full or partially. 379 Based on these flexible mechanisms, iFIT allows applications to apply 380 smart flow and data selection policies to suit the requirements. The 381 applications can dynamically change the policies at any time based on 382 the network load, processing capability, focus of interest, and any 383 other criteria. 385 4.1.1. Example: Sketch-guided Elephant Flow Selection 387 Network operators are usually more interested in elephant flows which 388 consume more resource and are sensitive to changes in network 389 conditions. A CountMin Sketch [CMSketch] can be used on the data 390 path of the head nodes, which identifies and reports the elephant 391 flows periodically. The controller maintains a current set of 392 elephant flows and dynamically enables the on-path telemetry for only 393 these flows. 395 4.1.2. Example: Adaptive Packet Sampling 397 Applying on-path telemetry on all packets of selected flows can still 398 be out of reach. A sample rate should be set for these flows and 399 only enable telemetry on the sampled packets. However, the head 400 nodes have no clue on the proper sampling rate. An overly high rate 401 would exhaust the network resource and even cause packet drops; An 402 overly low rate, on the contrary, would result in the loss of 403 information and inaccuracy of measurements. 405 An adaptive approach can be used based on the network conditions to 406 dynamically adjust the sampling rate. Every node gives user traffic 407 forwarding higher priority than telemetry data export. In case of 408 network congestion, the telemetry can sense some signals from the 409 data collected (e.g., deep buffer size, long delay, packet drop, and 410 data loss). The controller may use these signals to adjust the 411 packet sampling rate. In each adjustment period (i.e., RTT of the 412 feedback loop), the sampling rate is either decreased or increased in 413 response of the signals. An AIMD policy similar to the TCP flow 414 control mechanism for the rate adjustment can be used. 416 4.2. Smart Data Export 418 The flow telemetry data can catch the dynamics of the network and the 419 interactions between user traffic and network. Nevertheless, the 420 data inevitably contain redundancy. It is advisable to remove the 421 redundancy from the data in order to reduce the data transport 422 bandwidth and server processing load. 424 In addition to efficient export data encoding (e.g., IPFIX [RFC7011] 425 or protobuf [1]), iFIT nodes have several other ways to reduce the 426 export data by taking advantage of network device's capability and 427 programmability. iFIT nodes can cache the data and send the 428 accumulated data in batch if the data is not time sensitive. Various 429 deduplication and compression techniques can be applied on the batch 430 data. 432 From the application perspective, an application may only be 433 interested in some special events which can be derived from the 434 telemetry data. For example, in case that the forwarding delay of a 435 packet exceeds a threshold, or a flow changes its forwarding path is 436 of interest, it is unnecessary to send the original raw data to the 437 data collecting and processing servers. Rather, iFIT takes advantage 438 of the in-network computing capability of network devices to process 439 the raw data and only push the event notifications to the subscribing 440 applications. 442 Such events can be expressed as policies. An policy can request data 443 export only on change, on exception, on timeout, or on threshold. 445 4.2.1. Example: Event-based Anomaly Monitor 447 Network operators are interested in the anomalies such as path 448 change, network congestion, and packet drop. Such anomalies are 449 hidden in raw telemetry data (e.g., path trace, timestamp). Such 450 anomalies can be described as events and programmed into the device 451 data plane. Only the triggered events are exported. For example, if 452 a new flow appears at any node, a path change event is triggered; if 453 the packet delay exceeds a predefined threshold in a node, the 454 congestion event is triggered; if a packet is dropped due to buffer 455 overflow, a packet drop event is triggered. 457 The export data reduction due to such optimization is substantial. 458 For example, given a single 5-hop 10Gbps path, assume a moderate 459 number of 1 million packets per second are monitored, and the 460 telemetry data plus the export packet overhead consume less than 30 461 bytes per hop. Without such optimization, the bandwidth consumed by 462 the telemetry data can easily exceed 1Gbps (>10% of the path 463 bandwidth), When the optimization is used, the bandwidth consumed by 464 the telemetry data is negligible. Moreover, the pre-processed 465 telemetry data greatly simplify the work of data analyzers. 467 4.3. Dynamic Network Probe 469 Due to limited data plane resource and network bandwidth, it is 470 unlikely one can monitor all the data all the time. On the other 471 hand, the data needed by applications may be arbitrary but ephemeral. 472 It is critical to meet the dynamic data requirements with limited 473 resource. 475 Fortunately, data plane programmability allows iFIT to dynamically 476 load new data probes. These on-demand probes are called Dynamic 477 Network Probes (DNP) [I-D.song-opsawg-dnp4iq]. DNP is the technique 478 to enable probes for customized data collection in different network 479 planes. When working with IOAM or PBT, DNP is loaded to the data 480 plane through incremental programming or configuration. The DNP can 481 effectively conduct data generation, processing, and aggregation. 483 DNP introduces enough flexibility and extensibility to iFIT. It can 484 implement the optimizations for export data reduction motioned in the 485 previous section. It can also generate custom data as required by 486 today and tomorrow's applications. 488 4.3.1. Examples 490 Following are some possible DNPs that can be dynamically deployed to 491 support iFIT applications. 493 On-demand Flow Sketch: A flow sketch is a compact online data 494 structure for approximate flow statistics which can be used to 495 facilitate flow selection. The aforementioned CountMin Sketch is 496 such an example. Since a sketch consumes data plane resources, it 497 should only be deployed when needed. 499 Smart Flow Filter: The policies that choose flows and packet 500 sampling rate can change during the lifetime of an application. 502 Smart Statistics: An application may need to interactively count 503 flows based on different flow granularity or maintain hit counters 504 for selected flow table entries. 506 Smart Data Reduction: DNP can be used to program the events that 507 conditionally trigger data export. 509 4.4. Encapsulation and Tunneling 511 Since the introduction of IOAM, the IOAM option header encapsulation 512 schemes in various network protocols have been proposed with the 513 omission of some protocols, such as MPLS and IPv4, which are still 514 prevalent in carrier networks. iFIT provides solutions to apply the 515 on-path flow telemetry techniques in such networks. PBT-M 516 [I-D.song-ippm-postcard-based-telemetry] does not introduce new 517 headers to the packets so the trouble of encapsulation for a new 518 header is avoided. In case a technique that requires a new header is 519 preferred, [I-D.song-mpls-extension-header] provides a means to 520 encapsulate the extra header using an MPLS extension header. As for 521 IPv4, it is possible to encapsulate the new header in an IP option. 522 For example, RAO [RFC2113] can be used to indicate the presence of 523 the new header. A recent proposal [I-D.herbert-ipv4-eh] that 524 introduces the IPv4 extension header may lead to a long term 525 solution. 527 In carrier networks, it is common for user traffic to traverse 528 various tunnels for QoS, traffic engineering, or security. iFIT 529 supports both the uniform mode and the pipe mode for tunnel support 530 as described in [I-D.song-ippm-ioam-tunnel-mode]. With such 531 flexibility, the operator can either gain a true end-to-end 532 visibility or apply a hierarchical approach which isolates the 533 monitoring domain between customer and provider. 535 4.5. On-demand Technique Selection and Integration 537 With multiple underlying data collection and export techniques at its 538 disposal, iFIT can flexibly adapt to different network conditions and 539 different application requirements. 541 For example, depending on the types of data that are of interest, 542 iFIT may choose either IOAM or PBT to collect the data; if an 543 application needs to track down where the packets are lost, it may 544 switch from IOAM to PBT. 546 iFIT can further integrate multiple data plane monitoring and 547 measurement techniques together and present a comprehensive data 548 plane telemetry solution to network operating applications. 550 5. iFIT Closed-Loop Architecture 552 The iFIT architectural components can work together to form closed- 553 loop applications, as shown in Figure 2. 555 +---------------------+ 556 | | 557 +------+ iFIT Applications |<------+ 558 | | | | 559 | +---------------------+ | 560 | Technique Selection | 561 | and Integration | 562 | | 563 |Smart Flow Smart | 564 |and Data closed-loop Data | 565 |Selection Export| 566 | | 567 | +----+----+ 568 V +---------+| 569 +----------+ Encapsulation +---------+|| 570 | iFIT | and Tunneling | iFIT ||| 571 | Head |----------------------->| ||+ 572 | Node | | Nodes |+ 573 +----------+ +---------+ 574 DNP DNP 576 Figure 2: iFIT Closed-Loop Architecture 578 An iFIT application may pick a suite of telemetry techniques based on 579 its requirements and apply an initial technique to the data plane. 580 It then configures the iFIT head nodes to decide the initial target 581 flows/packets and telemetry data set, the encapsulation and tunneling 582 scheme based on the underlying network architecture, and the iFIT- 583 capable nodes to decide the initial telemetry data export policy. 584 Based on the network condition and the analysis results of the 585 telemetry data, the iFIT application can change the telemetry 586 technique, the flow/data selection policy, and the data export 587 approach in real time without breaking the normal network operation. 588 Many of such dynamic changes can be done through loading and 589 unloading DNPs. 591 We should avoid confusion between this closed telemetry loop and the 592 closed control loop. The latter term is often used in the context of 593 network automation. In such a closed control loop, telemetry also 594 plays an important role. Based on the telemetry results, 595 applications can automatically change the network policy or 596 configuration. In such a context, iFIT is just a part of the loop. 598 The closed-loop nature of the iFIT framework allows numerous new 599 applications which enable future network operation architecture. 601 5.1. Example: Intelligent Multipoint Performance Monitoring 603 [I-D.ietf-ippm-multipoint-alt-mark] describes an intelligent 604 performance management based on the network condition. The idea is 605 to split the monitoring network into clusters. The cluster partition 606 that can be applied to every type of network graph and the 607 possibility to combine clusters at different levels enable the so- 608 called Network Zooming. It allows a controller to calibrate the 609 network telemetry, so that it can start without examining in depth 610 and monitor the network as a whole. In case of necessity (packet 611 loss or too high delay), an immediate detailed analysis can be 612 reconfigured. In particular, the controller, that is aware of the 613 network topology, can set up the most suited cluster partition by 614 changing the traffic filter or activate new measurement points and 615 the problem can be localized with a step-by-step process. 617 An iFIT application on top of the controllers can manage such 618 mechanism and the iFIT closed-loop architecture allows its dynamic 619 and flexible operation. 621 5.2. Example: Intent-based Network Monitoring 623 User Intents 624 | 625 V Per-packet 626 +------------+ Telemetry 627 ACL | | Data 628 +--------+ Controller |<--------+ 629 | | | | 630 | +--+---------+ | 631 | | ^ | 632 | |DNPs |Network | 633 | | |Infor | 634 | V | | 635 +------+-------------------+-----------+---+ 636 | | | 637 | V +------+ | 638 | +-------+ +------+| | 639 | | iFIT | iFIT Domain +------+|| | 640 | | Head | |iFIT ||+ | 641 | | Node | |Nodes |+ | 642 | +-------+ +------+ | 643 +------------------------------------------+ 645 Figure 3: Intent-based Monitoring 647 In this example, a user can express high level intents for network 648 monitoring. The controller translates an intent and configure the 649 corresponding DNPs in iFIT nodes which collect necessary network 650 information. Based on the realtime information feedback, the 651 controller runs a local algorithm to determine the suspicious flows. 652 It then deploys ACLs to the iFIT head node to initiate the high 653 precision per-packet on-path telemetry for these flows. 655 6. Summary and Future Work 657 iFIT is an open framework for applying on-path telemetry techniques. 658 Combining with algorithmic and architectural schemes that fit into 659 the framework components, iFIT framework enables a practical 660 telemetry solution based on two basic on-path traffic data collection 661 modes: passport and postcard. 663 The operation of iFIT differs from both active OAM and passive OAM as 664 defined in [RFC7799]. It does not generate any active probe packets 665 or passively observe unmodified user packets. Instead, it modifies 666 selected user packets to collect useful information about them. 667 Therefore, the iFIT operation can be considered the hybrid type III 668 mode, which can provide more flexible and accurate network OAM. 670 More challenges and corresponding solutions for iFIT may need to be 671 covered. For example, how iFIT can fit in the big picture of 672 autonomous networking and support closed control loops. A complete 673 iFIT framework should also consider the cross-domain operations. We 674 leave these topics for future revisions. 676 7. Security Considerations 678 No specific security issues are identified other than those have been 679 discussed in the drafts on on-path flow information telemetry. 681 8. IANA Considerations 683 This document includes no request to IANA. 685 9. Contributors 687 Other major contributors of this document include Giuseppe Fioccola, 688 Daniel King, Zhenqiang Li, Zhenbin Li, Tianran Zhou, and James 689 Guichard. 691 10. Acknowledgments 693 We thank Shwetha Bhandari, Joe Clarke, and Frank Brockners for their 694 constructive suggestions for improving this document. 696 11. References 698 11.1. Normative References 700 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 701 Requirement Levels", BCP 14, RFC 2119, 702 DOI 10.17487/RFC2119, March 1997, 703 . 705 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 706 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 707 May 2016, . 709 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 710 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 711 May 2017, . 713 11.2. Informative References 715 [CMSketch] 716 Cormode, G. and S. Muthukrishnan, "An improved data stream 717 summary: the count-min sketch and its applications", 2005, 718 . 720 [I-D.brockners-inband-oam-data] 721 Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., 722 Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, 723 P., Chang, R., and d. daniel.bernier@bell.ca, "Data Fields 724 for In-situ OAM", draft-brockners-inband-oam-data-07 (work 725 in progress), July 2017. 727 [I-D.herbert-ipv4-eh] 728 Herbert, T., "IPv4 Extension Headers and Flow Label", 729 draft-herbert-ipv4-eh-01 (work in progress), May 2019. 731 [I-D.ietf-ippm-multipoint-alt-mark] 732 Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, 733 "Multipoint Alternate Marking method for passive and 734 hybrid performance monitoring", draft-ietf-ippm- 735 multipoint-alt-mark-02 (work in progress), July 2019. 737 [I-D.kumar-ippm-ifa] 738 Kumar, J., Anubolu, S., Lemon, J., Manur, R., Holbrook, 739 H., Ghanwani, A., Cai, D., Ou, H., and L. Yizhou, "Inband 740 Flow Analyzer", draft-kumar-ippm-ifa-01 (work in 741 progress), February 2019. 743 [I-D.mirsky-ippm-hybrid-two-step] 744 Mirsky, G., Lingqiang, W., and G. Zhui, "Hybrid Two-Step 745 Performance Measurement Method", draft-mirsky-ippm-hybrid- 746 two-step-04 (work in progress), October 2019. 748 [I-D.song-ippm-ioam-data-validation-option] 749 Song, H. and T. Zhou, "In-situ OAM Data Validation 750 Option", draft-song-ippm-ioam-data-validation-option-02 751 (work in progress), April 2018. 753 [I-D.song-ippm-ioam-tunnel-mode] 754 Song, H., Li, Z., Zhou, T., and Z. Wang, "In-situ OAM 755 Processing in Tunnels", draft-song-ippm-ioam-tunnel- 756 mode-00 (work in progress), June 2018. 758 [I-D.song-ippm-postcard-based-telemetry] 759 Song, H., Zhou, T., Li, Z., Shin, J., and K. Lee, 760 "Postcard-based On-Path Flow Data Telemetry", draft-song- 761 ippm-postcard-based-telemetry-06 (work in progress), 762 October 2019. 764 [I-D.song-mpls-extension-header] 765 Song, H., Li, Z., Zhou, T., and L. Andersson, "MPLS 766 Extension Header", draft-song-mpls-extension-header-02 767 (work in progress), February 2019. 769 [I-D.song-opsawg-dnp4iq] 770 Song, H. and J. Gong, "Requirements for Interactive Query 771 with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 772 (work in progress), June 2017. 774 [I-D.zhou-ippm-enhanced-alternate-marking] 775 Zhou, T., Fioccola, G., Li, Z., Lee, S., and M. Cociglio, 776 "Enhanced Alternate Marking Method", draft-zhou-ippm- 777 enhanced-alternate-marking-04 (work in progress), October 778 2019. 780 [passport-postcard] 781 Handigol, N., Heller, B., Jeyakumar, V., Mazieres, D., and 782 N. McKeown, "Where is the debugger for my software-defined 783 network?", 2012, 784 . 786 [RFC2113] Katz, D., "IP Router Alert Option", RFC 2113, 787 DOI 10.17487/RFC2113, February 1997, 788 . 790 [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, 791 "Specification of the IP Flow Information Export (IPFIX) 792 Protocol for the Exchange of Flow Information", STD 77, 793 RFC 7011, DOI 10.17487/RFC7011, September 2013, 794 . 796 11.3. URIs 798 [1] https://developers.google.com/protocol-buffers/ 800 Authors' Addresses 802 Haoyu Song (editor) 803 Futurewei 804 2330 Central Expressway 805 Santa Clara 806 USA 808 Email: haoyu.song@futurewei.com 810 Fengwei Qin 811 China Mobile 812 No. 32 Xuanwumenxi Ave., Xicheng District 813 Beijing, 100032 814 P.R. China 816 Email: qinfengwei@chinamobile.com 818 Huanan Chen 819 China Telecom 820 P. R. China 822 Email: chenhuan6@chinatelecom.cn 824 Jaehwan Jin 825 LG U+ 826 South Korea 828 Email: daenamu1@lguplus.co.kr 829 Jongyoon Shin 830 SK Telecom 831 South Korea 833 Email: jongyoon.shin@sk.com