idnits 2.17.1 draft-song-ippm-postcard-based-telemetry-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 161 has weird spacing: '...sr pkts gen p...' -- The document date (15 November 2021) is 892 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-13) exists of draft-ietf-6man-spring-srv6-oam-11 == Outdated reference: A later version (-17) exists of draft-ietf-ippm-ioam-data-16 == Outdated reference: A later version (-11) exists of draft-ietf-ippm-ioam-direct-export-07 == Outdated reference: A later version (-07) exists of draft-spiegel-ippm-ioam-rawexport-05 -- Obsolete informational reference (is this intentional?): RFC 2925 (Obsoleted by RFC 4560) -- Obsolete informational reference (is this intentional?): RFC 8321 (Obsoleted by RFC 9341) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPPM H. Song 3 Internet-Draft Futurewei Technologies 4 Intended status: Informational G. Mirsky 5 Expires: 19 May 2022 Ericsson 6 C. Filsfils 7 A. Abdelsalam 8 Cisco Systems, Inc. 9 T. Zhou 10 Z. Li 11 Huawei 12 J. Shin 13 SK Telecom 14 K. Lee 15 LG U+ 16 15 November 2021 18 In-Situ OAM Marking-based Direct Export 19 draft-song-ippm-postcard-based-telemetry-11 21 Abstract 23 The document describes a packet-marking variation of the IOAM DEX 24 option, referred to as IOAM Marking. Similar to IOAM DEX, IOAM 25 Marking does not carry the telemetry data in user packets but send 26 the telemetry data through a dedicated packet. Unlike IOAM DEX, IOAM 27 Marking does not require an extra instruction header. IOAM Marking 28 raises some unique issues that need to be considered. This document 29 formally describes the high level scheme and cover the common 30 requirements and issues when applying IOAM Marking in different 31 networks. IOAM Marking is complementary to the other on-path 32 telemetry schemes such as IOAM trace and E2E options. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on 19 May 2022. 50 Copyright Notice 52 Copyright (c) 2021 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 57 license-info) in effect on the date of publication of this document. 58 Please review these documents carefully, as they describe your rights 59 and restrictions with respect to this document. Code Components 60 extracted from this document must include Simplified BSD License text 61 as described in Section 4.e of the Trust Legal Provisions and are 62 provided without warranty as described in the Simplified BSD License. 64 Table of Contents 66 1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 2 67 2. IOAM Marking: Marking-based IOAM Direct Export . . . . . . . 3 68 3. New Challenges . . . . . . . . . . . . . . . . . . . . . . . 5 69 4. IOAM Marking Design Considerations . . . . . . . . . . . . . 6 70 4.1. Packet Marking . . . . . . . . . . . . . . . . . . . . . 6 71 4.2. Flow Path Discovery . . . . . . . . . . . . . . . . . . . 6 72 4.3. Packet Identity for Export Data Correlation . . . . . . . 7 73 4.4. Control the Load . . . . . . . . . . . . . . . . . . . . 8 74 5. Implementation Recommendation . . . . . . . . . . . . . . . . 8 75 5.1. Configuration . . . . . . . . . . . . . . . . . . . . . . 8 76 5.2. Postcard Format . . . . . . . . . . . . . . . . . . . . . 8 77 5.3. Data Correlation . . . . . . . . . . . . . . . . . . . . 9 78 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 79 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 80 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 9 81 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 82 10. Informative References . . . . . . . . . . . . . . . . . . . 9 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 85 1. Motivation 87 To gain detailed data plane visibility to support effective network 88 OAM, it is essential to be able to examine the trace of user packets 89 along their forwarding paths. Such on-path flow data reflect the 90 state and status of each user packet's real-time experience and 91 provide valuable information for network monitoring, measurement, and 92 diagnosis. 94 The telemetry data include but not limited to the detailed forwarding 95 path, the timestamp/latency at each network node, and, in case of 96 packet drop, the drop location, and the reason. The emerging 97 programmable data plane devices allow user-defined data collection or 98 conditional data collection based on trigger events. Such on-path 99 flow data are from and about the live user traffic, which complements 100 the data acquired through other passive and active OAM mechanisms 101 such as IPFIX [RFC7011] and ICMP [RFC2925]. 103 On-path telemetry was developed to cater to the need of collecting 104 on-path flow data. There are two basic modes for on-path telemetry: 105 the passport mode and the postcard mode. In the passport mode which 106 is represented by IOAM trace option [I-D.ietf-ippm-ioam-data], each 107 node on the path adds the telemetry data to the user packets (i.e., 108 stamp the passport). The accumulated data-trace carried by user 109 packets are exported at a configured end node. In the postcard mode 110 which is represented by IOAM direct export option (DEX) 111 [I-D.ietf-ippm-ioam-direct-export], each node directly exports the 112 telemetry data using an independent packet (i.e., send a postcard) to 113 avoid carrying the data with user packets. The postcard mode is 114 complementary to the passport mode. 116 IOAM DEX uses an instruction header to explicitly instruct the 117 telemetry data to be collected. This document describes another 118 variation of the postcard mode on-path telemetry, IOAM Marking. 119 Unlike IOAM DEX, IOAM Marking does not require a telemetry 120 instruction header. However, IOAM Marking has unique issues that 121 need to be considered. This document discusses the challenges and 122 their solutions which are common to the high-level scheme of IOAM 123 Marking. 125 2. IOAM Marking: Marking-based IOAM Direct Export 127 As the name suggests, IOAM Marking only needs a marking-bit in the 128 existing headers of user packets to trigger the telemetry data 129 collection and export. The sketch of IOAM Marking is as follows. If 130 on-path data need to be collected, the user packet is marked at the 131 path head node. At each IOAM Marking-aware node, if the mark is 132 detected, a postcard (i.e., the dedicated OAM packet triggered by a 133 marked user packet) is generated and sent to a collector. The 134 postcard contains the data requested by the management plane. The 135 requested data are configured by the management plane. Once the 136 collector receives all the postcards for a single user packet, it can 137 infer the packet's forwarding path and analyze the data set. The 138 path end node is configured to unmark the packets to its original 139 format if necessary. 141 The overall architecture of IOAM Marking is depicted in Figure 1. 143 +------------+ +-----------+ 144 | Network | | Telemetry | 145 | Management |(-------| Data | 146 | | | Collector | 147 +-----:------+ +-----------+ 148 : ^ 149 :configurations |postcards 150 : |(OAM pkts) 151 ...............:.....................|........ 152 : : : | : 153 : +---------:---+-----------:---+--+-------:---+ 154 : | : | : | : | 155 V | V | V | V | 156 +------+-+ +-----+--+ +------+-+ +------+-+ 157 usr pkts | Head | | Path | | Path | | End | 158 ====>| Node |====>| Node |====>| Node |====>| Node |===> 159 | | | A | | B | | | 160 +--------+ +--------+ +--------+ +--------+ 161 mark usr pkts gen postcards gen postcards gen postcards 162 gen postcards unmark usr pkts 164 Figure 1: Architecture of IOAM Marking 166 The advantages of IOAM Marking are summarized as follows. 168 * 1: IOAM Marking avoids augmenting user packets with new headers 169 and the signaling for telemetry data collection remains in the 170 data plane. 172 * 2: IOAM Marking is extensible for collecting arbitrary new data to 173 support possible future use cases. The data set to be collected 174 can be configured through the management plane or control plane. 176 * 3: IOAM Marking can avoid interfering with the normal forwarding. 177 The collected data are free to be transported independently 178 through in-band or out-of-band channels. The data collecting, 179 processing, assembly, encapsulation, and transport are, therefore, 180 decoupled from the forwarding of the corresponding user packets 181 and can be performed in data-plane slow-path if necessary. 183 * 4: For IOAM Marking, the types of data collected from each node 184 can vary depending on application requirements and node 185 capability. 187 * 5: IOAM Marking makes it easy to secure the collected data without 188 exposing it to unnecessary entities. For example, both the 189 configuration and the telemetry data can be encrypted and/or 190 authenticated before being transported, so passive eavesdropping 191 and a man-in-the-middle attack can both be deterred. 193 * 6: Even if a user packet under inspection is dropped at some node 194 in the network, the postcards collected from the preceding nodes 195 are still valid and can be used to diagnose the packet drop 196 location and reason. 198 3. New Challenges 200 Although IOAM Marking has some unique features compared to the 201 passport mode telemetry and the instruction-based IOAM DEX, it 202 introduces a few new challenges. 204 * Challenge 1 (Packet Marking): A user packet needs to be marked to 205 trigger the path-associated data collection. Since IOAM Marking 206 does not augment user packets with any new header fields, it needs 207 to reserve or reuse bits from the existing header fields. This 208 raises a similar issue as in the Alternate Marking Scheme 209 [RFC8321] 211 * Challenge 2 (Configuration): Since the packet header will not 212 carry IOAM instructions anymore, the data plane devices need to be 213 configured to know what data to collect. However, in general, the 214 forwarding path of a flow packet (due to ECMP or dynamic routing) 215 is unknown beforehand (note that there are some notable 216 exceptions, such as segment routing). If the per-flow customized 217 data collection is required, configuring the data set for each 218 flow at all data plane devices might be expensive in terms of 219 configuration load and data plane resources. 221 * Challenge 3 (Data Correlation): Due to the variable transport 222 latency, the dedicated postcard packets for a single packet may 223 arrive at the collector out of order or be dropped in networks for 224 some reason. In order to infer the packet forwarding path, the 225 collector needs some information from the postcard packets to 226 identify the user packet affiliation and the order of path node 227 traversal. 229 * Challenge 4 (Load Overhead): Since each postcard packet has its 230 header, the overall network bandwidth overhead of IOAM Marking can 231 be high. A large number of postcards could add processing 232 pressure on data collecting servers. That can be used as an 233 attack vector for DoS. 235 4. IOAM Marking Design Considerations 237 To address the above challenges, we propose several design details of 238 IOAM Marking. 240 4.1. Packet Marking 242 To trigger the path-associated data collection, usually, a single bit 243 from some header field is sufficient. While no such bit is 244 available, other packet-marking techniques are needed. We discuss 245 several possible application scenarios. 247 * IPv4. Alternate Marking (AM) [RFC8321] is an IP flow performance 248 measurement framework that also requires a single bit for packet 249 coloring. The difference is that AM does in-network measurement 250 while IOAM Marking only collects and exports data at network nodes 251 (i.e., the data analysis is done at the collector rather than in 252 the network nodes). AM suggests to use some reserved bit of the 253 Flag field or some unused bit of the TOS field. Actually, AM can 254 be considered a sub-case of IOAM Marking, so that the same bit can 255 be used for IOAM Marking. The management plane is responsible for 256 configuring the actual operation mode. 258 * SFC NSH. The OAM bit in the NSH header can be used to trigger the 259 on-path data collection [RFC8300]. IOAM Marking does not add any 260 other metadata to NSH. 262 * MPLS. Instead of choosing a header bit, we take advantage of the 263 synonymous flow label [I-D.bryant-mpls-synonymous-flow-labels] 264 approach to mark the packets. A synonymous flow label indicates 265 the on-path data should be collected and forwarded through a 266 postcard. 268 * SRv6: A flag bit in SRH can be reserved to trigger the on-path 269 data collection [I-D.song-6man-srv6-pbt]. SRv6 OAM 270 [I-D.ietf-6man-spring-srv6-oam] has adopted the O-bit in SRH flags 271 as the marking bit to trigger the telemetry. 273 4.2. Flow Path Discovery 275 In case the path that a flow traverses is unknown in advance, all 276 IOAM Marking-aware nodes should be configured to react to the marked 277 packets by exporting some basic data, such as node ID and TTL before 278 a data set template for that flow is configured. This way, the 279 management plane can learn the flow path dynamically. 281 If the management plane wants to collect the on-path data for some 282 flow, it configures the head node(s) with a probability or time 283 interval for the flow packet marking. When the first marked packet 284 is forwarded in the network, the IOAM Marking-aware nodes will export 285 the basic data set to the collector. Hence, the flow path is 286 identified. If other data types need to be collected, the management 287 plane can further configure the data set's template to the target 288 nodes on the flow's path. The IOAM Marking-aware nodes collect and 289 export data accordingly if the packet is marked and a data set 290 template is present. 292 If the flow path is changed for any reason, the new path can be 293 quickly learned by the collector. Consequently, the management plane 294 controller can be directed to configure the nodes on the new path. 295 The outdated configuration can be automatically timed out or 296 explicitly revoked by the management plane controller. 298 4.3. Packet Identity for Export Data Correlation 300 The collector needs to correlate all the postcard packets for a 301 single user packet. Once this is done, the TTL (or the timestamp, if 302 the network time is synchronized) can be used to infer the flow 303 forwarding path. The key issue here is to correlate all the 304 postcards for the same user packet. 306 The first possible approach includes the flow ID plus the user packet 307 ID in the OAM packets. For example, the flow ID can be the 5-tuple 308 IP header of the user traffic, and the user packet ID can be some 309 unique information pertaining to a user packet (e.g., the sequence 310 number of a TCP packet). 312 If the packet marking interval is large enough, the flow ID is enough 313 to identify a user packet. As a result, it can be assumed that all 314 the exported postcard packets for the same flow during a short time 315 interval belong to the same user packet. 317 Alternatively, if the network is synchronized, then the flow ID plus 318 the timestamp at each node can also infer the postcard affiliation. 319 However, some errors may occur under some circumstances. For 320 example, two consecutive user packets from the same flows are marked, 321 but one exported postcard from a node is lost. It is difficult for 322 the collector to decide to which user packet the remaining postcard 323 is related. In many cases, such a rare error has no catastrophic 324 consequence. Therefore it is tolerable. 326 4.4. Control the Load 328 IOAM Marking should not be applied to all the packets all the time. 329 It is better to be used in an interactive environment where the 330 network telemetry applications dynamically decide which subset of 331 traffic is under scrutiny. The network devices can limit the packet 332 marking rate through sampling and metering. The postcard packets can 333 be distributed to different servers to balance the processing load. 335 It is important to understand that the total amount of data exported 336 by IOAM Marking is identical to that of IOAM trace option. The only 337 extra overhead is the packet header of the postcards. In the case of 338 IOAM trace option, it carries the data from each node throughout the 339 path to the end node before exporting the aggregated data. On the 340 other hand, IOAM Marking directly exports local data. The overall 341 network bandwidth impact depends on the network topology and scale, 342 and in some cases IOAM Marking could be more bandwidth efficient. 344 5. Implementation Recommendation 346 5.1. Configuration 348 The head node's ACL should be configured to filter out the target 349 flows for telemetry data collection. Optionally, a flow packet 350 sampling rate or probability could be configured to monitor a subset 351 of the flow packets. 353 The telemetry data set that should be exported by postcards at each 354 path node could be configured using the data set templates specified, 355 for example, in IPFIX [RFC7011]. In future revisions, we will 356 provide more details. 358 The IOAM Marking-aware path nodes could be configured to respond or 359 ignore the marked packets. 361 5.2. Postcard Format 363 The postcard should use the same data export format as that used by 364 IOAM. [I-D.spiegel-ippm-ioam-rawexport] proposes a raw format that 365 can be interpreted by IPFIX. In future revisions, we will provide 366 more details. 368 5.3. Data Correlation 370 Enough information should be included to help the collector to 371 correlate and order the postcards for a single user packet. 372 Section 4.3 provides several possible means. The application 373 scenario and network protocol are important factors to determine the 374 means to use. In future revisions, we will provide details for 375 representative applications. 377 6. Security Considerations 379 Several security issues need to be considered. 381 * Eavesdrop and tamper: the postcards can be encrypted and 382 authenticated to avoid such security threats. 384 * DoS attack: IOAM Marking can be limited to a single administrative 385 domain. The mark must be removed at the egress domain edge. The 386 node can rate-limit the extra traffic incurred by postcards. 388 7. IANA Considerations 390 No requirement for IANA is identified. 392 8. Contributors 394 We thank Alfred Morton who provided valuable suggestions and comments 395 helping improve this draft. 397 9. Acknowledgments 399 TBD. 401 10. Informative References 403 [I-D.bryant-mpls-synonymous-flow-labels] 404 Bryant, S., Swallow, G., Sivabalan, S., Mirsky, G., Chen, 405 M., and Z. Li, "RFC6374 Synonymous Flow Labels", Work in 406 Progress, Internet-Draft, draft-bryant-mpls-synonymous- 407 flow-labels-01, 4 July 2015, 408 . 411 [I-D.ietf-6man-spring-srv6-oam] 412 Ali, Z., Filsfils, C., Matsushima, S., Voyer, D., and M. 413 Chen, "Operations, Administration, and Maintenance (OAM) 414 in Segment Routing Networks with IPv6 Data plane (SRv6)", 415 Work in Progress, Internet-Draft, draft-ietf-6man-spring- 416 srv6-oam-11, 2 June 2021, 417 . 420 [I-D.ietf-ippm-ioam-data] 421 Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields 422 for In-situ OAM", Work in Progress, Internet-Draft, draft- 423 ietf-ippm-ioam-data-16, 8 November 2021, 424 . 427 [I-D.ietf-ippm-ioam-direct-export] 428 Song, H., Gafni, B., Zhou, T., Li, Z., Brockners, F., 429 Bhandari, S., Sivakolundu, R., and T. Mizrahi, "In-situ 430 OAM Direct Exporting", Work in Progress, Internet-Draft, 431 draft-ietf-ippm-ioam-direct-export-07, 13 October 2021, 432 . 435 [I-D.song-6man-srv6-pbt] 436 Song, H., "Support Postcard-Based Telemetry for SRv6 OAM", 437 Work in Progress, Internet-Draft, draft-song-6man-srv6- 438 pbt-01, 14 October 2019, . 441 [I-D.spiegel-ippm-ioam-rawexport] 442 Spiegel, M., Brockners, F., Bhandari, S., and R. 443 Sivakolundu, "In-situ OAM raw data export with IPFIX", 444 Work in Progress, Internet-Draft, draft-spiegel-ippm-ioam- 445 rawexport-05, 12 July 2021, 446 . 449 [RFC2925] White, K., "Definitions of Managed Objects for Remote 450 Ping, Traceroute, and Lookup Operations", RFC 2925, 451 DOI 10.17487/RFC2925, September 2000, 452 . 454 [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, 455 "Specification of the IP Flow Information Export (IPFIX) 456 Protocol for the Exchange of Flow Information", STD 77, 457 RFC 7011, DOI 10.17487/RFC7011, September 2013, 458 . 460 [RFC8300] Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed., 461 "Network Service Header (NSH)", RFC 8300, 462 DOI 10.17487/RFC8300, January 2018, 463 . 465 [RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, 466 L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 467 "Alternate-Marking Method for Passive and Hybrid 468 Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, 469 January 2018, . 471 Authors' Addresses 473 Haoyu Song 474 Futurewei Technologies 475 2330 Central Expressway 476 Santa Clara, 95050, 477 United States of America 479 Email: hsong@futurewei.com 481 Greg Mirsky 482 Ericsson 484 Email: gregimirsky@gmail.com 486 Clarence Filsfils 487 Cisco Systems, Inc. 488 Belgium 490 Email: cfilsfil@cisco.com 492 Ahmed Abdelsalam 493 Cisco Systems, Inc. 494 Italy 496 Email: ahabdels@cisco.com 498 Tianran Zhou 499 Huawei 500 156 Beiqing Road 501 Beijing, 100095 502 P.R. China 504 Email: zhoutianran@huawei.com 505 Zhenbin Li 506 Huawei 507 156 Beiqing Road 508 Beijing, 100095 509 P.R. China 511 Email: lizhenbin@huawei.com 513 Jongyoon Shin 514 SK Telecom 515 South Korea 517 Email: jongyoon.shin@sk.com 519 Kyungtae Lee 520 LG U+ 521 South Korea 523 Email: coolee@lguplus.co.kr