idnits 2.17.1 draft-song-ippm-postcard-based-telemetry-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 164 has weird spacing: '...sr pkts gen p...' -- The document date (July 9, 2021) is 1019 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-13) exists of draft-ietf-6man-spring-srv6-oam-07 == Outdated reference: A later version (-11) exists of draft-ietf-ippm-ioam-direct-export-00 == Outdated reference: A later version (-07) exists of draft-spiegel-ippm-ioam-rawexport-01 -- Obsolete informational reference (is this intentional?): RFC 2925 (Obsoleted by RFC 4560) -- Obsolete informational reference (is this intentional?): RFC 8321 (Obsoleted by RFC 9341) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPPM H. Song 3 Internet-Draft Futurewei Technologies 4 Intended status: Informational G. Mirsky 5 Expires: January 10, 2022 ZTE Corp. 6 C. Filsfils 7 A. Abdelsalam 8 Cisco Systems, Inc. 9 T. Zhou 10 Z. Li 11 Huawei 12 J. Shin 13 SK Telecom 14 K. Lee 15 LG U+ 16 July 9, 2021 18 Postcard-based On-Path Flow Data Telemetry using Packet Marking 19 draft-song-ippm-postcard-based-telemetry-10 21 Abstract 23 The document describes a packet-marking variation of the Postcard- 24 Based Telemetry (PBT), referred to as PBT-M. Similar to the 25 instruction-based PBT (i.e., IOAM DEX), PBT-M does not carry the 26 telemetry data in user packets but send the telemetry data through a 27 dedicated packet. Unlike the instruction-based PBT, PBT-M does not 28 require an extra instruction header. PBT-M raises some unique issues 29 that need to be considered. This document formally describes the 30 high level scheme and cover the common requirements and issues when 31 applying PBT-M in different networks. PBT-M is complementary to the 32 other on-path telemetry schemes such as IOAM. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 10, 2022. 50 Copyright Notice 52 Copyright (c) 2021 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 2 68 2. PBT-M: Marking-based PBT . . . . . . . . . . . . . . . . . . 3 69 3. New Challenges . . . . . . . . . . . . . . . . . . . . . . . 5 70 4. PBT-M Design Considerations . . . . . . . . . . . . . . . . . 6 71 4.1. Packet Marking . . . . . . . . . . . . . . . . . . . . . 6 72 4.2. Flow Path Discovery . . . . . . . . . . . . . . . . . . . 6 73 4.3. Packet Identity for Export Data Correlation . . . . . . . 7 74 4.4. Control the Load . . . . . . . . . . . . . . . . . . . . 7 75 5. Implementation Recommendation . . . . . . . . . . . . . . . . 8 76 5.1. Configuration . . . . . . . . . . . . . . . . . . . . . . 8 77 5.2. Postcard Format . . . . . . . . . . . . . . . . . . . . . 8 78 5.3. Data Correlation . . . . . . . . . . . . . . . . . . . . 8 79 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 80 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 81 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 9 82 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 83 10. Informative References . . . . . . . . . . . . . . . . . . . 9 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 86 1. Motivation 88 To gain detailed data plane visibility to support effective network 89 OAM, it is essential to be able to examine the trace of user packets 90 along their forwarding paths. Such on-path flow data reflect the 91 state and status of each user packet's real-time experience and 92 provide valuable information for network monitoring, measurement, and 93 diagnosis. 95 The telemetry data include but not limited to the detailed forwarding 96 path, the timestamp/latency at each network node, and, in case of 97 packet drop, the drop location, and the reason. The emerging 98 programmable data plane devices allow user-defined data collection or 99 conditional data collection based on trigger events. Such on-path 100 flow data are from and about the live user traffic, which complements 101 the data acquired through other passive and active OAM mechanisms 102 such as IPFIX [RFC7011] and ICMP [RFC2925]. 104 On-path telemetry was developed to cater to the need of collecting 105 on-path flow data. There are two basic modes for on-path telemetry: 106 the passport mode and the postcard mode. In the passport mode, each 107 node on the path adds the telemetry data to the user packets (i.e., 108 stamp the passport). The accumulated data-trace carried by user 109 packets are exported at a configured end node. In the postcard mode, 110 each node directly exports the telemetry data using an independent 111 packet (i.e., send a postcard) to avoid the need for carrying the 112 data with user packets. 114 The postcard mode is complementary to the passport mode. In the 115 variant of the postcard-based telemetry (PBT) which uses an 116 instruction header, the postcards that carry telemetry data can be 117 generated by a node's slow path and transported in-band or out-of- 118 band, independent of the original user packets. IOAM direct export 119 option (DEX) [I-D.ietf-ippm-ioam-direct-export] is a representative 120 of instruction-based PBT. 122 This document describes another variation of the postcard mode on- 123 path telemetry, the marking-based PBT (PBT-M). Unlike the 124 instruction-based PBT, PBT-M does not require a telemetry instruction 125 header. However, PBT-M has unique issues that need to be considered. 126 This document discusses the challenges and their solutions which are 127 common to the high-level scheme of PBT-M. 129 2. PBT-M: Marking-based PBT 131 As the name suggests, PBT-M only needs a marking-bit in the existing 132 headers of user packets to trigger the telemetry data collection and 133 export. The sketch of PBT-M is as follows. If on-path data need to 134 be collected, the user packet is marked at the path head node. At 135 each PBT-aware node, if the mark is detected, a postcard (i.e., the 136 dedicated OAM packet triggered by a marked user packet) is generated 137 and sent to a collector. The postcard contains the data requested by 138 the management plane. The requested data are configured by the 139 management plane. Once the collector receives all the postcards for 140 a single user packet, it can infer the packet's forwarding path and 141 analyze the data set. The path end node is configured to unmark the 142 packets to its original format if necessary. 144 The overall architecture of PBT-M is depicted in Figure 1. 146 +------------+ +-----------+ 147 | Network | | Telemetry | 148 | Management |(-------| Data | 149 | | | Collector | 150 +-----:------+ +-----------+ 151 : ^ 152 :configurations |postcards 153 : |(OAM pkts) 154 ...............:.....................|........ 155 : : : | : 156 : +---------:---+-----------:---+--+-------:---+ 157 : | : | : | : | 158 V | V | V | V | 159 +------+-+ +-----+--+ +------+-+ +------+-+ 160 usr pkts | Head | | Path | | Path | | End | 161 ====>| Node |====>| Node |====>| Node |====>| Node |===> 162 | | | A | | B | | | 163 +--------+ +--------+ +--------+ +--------+ 164 mark usr pkts gen postcards gen postcards gen postcards 165 gen postcards unmark usr pkts 167 Figure 1: Architecture of PBT-M 169 The advantages of PBT-M are summarized as follows. 171 o 1: PBT-M avoids augmenting user packets with new headers and the 172 signaling for telemetry data collection remains in the data plane. 174 o 2: PBT-M is extensible for collecting arbitrary new data to 175 support possible future use cases. The data set to be collected 176 can be configured through the management plane or control plane. 178 o 3: PBT-M can avoid interfering with the normal forwarding. The 179 collected data are free to be transported independently through 180 in-band or out-of-band channels. The data collecting, processing, 181 assembly, encapsulation, and transport are, therefore, decoupled 182 from the forwarding of the corresponding user packets and can be 183 performed in data-plane slow-path if necessary. 185 o 4: For PBT-M, the types of data collected from each node can vary 186 depending on application requirements and node capability. 188 o 5: PBT-M makes it easy to secure the collected data without 189 exposing it to unnecessary entities. For example, both the 190 configuration and the telemetry data can be encrypted and/or 191 authenticated before being transported, so passive eavesdropping 192 and a man-in-the-middle attack can both be deterred. 194 o 6: Even if a user packet under inspection is dropped at some node 195 in the network, the postcards collected from the preceding nodes 196 are still valid and can be used to diagnose the packet drop 197 location and reason. 199 3. New Challenges 201 Although PBT-M addresses the issues of the passport mode telemetry 202 and the instruction-based PBT, it introduces a few new challenges. 204 o Challenge 1 (Packet Marking): A user packet needs to be marked to 205 trigger the path-associated data collection. Since the PBT-M does 206 not augment user packets with any new header fields, it needs to 207 reserve or reuse bits from the existing header fields. This 208 raises a similar issue as in the Alternate Marking Scheme 209 [RFC8321] 211 o Challenge 2 (Configuration): Since the packet header will not 212 carry OAM instructions anymore, the data plane devices need to be 213 configured to know what data to collect. However, in general, the 214 forwarding path of a flow packet (due to ECMP or dynamic routing) 215 is unknown beforehand (note that there are some notable 216 exceptions, such as segment routing). If the per-flow customized 217 data collection is required, configuring the data set for each 218 flow at all data plane devices might be expensive in terms of 219 configuration load and data plane resources. 221 o Challenge 3 (Data Correlation): Due to the variable transport 222 latency, the dedicated postcard packets for a single packet may 223 arrive at the collector out of order or be dropped in networks for 224 some reason. In order to infer the packet forwarding path, the 225 collector needs some information from the postcard packets to 226 identify the user packet affiliation and the order of path node 227 traversal. 229 o Challenge 4 (Load Overhead): Since each postcard packet has its 230 header, the overall network bandwidth overhead of PBT can be high. 231 A large number of postcards could add processing pressure on data 232 collecting servers. That can be used as an attack vector for DoS. 234 4. PBT-M Design Considerations 236 To address the above challenges, we propose several design details of 237 PBT-M. 239 4.1. Packet Marking 241 To trigger the path-associated data collection, usually, a single bit 242 from some header field is sufficient. While no such bit is 243 available, other packet-marking techniques are needed. We discuss 244 several possible application scenarios. 246 o IPv4. Alternate Marking (AM) [RFC8321] is an IP flow performance 247 measurement framework that also requires a single bit for packet 248 coloring. The difference is that AM does in-network measurement 249 while PBT-M only collects and exports data at network nodes (i.e., 250 the data analysis is done at the collector rather than in the 251 network nodes). AM suggests to use some reserved bit of the Flag 252 field or some unused bit of the TOS field. Actually, AM can be 253 considered a sub-case of PBT-M, so that the same bit can be used 254 for PBT-M. The management plane is responsible for configuring 255 the actual operation mode. 257 o SFC NSH. The OAM bit in the NSH header can be used to trigger the 258 on-path data collection [I-D.ietf-sfc-nsh]. PBT does not add any 259 other metadata to NSH. 261 o MPLS. Instead of choosing a header bit, we take advantage of the 262 synonymous flow label [I-D.bryant-mpls-synonymous-flow-labels] 263 approach to mark the packets. A synonymous flow label indicates 264 the on-path data should be collected and forwarded through a 265 postcard. 267 o SRv6: A flag bit in SRH can be reserved to trigger the on-path 268 data collection [I-D.song-6man-srv6-pbt]. SRv6 OAM 269 [I-D.ietf-6man-spring-srv6-oam] has adopted the O-bit in SRH flags 270 as the marking bit to trigger the telemetry. 272 4.2. Flow Path Discovery 274 In case the path that a flow traverses is unknown in advance, all 275 PBT-aware nodes should be configured to react to the marked packets 276 by exporting some basic data, such as node ID and TTL before a data 277 set template for that flow is configured. This way, the management 278 plane can learn the flow path dynamically. 280 If the management plane wants to collect the on-path data for some 281 flow, it configures the head node(s) with a probability or time 282 interval for the flow packet marking. When the first marked packet 283 is forwarded in the network, the PBT-aware nodes will export the 284 basic data set to the collector. Hence, the flow path is identified. 285 If other data types need to be collected, the management plane can 286 further configure the data set's template to the target nodes on the 287 flow's path. The PBT-aware nodes collect and export data accordingly 288 if the packet is marked and a data set template is present. 290 If the flow path is changed for any reason, the new path can be 291 quickly learned by the collector. Consequently, the management plane 292 controller can be directed to configure the nodes on the new path. 293 The outdated configuration can be automatically timed out or 294 explicitly revoked by the management plane controller. 296 4.3. Packet Identity for Export Data Correlation 298 The collector needs to correlate all the postcard packets for a 299 single user packet. Once this is done, the TTL (or the timestamp, if 300 the network time is synchronized) can be used to infer the flow 301 forwarding path. The key issue here is to correlate all the 302 postcards for the same user packet. 304 The first possible approach includes the flow ID plus the user packet 305 ID in the OAM packets. For example, the flow ID can be the 5-tuple 306 IP header of the user traffic, and the user packet ID can be some 307 unique information pertaining to a user packet (e.g., the sequence 308 number of a TCP packet). 310 If the packet marking interval is large enough, the flow ID is enough 311 to identify a user packet. As a result, it can be assumed that all 312 the exported postcard packets for the same flow during a short time 313 interval belong to the same user packet. 315 Alternatively, if the network is synchronized, then the flow ID plus 316 the timestamp at each node can also infer the postcard affiliation. 317 However, some errors may occur under some circumstances. For 318 example, two consecutive user packets from the same flows are marked, 319 but one exported postcard from a node is lost. It is difficult for 320 the collector to decide to which user packet the remaining postcard 321 is related. In many cases, such a rare error has no catastrophic 322 consequence. Therefore it is tolerable. 324 4.4. Control the Load 326 PBT-M should not be applied to all the packets all the time. It is 327 better to be used in an interactive environment where the network 328 telemetry applications dynamically decide which subset of traffic is 329 under scrutiny. The network devices can limit the PBT rate through 330 sampling and metering. The PBT packets can be distributed to 331 different servers to balance the processing load. 333 It is important to understand that the total amount of data exported 334 by PBT-M is identical to that of IOAM. The only extra overhead is 335 the packet header of the postcards. In the case of IOAM, it carries 336 the data from each node throughout the path to the end node before 337 exporting the aggregated data. On the other hand, PBT-M directly 338 exports local data. The overall network bandwidth impact depends on 339 the network topology and scale, and PBT-M could be more bandwidth 340 efficient. 342 5. Implementation Recommendation 344 5.1. Configuration 346 The head node's ACL should be configured to filter out the target 347 flows for telemetry data collection. Optionally, a flow packet 348 sampling rate or probability could be configured to monitor a subset 349 of the flow packets. 351 The telemetry data set that should be exported by postcards at each 352 path node could be configured using the data set templates specified, 353 for example, in IPFIX [RFC7011]. In future revisions, we will 354 provide more details. 356 The PBT-aware path nodes could be configured to respond or ignore the 357 marked packets. 359 5.2. Postcard Format 361 The postcard should use the same data export format as that used by 362 IOAM. [I-D.spiegel-ippm-ioam-rawexport] proposes a raw format that 363 can be interpreted by IPFIX. In future revisions, we will provide 364 more details. 366 5.3. Data Correlation 368 Enough information should be included to help the collector to 369 correlate and order the postcards for a single user packet. 370 Section 4.3 provides several possible means. The application 371 scenario and network protocol are important factors to determine the 372 means to use. In future revisions, we will provide details for 373 representative applications. 375 6. Security Considerations 377 Several security issues need to be considered. 379 o Eavesdrop and tamper: the postcards can be encrypted and 380 authenticated to avoid such security threats. 382 o DoS attack: PBT can be limited to a single administrative domain. 383 The mark must be removed at the egress domain edge. The node can 384 rate-limit the extra traffic incurred by postcards. 386 7. IANA Considerations 388 No requirement for IANA is identified. 390 8. Contributors 392 We thank Alfred Morton who provided valuable suggestions and comments 393 helping improve this draft. 395 9. Acknowledgments 397 TBD. 399 10. Informative References 401 [I-D.bryant-mpls-synonymous-flow-labels] 402 Bryant, S., Swallow, G., Sivabalan, S., Mirsky, G., Chen, 403 M., and Z. Li, "RFC6374 Synonymous Flow Labels", draft- 404 bryant-mpls-synonymous-flow-labels-01 (work in progress), 405 July 2015. 407 [I-D.ietf-6man-spring-srv6-oam] 408 Ali, Z., Filsfils, C., Matsushima, S., Voyer, D., and M. 409 Chen, "Operations, Administration, and Maintenance (OAM) 410 in Segment Routing Networks with IPv6 Data plane (SRv6)", 411 draft-ietf-6man-spring-srv6-oam-07 (work in progress), 412 July 2020. 414 [I-D.ietf-ippm-ioam-direct-export] 415 Song, H., Gafni, B., Zhou, T., Li, Z., Brockners, F., 416 Bhandari, S., Sivakolundu, R., and T. Mizrahi, "In-situ 417 OAM Direct Exporting", draft-ietf-ippm-ioam-direct- 418 export-00 (work in progress), February 2020. 420 [I-D.ietf-sfc-nsh] 421 Quinn, P., Elzur, U., and C. Pignataro, "Network Service 422 Header (NSH)", draft-ietf-sfc-nsh-28 (work in progress), 423 November 2017. 425 [I-D.song-6man-srv6-pbt] 426 Song, H., "Support Postcard-Based Telemetry for SRv6 OAM", 427 draft-song-6man-srv6-pbt-01 (work in progress), October 428 2019. 430 [I-D.spiegel-ippm-ioam-rawexport] 431 Spiegel, M., Brockners, F., Bhandari, S., and R. 432 Sivakolundu, "In-situ OAM raw data export with IPFIX", 433 draft-spiegel-ippm-ioam-rawexport-01 (work in progress), 434 October 2018. 436 [RFC2925] White, K., "Definitions of Managed Objects for Remote 437 Ping, Traceroute, and Lookup Operations", RFC 2925, 438 DOI 10.17487/RFC2925, September 2000, 439 . 441 [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, 442 "Specification of the IP Flow Information Export (IPFIX) 443 Protocol for the Exchange of Flow Information", STD 77, 444 RFC 7011, DOI 10.17487/RFC7011, September 2013, 445 . 447 [RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, 448 L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 449 "Alternate-Marking Method for Passive and Hybrid 450 Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, 451 January 2018, . 453 Authors' Addresses 455 Haoyu Song 456 Futurewei Technologies 457 2330 Central Expressway 458 Santa Clara, 95050 459 USA 461 Email: hsong@futurewei.com 463 Greg Mirsky 464 ZTE Corp. 466 Email: gregimirsky@gmail.com 467 Clarence Filsfils 468 Cisco Systems, Inc. 469 Belgium 471 Email: cfilsfil@cisco.com 473 Ahmed Abdelsalam 474 Cisco Systems, Inc. 475 Italy 477 Email: ahabdels@cisco.com 479 Tianran Zhou 480 Huawei 481 156 Beiqing Road 482 Beijing, 100095 483 P.R. China 485 Email: zhoutianran@huawei.com 487 Zhenbin Li 488 Huawei 489 156 Beiqing Road 490 Beijing, 100095 491 P.R. China 493 Email: lizhenbin@huawei.com 495 Jongyoon Shin 496 SK Telecom 497 South Korea 499 Email: jongyoon.shin@sk.com 501 Kyungtae Lee 502 LG U+ 503 South Korea 505 Email: coolee@lguplus.co.kr