idnits 2.17.1 draft-brockners-inband-oam-requirements-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 18, 2016) is 2840 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-15) exists of draft-ietf-spring-segment-routing-09 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Brockners 3 Internet-Draft S. Bhandari 4 Intended status: Informational S. Dara 5 Expires: January 19, 2017 C. Pignataro 6 Cisco 7 H. Gredler 8 RtBrick Inc. 9 J. Leddy 10 Comcast 11 S. Youell 12 JMPC 13 July 18, 2016 15 Requirements for In-band OAM 16 draft-brockners-inband-oam-requirements-01 18 Abstract 20 This document discusses the motivation and requirements for including 21 specific operational and telemetry information into data packets 22 while the data packet traverses a path between two points in the 23 network. This method is referred to as "in-band" Operations, 24 Administration, and Maintenance (OAM), given that the OAM information 25 is carried with the data packets as opposed to in "out-of-band" 26 packets dedicated to OAM. In-band OAM complements other OAM 27 mechanisms which use dedicated probe packets to convey OAM 28 information. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on January 19, 2017. 47 Copyright Notice 49 Copyright (c) 2016 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 3. Motivation for In-band OAM . . . . . . . . . . . . . . . . . 4 67 3.1. Path Congruency Issues with Dedicated OAM Packets . . . . 5 68 3.2. Results Sent to a System Other Than the Sender . . . . . 5 69 3.3. Overlay and Underlay Correlation . . . . . . . . . . . . 5 70 3.4. SLA Verification . . . . . . . . . . . . . . . . . . . . 6 71 3.5. Analytics and Diagnostics . . . . . . . . . . . . . . . . 6 72 3.6. Frame Replication/Elimination Decision for Bi-casting 73 /Active-active Networks . . . . . . . . . . . . . . . . . 7 74 3.7. Proof of Transit . . . . . . . . . . . . . . . . . . . . 7 75 3.8. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8 76 4. Considerations for In-band OAM . . . . . . . . . . . . . . . 9 77 4.1. Type of information to be recorded . . . . . . . . . . . 10 78 4.2. MTU and packet size . . . . . . . . . . . . . . . . . . . 10 79 4.3. Administrative boundaries . . . . . . . . . . . . . . . . 11 80 4.4. Selective enablement . . . . . . . . . . . . . . . . . . 11 81 4.5. Optimization of node and interface identifiers . . . . . 12 82 4.6. Loop communication path (IPv6-specifics) . . . . . . . . 12 83 5. Requirements for In-band OAM Data Types . . . . . . . . . . . 12 84 5.1. Generic Requirements . . . . . . . . . . . . . . . . . . 12 85 5.2. In-band OAM Data with Per-hop Scope . . . . . . . . . . . 13 86 5.3. In-band OAM with Selected Hop Scope . . . . . . . . . . . 14 87 5.4. In-band OAM with End-to-end Scope . . . . . . . . . . . . 14 88 6. Security Considerations and Requirements . . . . . . . . . . 15 89 6.1. Proof of Transit . . . . . . . . . . . . . . . . . . . . 15 90 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 91 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 92 9. Informative References . . . . . . . . . . . . . . . . . . . 16 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 95 1. Introduction 97 This document discusses requirements for "in-band" Operations, 98 Administration, and Maintenance (OAM) mechanisms. "In-band" OAM 99 means to record OAM and telemetry information within the data packet 100 while the data packet traverses a network or a particular network 101 domain. The term "in-band" refers to the fact that the OAM and 102 telemetry data is carried within data packets rather than being sent 103 within packets specifically dedicated to OAM. In-band OAM 104 mechanisms, which are sometimes also referred to as embedded network 105 telemetry are a current topic of discussion. In-band network 106 telemetry has been defined for P4 [P4]. The SPUD prototype 107 [I-D.hildebrand-spud-prototype] uses a similar logic that allows 108 network devices on the path between endpoints to participate 109 explicitly in the tube outside the end-to-end context. Even the IPv4 110 route-record option defined in [RFC0791] can be considered an in-band 111 OAM mechanism. In-band OAM complements "out-of-band" mechanisms such 112 as ping or traceroute, or more recent active probing mechanisms, as 113 described in [I-D.lapukhov-dataplane-probe]. In-band OAM mechanisms 114 can be leveraged where current out-of-band mechanisms do not apply or 115 do not offer the desired characteristics or requirements, such as 116 proving that a certain set of traffic takes a pre-defined path, 117 strict congruency is desired, checking service level agreements for 118 the live data traffic, detailed statistics on traffic distribution 119 paths in networks that distribute traffic across multiple paths, or 120 scenarios where probe traffic is potentially handled differently from 121 regular data traffic by the network devices. [RFC7276] presents an 122 overview of OAM tools. 124 Compared to probably the most basic example of "in-band OAM" which is 125 IPv4 route recording [RFC0791], an in-band OAM approach has the 126 following capabilities: 128 a. A flexible data format to allow different types of information to 129 be captured as part of an in-band OAM operation, including not 130 only path tracing information, but additional operational and 131 telemetry information such as timestamps, sequence numbers, or 132 even generic data such as queue size, geo-location of the node 133 that forwarded the packet, etc. 135 b. A data format to express node as well as link identifiers to 136 record the path a packet takes with a fixed amount of added data. 138 c. The ability to detect whether any nodes were skipped while 139 recording in-band OAM information (i.e., in-band OAM is not 140 supported or not enabled on those nodes). 142 d. The ability to actively process information in the packet, for 143 example to prove in a cryptographically secure way that a packet 144 really took a pre-defined path using some traffic steering method 145 such as service chaining or traffic engineering. 147 e. The ability to include OAM data beyond simple path information, 148 such as timestamps or even generic data of a particular use case. 150 f. The ability to include OAM data in various different transport 151 protocols. 153 2. Conventions 155 Abbreviations used in this document: 157 ECMP: Equal Cost Multi-Path 159 MTU: Maximum Transmit Unit 161 NFV: Network Function Virtualization 163 OAM: Operations, Administration, and Maintenance 165 PMTU: Path MTU 167 SLA: Service Level Agreement 169 SFC: Service Function Chain 171 SR: Segment Routing 173 This document defines in-band Operations, Administration, and 174 Maintenance (in-band OAM), as the subset in which OAM information is 175 carried along with data packets. This is as opposed to "out-of-band 176 OAM", where specific packets are dedicated to carrying OAM 177 information. 179 3. Motivation for In-band OAM 181 In several scenarios it is beneficial to make information about which 182 path a packet took through the network available to the operator. 183 This includes not only tasks like debugging, troubleshooting, as well 184 as network planning and network optimization but also policy or 185 service level agreement compliance checks. This section discusses 186 the motivation to introduce new methods for enhanced in-band network 187 diagnostics. 189 3.1. Path Congruency Issues with Dedicated OAM Packets 191 Mechanisms which add tracing information to the regular data traffic, 192 sometimes also referred to as "in-band" or "passive OAM" can 193 complement active, probe-based mechanisms such as ping or traceroute, 194 which are sometimes considered as "out-of-band", because the messages 195 are transported independently from regular data traffic. "In-band" 196 mechanisms do not require extra packets to be sent and hence don't 197 change the packet traffic mix within the network. Traceroute and 198 ping for example use ICMP messages: New packets are injected to get 199 tracing information. Those add to the number of messages in a 200 network, which already might be highly loaded or suffering 201 performance issues for a particular path or traffic type. 203 Packet scheduling algorithms, especially for balancing traffic across 204 equal cost paths or links, often leverage information contained 205 within the packet, such as protocol number, IP-address or MAC- 206 address. Probe packets would thus either need to be sent from the 207 exact same endpoints with the exact same parameters, or probe packets 208 would need to be artificially constructed as "fake" packets and 209 inserted along the path. Both approaches are often not feasible from 210 an operational perspective, be it that access to the end-system is 211 not feasible, or that the diversity of parameters and associated 212 probe packets to be created is simply too large. An in-band 213 mechanism is an alternative in those cases. 215 In-band mechanisms also don't suffer from implementations, where 216 probe traffic is handled differently (and potentially forwarded 217 differently) by a router than regular data traffic. 219 3.2. Results Sent to a System Other Than the Sender 221 Traditional ping and traceroute tools return the OAM results to the 222 sender of the probe. Even when the ICMP messages that are used with 223 these tools are enhanced, and additional telemetry is collected 224 (e.g., ICMP Multi-Part [RFC4884] supporting MPLS information 225 [RFC4950], Interface and Next-Hop Identification [RFC5837], etc.), it 226 would be advantageous to separate the sending of an OAM probe from 227 the receiving of the telemetry data. In this context, it is desired 228 to not assume there is a bidirectional working path. 230 3.3. Overlay and Underlay Correlation 232 Several network deployments leverage tunneling mechanisms to create 233 overlay or service-layer networks. Examples include VXLAN-GPE, GRE, 234 or LISP. One often observed attribute of overlay networks is that 235 they do not offer the user of the overlay any insight into the 236 underlay network. This means that the path that a particular 237 tunneled packet takes, nor other operational details such as the per- 238 hop delay/jitter in the underlay are visible to the user of the 239 overlay network, giving rise to diagnosis and debugging challenges in 240 case of connectivity or performance issues. The scope of OAM tools 241 like ping or traceroute is limited to either the overlay or the 242 underlay which means that the user of the overlay has typically no 243 access to OAM in the underlay, unless specific operational procedures 244 are put in place. With in-band OAM the operator of the underlay can 245 offer details of the connectivity in the underlay to the user of the 246 overlay. The operator of the egress tunnel router could choose to 247 share the recorded information about the path with the user of the 248 overlay. 250 Coupled with mechanisms such as Segment Routing (SR) 251 [I-D.ietf-spring-segment-routing], overlay network and underlay 252 network can be more tightly coupled: The user of the overlay has 253 detailed diagnostic information available in case of failure 254 conditions. The user of the overlay can also use the path recording 255 information as input to traffic steering or traffic engineering 256 mechanisms, to for example achieve path symmetry for the traffic 257 between two endpoints. [I-D.brockners-lisp-sr] is an example for how 258 these methods can be applied to LISP. 260 3.4. SLA Verification 262 In-band OAM can help users of an overlay-service to verify that 263 negotiated SLAs for the real traffic are met by the underlay network 264 provider. Different from solutions which rely on active probes to 265 test an SLA, in-band OAM based mechanisms avoid wrong interpretations 266 and "cheating", which can happen if the probe traffic that is used to 267 perform SLA-check is prioritized by the network provider of the 268 underlay. 270 3.5. Analytics and Diagnostics 272 Network planners and operators benefit from knowledge of the actual 273 traffic distribution in the network. When deriving an overall 274 network connectivity traffic matrix one typically needs to correlate 275 data gathered from each individual devices in the network. If the 276 path of a packet is recorded while the packet is forwarded, the 277 entire path that a packet took through the network is available to 278 the egress system. This obviates the need to retrieve individual 279 traffic statistics from every device in the network and correlate 280 those statistics, or employ other mechanisms such as leveraging 281 traffic engineering with null-bandwidth tunnels just to retrieve the 282 appropriate statistics to generate the traffic matrix. 284 In addition, with individual path tracing, information is available 285 at packet level granularity, rather than only at aggregate level - as 286 is usually the case with IPFIX-style methods which employ flow- 287 filters at the network elements. Data-center networks which use 288 equal-cost multipath (ECMP) forwarding are one example where detailed 289 statistics on flow distribution in the network are highly desired. 290 If a network supports ECMP, one can create detailed statistics for 291 the different paths packets take through the network at the egress 292 system, without a need to correlate/aggregate statistics from every 293 router in the system. Transit devices are off-loaded from the task 294 of gathering packet statistics. 296 3.6. Frame Replication/Elimination Decision for Bi-casting/Active- 297 active Networks 299 Bandwidth- and power-constrained, time-sensitive, or loss-intolerant 300 networks (e.g., networks for industry automation/control, health 301 care) require efficient OAM methods to decide when to replicate 302 packets to a secondary path in order to keep the loss/error-rate for 303 the receiver at a tolerable level - and also when to stop replication 304 and eliminate the redundant flow. Many IoT networks are time 305 sensitive and cannot leverage automatic retransmission requests (ARQ) 306 to cope with transmission errors or lost packets. Transmitting the 307 data over multiple disparate paths (often called bi-casting or live- 308 live) is a method used to reduce the error rate observed by the 309 receiver. TSN receive a lot of attention from the manufacturing 310 industry as shown by a various standardization activities and 311 industry forums being formed (see e.g., IETF 6TiSCH, IEEE P802.1CB, 312 AVnu). 314 3.7. Proof of Transit 316 Several deployments use traffic engineering, policy routing, segment 317 routing or Service Function Chaining (SFC) [RFC7665] to steer packets 318 through a specific set of nodes. In certain cases regulatory 319 obligations or a compliance policy require to prove that all packets 320 that are supposed to follow a specific path are indeed being 321 forwarded across the exact set of nodes specified. If a packet flow 322 is supposed to go through a series of service functions or network 323 nodes, it has to be proven that all packets of the flow actually went 324 through the service chain or collection of nodes specified by the 325 policy. In case the packets of a flow weren't appropriately 326 processed, a verification device would be required to identify the 327 policy violation and take corresponding actions (e.g., drop or 328 redirect the packet, send an alert etc.) corresponding to the policy. 329 In today's deployments, the proof that a packet traversed a 330 particular service chain is typically delivered in an indirect way: 331 Service appliances and network forwarding are in different trust 332 domains. Physical hand-off-points are defined between these trust 333 domains (i.e., physical interfaces). Or in other terms, in the 334 "network forwarding domain" things are wired up in a way that traffic 335 is delivered to the ingress interface of a service appliance and 336 received back from an egress interface of a service appliance. This 337 "wiring" is verified and trusted. The evolution to Network Function 338 Virtualization (NFV) and modern service chaining concepts (using 339 technologies such as LISP, NSH, Segment Routing, etc.) blurs the line 340 between the different trust domains, because the hand-off-points are 341 no longer clearly defined physical interfaces, but are virtual 342 interfaces. Because of that very reason, networks operators require 343 that different trust layers not to be mixed in the same device. For 344 an NFV scenario a different proof is required. Offering a proof that 345 a packet traversed a specific set of service functions would allow 346 network operators to move away from the above described indirect 347 methods of proving that a service chain is in place for a particular 348 application. 350 Deployed service chains without the presence of a "proof of transit" 351 mechanism are typically operated as fail-open system: The packets 352 that arrive at the end of a service chain are processed. Adding 353 "proof of transit" capabilites to a service chain allows an operator 354 to turn a fail-open system into a fail-close system, i.e. packets 355 that did not properly traverse the service chain can be blocked. 357 A solution approach could be based on OAM data which is added to 358 every packet for achieving Proof Of Transit. The OAM data is updated 359 at every hop and is used to verify whether a packet traversed all 360 required nodes. When the verifier receives each packet, it can 361 validate whether the packet traversed the service chain correctly. 362 The detailed mechanisms used for path verification along with the 363 procedures applied to the OAM data carried in the packet for path 364 verification are beyond the scope of this document. Details are 365 addressed in [draft-brockners-proof-of-transit]. In this document 366 the term "proof" refers to a discrete set of bits that represents an 367 integer or string carried as OAM data. The OAM data is used to 368 verify whether a packet traversed the nodes it is supposed to 369 traverse. 371 3.8. Use Cases 373 In-band OAM could be leveraged for several use cases, including: 375 o Traffic Matrix: Derive the network traffic matrix: Traffic for a 376 given time interval between any two edge nodes of a given domain. 377 Could be performed for all traffic or per QoS-class. 379 o Flow Debugging: Discover which path(s) a particular set of traffic 380 (identified by an n-tuple) takes in the network. Such a procedure 381 is particularly useful in case traffic is balanced across multiple 382 paths, like with link aggregation (LACP) or equal cost multi- 383 pathing (ECMP). 385 o Loss Statistics per Path: Retrieve loss statistics per flow and 386 path in the network. 388 o Path Heat Maps: Discover highly utilized links in the network. 390 o Trend Analysis on Traffic Patterns: Analyze if (and if so how) the 391 forwarding path for a specific set of traffic changes over time 392 (can give hints to routing issues, unstable links etc.). 394 o Network Delay Distribution: Show delay distribution across network 395 by node or links. If enabled per application or for a specific 396 flow then display the path taken along with the delay incurred at 397 every hop. 399 o SLA Verification: Verify that a negotiated service level agreement 400 (SLA), e.g., for packet drop rates or delay/jitter is conformed to 401 by the actual traffic. 403 o Low-power Networks: Include application level OAM information 404 (e.g., battery charge level, cache or buffer fill level) into data 405 traffic to avoid sending extra OAM traffic which incur an extra 406 cost on the devices. Using the battery charge level as example, 407 one could avoid sending extra OAM packets just to communicate 408 battery health, and as such would save battery on sensors. 410 o Path Verification or Service Function Path Verification: Proof and 411 verification of packets traversing check points in the network, 412 where check points can be nodes in the network or service 413 functions. 415 o Geo-location Policy: Network policy implemented based on which 416 path packets took. Example: Only if packets originated and stayed 417 within the trading-floor department, access to specific 418 applications or servers is granted. 420 4. Considerations for In-band OAM 422 The implementation of an in-band OAM mechanism needs to take several 423 considerations into account, including administrative boundaries, how 424 information is recorded, Maximum Transfer Unit (MTU), Path MTU 425 discovery and packet size, etc. 427 4.1. Type of information to be recorded 429 The information gathered for in-band OAM can be categorized into 430 three main categories: Information with a per-hop scope, such as path 431 tracing; information which applies to a specific set of nodes, such 432 as path or service chain verification; information which only applies 433 to the edges of a domain, such as sequence numbers. 435 o "edge to edge": Information that needs to be shared between 436 network edges (the "edge" of a network could either be a host or a 437 domain edge device): Edge to edge data e.g., packet and octet 438 count of data entering a well-defined domain and leaving it is 439 helpful in building traffic matrix, sequence number (also called 440 "path packet counters") is useful for the flow to detect packet 441 loss. 443 o "selected hops": Information that applies to a specific set of 444 nodes only. In case of path verification, only the nodes which 445 are "check points" are required to interpret and update the 446 information in the packet. 448 o "per hop": Information that is gathered at every hop along the 449 path a packet traverses within an administrative domain: 451 * Hop by Hop information e.g., Nodes visited for path tracing, 452 Timestamps at each hop to find delays along the path 454 * Stats collection at each hop to optimize communication in 455 resource constrained networks e.g., Battery, CPU, memory status 456 of each node piggy backed in a data packet is useful in low 457 power lossy networks where network nodes are mostly asleep and 458 communication is expensive 460 4.2. MTU and packet size 462 The recorded data at every hop may lead to packet size exceeding the 463 Maximum Transmit Unit (MTU). Based on the transport protocol used 464 MTU is discovered as a configuration parameter or Path MTU (PMTU) is 465 discovered dynamically. Example: IPv6 recommends PMTU discovery 466 before data packets are sent to prevent packet fragmentation. It 467 specifies 1280 octets as the default PDU to be carried in a IPv6 468 datagram. A detailed discussion of the implications of oversized 469 IPv6 header chains if found in [RFC7112]. 471 The Path MTU restricts the amount of data that can be recorded for 472 purpose of OAM within a data packet. The total size of data to be 473 recorded needs to be preset to avoid packet size exceeding the MTU. 475 It is recommended to pre-calculate and configures network devices to 476 limit the in-band OAM data that is attached to a packet. 478 4.3. Administrative boundaries 480 There are several challenges in enabling in-band OAM in the public 481 Internet as well as in corporate/enterprise networks across 482 administrative domains, which include but are not limited to: 484 o Deployment dependent, the data fields that in-band OAM requires as 485 part of a specific transport protocol may not be supported across 486 administrative boundaries. 488 o Current OAM implementations are often done in the slow path, i.e., 489 OAM packets are punted to router's CPU for processing. This leads 490 to performance and scaling issues and opens up routers for attacks 491 such as Denial of Service (DoS) attacks. 493 o Discovery of network topology and details of the network devices 494 across administrative boundaries may open up attack vectors 495 compromising network security. 497 o Specifically on IPv6: At the administrative boundaries IPv6 498 packets with extension headers are dropped for several reasons 499 described in [RFC7872]. 501 The following considerations will be discussed in a future version of 502 this document: If the packet is dropped due to the presence of the 503 in-band OAM; If the policy failure is treated as feature disablement 504 and any further recording is stopped but the packet itself is not 505 dropped, it may lead to every node in the path to make this policy 506 decision. 508 4.4. Selective enablement 510 Deployment dependent, in-band OAM could either be used for all, or 511 only a subset of the overall traffic. While it might be desirable to 512 apply in-band OAM to all traffic and then selectively use the data 513 gathered in case needed, it might not always be feasible. Depending 514 on the forwarding infrastructure used, in-band OAM can have an impact 515 on forwarding performance. The SPUD prototype for example uses the 516 notion of "pipes" to describe the portion of the traffic that could 517 be subject to in-path inspection. Mechanisms to decide which traffic 518 would be subject to in-band OAM are outside the scope of this 519 document. 521 4.5. Optimization of node and interface identifiers 523 Since packets have a finite maximum size, the data recording or 524 carrying capacity of one packet in which the in-band OAM meta data is 525 present is limited. In-band OAM should use its own dedicated 526 namespace (confined to the domain in-band OAM operates in) to 527 represent node and interface IDs to save space in the header. 528 Generic representations of node and interface identifiers which are 529 globally unique (such as a UUID) would consume significantly more 530 bits of in-band OAM data. 532 4.6. Loop communication path (IPv6-specifics) 534 When recorded data is required to be analyzed on a source node that 535 issues a packet and inserts in-band OAM data, the recorded data needs 536 to be carried back to the source node. 538 One way to carry the in-band OAM data back to the source is to 539 utilize an ICMP Echo Request/Reply (ping) or ICMPv6 Echo Request/ 540 Reply (ping6) mechanism. In order to run the in-band OAM mechanism 541 appropriately on the ping/ping6 mechanism, the following two 542 operations should be implemented by the ping/ping6 target node: 544 1. All of the in-band OAM fields would be copied from an Echo 545 Request message to an Echo Reply message. 547 2. The Hop Limit field of the IPv6 header of these messages would be 548 copied as a continuous sequence. Further considerations are 549 addressed in a future version of this document. 551 5. Requirements for In-band OAM Data Types 553 The above discussed use cases require different types of in-band OAM 554 data. This section details requirements for in-band OAM derived from 555 the discussion above. 557 5.1. Generic Requirements 559 REQ-G1: Classification: It should be possible to enable in-band OAM 560 on a selected set of traffic. The selected set of traffic 561 can also be all traffic. 563 REQ-G2: Scope: If in-band OAM is used only within a specific domain, 564 provisions need to be put in place to ensure that in-band 565 OAM data stays within the specific domain only. 567 REQ-G3: Transport independence: Data formats for in-band OAM shall 568 be defined in a transport independent way. In-band OAM 569 applies to a variety of transport protocols. Encapsulations 570 should be defined how the generic data formats are carried 571 by a specific protocol. 573 REQ-G4: Layering: It should be possible to have in-band OAM 574 information for different transport protocol layers be 575 present in several fields within a single packet. This 576 could for example be the case when tunnels are employed and 577 in-band OAM information is to be gathered for both the 578 underlay as well as the overlay network. 580 REQ-G5: MTU size: With in-band OAM information added, packets should 581 not become larger than the path MTU. 583 REQ-G6: Data Structure Reusability: The data types and data formats 584 defined and used for in-band OAM ought to be reusable for 585 out-of-band OAM telemetry as well. 587 5.2. In-band OAM Data with Per-hop Scope 589 REQ-H1: Missing nodes detection: Data shall be present that allows a 590 node to detect whether all nodes that should participate in 591 in-band OAM operations have indeed participated. 593 REQ-H2: Node, instance or device identifier: Data shall be present 594 that allows to retrieve the identity of the entity reporting 595 telemetry information. The entity can be a device, or a 596 subsystem/component within a device. The latter will allow 597 for packet tracing within a device in much the same way as 598 between devices. 600 REQ-H3: Ingress interface identifier: Data shall be present that 601 allows the identification of the interface a particular 602 packet was received from. The interface can be a logical or 603 physical entity. 605 REQ-H4: Egress interface identifier: Data shall be present that 606 allows the identification of the interface a particular 607 packet was forwarded to. Interface can be a logical or 608 physical entity. 610 REQ-H5: Time-related requirements 612 REQ-H5.1: Delay: Data shall be present that allows to 613 retrieve the delay between two or more points of 614 interest within the system. Those points can be 615 within the same device or on different devices. 617 REQ-H5.2: Jitter: Data shall be present that allows to 618 retrieve the jitter between two or more points of 619 interest within the system. Those points can be 620 within the same device or on different devices. 622 REQ-H5.3: Wall-clock time: Data shall be present that 623 allows to retrieve the wall-clock time visited a 624 particular point of interest in the system. 626 REQ-H5.4: Time precision: The precision of the time related 627 data should be configurable. Use-case dependent, 628 the required precision could e.g., be nano- 629 seconds, micro-seconds, milli-seconds, or 630 seconds. 632 REQ-H6: Generic data records (like e.g., GPS/Geo-location 633 information): It should be possible to add user-defined OAM 634 data at select hops to the packet. The semantics of the 635 data are defined by the user. 637 5.3. In-band OAM with Selected Hop Scope 639 REQ-S1: Proof of transit: Data shall be present which allows to 640 securely prove that a packet has visited or ore several 641 particular points of interest (i.e., a particular set of 642 nodes). 644 REQ-S1.1: In case "Shamir's secret sharing scheme" is used 645 for proof of transit, two data records, "random" 646 and "cumulative" shall be present. The number of 647 bits used for "random" and "cumulative" data 648 records can vary between deployments and should 649 thus be configurable. 651 REQ-S1.2: Enable a fail-open service chaining system to be 652 converted into a fail-closed service chaining 653 system. 655 5.4. In-band OAM with End-to-end Scope 657 REQ-E1: Sequence numbering: 659 REQ-E1.1: Reordering detection: It should be possible to 660 detect whether packets have been reordered while 661 traversing an in-band OAM domain. 663 REQ-E1.2: Duplicates detection: It should be possible to 664 detect whether packets have been duplicated while 665 traversing an in-band OAM domain. 667 REQ-E1.3: Detection of packet drops: It should be possible 668 to detect whether packets have been dropped while 669 traversing an in-band OAM domain. 671 6. Security Considerations and Requirements 673 General Security considerations will be addressed in a later version 674 of this document. Security considerations for Proof of Transit alone 675 are discussed below. 677 6.1. Proof of Transit 679 Threat Model: Attacks on the deployments could be due to malicious 680 administrators or accidental misconfigurations resulting in bypassing 681 of certain nodes. The solution approach should meet the following 682 requirements: 684 REQ-SEC1: Sound Proof of Transit: A valid and verifiable proof that 685 the packet definitively traversed through all the nodes as 686 expected. Probabilistic methods to achieve this should be 687 avoided, as the same could be exploited by an attacker. 689 REQ-SEC2: Tampering of meta data: An active attacker should not be 690 able to insert or modify or delete meta data in whole or 691 in parts and bypass few (or all) nodes. Any deviation 692 from the expected path should be accurately determined. 694 REQ-SEC3: Replay Attacks: A attacker (active/passive) should not be 695 able to reuse the proof of transit bits in the packet by 696 observing the OAM data in the packet, packet 697 characteristics (like IP addresses, octets transferred, 698 timestamps) or even the proof bits themselves. The 699 solution approach should consider usage of these 700 parameters for deriving any secrets cautiously. 701 Mitigating replay attacks beyond a window of longer 702 duration could be intractable to achieve with fixed number 703 of bits allocated for proof. 705 REQ-SEC4: Recycle Secrets: Any configuration of the secrets (like 706 cryptographic keys, initialisation vectors etc.) either in 707 the controller or service functions should be 708 reconfigurable. Solution approach should enable controls, 709 API calls etc. needed in order to perform such recycling. 710 It is desirable to provide recommendations on the duration 711 of rotation cycles needed for the secure functioning of 712 the overall system. 714 REQ-SEC5: Secret storage and distribution: Secrets should be shared 715 with the devices over secure channels. Methods should be 716 put in place so that secrets cannot be retrieved by non 717 authorized personnel from the devices. 719 7. IANA Considerations 721 [RFC Editor: please remove this section prior to publication.] 723 This document has no IANA actions. 725 8. Acknowledgements 727 The authors would like to thank Eric Vyncke, Nalini Elkins, Srihari 728 Raghavan, Ranganathan T S, Karthik Babu Harichandra Babu, Akshaya 729 Nadahalli, and Andrew Yourtchenko for the comments and advice. This 730 document leverages and builds on top of several concepts described in 731 [draft-kitamura-ipv6-record-route]. The authors would like to 732 acknowledge the work done by the author Hiroshi Kitamura and people 733 involved in writing it. 735 9. Informative References 737 [draft-brockners-proof-of-transit] 738 Brockners, F., Bhandari, S., and S. Dara, "Proof of 739 transit", July 2016. 741 [draft-kitamura-ipv6-record-route] 742 Kitamura, H., "Record Route for IPv6 (PR6),Hop-by-Hop 743 Option Extension", November 2000. 745 [I-D.brockners-lisp-sr] 746 Brockners, F., Bhandari, S., Maino, F., and D. Lewis, 747 "LISP Extensions for Segment Routing", draft-brockners- 748 lisp-sr-01 (work in progress), February 2014. 750 [I-D.hildebrand-spud-prototype] 751 Hildebrand, J. and B. Trammell, "Substrate Protocol for 752 User Datagrams (SPUD) Prototype", draft-hildebrand-spud- 753 prototype-03 (work in progress), March 2015. 755 [I-D.ietf-spring-segment-routing] 756 Filsfils, C., Previdi, S., Decraene, B., Litkowski, S., 757 and R. Shakir, "Segment Routing Architecture", draft-ietf- 758 spring-segment-routing-09 (work in progress), July 2016. 760 [I-D.lapukhov-dataplane-probe] 761 Lapukhov, P. and r. remy@barefootnetworks.com, "Data-plane 762 probe for in-band telemetry collection", draft-lapukhov- 763 dataplane-probe-01 (work in progress), June 2016. 765 [P4] Kim, , "P4: In-band Network Telemetry (INT)", September 766 2015. 768 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 769 DOI 10.17487/RFC0791, September 1981, 770 . 772 [RFC4884] Bonica, R., Gan, D., Tappan, D., and C. Pignataro, 773 "Extended ICMP to Support Multi-Part Messages", RFC 4884, 774 DOI 10.17487/RFC4884, April 2007, 775 . 777 [RFC4950] Bonica, R., Gan, D., Tappan, D., and C. Pignataro, "ICMP 778 Extensions for Multiprotocol Label Switching", RFC 4950, 779 DOI 10.17487/RFC4950, August 2007, 780 . 782 [RFC5837] Atlas, A., Ed., Bonica, R., Ed., Pignataro, C., Ed., Shen, 783 N., and JR. Rivers, "Extending ICMP for Interface and 784 Next-Hop Identification", RFC 5837, DOI 10.17487/RFC5837, 785 April 2010, . 787 [RFC7112] Gont, F., Manral, V., and R. Bonica, "Implications of 788 Oversized IPv6 Header Chains", RFC 7112, 789 DOI 10.17487/RFC7112, January 2014, 790 . 792 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 793 Weingarten, "An Overview of Operations, Administration, 794 and Maintenance (OAM) Tools", RFC 7276, 795 DOI 10.17487/RFC7276, June 2014, 796 . 798 [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function 799 Chaining (SFC) Architecture", RFC 7665, 800 DOI 10.17487/RFC7665, October 2015, 801 . 803 [RFC7872] Gont, F., Linkova, J., Chown, T., and W. Liu, 804 "Observations on the Dropping of Packets with IPv6 805 Extension Headers in the Real World", RFC 7872, 806 DOI 10.17487/RFC7872, June 2016, 807 . 809 Authors' Addresses 811 Frank Brockners 812 Cisco Systems, Inc. 813 Hansaallee 249, 3rd Floor 814 DUESSELDORF, NORDRHEIN-WESTFALEN 40549 815 Germany 817 Email: fbrockne@cisco.com 819 Shwetha Bhandari 820 Cisco Systems, Inc. 821 Cessna Business Park, Sarjapura Marathalli Outer Ring Road 822 Bangalore, KARNATAKA 560 087 823 India 825 Email: shwethab@cisco.com 827 Sashank Dara 828 Cisco Systems, Inc. 829 Cessna Business Park, Sarjapura Marathalli Outer Ring Road 830 Bangalore, KARNATAKA 560 087 831 India 833 Email: sadara@cisco.com 835 Carlos Pignataro 836 Cisco Systems, Inc. 837 7200-11 Kit Creek Road 838 Research Triangle Park, NC 27709 839 United States 841 Email: cpignata@cisco.com 843 Hannes Gredler 844 RtBrick Inc. 846 Email: hannes@rtbrick.com 848 John Leddy 849 Comcast 851 Email: John_Leddy@cable.comcast.com 852 Stephen Youell 853 JP Morgan Chase 854 25 Bank Street 855 London E14 5JP 856 United Kingdom 858 Email: stephen.youell@jpmorgan.com