idnits 2.17.1 draft-brockners-inband-oam-requirements-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 8, 2016) is 2849 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-15) exists of draft-ietf-spring-segment-routing-09 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Brockners 3 Internet-Draft S. Bhandari 4 Intended status: Informational S. Dara 5 Expires: January 9, 2017 C. Pignataro 6 Cisco 7 H. Gredler 8 RtBrick Inc. 9 July 8, 2016 11 Requirements for In-band OAM 12 draft-brockners-inband-oam-requirements-00 14 Abstract 16 This document discusses the motivation and requirements for including 17 specific operational and telemetry information into data packets 18 while the data packet traverses a path between two points in the 19 network. This method is referred to as "in-band" Operations, 20 Administration, and Maintenance (OAM), given that the OAM information 21 is carried with the data packets as opposed to in "out-of-band" 22 packets dedicated to OAM. In-band OAM complements other OAM 23 mechanisms which use dedicated probe packets to convey OAM 24 information. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on January 9, 2017. 43 Copyright Notice 45 Copyright (c) 2016 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 3. Motivation for In-band OAM . . . . . . . . . . . . . . . . . 4 63 3.1. Path Congruency Issues with Dedicated OAM Packets . . . . 4 64 3.2. Results Sent to a System Other Than the Sender . . . . . 5 65 3.3. Overlay and Underlay Correlation . . . . . . . . . . . . 5 66 3.4. SLA Verification . . . . . . . . . . . . . . . . . . . . 6 67 3.5. Analytics and Diagnostics . . . . . . . . . . . . . . . . 6 68 3.6. Frame Replication/Elimination Decision for Bi-casting 69 /Active-active Networks . . . . . . . . . . . . . . . . . 7 70 3.7. Proof of Transit . . . . . . . . . . . . . . . . . . . . 7 71 3.8. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8 72 4. Considerations for In-band OAM . . . . . . . . . . . . . . . 9 73 4.1. Type of Information to Be Recorded . . . . . . . . . . . 9 74 4.2. MTU and Packet Size . . . . . . . . . . . . . . . . . . . 10 75 4.3. Administrative Boundaries . . . . . . . . . . . . . . . . 10 76 4.4. Selective Enablement . . . . . . . . . . . . . . . . . . 11 77 4.5. Optimization of Node and Interface Identifiers . . . . . 11 78 4.6. Loop Communication Path (IPv6-specifics) . . . . . . . . 11 79 5. Requirements for In-band OAM Data Types . . . . . . . . . . . 12 80 5.1. Generic Requirements . . . . . . . . . . . . . . . . . . 12 81 5.2. In-band OAM Data with Per-hop Scope . . . . . . . . . . . 13 82 5.3. In-band OAM with Selected Hop Scope . . . . . . . . . . . 14 83 5.4. In-band OAM with End-to-end Scope . . . . . . . . . . . . 14 84 6. Security Considerations and Requirements . . . . . . . . . . 14 85 6.1. Proof of Transit . . . . . . . . . . . . . . . . . . . . 14 86 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 87 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 88 9. Informative References . . . . . . . . . . . . . . . . . . . 16 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 91 1. Introduction 93 This document discusses requirements for "in-band" Operations, 94 Administration, and Maintenance (OAM) mechanisms. "In-band" OAM 95 means to record OAM and telemetry information within the data packet 96 while the data packet traverses a network or a particular network 97 domain. The term "in-band" refers to the fact that the OAM and 98 telemetry data is carried within data packets rather than being sent 99 within packets specifically dedicated to OAM. In-band OAM 100 mechanisms, which are sometimes also referred to as embedded network 101 telemetry are a current topic of discussion. In-band network 102 telemetry has been defined for P4 [P4]. The SPUD prototype 103 [I-D.hildebrand-spud-prototype] uses a similar logic that allows 104 network devices on the path between endpoints to participate 105 explicitly in the tube outside the end-to-end context. Even the IPv4 106 route-record option defined in [RFC0791] can be considered an in-band 107 OAM mechanism. In-band OAM complements "out-of-band" mechanisms such 108 as ping or traceroute, or more recent active probing mechanisms, as 109 described in [I-D.lapukhov-dataplane-probe]. In-band OAM mechanisms 110 can be leveraged where current out-of-band mechanisms do not apply or 111 do not offer the desired characteristics or requirements, such as 112 proving that a certain set of traffic takes a pre-defined path, 113 strict congruency is desired, checking service level agreements for 114 the live data traffic, detailed statistics on traffic distribution 115 paths in networks that distribute traffic across multiple paths, or 116 scenarios where probe traffic is potentially handled differently from 117 regular data traffic by the network devices. [RFC7276] presents an 118 overview of OAM tools. 120 Compared to probably the most basic example of "in-band OAM" which is 121 IPv4 route recording [RFC0791], an in-band OAM approach has the 122 following capabilities: 124 a. A flexible data format to allow different types of information to 125 be captured as part of an in-band OAM operation, including not 126 only path tracing information, but additional operational and 127 telemetry information such as timestamps, sequence numbers, or 128 even generic data such as queue size, geo-location of the node 129 that forwarded the packet, etc. 131 b. A data format to express node as well as link identifiers to 132 record the path a packet takes with a fixed amount of added data. 134 c. The ability to detect whether any nodes were skipped while 135 recording in-band OAM information (i.e., in-band OAM is not 136 supported or not enabled on those nodes). 138 d. The ability to actively process information in the packet, for 139 example to prove in a cryptographically secure way that a packet 140 really took a pre-defined path using some traffic steering method 141 such as service chaining or traffic engineering. 143 e. The ability to include OAM data beyond simple path information, 144 such as timestamps or even generic data of a particular use case. 146 f. The ability to include OAM data in various different transport 147 protocols. 149 2. Conventions 151 Abbreviations used in this document: 153 ECMP: Equal Cost Multi-Path 155 MTU: Maximum Transmit Unit 157 NFV: Network Function Virtualization 159 OAM: Operations, Administration, and Maintenance 161 PMTU: Path MTU 163 SLA: Service Level Agreement 165 SFC: Service Function Chain 167 SR: Segment Routing 169 This document defines in-band Operations, Administration, and 170 Maintenance (in-band OAM), as the subset in which OAM information is 171 carried along with data packets. This is as opposed to "out-of-band 172 OAM", where specific packets are dedicated to carrying OAM 173 information. 175 3. Motivation for In-band OAM 177 In several scenarios it is beneficial to make information about which 178 path a packet took through the network available to the operator. 179 This includes not only tasks like debugging, troubleshooting, as well 180 as network planning and network optimization but also policy or 181 service level agreement compliance checks. This section discusses 182 the motivation to introduce new methods for enhanced in-band network 183 diagnostics. 185 3.1. Path Congruency Issues with Dedicated OAM Packets 187 Mechanisms which add tracing information to the regular data traffic, 188 sometimes also referred to as "in-band" or "passive OAM" can 189 complement active, probe-based mechanisms such as ping or traceroute, 190 which are sometimes considered as "out-of-band", because the messages 191 are transported independently from regular data traffic. "In-band" 192 mechanisms do not require extra packets to be sent and hence don't 193 change the packet traffic mix within the network. Traceroute and 194 ping for example use ICMP messages: New packets are injected to get 195 tracing information. Those add to the number of messages in a 196 network, which already might be highly loaded or suffering 197 performance issues for a particular path or traffic type. 199 Packet scheduling algorithms, especially for balancing traffic across 200 equal cost paths or links, often leverage information contained 201 within the packet, such as protocol number, IP-address or MAC- 202 address. Probe packets would thus either need to be sent from the 203 exact same endpoints with the exact same parameters, or probe packets 204 would need to be artificially constructed as "fake" packets and 205 inserted along the path. Both approaches are often not feasible from 206 an operational perspective, be it that access to the end-system is 207 not feasible, or that the diversity of parameters and associated 208 probe packets to be created is simply too large. An in-band 209 mechanism is an alternative in those cases. 211 In-band mechanisms also don't suffer from implementations, where 212 probe traffic is handled differently (and potentially forwarded 213 differently) by a router than regular data traffic. 215 3.2. Results Sent to a System Other Than the Sender 217 Traditional ping and traceroute tools return the OAM results to the 218 sender of the probe. Even when the ICMP messages that are used with 219 these tools are enhanced, and additional telemetry is collected 220 (e.g., ICMP Multi-Part [RFC4884] supporting MPLS information 221 [RFC4950], Interface and Next-Hop Identification [RFC5837], etc.), it 222 would be advantageous to separate the sending of an OAM probe from 223 the receiving of the telemetry data. In this context, it is desired 224 to not assume there is a bidirectional working path. 226 3.3. Overlay and Underlay Correlation 228 Several network deployments leverage tunneling mechanisms to create 229 overlay or service-layer networks. Examples include VXLAN-GPE, GRE, 230 or LISP. One often observed attribute of overlay networks is that 231 they do not offer the user of the overlay any insight into the 232 underlay network. This means that the path that a particular 233 tunneled packet takes, nor other operational details such as the per- 234 hop delay/jitter in the underlay are visible to the user of the 235 overlay network, giving rise to diagnosis and debugging challenges in 236 case of connectivity or performance issues. The scope of OAM tools 237 like ping or traceroute is limited to either the overlay or the 238 underlay which means that the user of the overlay has typically no 239 access to OAM in the underlay, unless specific operational procedures 240 are put in place. With in-band OAM the operator of the underlay can 241 offer details of the connectivity in the underlay to the user of the 242 overlay. The operator of the egress tunnel router could choose to 243 share the recorded information about the path with the user of the 244 overlay. 246 Coupled with mechanisms such as Segment Routing (SR) 247 [I-D.ietf-spring-segment-routing], overlay network and underlay 248 network can be more tightly coupled: The user of the overlay has 249 detailed diagnostic information available in case of failure 250 conditions. The user of the overlay can also use the path recording 251 information as input to traffic steering or traffic engineering 252 mechanisms, to for example achieve path symmetry for the traffic 253 between two endpoints. [I-D.brockners-lisp-sr] is an example for how 254 these methods can be applied to LISP. 256 3.4. SLA Verification 258 In-band OAM can help users of an overlay-service to verify that 259 negotiated SLAs for the real traffic are met by the underlay network 260 provider. Different from solutions which rely on active probes to 261 test an SLA, in-band OAM based mechanisms avoid wrong interpretations 262 and "cheating", which can happen if the probe traffic that is used to 263 perform SLA-check is prioritized by the network provider of the 264 underlay. 266 3.5. Analytics and Diagnostics 268 Network planners and operators benefit from knowledge of the actual 269 traffic distribution in the network. When deriving an overall 270 network connectivity traffic matrix one typically needs to correlate 271 data gathered from each individual devices in the network. If the 272 path of a packet is recorded while the packet is forwarded, the 273 entire path that a packet took through the network is available to 274 the egress system. This obviates the need to retrieve individual 275 traffic statistics from every device in the network and correlate 276 those statistics, or employ other mechanisms such as leveraging 277 traffic engineering with null-bandwidth tunnels just to retrieve the 278 appropriate statistics to generate the traffic matrix. 280 In addition, with individual path tracing, information is available 281 at packet level granularity, rather than only at aggregate level - as 282 is usually the case with IPFIX-style methods which employ flow- 283 filters at the network elements. Data-center networks which use 284 equal-cost multipath (ECMP) forwarding are one example where detailed 285 statistics on flow distribution in the network are highly desired. 286 If a network supports ECMP, one can create detailed statistics for 287 the different paths packets take through the network at the egress 288 system, without a need to correlate/aggregate statistics from every 289 router in the system. Transit devices are off-loaded from the task 290 of gathering packet statistics. 292 3.6. Frame Replication/Elimination Decision for Bi-casting/Active- 293 active Networks 295 Bandwidth- and power-constrained, time-sensitive, or loss-intolerant 296 networks (e.g., networks for industry automation/control, health 297 care) require efficient OAM methods to decide when to replicate 298 packets to a secondary path in order to keep the loss/error-rate for 299 the receiver at a tolerable level - and also when to stop replication 300 and eliminate the redundant flow. Many IoT networks are time 301 sensitive and cannot leverage automatic retransmission requests (ARQ) 302 to cope with transmission errors or lost packets. Transmitting the 303 data over multiple disparate paths (often called bi-casting or live- 304 live) is a method used to reduce the error rate observed by the 305 receiver. TSN receive a lot of attention from the manufacturing 306 industry as shown by a various standardization activities and 307 industry forums being formed (see e.g., IETF 6TiSCH, IEEE P802.1CB, 308 AVnu). 310 3.7. Proof of Transit 312 Several deployments use traffic engineering, policy routing, segment 313 routing or Service Function Chaining (SFC) [RFC7665] to steer packets 314 through a specific set of nodes. In certain cases regulatory 315 obligations or a compliance policy require to prove that all packets 316 that are supposed to follow a specific path are indeed being 317 forwarded across the exact set of nodes specified. If a packet flow 318 is supposed to go through a series of service functions or network 319 nodes, it has to be proven that all packets of the flow actually went 320 through the service chain or collection of nodes specified by the 321 policy. In case the packets of a flow weren't appropriately 322 processed, a verification device would be required to identify the 323 policy violation and take corresponding actions (e.g., drop or 324 redirect the packet, send an alert etc.) corresponding to the policy. 325 In today's deployments, the proof that a packet traversed a 326 particular service chain is typically delivered in an indirect way: 327 Service appliances and network forwarding are in different trust 328 domains. Physical hand-off-points are defined between these trust 329 domains (i.e., physical interfaces). Or in other terms, in the 330 "network forwarding domain" things are wired up in a way that traffic 331 is delivered to the ingress interface of a service appliance and 332 received back from an egress interface of a service appliance. This 333 "wiring" is verified and trusted. The evolution to Network Function 334 Virtualization (NFV) and modern service chaining concepts (using 335 technologies such as LISP, NSH, Segment Routing, etc.) blurs the line 336 between the different trust domains, because the hand-off-points are 337 no longer clearly defined physical interfaces, but are virtual 338 interfaces. Because of that very reason, networks operators require 339 that different trust layers not to be mixed in the same device. For 340 an NFV scenario a different proof is required. Offering a proof that 341 a packet traversed a specific set of service functions would allow 342 network operators to move away from the above described indirect 343 methods of proving that a service chain is in place for a particular 344 application. 346 A solution approach could be based on OAM data which is added to 347 every packet for achieving Proof Of Transit. The OAM data is updated 348 at every hop and is used to verify whether a packet traversed all 349 required nodes. When the verifier receives each packet, it can 350 validate whether the packet traversed the service chain correctly. 351 The detailed mechanisms used for path verification along with the 352 procedures applied to the OAM data carried in the packet for path 353 verification are beyond the scope of this document. Details are 354 addressed in [draft-brockners-proof-of-transit]. In this document 355 the term "proof" refers to a discrete set of bits that represents an 356 integer or string carried as OAM data. The OAM data is used to 357 verify whether a packet traversed the nodes it is supposed to 358 traverse. 360 3.8. Use Cases 362 In-band OAM could be leveraged for several use cases, including: 364 o Traffic Matrix: Derive the network traffic matrix: Traffic for a 365 given time interval between any two edge nodes of a given domain. 366 Could be performed for all traffic or per QoS-class. 368 o Flow Debugging: Discover which path(s) a particular set of traffic 369 (identified by an n-tuple) takes in the network. Such a procedure 370 is particularly useful in case traffic is balanced across multiple 371 paths, like with link aggregation (LACP) or equal cost multi- 372 pathing (ECMP). 374 o Loss Statistics per Path: Retrieve loss statistics per flow and 375 path in the network. 377 o Path Heat Maps: Discover highly utilized links in the network. 379 o Trend Analysis on Traffic Patterns: Analyze if (and if so how) the 380 forwarding path for a specific set of traffic changes over time 381 (can give hints to routing issues, unstable links etc.). 383 o Network Delay Distribution: Show delay distribution across network 384 by node or links. If enabled per application or for a specific 385 flow then display the path taken along with the delay incurred at 386 every hop. 388 o SLA Verification: Verify that a negotiated service level agreement 389 (SLA), e.g., for packet drop rates or delay/jitter is conformed to 390 by the actual traffic. 392 o Low-power Networks: Include application level OAM information 393 (e.g., battery charge level, cache or buffer fill level) into data 394 traffic to avoid sending extra OAM traffic which incur an extra 395 cost on the devices. Using the battery charge level as example, 396 one could avoid sending extra OAM packets just to communicate 397 battery health, and as such would save battery on sensors. 399 o Path Verification or Service Function Path Verification: Proof and 400 verification of packets traversing check points in the network, 401 where check points can be nodes in the network or service 402 functions. 404 o Geo-location Policy: Network policy implemented based on which 405 path packets took. Example: Only if packets originated and stayed 406 within the trading-floor department, access to specific 407 applications or servers is granted. 409 4. Considerations for In-band OAM 411 The implementation of an in-band OAM mechanism needs to take several 412 considerations into account, including administrative boundaries, how 413 information is recorded, Maximum Transfer Unit (MTU), Path MTU 414 discovery and packet size, etc. 416 4.1. Type of Information to Be Recorded 418 The information gathered for in-band OAM can be categorized into 419 three main categories: Information with a per-hop scope, such as path 420 tracing; information which applies to a specific set of nodes, such 421 as path or service chain verification; information which only applies 422 to the edges of a domain, such as sequence numbers. 424 o "edge to edge": Information that needs to be shared between 425 network edges (the "edge" of a network could either be a host or a 426 domain edge device): Edge to edge data e.g., packet and octet 427 count of data entering a well-defined domain and leaving it is 428 helpful in building traffic matrix, sequence number (also called 429 "path packet counters") is useful for the flow to detect packet 430 loss. 432 o "selected hops": Information that applies to a specific set of 433 nodes only. In case of path verification, only the nodes which 434 are "check points" are required to interpret and update the 435 information in the packet. 437 o "per hop": Information that is gathered at every hop along the 438 path a packet traverses within an administrative domain: 440 * Hop by Hop information e.g., Nodes visited for path tracing, 441 Timestamps at each hop to find delays along the path 443 * Stats collection at each hop to optimize communication in 444 resource constrained networks e.g., Battery, CPU, memory status 445 of each node piggy backed in a data packet is useful in low 446 power lossy networks where network nodes are mostly asleep and 447 communication is expensive 449 4.2. MTU and Packet Size 451 The recorded data at every hop may lead to packet size exceeding the 452 Maximum Transmit Unit (MTU). Based on the transport protocol used 453 MTU is discovered as a configuration parameter or Path MTU (PMTU) is 454 discovered dynamically. Example: IPv6 recommends PMTU discovery 455 before data packets are sent to prevent packet fragmentation. It 456 specifies 1280 octets as the default PDU to be carried in a IPv6 457 datagram. A detailed discussion of the implications of oversized 458 IPv6 header chains if found in [RFC7112]. 460 The Path MTU restricts the amount of data that can be recorded for 461 purpose of OAM within a data packet. The total size of data to be 462 recorded needs to be preset to avoid packet size exceeding the MTU. 463 It is recommended to pre-calculate and configures network devices to 464 limit the in-band OAM data that is attached to a packet. 466 4.3. Administrative Boundaries 468 There are challenges in enabling in-band OAM in the public Internet 469 across administrative domains: 471 o Deployment dependent, the data fields that in-band OAM requires as 472 part of a specific transport protocol may not be supported across 473 administrative boundaries. 475 o Current OAM implementations are often done in the slow path, i.e., 476 OAM packets are punted to router's CPU for processing. This leads 477 to performance and scaling issues and opens up routers for attacks 478 such as Denial of Service (DoS) attacks. 480 o Discovery of network topology and details of the network devices 481 across administrative boundaries may open up attack vectors 482 compromising network security. 484 o Specifically on IPv6: At the administrative boundaries IPv6 485 packets with extension headers are dropped for several reasons 486 described in [RFC7872] 488 The following considerations will be discussed in a future version of 489 this document: If the packet is dropped due to the presence of the 490 in-band OAM; If the policy failure is treated as feature disablement 491 and any further recording is stopped but the packet itself is not 492 dropped, it may lead to every node in the path to make this policy 493 decision. 495 4.4. Selective Enablement 497 Deployment dependent, in-band OAM could either be used for all, or 498 only a subset of the overall traffic. While it might be desirable to 499 apply in-band OAM to all traffic and then selectively use the data 500 gathered in case needed, it might not always be feasible. Depending 501 on the forwarding infrastructure used, in-band OAM can have an impact 502 on forwarding performance. The SPUD prototype for example uses the 503 notion of "pipes" to describe the portion of the traffic that could 504 be subject to in-path inspection. Mechanisms to decide which traffic 505 would be subject to in-band OAM are outside the scope of this 506 document. 508 4.5. Optimization of Node and Interface Identifiers 510 Since packets have a finite maximum size, the data recording or 511 carrying capacity of one packet in which the in-band OAM meta data is 512 present is limited. In-band OAM should use its own dedicated 513 namespace (confined to the domain in-band OAM operates in) to 514 represent node and interface IDs to save space in the header. 515 Generic representations of node and interface identifiers which are 516 globally unique (such as a UUID) would consume significantly more 517 bits of in-band OAM data. 519 4.6. Loop Communication Path (IPv6-specifics) 521 When recorded data is required to be analyzed on a source node that 522 issues a packet and inserts in-band OAM data, the recorded data needs 523 to be carried back to the source node. 525 One way to carry the in-band OAM data back to the source is to 526 utilize an ICMP Echo Request/Reply (ping) or ICMPv6 Echo Request/ 527 Reply (ping6) mechanism. In order to run the in-band OAM mechanism 528 appropriately on the ping/ping6 mechanism, the following two 529 operations should be implemented by the ping/ping6 target node: 531 1. All of the in-band OAM fields would be copied from an Echo 532 Request message to an Echo Reply message. 534 2. The Hop Limit field of the IPv6 header of these messages would be 535 copied as a continuous sequence. Further considerations are 536 addressed in a future version of this document. 538 5. Requirements for In-band OAM Data Types 540 The above discussed use cases require different types of in-band OAM 541 data. This section details requirements for in-band OAM derived from 542 the discussion above. 544 5.1. Generic Requirements 546 REQ-G1: Classification: It should be possible to enable in-band OAM 547 on a selected set of traffic. The selected set of traffic 548 can also be all traffic. 550 REQ-G2: Scope: If in-band OAM is used only within a specific domain, 551 provisions need to be put in place to ensure that in-band 552 OAM data stays within the specific domain only. 554 REQ-G3: Transport independence: Data formats for in-band OAM shall 555 be defined in a transport independent way. In-band OAM 556 applies to a variety of transport protocols. Encapsulations 557 should be defined how the generic data formats are carried 558 by a specific protocol. 560 REQ-G4: Layering: It should be possible to have in-band OAM 561 information for different transport protocol layers be 562 present in several fields within a single packet. This 563 could for example be the case when tunnels are employed and 564 in-band OAM information is to be gathered for both the 565 underlay as well as the overlay network. 567 REQ-G5: MTU size: With in-band OAM information added, packets should 568 not become larger than the path MTU. 570 REQ-G6: Data Structure Reusability: The data types and data formats 571 defined and used for in-band OAM ought to be reusable for 572 out-of-band OAM telemetry as well. 574 5.2. In-band OAM Data with Per-hop Scope 576 REQ-H1: Missing nodes detection: Data shall be present that allows a 577 node to detect whether all nodes that should participate in 578 in-band OAM operations have indeed participated. 580 REQ-H2: Node, instance or device identifier: Data shall be present 581 that allows to retrieve the identity of the entity reporting 582 telemetry information. The entity can be a device, or a 583 subsystem/component within a device. The latter will allow 584 for packet tracing within a device in much the same way as 585 between devices. 587 REQ-H3: Ingress interface identifier: Data shall be present that 588 allows the identification of the interface a particular 589 packet was received from. The interface can be a logical or 590 physical entity. 592 REQ-H4: Egress interface identifier: Data shall be present that 593 allows the identification of the interface a particular 594 packet was forwarded to. Interface can be a logical or 595 physical entity. 597 REQ-H5: Time-related requirements 599 REQ-H5.1: Delay: Data shall be present that allows to 600 retrieve the delay between two or more points of 601 interest within the system. Those points can be 602 within the same device or on different devices. 604 REQ-H5.2: Jitter: Data shall be present that allows to 605 retrieve the jitter between two or more points of 606 interest within the system. Those points can be 607 within the same device or on different devices. 609 REQ-H5.3: Wall-clock time: Data shall be present that 610 allows to retrieve the wall-clock time visited a 611 particular point of interest in the system. 613 REQ-H5.4: Time precision: The precision of the time related 614 data should be configurable. Use-case dependent, 615 the required precision could e.g., be nano- 616 seconds, micro-seconds, milli-seconds, or 617 seconds. 619 REQ-H6: Generic data records (like e.g., GPS/Geo-location 620 information): It should be possible to add user-defined OAM 621 data at select hops to the packet. The semantics of the 622 data are defined by the user. 624 5.3. In-band OAM with Selected Hop Scope 626 REQ-S1: Proof of transit: Data shall be present which allows to 627 securely prove that a packet has visited or ore several 628 particular points of interest (i.e., a particular set of 629 nodes). 631 REQ-S1.1: In case "Shamir's secret sharing scheme" is used 632 for proof of transit, two data records, "random" 633 and "cumulative" shall be present. The number of 634 bits used for "random" and "cumulative" data 635 records can vary between deployments and should 636 thus be configurable. 638 5.4. In-band OAM with End-to-end Scope 640 REQ-E1: Sequence numbering: 642 REQ-E1.1: Reordering detection: It should be possible to 643 detect whether packets have been reordered while 644 traversing an in-band OAM domain. 646 REQ-E1.2: Duplicates detection: It should be possible to 647 detect whether packets have been duplicated while 648 traversing an in-band OAM domain. 650 REQ-E1.3: Detection of packet drops: It should be possible 651 to detect whether packets have been dropped while 652 traversing an in-band OAM domain. 654 6. Security Considerations and Requirements 656 General Security considerations will be addressed in a later version 657 of this document. Security considerations for Proof of Transit alone 658 are discussed below. 660 6.1. Proof of Transit 662 Threat Model: Attacks on the deployments could be due to malicious 663 administrators or accidental misconfigurations resulting in bypassing 664 of certain nodes. The solution approach should meet the following 665 requirements: 667 REQ-SEC1: Sound Proof of Transit: A valid and verifiable proof that 668 the packet definitively traversed through all the nodes as 669 expected. Probabilistic methods to achieve this should be 670 avoided, as the same could be exploited by an attacker. 672 REQ-SEC2: Tampering of meta data: An active attacker should not be 673 able to insert or modify or delete meta data in whole or 674 in parts and bypass few (or all) nodes. Any deviation 675 from the expected path should be accurately determined. 677 REQ-SEC3: Replay Attacks: A attacker (active/passive) should not be 678 able to reuse the proof of transit bits in the packet by 679 observing the OAM data in the packet, packet 680 characteristics (like IP addresses, octets transferred, 681 timestamps) or even the proof bits themselves. The 682 solution approach should consider usage of these 683 parameters for deriving any secrets cautiously. 684 Mitigating replay attacks beyond a window of longer 685 duration could be intractable to achieve with fixed number 686 of bits allocated for proof. 688 REQ-SEC4: Recycle Secrets: Any configuration of the secrets (like 689 cryptographic keys, initialisation vectors etc.) either in 690 the controller or service functions should be 691 reconfigurable. Solution approach should enable controls, 692 API calls etc. needed in order to perform such recycling. 693 It is desirable to provide recommendations on the duration 694 of rotation cycles needed for the secure functioning of 695 the overall system. 697 REQ-SEC5: Secret storage and distribution: Secrets should be shared 698 with the devices over secure channels. Methods should be 699 put in place so that secrets cannot be retrieved by non 700 authorized personnel from the devices. 702 7. IANA Considerations 704 [RFC Editor: please remove this section prior to publication.] 706 This document has no IANA actions. 708 8. Acknowledgements 710 The authors would like to thank Steve Youell, Eric Vyncke, Nalini 711 Elkins, Srihari Raghavan, Ranganathan T S, Karthik Babu Harichandra 712 Babu, Akshaya Nadahalli, and Andrew Yourtchenko for the comments and 713 advice. This document leverages and builds on top of several 714 concepts described in [draft-kitamura-ipv6-record-route]. The 715 authors would like to acknowledge the work done by the author Hiroshi 716 Kitamura and people involved in writing it. 718 9. Informative References 720 [draft-brockners-proof-of-transit] 721 Brockners, F., Bhandari, S., and S. Dara, "Proof of 722 transit", July 2016. 724 [draft-kitamura-ipv6-record-route] 725 Kitamura, H., "Record Route for IPv6 (PR6),Hop-by-Hop 726 Option Extension", November 2000. 728 [I-D.brockners-lisp-sr] 729 Brockners, F., Bhandari, S., Maino, F., and D. Lewis, 730 "LISP Extensions for Segment Routing", draft-brockners- 731 lisp-sr-01 (work in progress), February 2014. 733 [I-D.hildebrand-spud-prototype] 734 Hildebrand, J. and B. Trammell, "Substrate Protocol for 735 User Datagrams (SPUD) Prototype", draft-hildebrand-spud- 736 prototype-03 (work in progress), March 2015. 738 [I-D.ietf-spring-segment-routing] 739 Filsfils, C., Previdi, S., Decraene, B., Litkowski, S., 740 and R. Shakir, "Segment Routing Architecture", draft-ietf- 741 spring-segment-routing-09 (work in progress), July 2016. 743 [I-D.lapukhov-dataplane-probe] 744 Lapukhov, P. and r. remy@barefootnetworks.com, "Data-plane 745 probe for in-band telemetry collection", draft-lapukhov- 746 dataplane-probe-01 (work in progress), June 2016. 748 [P4] Kim, , "P4: In-band Network Telemetry (INT)", September 749 2015. 751 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 752 DOI 10.17487/RFC0791, September 1981, 753 . 755 [RFC4884] Bonica, R., Gan, D., Tappan, D., and C. Pignataro, 756 "Extended ICMP to Support Multi-Part Messages", RFC 4884, 757 DOI 10.17487/RFC4884, April 2007, 758 . 760 [RFC4950] Bonica, R., Gan, D., Tappan, D., and C. Pignataro, "ICMP 761 Extensions for Multiprotocol Label Switching", RFC 4950, 762 DOI 10.17487/RFC4950, August 2007, 763 . 765 [RFC5837] Atlas, A., Ed., Bonica, R., Ed., Pignataro, C., Ed., Shen, 766 N., and JR. Rivers, "Extending ICMP for Interface and 767 Next-Hop Identification", RFC 5837, DOI 10.17487/RFC5837, 768 April 2010, . 770 [RFC7112] Gont, F., Manral, V., and R. Bonica, "Implications of 771 Oversized IPv6 Header Chains", RFC 7112, 772 DOI 10.17487/RFC7112, January 2014, 773 . 775 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 776 Weingarten, "An Overview of Operations, Administration, 777 and Maintenance (OAM) Tools", RFC 7276, 778 DOI 10.17487/RFC7276, June 2014, 779 . 781 [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function 782 Chaining (SFC) Architecture", RFC 7665, 783 DOI 10.17487/RFC7665, October 2015, 784 . 786 [RFC7872] Gont, F., Linkova, J., Chown, T., and W. Liu, 787 "Observations on the Dropping of Packets with IPv6 788 Extension Headers in the Real World", RFC 7872, 789 DOI 10.17487/RFC7872, June 2016, 790 . 792 Authors' Addresses 794 Frank Brockners 795 Cisco Systems, Inc. 796 Hansaallee 249, 3rd Floor 797 DUESSELDORF, NORDRHEIN-WESTFALEN 40549 798 Germany 800 Email: fbrockne@cisco.com 802 Shwetha Bhandari 803 Cisco Systems, Inc. 804 Cessna Business Park, Sarjapura Marathalli Outer Ring Road 805 Bangalore, KARNATAKA 560 087 806 India 808 Email: shwethab@cisco.com 809 Sashank Dara 810 Cisco Systems, Inc. 811 Cessna Business Park, Sarjapura Marathalli Outer Ring Road 812 Bangalore, KARNATAKA 560 087 813 India 815 Email: sadara@cisco.com 817 Carlos Pignataro 818 Cisco Systems, Inc. 819 7200-11 Kit Creek Road 820 Research Triangle Park, NC 27709 821 United States 823 Email: cpignata@cisco.com 825 Hannes Gredler 826 RtBrick Inc. 828 Email: hannes@rtbrick.com