idnits 2.17.1 

draft-brockners-inband-oam-requirements-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 18, 2016) is 2840 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-15) exists of
     draft-ietf-spring-segment-routing-09


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                       F. Brockners
3	Internet-Draft                                               S. Bhandari
4	Intended status: Informational                                   S. Dara
5	Expires: January 19, 2017                                   C. Pignataro
6	                                                                   Cisco
7	                                                              H. Gredler
8	                                                            RtBrick Inc.
9	                                                                J. Leddy
10	                                                                 Comcast
11	                                                               S. Youell
12	                                                                    JMPC
13	                                                           July 18, 2016

15	                      Requirements for In-band OAM
16	               draft-brockners-inband-oam-requirements-01

18	Abstract

20	   This document discusses the motivation and requirements for including
21	   specific operational and telemetry information into data packets
22	   while the data packet traverses a path between two points in the
23	   network.  This method is referred to as "in-band" Operations,
24	   Administration, and Maintenance (OAM), given that the OAM information
25	   is carried with the data packets as opposed to in "out-of-band"
26	   packets dedicated to OAM.  In-band OAM complements other OAM
27	   mechanisms which use dedicated probe packets to convey OAM
28	   information.

30	Status of This Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at http://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	   This Internet-Draft will expire on January 19, 2017.

47	Copyright Notice

49	   Copyright (c) 2016 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
65	   2.  Conventions . . . . . . . . . . . . . . . . . . . . . . . . .   4
66	   3.  Motivation for In-band OAM  . . . . . . . . . . . . . . . . .   4
67	     3.1.  Path Congruency Issues with Dedicated OAM Packets . . . .   5
68	     3.2.  Results Sent to a System Other Than the Sender  . . . . .   5
69	     3.3.  Overlay and Underlay Correlation  . . . . . . . . . . . .   5
70	     3.4.  SLA Verification  . . . . . . . . . . . . . . . . . . . .   6
71	     3.5.  Analytics and Diagnostics . . . . . . . . . . . . . . . .   6
72	     3.6.  Frame Replication/Elimination Decision for Bi-casting
73	           /Active-active Networks . . . . . . . . . . . . . . . . .   7
74	     3.7.  Proof of Transit  . . . . . . . . . . . . . . . . . . . .   7
75	     3.8.  Use Cases . . . . . . . . . . . . . . . . . . . . . . . .   8
76	   4.  Considerations for In-band OAM  . . . . . . . . . . . . . . .   9
77	     4.1.  Type of information to be recorded  . . . . . . . . . . .  10
78	     4.2.  MTU and packet size . . . . . . . . . . . . . . . . . . .  10
79	     4.3.  Administrative boundaries . . . . . . . . . . . . . . . .  11
80	     4.4.  Selective enablement  . . . . . . . . . . . . . . . . . .  11
81	     4.5.  Optimization of node and interface identifiers  . . . . .  12
82	     4.6.  Loop communication path (IPv6-specifics)  . . . . . . . .  12
83	   5.  Requirements for In-band OAM Data Types . . . . . . . . . . .  12
84	     5.1.  Generic Requirements  . . . . . . . . . . . . . . . . . .  12
85	     5.2.  In-band OAM Data with Per-hop Scope . . . . . . . . . . .  13
86	     5.3.  In-band OAM with Selected Hop Scope . . . . . . . . . . .  14
87	     5.4.  In-band OAM with End-to-end Scope . . . . . . . . . . . .  14
88	   6.  Security Considerations and Requirements  . . . . . . . . . .  15
89	     6.1.  Proof of Transit  . . . . . . . . . . . . . . . . . . . .  15
90	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
91	   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  16
92	   9.  Informative References  . . . . . . . . . . . . . . . . . . .  16
93	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  18

95	1.  Introduction

97	   This document discusses requirements for "in-band" Operations,
98	   Administration, and Maintenance (OAM) mechanisms.  "In-band" OAM
99	   means to record OAM and telemetry information within the data packet
100	   while the data packet traverses a network or a particular network
101	   domain.  The term "in-band" refers to the fact that the OAM and
102	   telemetry data is carried within data packets rather than being sent
103	   within packets specifically dedicated to OAM.  In-band OAM
104	   mechanisms, which are sometimes also referred to as embedded network
105	   telemetry are a current topic of discussion.  In-band network
106	   telemetry has been defined for P4 [P4].  The SPUD prototype
107	   [I-D.hildebrand-spud-prototype] uses a similar logic that allows
108	   network devices on the path between endpoints to participate
109	   explicitly in the tube outside the end-to-end context.  Even the IPv4
110	   route-record option defined in [RFC0791] can be considered an in-band
111	   OAM mechanism.  In-band OAM complements "out-of-band" mechanisms such
112	   as ping or traceroute, or more recent active probing mechanisms, as
113	   described in [I-D.lapukhov-dataplane-probe].  In-band OAM mechanisms
114	   can be leveraged where current out-of-band mechanisms do not apply or
115	   do not offer the desired characteristics or requirements, such as
116	   proving that a certain set of traffic takes a pre-defined path,
117	   strict congruency is desired, checking service level agreements for
118	   the live data traffic, detailed statistics on traffic distribution
119	   paths in networks that distribute traffic across multiple paths, or
120	   scenarios where probe traffic is potentially handled differently from
121	   regular data traffic by the network devices.  [RFC7276] presents an
122	   overview of OAM tools.

124	   Compared to probably the most basic example of "in-band OAM" which is
125	   IPv4 route recording [RFC0791], an in-band OAM approach has the
126	   following capabilities:

128	   a.  A flexible data format to allow different types of information to
129	       be captured as part of an in-band OAM operation, including not
130	       only path tracing information, but additional operational and
131	       telemetry information such as timestamps, sequence numbers, or
132	       even generic data such as queue size, geo-location of the node
133	       that forwarded the packet, etc.

135	   b.  A data format to express node as well as link identifiers to
136	       record the path a packet takes with a fixed amount of added data.

138	   c.  The ability to detect whether any nodes were skipped while
139	       recording in-band OAM information (i.e., in-band OAM is not
140	       supported or not enabled on those nodes).

142	   d.  The ability to actively process information in the packet, for
143	       example to prove in a cryptographically secure way that a packet
144	       really took a pre-defined path using some traffic steering method
145	       such as service chaining or traffic engineering.

147	   e.  The ability to include OAM data beyond simple path information,
148	       such as timestamps or even generic data of a particular use case.

150	   f.  The ability to include OAM data in various different transport
151	       protocols.

153	2.  Conventions

155	   Abbreviations used in this document:

157	   ECMP:      Equal Cost Multi-Path

159	   MTU:       Maximum Transmit Unit

161	   NFV:       Network Function Virtualization

163	   OAM:       Operations, Administration, and Maintenance

165	   PMTU:      Path MTU

167	   SLA:       Service Level Agreement

169	   SFC:       Service Function Chain

171	   SR:        Segment Routing

173	   This document defines in-band Operations, Administration, and
174	   Maintenance (in-band OAM), as the subset in which OAM information is
175	   carried along with data packets.  This is as opposed to "out-of-band
176	   OAM", where specific packets are dedicated to carrying OAM
177	   information.

179	3.  Motivation for In-band OAM

181	   In several scenarios it is beneficial to make information about which
182	   path a packet took through the network available to the operator.
183	   This includes not only tasks like debugging, troubleshooting, as well
184	   as network planning and network optimization but also policy or
185	   service level agreement compliance checks.  This section discusses
186	   the motivation to introduce new methods for enhanced in-band network
187	   diagnostics.

189	3.1.  Path Congruency Issues with Dedicated OAM Packets

191	   Mechanisms which add tracing information to the regular data traffic,
192	   sometimes also referred to as "in-band" or "passive OAM" can
193	   complement active, probe-based mechanisms such as ping or traceroute,
194	   which are sometimes considered as "out-of-band", because the messages
195	   are transported independently from regular data traffic.  "In-band"
196	   mechanisms do not require extra packets to be sent and hence don't
197	   change the packet traffic mix within the network.  Traceroute and
198	   ping for example use ICMP messages: New packets are injected to get
199	   tracing information.  Those add to the number of messages in a
200	   network, which already might be highly loaded or suffering
201	   performance issues for a particular path or traffic type.

203	   Packet scheduling algorithms, especially for balancing traffic across
204	   equal cost paths or links, often leverage information contained
205	   within the packet, such as protocol number, IP-address or MAC-
206	   address.  Probe packets would thus either need to be sent from the
207	   exact same endpoints with the exact same parameters, or probe packets
208	   would need to be artificially constructed as "fake" packets and
209	   inserted along the path.  Both approaches are often not feasible from
210	   an operational perspective, be it that access to the end-system is
211	   not feasible, or that the diversity of parameters and associated
212	   probe packets to be created is simply too large.  An in-band
213	   mechanism is an alternative in those cases.

215	   In-band mechanisms also don't suffer from implementations, where
216	   probe traffic is handled differently (and potentially forwarded
217	   differently) by a router than regular data traffic.

219	3.2.  Results Sent to a System Other Than the Sender

221	   Traditional ping and traceroute tools return the OAM results to the
222	   sender of the probe.  Even when the ICMP messages that are used with
223	   these tools are enhanced, and additional telemetry is collected
224	   (e.g., ICMP Multi-Part [RFC4884] supporting MPLS information
225	   [RFC4950], Interface and Next-Hop Identification [RFC5837], etc.), it
226	   would be advantageous to separate the sending of an OAM probe from
227	   the receiving of the telemetry data.  In this context, it is desired
228	   to not assume there is a bidirectional working path.

230	3.3.  Overlay and Underlay Correlation

232	   Several network deployments leverage tunneling mechanisms to create
233	   overlay or service-layer networks.  Examples include VXLAN-GPE, GRE,
234	   or LISP.  One often observed attribute of overlay networks is that
235	   they do not offer the user of the overlay any insight into the
236	   underlay network.  This means that the path that a particular
237	   tunneled packet takes, nor other operational details such as the per-
238	   hop delay/jitter in the underlay are visible to the user of the
239	   overlay network, giving rise to diagnosis and debugging challenges in
240	   case of connectivity or performance issues.  The scope of OAM tools
241	   like ping or traceroute is limited to either the overlay or the
242	   underlay which means that the user of the overlay has typically no
243	   access to OAM in the underlay, unless specific operational procedures
244	   are put in place.  With in-band OAM the operator of the underlay can
245	   offer details of the connectivity in the underlay to the user of the
246	   overlay.  The operator of the egress tunnel router could choose to
247	   share the recorded information about the path with the user of the
248	   overlay.

250	   Coupled with mechanisms such as Segment Routing (SR)
251	   [I-D.ietf-spring-segment-routing], overlay network and underlay
252	   network can be more tightly coupled: The user of the overlay has
253	   detailed diagnostic information available in case of failure
254	   conditions.  The user of the overlay can also use the path recording
255	   information as input to traffic steering or traffic engineering
256	   mechanisms, to for example achieve path symmetry for the traffic
257	   between two endpoints.  [I-D.brockners-lisp-sr] is an example for how
258	   these methods can be applied to LISP.

260	3.4.  SLA Verification

262	   In-band OAM can help users of an overlay-service to verify that
263	   negotiated SLAs for the real traffic are met by the underlay network
264	   provider.  Different from solutions which rely on active probes to
265	   test an SLA, in-band OAM based mechanisms avoid wrong interpretations
266	   and "cheating", which can happen if the probe traffic that is used to
267	   perform SLA-check is prioritized by the network provider of the
268	   underlay.

270	3.5.  Analytics and Diagnostics

272	   Network planners and operators benefit from knowledge of the actual
273	   traffic distribution in the network.  When deriving an overall
274	   network connectivity traffic matrix one typically needs to correlate
275	   data gathered from each individual devices in the network.  If the
276	   path of a packet is recorded while the packet is forwarded, the
277	   entire path that a packet took through the network is available to
278	   the egress system.  This obviates the need to retrieve individual
279	   traffic statistics from every device in the network and correlate
280	   those statistics, or employ other mechanisms such as leveraging
281	   traffic engineering with null-bandwidth tunnels just to retrieve the
282	   appropriate statistics to generate the traffic matrix.

284	   In addition, with individual path tracing, information is available
285	   at packet level granularity, rather than only at aggregate level - as
286	   is usually the case with IPFIX-style methods which employ flow-
287	   filters at the network elements.  Data-center networks which use
288	   equal-cost multipath (ECMP) forwarding are one example where detailed
289	   statistics on flow distribution in the network are highly desired.
290	   If a network supports ECMP, one can create detailed statistics for
291	   the different paths packets take through the network at the egress
292	   system, without a need to correlate/aggregate statistics from every
293	   router in the system.  Transit devices are off-loaded from the task
294	   of gathering packet statistics.

296	3.6.  Frame Replication/Elimination Decision for Bi-casting/Active-
297	      active Networks

299	   Bandwidth- and power-constrained, time-sensitive, or loss-intolerant
300	   networks (e.g., networks for industry automation/control, health
301	   care) require efficient OAM methods to decide when to replicate
302	   packets to a secondary path in order to keep the loss/error-rate for
303	   the receiver at a tolerable level - and also when to stop replication
304	   and eliminate the redundant flow.  Many IoT networks are time
305	   sensitive and cannot leverage automatic retransmission requests (ARQ)
306	   to cope with transmission errors or lost packets.  Transmitting the
307	   data over multiple disparate paths (often called bi-casting or live-
308	   live) is a method used to reduce the error rate observed by the
309	   receiver.  TSN receive a lot of attention from the manufacturing
310	   industry as shown by a various standardization activities and
311	   industry forums being formed (see e.g., IETF 6TiSCH, IEEE P802.1CB,
312	   AVnu).

314	3.7.  Proof of Transit

316	   Several deployments use traffic engineering, policy routing, segment
317	   routing or Service Function Chaining (SFC) [RFC7665] to steer packets
318	   through a specific set of nodes.  In certain cases regulatory
319	   obligations or a compliance policy require to prove that all packets
320	   that are supposed to follow a specific path are indeed being
321	   forwarded across the exact set of nodes specified.  If a packet flow
322	   is supposed to go through a series of service functions or network
323	   nodes, it has to be proven that all packets of the flow actually went
324	   through the service chain or collection of nodes specified by the
325	   policy.  In case the packets of a flow weren't appropriately
326	   processed, a verification device would be required to identify the
327	   policy violation and take corresponding actions (e.g., drop or
328	   redirect the packet, send an alert etc.) corresponding to the policy.
329	   In today's deployments, the proof that a packet traversed a
330	   particular service chain is typically delivered in an indirect way:
331	   Service appliances and network forwarding are in different trust
332	   domains.  Physical hand-off-points are defined between these trust
333	   domains (i.e., physical interfaces).  Or in other terms, in the
334	   "network forwarding domain" things are wired up in a way that traffic
335	   is delivered to the ingress interface of a service appliance and
336	   received back from an egress interface of a service appliance.  This
337	   "wiring" is verified and trusted.  The evolution to Network Function
338	   Virtualization (NFV) and modern service chaining concepts (using
339	   technologies such as LISP, NSH, Segment Routing, etc.) blurs the line
340	   between the different trust domains, because the hand-off-points are
341	   no longer clearly defined physical interfaces, but are virtual
342	   interfaces.  Because of that very reason, networks operators require
343	   that different trust layers not to be mixed in the same device.  For
344	   an NFV scenario a different proof is required.  Offering a proof that
345	   a packet traversed a specific set of service functions would allow
346	   network operators to move away from the above described indirect
347	   methods of proving that a service chain is in place for a particular
348	   application.

350	   Deployed service chains without the presence of a "proof of transit"
351	   mechanism are typically operated as fail-open system: The packets
352	   that arrive at the end of a service chain are processed.  Adding
353	   "proof of transit" capabilites to a service chain allows an operator
354	   to turn a fail-open system into a fail-close system, i.e.  packets
355	   that did not properly traverse the service chain can be blocked.

357	   A solution approach could be based on OAM data which is added to
358	   every packet for achieving Proof Of Transit.  The OAM data is updated
359	   at every hop and is used to verify whether a packet traversed all
360	   required nodes.  When the verifier receives each packet, it can
361	   validate whether the packet traversed the service chain correctly.
362	   The detailed mechanisms used for path verification along with the
363	   procedures applied to the OAM data carried in the packet for path
364	   verification are beyond the scope of this document.  Details are
365	   addressed in [draft-brockners-proof-of-transit].  In this document
366	   the term "proof" refers to a discrete set of bits that represents an
367	   integer or string carried as OAM data.  The OAM data is used to
368	   verify whether a packet traversed the nodes it is supposed to
369	   traverse.

371	3.8.  Use Cases

373	   In-band OAM could be leveraged for several use cases, including:

375	   o  Traffic Matrix: Derive the network traffic matrix: Traffic for a
376	      given time interval between any two edge nodes of a given domain.
377	      Could be performed for all traffic or per QoS-class.

379	   o  Flow Debugging: Discover which path(s) a particular set of traffic
380	      (identified by an n-tuple) takes in the network.  Such a procedure
381	      is particularly useful in case traffic is balanced across multiple
382	      paths, like with link aggregation (LACP) or equal cost multi-
383	      pathing (ECMP).

385	   o  Loss Statistics per Path: Retrieve loss statistics per flow and
386	      path in the network.

388	   o  Path Heat Maps: Discover highly utilized links in the network.

390	   o  Trend Analysis on Traffic Patterns: Analyze if (and if so how) the
391	      forwarding path for a specific set of traffic changes over time
392	      (can give hints to routing issues, unstable links etc.).

394	   o  Network Delay Distribution: Show delay distribution across network
395	      by node or links.  If enabled per application or for a specific
396	      flow then display the path taken along with the delay incurred at
397	      every hop.

399	   o  SLA Verification: Verify that a negotiated service level agreement
400	      (SLA), e.g., for packet drop rates or delay/jitter is conformed to
401	      by the actual traffic.

403	   o  Low-power Networks: Include application level OAM information
404	      (e.g., battery charge level, cache or buffer fill level) into data
405	      traffic to avoid sending extra OAM traffic which incur an extra
406	      cost on the devices.  Using the battery charge level as example,
407	      one could avoid sending extra OAM packets just to communicate
408	      battery health, and as such would save battery on sensors.

410	   o  Path Verification or Service Function Path Verification: Proof and
411	      verification of packets traversing check points in the network,
412	      where check points can be nodes in the network or service
413	      functions.

415	   o  Geo-location Policy: Network policy implemented based on which
416	      path packets took.  Example: Only if packets originated and stayed
417	      within the trading-floor department, access to specific
418	      applications or servers is granted.

420	4.  Considerations for In-band OAM

422	   The implementation of an in-band OAM mechanism needs to take several
423	   considerations into account, including administrative boundaries, how
424	   information is recorded, Maximum Transfer Unit (MTU), Path MTU
425	   discovery and packet size, etc.

427	4.1.  Type of information to be recorded

429	   The information gathered for in-band OAM can be categorized into
430	   three main categories: Information with a per-hop scope, such as path
431	   tracing; information which applies to a specific set of nodes, such
432	   as path or service chain verification; information which only applies
433	   to the edges of a domain, such as sequence numbers.

435	   o  "edge to edge": Information that needs to be shared between
436	      network edges (the "edge" of a network could either be a host or a
437	      domain edge device): Edge to edge data e.g., packet and octet
438	      count of data entering a well-defined domain and leaving it is
439	      helpful in building traffic matrix, sequence number (also called
440	      "path packet counters") is useful for the flow to detect packet
441	      loss.

443	   o  "selected hops": Information that applies to a specific set of
444	      nodes only.  In case of path verification, only the nodes which
445	      are "check points" are required to interpret and update the
446	      information in the packet.

448	   o  "per hop": Information that is gathered at every hop along the
449	      path a packet traverses within an administrative domain:

451	      *  Hop by Hop information e.g., Nodes visited for path tracing,
452	         Timestamps at each hop to find delays along the path

454	      *  Stats collection at each hop to optimize communication in
455	         resource constrained networks e.g., Battery, CPU, memory status
456	         of each node piggy backed in a data packet is useful in low
457	         power lossy networks where network nodes are mostly asleep and
458	         communication is expensive

460	4.2.  MTU and packet size

462	   The recorded data at every hop may lead to packet size exceeding the
463	   Maximum Transmit Unit (MTU).  Based on the transport protocol used
464	   MTU is discovered as a configuration parameter or Path MTU (PMTU) is
465	   discovered dynamically.  Example: IPv6 recommends PMTU discovery
466	   before data packets are sent to prevent packet fragmentation.  It
467	   specifies 1280 octets as the default PDU to be carried in a IPv6
468	   datagram.  A detailed discussion of the implications of oversized
469	   IPv6 header chains if found in [RFC7112].

471	   The Path MTU restricts the amount of data that can be recorded for
472	   purpose of OAM within a data packet.  The total size of data to be
473	   recorded needs to be preset to avoid packet size exceeding the MTU.

475	   It is recommended to pre-calculate and configures network devices to
476	   limit the in-band OAM data that is attached to a packet.

478	4.3.  Administrative boundaries

480	   There are several challenges in enabling in-band OAM in the public
481	   Internet as well as in corporate/enterprise networks across
482	   administrative domains, which include but are not limited to:

484	   o  Deployment dependent, the data fields that in-band OAM requires as
485	      part of a specific transport protocol may not be supported across
486	      administrative boundaries.

488	   o  Current OAM implementations are often done in the slow path, i.e.,
489	      OAM packets are punted to router's CPU for processing.  This leads
490	      to performance and scaling issues and opens up routers for attacks
491	      such as Denial of Service (DoS) attacks.

493	   o  Discovery of network topology and details of the network devices
494	      across administrative boundaries may open up attack vectors
495	      compromising network security.

497	   o  Specifically on IPv6: At the administrative boundaries IPv6
498	      packets with extension headers are dropped for several reasons
499	      described in [RFC7872].

501	   The following considerations will be discussed in a future version of
502	   this document: If the packet is dropped due to the presence of the
503	   in-band OAM; If the policy failure is treated as feature disablement
504	   and any further recording is stopped but the packet itself is not
505	   dropped, it may lead to every node in the path to make this policy
506	   decision.

508	4.4.  Selective enablement

510	   Deployment dependent, in-band OAM could either be used for all, or
511	   only a subset of the overall traffic.  While it might be desirable to
512	   apply in-band OAM to all traffic and then selectively use the data
513	   gathered in case needed, it might not always be feasible.  Depending
514	   on the forwarding infrastructure used, in-band OAM can have an impact
515	   on forwarding performance.  The SPUD prototype for example uses the
516	   notion of "pipes" to describe the portion of the traffic that could
517	   be subject to in-path inspection.  Mechanisms to decide which traffic
518	   would be subject to in-band OAM are outside the scope of this
519	   document.

521	4.5.  Optimization of node and interface identifiers

523	   Since packets have a finite maximum size, the data recording or
524	   carrying capacity of one packet in which the in-band OAM meta data is
525	   present is limited.  In-band OAM should use its own dedicated
526	   namespace (confined to the domain in-band OAM operates in) to
527	   represent node and interface IDs to save space in the header.
528	   Generic representations of node and interface identifiers which are
529	   globally unique (such as a UUID) would consume significantly more
530	   bits of in-band OAM data.

532	4.6.  Loop communication path (IPv6-specifics)

534	   When recorded data is required to be analyzed on a source node that
535	   issues a packet and inserts in-band OAM data, the recorded data needs
536	   to be carried back to the source node.

538	   One way to carry the in-band OAM data back to the source is to
539	   utilize an ICMP Echo Request/Reply (ping) or ICMPv6 Echo Request/
540	   Reply (ping6) mechanism.  In order to run the in-band OAM mechanism
541	   appropriately on the ping/ping6 mechanism, the following two
542	   operations should be implemented by the ping/ping6 target node:

544	   1.  All of the in-band OAM fields would be copied from an Echo
545	       Request message to an Echo Reply message.

547	   2.  The Hop Limit field of the IPv6 header of these messages would be
548	       copied as a continuous sequence.  Further considerations are
549	       addressed in a future version of this document.

551	5.  Requirements for In-band OAM Data Types

553	   The above discussed use cases require different types of in-band OAM
554	   data.  This section details requirements for in-band OAM derived from
555	   the discussion above.

557	5.1.  Generic Requirements

559	   REQ-G1:  Classification: It should be possible to enable in-band OAM
560	            on a selected set of traffic.  The selected set of traffic
561	            can also be all traffic.

563	   REQ-G2:  Scope: If in-band OAM is used only within a specific domain,
564	            provisions need to be put in place to ensure that in-band
565	            OAM data stays within the specific domain only.

567	   REQ-G3:  Transport independence: Data formats for in-band OAM shall
568	            be defined in a transport independent way.  In-band OAM
569	            applies to a variety of transport protocols.  Encapsulations
570	            should be defined how the generic data formats are carried
571	            by a specific protocol.

573	   REQ-G4:  Layering: It should be possible to have in-band OAM
574	            information for different transport protocol layers be
575	            present in several fields within a single packet.  This
576	            could for example be the case when tunnels are employed and
577	            in-band OAM information is to be gathered for both the
578	            underlay as well as the overlay network.

580	   REQ-G5:  MTU size: With in-band OAM information added, packets should
581	            not become larger than the path MTU.

583	   REQ-G6:  Data Structure Reusability: The data types and data formats
584	            defined and used for in-band OAM ought to be reusable for
585	            out-of-band OAM telemetry as well.

587	5.2.  In-band OAM Data with Per-hop Scope

589	   REQ-H1:  Missing nodes detection: Data shall be present that allows a
590	            node to detect whether all nodes that should participate in
591	            in-band OAM operations have indeed participated.

593	   REQ-H2:  Node, instance or device identifier: Data shall be present
594	            that allows to retrieve the identity of the entity reporting
595	            telemetry information.  The entity can be a device, or a
596	            subsystem/component within a device.  The latter will allow
597	            for packet tracing within a device in much the same way as
598	            between devices.

600	   REQ-H3:  Ingress interface identifier: Data shall be present that
601	            allows the identification of the interface a particular
602	            packet was received from.  The interface can be a logical or
603	            physical entity.

605	   REQ-H4:  Egress interface identifier: Data shall be present that
606	            allows the identification of the interface a particular
607	            packet was forwarded to.  Interface can be a logical or
608	            physical entity.

610	   REQ-H5:  Time-related requirements

612	            REQ-H5.1:  Delay: Data shall be present that allows to
613	                       retrieve the delay between two or more points of
614	                       interest within the system.  Those points can be
615	                       within the same device or on different devices.

617	            REQ-H5.2:  Jitter: Data shall be present that allows to
618	                       retrieve the jitter between two or more points of
619	                       interest within the system.  Those points can be
620	                       within the same device or on different devices.

622	            REQ-H5.3:  Wall-clock time: Data shall be present that
623	                       allows to retrieve the wall-clock time visited a
624	                       particular point of interest in the system.

626	            REQ-H5.4:  Time precision: The precision of the time related
627	                       data should be configurable.  Use-case dependent,
628	                       the required precision could e.g., be nano-
629	                       seconds, micro-seconds, milli-seconds, or
630	                       seconds.

632	   REQ-H6:  Generic data records (like e.g., GPS/Geo-location
633	            information): It should be possible to add user-defined OAM
634	            data at select hops to the packet.  The semantics of the
635	            data are defined by the user.

637	5.3.  In-band OAM with Selected Hop Scope

639	   REQ-S1:  Proof of transit: Data shall be present which allows to
640	            securely prove that a packet has visited or ore several
641	            particular points of interest (i.e., a particular set of
642	            nodes).

644	            REQ-S1.1:  In case "Shamir's secret sharing scheme" is used
645	                       for proof of transit, two data records, "random"
646	                       and "cumulative" shall be present.  The number of
647	                       bits used for "random" and "cumulative" data
648	                       records can vary between deployments and should
649	                       thus be configurable.

651	            REQ-S1.2:  Enable a fail-open service chaining system to be
652	                       converted into a fail-closed service chaining
653	                       system.

655	5.4.  In-band OAM with End-to-end Scope

657	   REQ-E1:  Sequence numbering:

659	            REQ-E1.1:  Reordering detection: It should be possible to
660	                       detect whether packets have been reordered while
661	                       traversing an in-band OAM domain.

663	            REQ-E1.2:  Duplicates detection: It should be possible to
664	                       detect whether packets have been duplicated while
665	                       traversing an in-band OAM domain.

667	            REQ-E1.3:  Detection of packet drops: It should be possible
668	                       to detect whether packets have been dropped while
669	                       traversing an in-band OAM domain.

671	6.  Security Considerations and Requirements

673	   General Security considerations will be addressed in a later version
674	   of this document.  Security considerations for Proof of Transit alone
675	   are discussed below.

677	6.1.  Proof of Transit

679	   Threat Model: Attacks on the deployments could be due to malicious
680	   administrators or accidental misconfigurations resulting in bypassing
681	   of certain nodes.  The solution approach should meet the following
682	   requirements:

684	   REQ-SEC1:  Sound Proof of Transit: A valid and verifiable proof that
685	              the packet definitively traversed through all the nodes as
686	              expected.  Probabilistic methods to achieve this should be
687	              avoided, as the same could be exploited by an attacker.

689	   REQ-SEC2:  Tampering of meta data: An active attacker should not be
690	              able to insert or modify or delete meta data in whole or
691	              in parts and bypass few (or all) nodes.  Any deviation
692	              from the expected path should be accurately determined.

694	   REQ-SEC3:  Replay Attacks: A attacker (active/passive) should not be
695	              able to reuse the proof of transit bits in the packet by
696	              observing the OAM data in the packet, packet
697	              characteristics (like IP addresses, octets transferred,
698	              timestamps) or even the proof bits themselves.  The
699	              solution approach should consider usage of these
700	              parameters for deriving any secrets cautiously.
701	              Mitigating replay attacks beyond a window of longer
702	              duration could be intractable to achieve with fixed number
703	              of bits allocated for proof.

705	   REQ-SEC4:  Recycle Secrets: Any configuration of the secrets (like
706	              cryptographic keys, initialisation vectors etc.) either in
707	              the controller or service functions should be
708	              reconfigurable.  Solution approach should enable controls,
709	              API calls etc. needed in order to perform such recycling.
710	              It is desirable to provide recommendations on the duration
711	              of rotation cycles needed for the secure functioning of
712	              the overall system.

714	   REQ-SEC5:  Secret storage and distribution: Secrets should be shared
715	              with the devices over secure channels.  Methods should be
716	              put in place so that secrets cannot be retrieved by non
717	              authorized personnel from the devices.

719	7.  IANA Considerations

721	   [RFC Editor: please remove this section prior to publication.]

723	   This document has no IANA actions.

725	8.  Acknowledgements

727	   The authors would like to thank Eric Vyncke, Nalini Elkins, Srihari
728	   Raghavan, Ranganathan T S, Karthik Babu Harichandra Babu, Akshaya
729	   Nadahalli, and Andrew Yourtchenko for the comments and advice.  This
730	   document leverages and builds on top of several concepts described in
731	   [draft-kitamura-ipv6-record-route].  The authors would like to
732	   acknowledge the work done by the author Hiroshi Kitamura and people
733	   involved in writing it.

735	9.  Informative References

737	   [draft-brockners-proof-of-transit]
738	              Brockners, F., Bhandari, S., and S. Dara, "Proof of
739	              transit", July 2016.

741	   [draft-kitamura-ipv6-record-route]
742	              Kitamura, H., "Record Route for IPv6 (PR6),Hop-by-Hop
743	              Option Extension", November 2000.

745	   [I-D.brockners-lisp-sr]
746	              Brockners, F., Bhandari, S., Maino, F., and D. Lewis,
747	              "LISP Extensions for Segment Routing", draft-brockners-
748	              lisp-sr-01 (work in progress), February 2014.

750	   [I-D.hildebrand-spud-prototype]
751	              Hildebrand, J. and B. Trammell, "Substrate Protocol for
752	              User Datagrams (SPUD) Prototype", draft-hildebrand-spud-
753	              prototype-03 (work in progress), March 2015.

755	   [I-D.ietf-spring-segment-routing]
756	              Filsfils, C., Previdi, S., Decraene, B., Litkowski, S.,
757	              and R. Shakir, "Segment Routing Architecture", draft-ietf-
758	              spring-segment-routing-09 (work in progress), July 2016.

760	   [I-D.lapukhov-dataplane-probe]
761	              Lapukhov, P. and r. remy@barefootnetworks.com, "Data-plane
762	              probe for in-band telemetry collection", draft-lapukhov-
763	              dataplane-probe-01 (work in progress), June 2016.

765	   [P4]       Kim, , "P4: In-band Network Telemetry (INT)", September
766	              2015.

768	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
769	              DOI 10.17487/RFC0791, September 1981,
770	              <http://www.rfc-editor.org/info/rfc791>.

772	   [RFC4884]  Bonica, R., Gan, D., Tappan, D., and C. Pignataro,
773	              "Extended ICMP to Support Multi-Part Messages", RFC 4884,
774	              DOI 10.17487/RFC4884, April 2007,
775	              <http://www.rfc-editor.org/info/rfc4884>.

777	   [RFC4950]  Bonica, R., Gan, D., Tappan, D., and C. Pignataro, "ICMP
778	              Extensions for Multiprotocol Label Switching", RFC 4950,
779	              DOI 10.17487/RFC4950, August 2007,
780	              <http://www.rfc-editor.org/info/rfc4950>.

782	   [RFC5837]  Atlas, A., Ed., Bonica, R., Ed., Pignataro, C., Ed., Shen,
783	              N., and JR. Rivers, "Extending ICMP for Interface and
784	              Next-Hop Identification", RFC 5837, DOI 10.17487/RFC5837,
785	              April 2010, <http://www.rfc-editor.org/info/rfc5837>.

787	   [RFC7112]  Gont, F., Manral, V., and R. Bonica, "Implications of
788	              Oversized IPv6 Header Chains", RFC 7112,
789	              DOI 10.17487/RFC7112, January 2014,
790	              <http://www.rfc-editor.org/info/rfc7112>.

792	   [RFC7276]  Mizrahi, T., Sprecher, N., Bellagamba, E., and Y.
793	              Weingarten, "An Overview of Operations, Administration,
794	              and Maintenance (OAM) Tools", RFC 7276,
795	              DOI 10.17487/RFC7276, June 2014,
796	              <http://www.rfc-editor.org/info/rfc7276>.

798	   [RFC7665]  Halpern, J., Ed. and C. Pignataro, Ed., "Service Function
799	              Chaining (SFC) Architecture", RFC 7665,
800	              DOI 10.17487/RFC7665, October 2015,
801	              <http://www.rfc-editor.org/info/rfc7665>.

803	   [RFC7872]  Gont, F., Linkova, J., Chown, T., and W. Liu,
804	              "Observations on the Dropping of Packets with IPv6
805	              Extension Headers in the Real World", RFC 7872,
806	              DOI 10.17487/RFC7872, June 2016,
807	              <http://www.rfc-editor.org/info/rfc7872>.

809	Authors' Addresses

811	   Frank Brockners
812	   Cisco Systems, Inc.
813	   Hansaallee 249, 3rd Floor
814	   DUESSELDORF, NORDRHEIN-WESTFALEN  40549
815	   Germany

817	   Email: fbrockne@cisco.com

819	   Shwetha Bhandari
820	   Cisco Systems, Inc.
821	   Cessna Business Park, Sarjapura Marathalli Outer Ring Road
822	   Bangalore, KARNATAKA 560 087
823	   India

825	   Email: shwethab@cisco.com

827	   Sashank Dara
828	   Cisco Systems, Inc.
829	   Cessna Business Park, Sarjapura Marathalli Outer Ring Road
830	   Bangalore, KARNATAKA 560 087
831	   India

833	   Email: sadara@cisco.com

835	   Carlos Pignataro
836	   Cisco Systems, Inc.
837	   7200-11 Kit Creek Road
838	   Research Triangle Park, NC  27709
839	   United States

841	   Email: cpignata@cisco.com

843	   Hannes Gredler
844	   RtBrick Inc.

846	   Email: hannes@rtbrick.com

848	   John Leddy
849	   Comcast

851	   Email: John_Leddy@cable.comcast.com
852	   Stephen Youell
853	   JP Morgan Chase
854	   25 Bank Street
855	   London  E14 5JP
856	   United Kingdom

858	   Email: stephen.youell@jpmorgan.com