idnits 2.17.1 

draft-brockners-inband-oam-requirements-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 8, 2016) is 2849 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-15) exists of
     draft-ietf-spring-segment-routing-09


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                       F. Brockners
3	Internet-Draft                                               S. Bhandari
4	Intended status: Informational                                   S. Dara
5	Expires: January 9, 2017                                    C. Pignataro
6	                                                                   Cisco
7	                                                              H. Gredler
8	                                                            RtBrick Inc.
9	                                                            July 8, 2016

11	                      Requirements for In-band OAM
12	               draft-brockners-inband-oam-requirements-00

14	Abstract

16	   This document discusses the motivation and requirements for including
17	   specific operational and telemetry information into data packets
18	   while the data packet traverses a path between two points in the
19	   network.  This method is referred to as "in-band" Operations,
20	   Administration, and Maintenance (OAM), given that the OAM information
21	   is carried with the data packets as opposed to in "out-of-band"
22	   packets dedicated to OAM.  In-band OAM complements other OAM
23	   mechanisms which use dedicated probe packets to convey OAM
24	   information.

26	Status of This Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at http://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on January 9, 2017.

43	Copyright Notice

45	   Copyright (c) 2016 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
61	   2.  Conventions . . . . . . . . . . . . . . . . . . . . . . . . .   4
62	   3.  Motivation for In-band OAM  . . . . . . . . . . . . . . . . .   4
63	     3.1.  Path Congruency Issues with Dedicated OAM Packets . . . .   4
64	     3.2.  Results Sent to a System Other Than the Sender  . . . . .   5
65	     3.3.  Overlay and Underlay Correlation  . . . . . . . . . . . .   5
66	     3.4.  SLA Verification  . . . . . . . . . . . . . . . . . . . .   6
67	     3.5.  Analytics and Diagnostics . . . . . . . . . . . . . . . .   6
68	     3.6.  Frame Replication/Elimination Decision for Bi-casting
69	           /Active-active Networks . . . . . . . . . . . . . . . . .   7
70	     3.7.  Proof of Transit  . . . . . . . . . . . . . . . . . . . .   7
71	     3.8.  Use Cases . . . . . . . . . . . . . . . . . . . . . . . .   8
72	   4.  Considerations for In-band OAM  . . . . . . . . . . . . . . .   9
73	     4.1.  Type of Information to Be Recorded  . . . . . . . . . . .   9
74	     4.2.  MTU and Packet Size . . . . . . . . . . . . . . . . . . .  10
75	     4.3.  Administrative Boundaries . . . . . . . . . . . . . . . .  10
76	     4.4.  Selective Enablement  . . . . . . . . . . . . . . . . . .  11
77	     4.5.  Optimization of Node and Interface Identifiers  . . . . .  11
78	     4.6.  Loop Communication Path (IPv6-specifics)  . . . . . . . .  11
79	   5.  Requirements for In-band OAM Data Types . . . . . . . . . . .  12
80	     5.1.  Generic Requirements  . . . . . . . . . . . . . . . . . .  12
81	     5.2.  In-band OAM Data with Per-hop Scope . . . . . . . . . . .  13
82	     5.3.  In-band OAM with Selected Hop Scope . . . . . . . . . . .  14
83	     5.4.  In-band OAM with End-to-end Scope . . . . . . . . . . . .  14
84	   6.  Security Considerations and Requirements  . . . . . . . . . .  14
85	     6.1.  Proof of Transit  . . . . . . . . . . . . . . . . . . . .  14
86	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
87	   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
88	   9.  Informative References  . . . . . . . . . . . . . . . . . . .  16
89	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17

91	1.  Introduction

93	   This document discusses requirements for "in-band" Operations,
94	   Administration, and Maintenance (OAM) mechanisms.  "In-band" OAM
95	   means to record OAM and telemetry information within the data packet
96	   while the data packet traverses a network or a particular network
97	   domain.  The term "in-band" refers to the fact that the OAM and
98	   telemetry data is carried within data packets rather than being sent
99	   within packets specifically dedicated to OAM.  In-band OAM
100	   mechanisms, which are sometimes also referred to as embedded network
101	   telemetry are a current topic of discussion.  In-band network
102	   telemetry has been defined for P4 [P4].  The SPUD prototype
103	   [I-D.hildebrand-spud-prototype] uses a similar logic that allows
104	   network devices on the path between endpoints to participate
105	   explicitly in the tube outside the end-to-end context.  Even the IPv4
106	   route-record option defined in [RFC0791] can be considered an in-band
107	   OAM mechanism.  In-band OAM complements "out-of-band" mechanisms such
108	   as ping or traceroute, or more recent active probing mechanisms, as
109	   described in [I-D.lapukhov-dataplane-probe].  In-band OAM mechanisms
110	   can be leveraged where current out-of-band mechanisms do not apply or
111	   do not offer the desired characteristics or requirements, such as
112	   proving that a certain set of traffic takes a pre-defined path,
113	   strict congruency is desired, checking service level agreements for
114	   the live data traffic, detailed statistics on traffic distribution
115	   paths in networks that distribute traffic across multiple paths, or
116	   scenarios where probe traffic is potentially handled differently from
117	   regular data traffic by the network devices.  [RFC7276] presents an
118	   overview of OAM tools.

120	   Compared to probably the most basic example of "in-band OAM" which is
121	   IPv4 route recording [RFC0791], an in-band OAM approach has the
122	   following capabilities:

124	   a.  A flexible data format to allow different types of information to
125	       be captured as part of an in-band OAM operation, including not
126	       only path tracing information, but additional operational and
127	       telemetry information such as timestamps, sequence numbers, or
128	       even generic data such as queue size, geo-location of the node
129	       that forwarded the packet, etc.

131	   b.  A data format to express node as well as link identifiers to
132	       record the path a packet takes with a fixed amount of added data.

134	   c.  The ability to detect whether any nodes were skipped while
135	       recording in-band OAM information (i.e., in-band OAM is not
136	       supported or not enabled on those nodes).

138	   d.  The ability to actively process information in the packet, for
139	       example to prove in a cryptographically secure way that a packet
140	       really took a pre-defined path using some traffic steering method
141	       such as service chaining or traffic engineering.

143	   e.  The ability to include OAM data beyond simple path information,
144	       such as timestamps or even generic data of a particular use case.

146	   f.  The ability to include OAM data in various different transport
147	       protocols.

149	2.  Conventions

151	   Abbreviations used in this document:

153	   ECMP:      Equal Cost Multi-Path

155	   MTU:       Maximum Transmit Unit

157	   NFV:       Network Function Virtualization

159	   OAM:       Operations, Administration, and Maintenance

161	   PMTU:      Path MTU

163	   SLA:       Service Level Agreement

165	   SFC:       Service Function Chain

167	   SR:        Segment Routing

169	   This document defines in-band Operations, Administration, and
170	   Maintenance (in-band OAM), as the subset in which OAM information is
171	   carried along with data packets.  This is as opposed to "out-of-band
172	   OAM", where specific packets are dedicated to carrying OAM
173	   information.

175	3.  Motivation for In-band OAM

177	   In several scenarios it is beneficial to make information about which
178	   path a packet took through the network available to the operator.
179	   This includes not only tasks like debugging, troubleshooting, as well
180	   as network planning and network optimization but also policy or
181	   service level agreement compliance checks.  This section discusses
182	   the motivation to introduce new methods for enhanced in-band network
183	   diagnostics.

185	3.1.  Path Congruency Issues with Dedicated OAM Packets

187	   Mechanisms which add tracing information to the regular data traffic,
188	   sometimes also referred to as "in-band" or "passive OAM" can
189	   complement active, probe-based mechanisms such as ping or traceroute,
190	   which are sometimes considered as "out-of-band", because the messages
191	   are transported independently from regular data traffic.  "In-band"
192	   mechanisms do not require extra packets to be sent and hence don't
193	   change the packet traffic mix within the network.  Traceroute and
194	   ping for example use ICMP messages: New packets are injected to get
195	   tracing information.  Those add to the number of messages in a
196	   network, which already might be highly loaded or suffering
197	   performance issues for a particular path or traffic type.

199	   Packet scheduling algorithms, especially for balancing traffic across
200	   equal cost paths or links, often leverage information contained
201	   within the packet, such as protocol number, IP-address or MAC-
202	   address.  Probe packets would thus either need to be sent from the
203	   exact same endpoints with the exact same parameters, or probe packets
204	   would need to be artificially constructed as "fake" packets and
205	   inserted along the path.  Both approaches are often not feasible from
206	   an operational perspective, be it that access to the end-system is
207	   not feasible, or that the diversity of parameters and associated
208	   probe packets to be created is simply too large.  An in-band
209	   mechanism is an alternative in those cases.

211	   In-band mechanisms also don't suffer from implementations, where
212	   probe traffic is handled differently (and potentially forwarded
213	   differently) by a router than regular data traffic.

215	3.2.  Results Sent to a System Other Than the Sender

217	   Traditional ping and traceroute tools return the OAM results to the
218	   sender of the probe.  Even when the ICMP messages that are used with
219	   these tools are enhanced, and additional telemetry is collected
220	   (e.g., ICMP Multi-Part [RFC4884] supporting MPLS information
221	   [RFC4950], Interface and Next-Hop Identification [RFC5837], etc.), it
222	   would be advantageous to separate the sending of an OAM probe from
223	   the receiving of the telemetry data.  In this context, it is desired
224	   to not assume there is a bidirectional working path.

226	3.3.  Overlay and Underlay Correlation

228	   Several network deployments leverage tunneling mechanisms to create
229	   overlay or service-layer networks.  Examples include VXLAN-GPE, GRE,
230	   or LISP.  One often observed attribute of overlay networks is that
231	   they do not offer the user of the overlay any insight into the
232	   underlay network.  This means that the path that a particular
233	   tunneled packet takes, nor other operational details such as the per-
234	   hop delay/jitter in the underlay are visible to the user of the
235	   overlay network, giving rise to diagnosis and debugging challenges in
236	   case of connectivity or performance issues.  The scope of OAM tools
237	   like ping or traceroute is limited to either the overlay or the
238	   underlay which means that the user of the overlay has typically no
239	   access to OAM in the underlay, unless specific operational procedures
240	   are put in place.  With in-band OAM the operator of the underlay can
241	   offer details of the connectivity in the underlay to the user of the
242	   overlay.  The operator of the egress tunnel router could choose to
243	   share the recorded information about the path with the user of the
244	   overlay.

246	   Coupled with mechanisms such as Segment Routing (SR)
247	   [I-D.ietf-spring-segment-routing], overlay network and underlay
248	   network can be more tightly coupled: The user of the overlay has
249	   detailed diagnostic information available in case of failure
250	   conditions.  The user of the overlay can also use the path recording
251	   information as input to traffic steering or traffic engineering
252	   mechanisms, to for example achieve path symmetry for the traffic
253	   between two endpoints.  [I-D.brockners-lisp-sr] is an example for how
254	   these methods can be applied to LISP.

256	3.4.  SLA Verification

258	   In-band OAM can help users of an overlay-service to verify that
259	   negotiated SLAs for the real traffic are met by the underlay network
260	   provider.  Different from solutions which rely on active probes to
261	   test an SLA, in-band OAM based mechanisms avoid wrong interpretations
262	   and "cheating", which can happen if the probe traffic that is used to
263	   perform SLA-check is prioritized by the network provider of the
264	   underlay.

266	3.5.  Analytics and Diagnostics

268	   Network planners and operators benefit from knowledge of the actual
269	   traffic distribution in the network.  When deriving an overall
270	   network connectivity traffic matrix one typically needs to correlate
271	   data gathered from each individual devices in the network.  If the
272	   path of a packet is recorded while the packet is forwarded, the
273	   entire path that a packet took through the network is available to
274	   the egress system.  This obviates the need to retrieve individual
275	   traffic statistics from every device in the network and correlate
276	   those statistics, or employ other mechanisms such as leveraging
277	   traffic engineering with null-bandwidth tunnels just to retrieve the
278	   appropriate statistics to generate the traffic matrix.

280	   In addition, with individual path tracing, information is available
281	   at packet level granularity, rather than only at aggregate level - as
282	   is usually the case with IPFIX-style methods which employ flow-
283	   filters at the network elements.  Data-center networks which use
284	   equal-cost multipath (ECMP) forwarding are one example where detailed
285	   statistics on flow distribution in the network are highly desired.
286	   If a network supports ECMP, one can create detailed statistics for
287	   the different paths packets take through the network at the egress
288	   system, without a need to correlate/aggregate statistics from every
289	   router in the system.  Transit devices are off-loaded from the task
290	   of gathering packet statistics.

292	3.6.  Frame Replication/Elimination Decision for Bi-casting/Active-
293	      active Networks

295	   Bandwidth- and power-constrained, time-sensitive, or loss-intolerant
296	   networks (e.g., networks for industry automation/control, health
297	   care) require efficient OAM methods to decide when to replicate
298	   packets to a secondary path in order to keep the loss/error-rate for
299	   the receiver at a tolerable level - and also when to stop replication
300	   and eliminate the redundant flow.  Many IoT networks are time
301	   sensitive and cannot leverage automatic retransmission requests (ARQ)
302	   to cope with transmission errors or lost packets.  Transmitting the
303	   data over multiple disparate paths (often called bi-casting or live-
304	   live) is a method used to reduce the error rate observed by the
305	   receiver.  TSN receive a lot of attention from the manufacturing
306	   industry as shown by a various standardization activities and
307	   industry forums being formed (see e.g., IETF 6TiSCH, IEEE P802.1CB,
308	   AVnu).

310	3.7.  Proof of Transit

312	   Several deployments use traffic engineering, policy routing, segment
313	   routing or Service Function Chaining (SFC) [RFC7665] to steer packets
314	   through a specific set of nodes.  In certain cases regulatory
315	   obligations or a compliance policy require to prove that all packets
316	   that are supposed to follow a specific path are indeed being
317	   forwarded across the exact set of nodes specified.  If a packet flow
318	   is supposed to go through a series of service functions or network
319	   nodes, it has to be proven that all packets of the flow actually went
320	   through the service chain or collection of nodes specified by the
321	   policy.  In case the packets of a flow weren't appropriately
322	   processed, a verification device would be required to identify the
323	   policy violation and take corresponding actions (e.g., drop or
324	   redirect the packet, send an alert etc.) corresponding to the policy.
325	   In today's deployments, the proof that a packet traversed a
326	   particular service chain is typically delivered in an indirect way:
327	   Service appliances and network forwarding are in different trust
328	   domains.  Physical hand-off-points are defined between these trust
329	   domains (i.e., physical interfaces).  Or in other terms, in the
330	   "network forwarding domain" things are wired up in a way that traffic
331	   is delivered to the ingress interface of a service appliance and
332	   received back from an egress interface of a service appliance.  This
333	   "wiring" is verified and trusted.  The evolution to Network Function
334	   Virtualization (NFV) and modern service chaining concepts (using
335	   technologies such as LISP, NSH, Segment Routing, etc.) blurs the line
336	   between the different trust domains, because the hand-off-points are
337	   no longer clearly defined physical interfaces, but are virtual
338	   interfaces.  Because of that very reason, networks operators require
339	   that different trust layers not to be mixed in the same device.  For
340	   an NFV scenario a different proof is required.  Offering a proof that
341	   a packet traversed a specific set of service functions would allow
342	   network operators to move away from the above described indirect
343	   methods of proving that a service chain is in place for a particular
344	   application.

346	   A solution approach could be based on OAM data which is added to
347	   every packet for achieving Proof Of Transit.  The OAM data is updated
348	   at every hop and is used to verify whether a packet traversed all
349	   required nodes.  When the verifier receives each packet, it can
350	   validate whether the packet traversed the service chain correctly.
351	   The detailed mechanisms used for path verification along with the
352	   procedures applied to the OAM data carried in the packet for path
353	   verification are beyond the scope of this document.  Details are
354	   addressed in [draft-brockners-proof-of-transit].  In this document
355	   the term "proof" refers to a discrete set of bits that represents an
356	   integer or string carried as OAM data.  The OAM data is used to
357	   verify whether a packet traversed the nodes it is supposed to
358	   traverse.

360	3.8.  Use Cases

362	   In-band OAM could be leveraged for several use cases, including:

364	   o  Traffic Matrix: Derive the network traffic matrix: Traffic for a
365	      given time interval between any two edge nodes of a given domain.
366	      Could be performed for all traffic or per QoS-class.

368	   o  Flow Debugging: Discover which path(s) a particular set of traffic
369	      (identified by an n-tuple) takes in the network.  Such a procedure
370	      is particularly useful in case traffic is balanced across multiple
371	      paths, like with link aggregation (LACP) or equal cost multi-
372	      pathing (ECMP).

374	   o  Loss Statistics per Path: Retrieve loss statistics per flow and
375	      path in the network.

377	   o  Path Heat Maps: Discover highly utilized links in the network.

379	   o  Trend Analysis on Traffic Patterns: Analyze if (and if so how) the
380	      forwarding path for a specific set of traffic changes over time
381	      (can give hints to routing issues, unstable links etc.).

383	   o  Network Delay Distribution: Show delay distribution across network
384	      by node or links.  If enabled per application or for a specific
385	      flow then display the path taken along with the delay incurred at
386	      every hop.

388	   o  SLA Verification: Verify that a negotiated service level agreement
389	      (SLA), e.g., for packet drop rates or delay/jitter is conformed to
390	      by the actual traffic.

392	   o  Low-power Networks: Include application level OAM information
393	      (e.g., battery charge level, cache or buffer fill level) into data
394	      traffic to avoid sending extra OAM traffic which incur an extra
395	      cost on the devices.  Using the battery charge level as example,
396	      one could avoid sending extra OAM packets just to communicate
397	      battery health, and as such would save battery on sensors.

399	   o  Path Verification or Service Function Path Verification: Proof and
400	      verification of packets traversing check points in the network,
401	      where check points can be nodes in the network or service
402	      functions.

404	   o  Geo-location Policy: Network policy implemented based on which
405	      path packets took.  Example: Only if packets originated and stayed
406	      within the trading-floor department, access to specific
407	      applications or servers is granted.

409	4.  Considerations for In-band OAM

411	   The implementation of an in-band OAM mechanism needs to take several
412	   considerations into account, including administrative boundaries, how
413	   information is recorded, Maximum Transfer Unit (MTU), Path MTU
414	   discovery and packet size, etc.

416	4.1.  Type of Information to Be Recorded

418	   The information gathered for in-band OAM can be categorized into
419	   three main categories: Information with a per-hop scope, such as path
420	   tracing; information which applies to a specific set of nodes, such
421	   as path or service chain verification; information which only applies
422	   to the edges of a domain, such as sequence numbers.

424	   o  "edge to edge": Information that needs to be shared between
425	      network edges (the "edge" of a network could either be a host or a
426	      domain edge device): Edge to edge data e.g., packet and octet
427	      count of data entering a well-defined domain and leaving it is
428	      helpful in building traffic matrix, sequence number (also called
429	      "path packet counters") is useful for the flow to detect packet
430	      loss.

432	   o  "selected hops": Information that applies to a specific set of
433	      nodes only.  In case of path verification, only the nodes which
434	      are "check points" are required to interpret and update the
435	      information in the packet.

437	   o  "per hop": Information that is gathered at every hop along the
438	      path a packet traverses within an administrative domain:

440	      *  Hop by Hop information e.g., Nodes visited for path tracing,
441	         Timestamps at each hop to find delays along the path

443	      *  Stats collection at each hop to optimize communication in
444	         resource constrained networks e.g., Battery, CPU, memory status
445	         of each node piggy backed in a data packet is useful in low
446	         power lossy networks where network nodes are mostly asleep and
447	         communication is expensive

449	4.2.  MTU and Packet Size

451	   The recorded data at every hop may lead to packet size exceeding the
452	   Maximum Transmit Unit (MTU).  Based on the transport protocol used
453	   MTU is discovered as a configuration parameter or Path MTU (PMTU) is
454	   discovered dynamically.  Example: IPv6 recommends PMTU discovery
455	   before data packets are sent to prevent packet fragmentation.  It
456	   specifies 1280 octets as the default PDU to be carried in a IPv6
457	   datagram.  A detailed discussion of the implications of oversized
458	   IPv6 header chains if found in [RFC7112].

460	   The Path MTU restricts the amount of data that can be recorded for
461	   purpose of OAM within a data packet.  The total size of data to be
462	   recorded needs to be preset to avoid packet size exceeding the MTU.
463	   It is recommended to pre-calculate and configures network devices to
464	   limit the in-band OAM data that is attached to a packet.

466	4.3.  Administrative Boundaries

468	   There are challenges in enabling in-band OAM in the public Internet
469	   across administrative domains:

471	   o  Deployment dependent, the data fields that in-band OAM requires as
472	      part of a specific transport protocol may not be supported across
473	      administrative boundaries.

475	   o  Current OAM implementations are often done in the slow path, i.e.,
476	      OAM packets are punted to router's CPU for processing.  This leads
477	      to performance and scaling issues and opens up routers for attacks
478	      such as Denial of Service (DoS) attacks.

480	   o  Discovery of network topology and details of the network devices
481	      across administrative boundaries may open up attack vectors
482	      compromising network security.

484	   o  Specifically on IPv6: At the administrative boundaries IPv6
485	      packets with extension headers are dropped for several reasons
486	      described in [RFC7872]

488	   The following considerations will be discussed in a future version of
489	   this document: If the packet is dropped due to the presence of the
490	   in-band OAM; If the policy failure is treated as feature disablement
491	   and any further recording is stopped but the packet itself is not
492	   dropped, it may lead to every node in the path to make this policy
493	   decision.

495	4.4.  Selective Enablement

497	   Deployment dependent, in-band OAM could either be used for all, or
498	   only a subset of the overall traffic.  While it might be desirable to
499	   apply in-band OAM to all traffic and then selectively use the data
500	   gathered in case needed, it might not always be feasible.  Depending
501	   on the forwarding infrastructure used, in-band OAM can have an impact
502	   on forwarding performance.  The SPUD prototype for example uses the
503	   notion of "pipes" to describe the portion of the traffic that could
504	   be subject to in-path inspection.  Mechanisms to decide which traffic
505	   would be subject to in-band OAM are outside the scope of this
506	   document.

508	4.5.  Optimization of Node and Interface Identifiers

510	   Since packets have a finite maximum size, the data recording or
511	   carrying capacity of one packet in which the in-band OAM meta data is
512	   present is limited.  In-band OAM should use its own dedicated
513	   namespace (confined to the domain in-band OAM operates in) to
514	   represent node and interface IDs to save space in the header.
515	   Generic representations of node and interface identifiers which are
516	   globally unique (such as a UUID) would consume significantly more
517	   bits of in-band OAM data.

519	4.6.  Loop Communication Path (IPv6-specifics)

521	   When recorded data is required to be analyzed on a source node that
522	   issues a packet and inserts in-band OAM data, the recorded data needs
523	   to be carried back to the source node.

525	   One way to carry the in-band OAM data back to the source is to
526	   utilize an ICMP Echo Request/Reply (ping) or ICMPv6 Echo Request/
527	   Reply (ping6) mechanism.  In order to run the in-band OAM mechanism
528	   appropriately on the ping/ping6 mechanism, the following two
529	   operations should be implemented by the ping/ping6 target node:

531	   1.  All of the in-band OAM fields would be copied from an Echo
532	       Request message to an Echo Reply message.

534	   2.  The Hop Limit field of the IPv6 header of these messages would be
535	       copied as a continuous sequence.  Further considerations are
536	       addressed in a future version of this document.

538	5.  Requirements for In-band OAM Data Types

540	   The above discussed use cases require different types of in-band OAM
541	   data.  This section details requirements for in-band OAM derived from
542	   the discussion above.

544	5.1.  Generic Requirements

546	   REQ-G1:  Classification: It should be possible to enable in-band OAM
547	            on a selected set of traffic.  The selected set of traffic
548	            can also be all traffic.

550	   REQ-G2:  Scope: If in-band OAM is used only within a specific domain,
551	            provisions need to be put in place to ensure that in-band
552	            OAM data stays within the specific domain only.

554	   REQ-G3:  Transport independence: Data formats for in-band OAM shall
555	            be defined in a transport independent way.  In-band OAM
556	            applies to a variety of transport protocols.  Encapsulations
557	            should be defined how the generic data formats are carried
558	            by a specific protocol.

560	   REQ-G4:  Layering: It should be possible to have in-band OAM
561	            information for different transport protocol layers be
562	            present in several fields within a single packet.  This
563	            could for example be the case when tunnels are employed and
564	            in-band OAM information is to be gathered for both the
565	            underlay as well as the overlay network.

567	   REQ-G5:  MTU size: With in-band OAM information added, packets should
568	            not become larger than the path MTU.

570	   REQ-G6:  Data Structure Reusability: The data types and data formats
571	            defined and used for in-band OAM ought to be reusable for
572	            out-of-band OAM telemetry as well.

574	5.2.  In-band OAM Data with Per-hop Scope

576	   REQ-H1:  Missing nodes detection: Data shall be present that allows a
577	            node to detect whether all nodes that should participate in
578	            in-band OAM operations have indeed participated.

580	   REQ-H2:  Node, instance or device identifier: Data shall be present
581	            that allows to retrieve the identity of the entity reporting
582	            telemetry information.  The entity can be a device, or a
583	            subsystem/component within a device.  The latter will allow
584	            for packet tracing within a device in much the same way as
585	            between devices.

587	   REQ-H3:  Ingress interface identifier: Data shall be present that
588	            allows the identification of the interface a particular
589	            packet was received from.  The interface can be a logical or
590	            physical entity.

592	   REQ-H4:  Egress interface identifier: Data shall be present that
593	            allows the identification of the interface a particular
594	            packet was forwarded to.  Interface can be a logical or
595	            physical entity.

597	   REQ-H5:  Time-related requirements

599	            REQ-H5.1:  Delay: Data shall be present that allows to
600	                       retrieve the delay between two or more points of
601	                       interest within the system.  Those points can be
602	                       within the same device or on different devices.

604	            REQ-H5.2:  Jitter: Data shall be present that allows to
605	                       retrieve the jitter between two or more points of
606	                       interest within the system.  Those points can be
607	                       within the same device or on different devices.

609	            REQ-H5.3:  Wall-clock time: Data shall be present that
610	                       allows to retrieve the wall-clock time visited a
611	                       particular point of interest in the system.

613	            REQ-H5.4:  Time precision: The precision of the time related
614	                       data should be configurable.  Use-case dependent,
615	                       the required precision could e.g., be nano-
616	                       seconds, micro-seconds, milli-seconds, or
617	                       seconds.

619	   REQ-H6:  Generic data records (like e.g., GPS/Geo-location
620	            information): It should be possible to add user-defined OAM
621	            data at select hops to the packet.  The semantics of the
622	            data are defined by the user.

624	5.3.  In-band OAM with Selected Hop Scope

626	   REQ-S1:  Proof of transit: Data shall be present which allows to
627	            securely prove that a packet has visited or ore several
628	            particular points of interest (i.e., a particular set of
629	            nodes).

631	            REQ-S1.1:  In case "Shamir's secret sharing scheme" is used
632	                       for proof of transit, two data records, "random"
633	                       and "cumulative" shall be present.  The number of
634	                       bits used for "random" and "cumulative" data
635	                       records can vary between deployments and should
636	                       thus be configurable.

638	5.4.  In-band OAM with End-to-end Scope

640	   REQ-E1:  Sequence numbering:

642	            REQ-E1.1:  Reordering detection: It should be possible to
643	                       detect whether packets have been reordered while
644	                       traversing an in-band OAM domain.

646	            REQ-E1.2:  Duplicates detection: It should be possible to
647	                       detect whether packets have been duplicated while
648	                       traversing an in-band OAM domain.

650	            REQ-E1.3:  Detection of packet drops: It should be possible
651	                       to detect whether packets have been dropped while
652	                       traversing an in-band OAM domain.

654	6.  Security Considerations and Requirements

656	   General Security considerations will be addressed in a later version
657	   of this document.  Security considerations for Proof of Transit alone
658	   are discussed below.

660	6.1.  Proof of Transit

662	   Threat Model: Attacks on the deployments could be due to malicious
663	   administrators or accidental misconfigurations resulting in bypassing
664	   of certain nodes.  The solution approach should meet the following
665	   requirements:

667	   REQ-SEC1:  Sound Proof of Transit: A valid and verifiable proof that
668	              the packet definitively traversed through all the nodes as
669	              expected.  Probabilistic methods to achieve this should be
670	              avoided, as the same could be exploited by an attacker.

672	   REQ-SEC2:  Tampering of meta data: An active attacker should not be
673	              able to insert or modify or delete meta data in whole or
674	              in parts and bypass few (or all) nodes.  Any deviation
675	              from the expected path should be accurately determined.

677	   REQ-SEC3:  Replay Attacks: A attacker (active/passive) should not be
678	              able to reuse the proof of transit bits in the packet by
679	              observing the OAM data in the packet, packet
680	              characteristics (like IP addresses, octets transferred,
681	              timestamps) or even the proof bits themselves.  The
682	              solution approach should consider usage of these
683	              parameters for deriving any secrets cautiously.
684	              Mitigating replay attacks beyond a window of longer
685	              duration could be intractable to achieve with fixed number
686	              of bits allocated for proof.

688	   REQ-SEC4:  Recycle Secrets: Any configuration of the secrets (like
689	              cryptographic keys, initialisation vectors etc.) either in
690	              the controller or service functions should be
691	              reconfigurable.  Solution approach should enable controls,
692	              API calls etc. needed in order to perform such recycling.
693	              It is desirable to provide recommendations on the duration
694	              of rotation cycles needed for the secure functioning of
695	              the overall system.

697	   REQ-SEC5:  Secret storage and distribution: Secrets should be shared
698	              with the devices over secure channels.  Methods should be
699	              put in place so that secrets cannot be retrieved by non
700	              authorized personnel from the devices.

702	7.  IANA Considerations

704	   [RFC Editor: please remove this section prior to publication.]

706	   This document has no IANA actions.

708	8.  Acknowledgements

710	   The authors would like to thank Steve Youell, Eric Vyncke, Nalini
711	   Elkins, Srihari Raghavan, Ranganathan T S, Karthik Babu Harichandra
712	   Babu, Akshaya Nadahalli, and Andrew Yourtchenko for the comments and
713	   advice.  This document leverages and builds on top of several
714	   concepts described in [draft-kitamura-ipv6-record-route].  The
715	   authors would like to acknowledge the work done by the author Hiroshi
716	   Kitamura and people involved in writing it.

718	9.  Informative References

720	   [draft-brockners-proof-of-transit]
721	              Brockners, F., Bhandari, S., and S. Dara, "Proof of
722	              transit", July 2016.

724	   [draft-kitamura-ipv6-record-route]
725	              Kitamura, H., "Record Route for IPv6 (PR6),Hop-by-Hop
726	              Option Extension", November 2000.

728	   [I-D.brockners-lisp-sr]
729	              Brockners, F., Bhandari, S., Maino, F., and D. Lewis,
730	              "LISP Extensions for Segment Routing", draft-brockners-
731	              lisp-sr-01 (work in progress), February 2014.

733	   [I-D.hildebrand-spud-prototype]
734	              Hildebrand, J. and B. Trammell, "Substrate Protocol for
735	              User Datagrams (SPUD) Prototype", draft-hildebrand-spud-
736	              prototype-03 (work in progress), March 2015.

738	   [I-D.ietf-spring-segment-routing]
739	              Filsfils, C., Previdi, S., Decraene, B., Litkowski, S.,
740	              and R. Shakir, "Segment Routing Architecture", draft-ietf-
741	              spring-segment-routing-09 (work in progress), July 2016.

743	   [I-D.lapukhov-dataplane-probe]
744	              Lapukhov, P. and r. remy@barefootnetworks.com, "Data-plane
745	              probe for in-band telemetry collection", draft-lapukhov-
746	              dataplane-probe-01 (work in progress), June 2016.

748	   [P4]       Kim, , "P4: In-band Network Telemetry (INT)", September
749	              2015.

751	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
752	              DOI 10.17487/RFC0791, September 1981,
753	              <http://www.rfc-editor.org/info/rfc791>.

755	   [RFC4884]  Bonica, R., Gan, D., Tappan, D., and C. Pignataro,
756	              "Extended ICMP to Support Multi-Part Messages", RFC 4884,
757	              DOI 10.17487/RFC4884, April 2007,
758	              <http://www.rfc-editor.org/info/rfc4884>.

760	   [RFC4950]  Bonica, R., Gan, D., Tappan, D., and C. Pignataro, "ICMP
761	              Extensions for Multiprotocol Label Switching", RFC 4950,
762	              DOI 10.17487/RFC4950, August 2007,
763	              <http://www.rfc-editor.org/info/rfc4950>.

765	   [RFC5837]  Atlas, A., Ed., Bonica, R., Ed., Pignataro, C., Ed., Shen,
766	              N., and JR. Rivers, "Extending ICMP for Interface and
767	              Next-Hop Identification", RFC 5837, DOI 10.17487/RFC5837,
768	              April 2010, <http://www.rfc-editor.org/info/rfc5837>.

770	   [RFC7112]  Gont, F., Manral, V., and R. Bonica, "Implications of
771	              Oversized IPv6 Header Chains", RFC 7112,
772	              DOI 10.17487/RFC7112, January 2014,
773	              <http://www.rfc-editor.org/info/rfc7112>.

775	   [RFC7276]  Mizrahi, T., Sprecher, N., Bellagamba, E., and Y.
776	              Weingarten, "An Overview of Operations, Administration,
777	              and Maintenance (OAM) Tools", RFC 7276,
778	              DOI 10.17487/RFC7276, June 2014,
779	              <http://www.rfc-editor.org/info/rfc7276>.

781	   [RFC7665]  Halpern, J., Ed. and C. Pignataro, Ed., "Service Function
782	              Chaining (SFC) Architecture", RFC 7665,
783	              DOI 10.17487/RFC7665, October 2015,
784	              <http://www.rfc-editor.org/info/rfc7665>.

786	   [RFC7872]  Gont, F., Linkova, J., Chown, T., and W. Liu,
787	              "Observations on the Dropping of Packets with IPv6
788	              Extension Headers in the Real World", RFC 7872,
789	              DOI 10.17487/RFC7872, June 2016,
790	              <http://www.rfc-editor.org/info/rfc7872>.

792	Authors' Addresses

794	   Frank Brockners
795	   Cisco Systems, Inc.
796	   Hansaallee 249, 3rd Floor
797	   DUESSELDORF, NORDRHEIN-WESTFALEN  40549
798	   Germany

800	   Email: fbrockne@cisco.com

802	   Shwetha Bhandari
803	   Cisco Systems, Inc.
804	   Cessna Business Park, Sarjapura Marathalli Outer Ring Road
805	   Bangalore, KARNATAKA 560 087
806	   India

808	   Email: shwethab@cisco.com
809	   Sashank Dara
810	   Cisco Systems, Inc.
811	   Cessna Business Park, Sarjapura Marathalli Outer Ring Road
812	   Bangalore, KARNATAKA 560 087
813	   India

815	   Email: sadara@cisco.com

817	   Carlos Pignataro
818	   Cisco Systems, Inc.
819	   7200-11 Kit Creek Road
820	   Research Triangle Park, NC  27709
821	   United States

823	   Email: cpignata@cisco.com

825	   Hannes Gredler
826	   RtBrick Inc.

828	   Email: hannes@rtbrick.com