<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced. 
     An alternate method (rfc include) is described in the references. -->
<!ENTITY IOAMReq SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.draft-brockners-inband-oam-requirements-02.xml">
<!ENTITY IOAMData SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.draft-brockners-inband-oam-data-02.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/authoring/README.html. -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation --> <!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="3"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="exp" docName="draft-song-ippm-ioam-scalability-01"
     ipr="trust200902">
  <front>
    <title abbrev="IOAM Scalability">
         On Scalability of In-situ OAM </title>

    <author fullname="Haoyu Song" initials="H." surname="Song" role="editor">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street>2330 Central Expressway</street>

          <city>Santa Clara, 95050</city>

          <country>USA</country>
        </postal>

        <email>haoyu.song@huawei.com</email>
      </address>
    </author>

    <author fullname="Tianran Zhou" initials="T." surname="Zhou">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street>156 Beiqing Road</street>

          <city>Beijing, 100095</city>

          <country>P.R. China</country>
        </postal>

        <email>zhoutianran@huawei.com</email>
      </address>
    </author>

    <date month="June" year="2017"/>

    <area>OPS &amp; Management</area>

    <workgroup>ippm</workgroup>

    <keyword>in-situ OAM, data extension, scalability</keyword>

    <abstract>
      <t>
	 This document describes several potential scalability issues when implementing in-situ OAM based on the current in-situ OAM documents and proposes the corresponding solutions and modifications to the current in-situ OAM specification. Specifically, we extend in-situ OAM to support 
	 more standard tracing data than is currently defined and add new features to avoid limitations on MTU, bandwidth, forwarding path length, 
	 and node processing capability. We provide use cases to motivate our proposal and base the changes on the current in-situ OAM header format specification.   
      </t>
    </abstract>
  </front>

  <middle>

    <section title="Introduction">
      <t>
      <xref target="I-D.brockners-inband-oam-requirements">In-situ OAM (iOAM)</xref> records OAM information within user packets while the packets traverse a network. 
      The data types and data formats for in-situ OAM data records have been defined in <xref target="I-D.brockners-inband-oam-data"></xref>. 
      We identify several scalability issues for implementing the
      current iOAM specification and propose solutions in this draft. 
      </t>
      </section>
      
      <section title="Motivation for Better iOAM Scalability">
      <section title="Support Data Type Extensions">
        <t>
		Currently 11 data types and associated formats (including wide format and short format of the same data) are defined in <xref target="I-D.brockners-inband-oam-data"></xref> . 
		The presence of data is indicated by a 16-bit bitmap in the "OAM-Trace-Type" field.    
		</t><t>
	  In the current specification only five bits are left to identify new data types. 
	  Moreover, some data is forced to be bundled together as a single unit to save bitmap 
          space and pack data to the ideal size (e.g., the hop limit and the node id are bundled, 
          and the ingress interface id and the egress interface id are bundled), regardless of the fact that an application may only ask for a part of the data. 
          Last but not the least, each data is forced to be 4-byte aligned for easier access, resulting in waste of header space in many cases.   
        </t><t>
          Since the data plane bandwidth, the data plane packet processing, and the management plane data handling are all precious yet scarce resource, the scheme should strive to be simple and precise. 
          The application should be able to control the exact type and format of data it needs to collect and analyze. 
          It is conceivable that more types of data may be introduced in the future. However, the current scheme cannot support it after all the bits in the bitmap are used up.    
        </t><t>
          Currently, bit 7 is used to indicate the presence of variable length opaque state snapshot data. 
          While this data field can be used to store arbitrary data, 
          the data is difficult to be standardized and another schema is needed to decode the data, which may lead to low data plane performance.  
        </t>
	<section title="Motivating Use Cases">
		<t>
			When a flow traverses a series of middleboxes (e.g., Firewall, NAT, and load balancer), its identity (e.g., the 5-tuple) 
			is often altered, which makes the OAM system lose track of the flow trace. In this case, we may want to copy some of the original 
			packet header fields into the iOAM header so the original flow can be identified at any point of the network.  
	</t><t>
		In wireless, mobile, and optical network environments, some physical data associated with a flow (e.g., power, temperature, signal strength, GPS location) need to be collected to monitor the service performance. 
		</t><t>
		Both cases require new iOAM data types. More examples are listed in <xref target="uc1"></xref>.
	</t>
	</section>

      </section>
      
      <section title ="Cope with Packet Size Limitation">
        <t>
	  The total size of data is limited by the MTU. When the number of required data types is large and the forwarding path length is long, it is possible that there is not enough
	  space in the iOAM header to save the data. The current proposal is to label the overflow status and stop adding new node data to the packet, leading to loss of
	  information.
	</t><t>
	  Even if the header has enough space to hold the iOAM data, the overhead may be too large and consume too much bandwidth. 
	  For example, if we assume moderate 20 bytes of data per node, a path with length of 10 will need 200 bytes to hold the data. 
	  This will inflate small 64-byte packets by more than four times. Even for the largest packet size (e.g., 1500 bytes),
	  the overhead (>10%) is not negligible.  
	  Therefore, we need to limit the iOAM data overhead without sacrificing the data collection capability. 
	  </t><t>
	  Here we have another interesting related issue. Packets can be dropped anywhere in a network for various reasons. 
	  If we can only collect iOAM data at the path end, we
	  lose all data from the dropped packets and have no idea where the packets are dropped. This defies the purpose of iOAM and makes those 
	  iOAM-enabled nodes work in vain. 
  </t>
  <section title="Motivating Use Cases">
		<t>
		Some use cases are described in <xref target="uc2"></xref>.
	</t>
</section>
      </section>
      
      <section title ="Adapt to Node Processing Capability">
	 <t>
           iOAM can designate the flow to add the iOAM header and collect data on the flow forwarding path. The flow can have arbitrary granularity. However, processing the data can be a heavy burden
	   for the network nodes, especially when some data needs to be calculated by the node (e.g., the transit delay). If the flow traffic is heavy, the node may not be able to handle the iOAM processing
	   so many performance issues may occur, such as long latency and packet drop. 
	 </t><t>
	   Although it is good for the OAM applications to gain the detailed information on every packet at every node, in many cases, such information is often repetitive and redundant. 
	   The large quantity of data would also burden the management plane which needs to collect and stream the data for analytics. It is also possible that some nodes cannot 
	   provide the requested data at all or are unwilling to provide some data for security or privacy concerns.
	   So a trade-off is needed to balance the performance impact and the data availability and completeness. 
   </t>
   <section title="Motivating Use Cases">
	   <t>
		   To minimize the network impact, a network operator decides to collect the iOAM data only for initial and last flow packets (e.g., TCP packets 
		   with SYN, FIN, and RST flags).  
	   </t><t>
	   A head node alternates two iOAM headers with each requesting a subset of iOAM data. Hence, each node on the flow path only needs to handle 
	   partial data. The requests can be balanced without exhausting the network nodes. 
	   </t><t>
	   A node is temporarily under heavy traffic load. It is in danger of dropping packets if it tries to satisfy all the iOAM data requests. In this case, 
	   it would rather deny some requests than drop user traffic.
		</t><t>
		More examples are listed in <xref target="uc3"></xref>.
	</t>
   </section>
      </section>

    </section>

    <section title="Scalable Data Type Extension">
       <t>
         Based on the observation in Section 2.1, we propose a method for data type encoding which can solve the current limitation and address future data requirements.  
       </t>
       <section title ="Data Type Bitmap">    
	    <t>
	    Bitmap is simple and efficient data structure for high performance data plane implementation. The base bitmap size is kept to be 16 bits. 
	    We use one bit to indicate a single type of data in a single format. The last bit in the bitmap (i.e., bit 15), if set, is used to indicate 
	    the presence of the next data type bitmap, which is 32 bits long. In the second bitmap, bit 31 is again reserved to indicate a third bitmap, and so on. 
	    With each extra bitmap, 31 more data types can be defined. 	    
	    </t><t>
	    <xref target="figure_1"></xref> shows an example of the in-situ OAM header format with two extended OAM trace type fields. Except the OAM Trace Type fields, all other fields remain 
	    the same as defined in <xref target="I-D.brockners-inband-oam-data"></xref>.  
	    </t><t>
		<figure title="Extended OAM Trace Type Header Format" anchor="figure_1">  
	          <artwork>
                  
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Base OAM Trace Type      |1| Length Field  |    Flags      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Extended OAM Trace Type 1                    |1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Extended OAM Trace Type 2                    |0|   
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                  Node Data List []                            |
   |                                                               | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        	  </artwork>
         	</figure>
	    </t><t>
	    The specification of the Base OAM Trace Type is the same as the OAM Trace Type in <xref target="I-D.brockners-inband-oam-data"></xref> except the last bit, which is defined as follows:
	    </t><t>
<list style="symbols">
	<t> Bit 15: When set indicates presence of next bit map.</t>
</list>
</t><t>

The OAM trace type fields are labeled as Base OAM Trace Type, Extended OAM Trace Type 1, Extended OAM Trace Type 2, and so on. The Base OAM Trace Type is always present.
If no data type is asked by the application in Extended OAM Trace Type n and beyond, then the last bit in the previous bitmap is set to 1 and these extended fields are not included in the header. 
On the other hand, to eliminate ambiguity, if any data is asked for by the application in Extended OAM Trace Type n, then Extended OAM Trace Type 1 to (n-1) must be included in the header, 
even though no data type in these bitmaps are needed (i.e., all zero bitmap except the last bit).

</t><t>

The actual data in a node is packed together in the same order as listed in the OAM Trace Type bitmap. Each node is padded to be the multiple of 4 bytes. 

</t>
    </section>

    <section title="Scalable Data Type Extension Use Cases" anchor="uc1">

	    <t> 
		    New types of data can be potentially added and standardized, which demand new bits allocated in the OAM Trace Type bitmaps. Some examples are listed here.

		    </t><t>
		    <list style="symbols">
			    <t> Metered flow bandwidth.</t>
			    <t> Time gap between two consecutive flow packets. </t>
			    <t> Remaining time budget to the packet delivery deadline. </t> 
			    <t> Buffer occupancy on the Node. </t>
			    <t> Queue depth on each level of hierarchical QoS queues. </t>
			    <t> Packet jitter at the Node.</t>
			    <t> Current packet IP addresses. </t>
			    <t> Current packet port numbers. </t>
			    <t> Other node statistics. </t>
		    </list>
         </t>

    </section>
   
    <section title ="Consideration for Data Packing">
	    <t>
		    The length of each data must be the multiple of 2 bytes. However, allowing different data type to have different length, while efficient in storage, makes data 
		    alignment and packing difficult.  
	    </t>
	    <t>
		    If we can define the maximum number of data types that can be carried per packet, the offset of each data in the node can be pre-calculated and carried in the iOAM header. 
		    The overhead can be justified by the overall space saving of the node data list. Otherwise, 
		    each data's offset in the node must be calculated in each device, with the help of a table which stores the size of each data type. We can also arrange the bitmap to reflect the 
		    data availability order in the system (e.g.,  the bit for egress_if_id must be after the bit for ingress_if_id), so in a pipeline-based system, the required data can be packed one
		    after one.
	    </t>
    </section>
    <section title ="Other Data Extension Possibilities">
	    <t>	    Bitmap is simple and support parallel processing in hardware, however, it is not the only option to support data type extension. For example, 
	    cascaded TLV can be used to support arbitrary number of new data types. 
    </t>
    </section>

  </section>

  <section title = "Segment In-situ OAM">

       <t>
         Based on the observation in Section 2.2, we propose a method to limit the size of the node data list.  
       </t>
    <section title="Segment and Hops">
      <t>
	A hop is a node on a flow's forwarding path which is capable of processing iOAM data. A segment is a fixed number hops on a flow's forwarding path.      
	While working in the "per hop" mode, the segment size (SSize) and the remaining hops (RHop), is added to the iOAM header at the edge. Initially, RHop is equal to SSize. 
	At each hop, if RH is not zero, the node data is added to the node data list at the corresponding location and then RH is decremented by 1.  
	If RH is equal to 0 when receiving the packet, the node needs to remove (in incremental trace option) or clear (in pre-allocated trace option) the iOAM node data list and 
	reset RHop to SSize. Then the node will add its data to the node data list as if it is the edge node. 
      </t><t>
        <xref target="figure_2"></xref> shows the proposed in-situ OAM header format. The last bit (bit 31) in the Flags field is used to indicate the current header is a segment iOAM header.
	In this context, the third byte of the first word is partitioned into two 4-bit piece. The first piece is used to save the segment size and the second piece is used to save the remaining hops.
	This limits the maximum segment size to 15. 
      </t>

    <t><figure title="Segment iOAM Header Format" anchor="figure_2">  
          <artwork>
                  
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Base OAM Trace Type      |0| SSize | RHop  |    Flags    |1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                  Node Data List []                            |
   |                                                               | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       	  </artwork>
    </figure></t>

    </section>

    <section title = "Considerations for Data Handling">
      <t>
	      At any hop when RHop is equal to 0, the node data list is copied from the iOAM header. The data can be encapsulated and reported to the controller or the edge node as configured. 
	      The encapsulation and report method is beyond the scope of this draft but should be comply with the method used by the iOAM edge node.    
     </t><t>
              The actual size of the last segment may not be equal to SSize but this is not a problem.
      </t>
    </section>

    <section title = "Segment iOAM Use Cases" anchor="uc2">
	    <t> Segment iOAM is necessary in the following example scenarios:</t><t>
	<list style="symbols">
          <t>
	    Segment iOAM can be used to detect at which segment the flow packet is dropped. If the SSize is set to 1, then the exact drop node can be identified. 
	    The iOAM data before the dropping point is also retained.
	  </t><t>
	  The path MTU allows to add at most k node data in the list to avoid fragmentation. Therefore SSize is set to k and at each hop where RHop is 0, the node data list is retrieved and sent
            in a standalone packet. 
	  </t><t>
            A flow contains mainly short packets and travels a long path. It would be inefficient to keep a large node data list in the packet so the network bandwidth utilization rate is low. 
	    In this case, segment iOAM can be used to limit the ratio of the iOAM data to the flow packet payload. 
	  </t><t>
	  The network allows at most n bytes budget for the iOAM data. There is a tradeoff between the number of data types that can be collected and the number of hops for data collecting. 
	  The segment size is therefore necessary to meet the application's data requirement (i.e., SSize * Node Data Size &lt; n).
    	  </t>
	</list>
      </t>
    </section>

    </section>

    <section title = "In-situ OAM Sampling and Data Validation">
       <t>
         Based on the observation in Section 1.3,    
	 the source edge node should be able to define either the period or the probability to add the iOAM header to the selected flow packet. In this way, only a subset of the flow/sec packets would 
	       carry the OAM data, which not only reduces the overall iOAM data quantity but also reduces the processing work load of the network nodes. 
	       </t>
	       
	       <section title ="Valid Node Bitmap and Valid Data Bitmap">       
        <t>
		It is possible that even an iOAM capable node will not add data to the node data list as requested. In some cases, a node can be too busy to handle the data request or 
		some types of the requested data is not available. Therefore, we propose to add two bitmaps, a valid node bitmap and a valid data bit, to the iOAM specification. </t>
	<t>
		The Node Valid Bitmap is inserted before the Node Data List as shown in <xref target="figure_3"></xref>. Each bit in the bitmap corresponds to a hop on the packet's forwarding path. The
		bits are listed in the same order as the hop on the packet's forwarding path. The bitmap is cleared to all zero at first. If a hop can add data to the Node Data List, the corresponding 
		bit in Node Valid Bitmap is set to 1. The bit location for a hop can be calculated from the length field (e.g, the bit index is equal to SSize-RHop).The valid node data items in the node data 
		list is equal to the number of 1's in the Node Valid Bitmap.</t>	

    <t><figure title="Segment iOAM Header Format" anchor="figure_3">  
          <artwork>
                  
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Base OAM Trace Type      |0| Length Field  |    Flags      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Valid Node Bitmap                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                  Node Data List []                            |
   |                                                               | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       	  </artwork>
    </figure></t>

    <t>For each node data in the node data list,       	
	    a Valid Data Bitmap is added before the node data. 
	    The number of bits in the Valid Data Bitmap is equal to the number of 1's in the OAM Trace Type bitmaps (excluding the next trace type bitmap indicator bits). When the bit is set, the 
	 corresponding data is valid in the node; otherwise, the corresponding data is invalid so the management plane should ignore it after the data is collected. 
       </t><t>
         The size of the bitmap can be padded to two or four bytes, which allow up to 16 or 32 types of data to be included in a node.
       </t>

       </section>
       <section title = "iOAM Sampling and Data Validation Use Cases" anchor="uc3">
	       <t>
	       We give some examples to show the usefulness of in-situ OAM sampling and data validation features. 
	       </t><t>
		       <list style="symbols">
			       <t> An application needs to track a flow's forwarding path and knows the path will not change frequently, so it sets a low sampling rate to periodically insert the iOAM header 
				      to request the node ID. </t> 
			      <t>  In a heterogeneous data plane, some nodes support to provide data x but the other nodes do not support it. However, an application is still interested in collecting data x
				      if available. In this case, iOAM header can still be configured to ask for data x but the nodes that cannot provide the data simply invalidates it by resetting the 
				      corresponding bit in the valid data bitmap.  
			    </t><t> 
                                   Multiple sampling rate and multiple data request schema can be defined for a flow based on applications requirements and the data property, so for a flow packet, 
				   there can be no iOAM header or different iOAM headers. The node does not need to process all data all the time. 
		   </t><t>
		   For security reason, a node decides to not participate in the iOAM data collection. While it processes the other iOAM header fields as usual, it does not set the node valid 
		   bit in the Node Valid Bitmap and add node data to the Node Data List. </t> 
		       </list>
	       </t>
       </section>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t> 
        There is no extra security considerations beyond those have been identified by in-situ OAM protocol.  
      </t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t>
    </section>

    <section anchor="Acknowledgments" title="Acknowledgments">
      <t>We would like to thank Frank Brockners and Carlos Pignataro for helpful comments and suggestions.</t>
    </section>
    
    <section anchor="Contributors" title="Contributors">
	    <t>The document is inspired by numerous discussions with James N. Guichard. He also provided significant comments and suggestions to help improve this document.</t>
    </section>

  </middle>

  <back>
    <references title="Informative References">
      &IOAMReq;
      &IOAMData;
    </references>
  </back>
</rfc>
