<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="no"?>
<?rfc subcompact="no"?>
<?rfc authorship="yes"?>
<?rfc tocappendix="yes"?>

<rfc category="std" ipr="trust200902" docName="draft-theoleyre-raw-oam-support-01">


<front> 

    <title abbrev="OAM features for RAW">Operations, Administration and Maintenance (OAM) features for RAW</title>
   
 
 
	<author initials="F." surname="Theoleyre" fullname="Fabrice Theoleyre">
    	<organization>CNRS</organization>
     	<address>
        	<postal>
            	<street>Building B</street>
             	<street>300 boulevard Sebastien Brant - CS 10413</street>
             	<city>Illkirch - Strasbourg</city>
             	<code>67400</code>
             	<country>FRANCE</country>
         	</postal>
         	<phone>+33 368 85 45 33</phone>
            <email>theoleyre@unistra.fr</email>
            <uri>http://www.theoleyre.eu</uri>
     	</address>
 	</author>

    <author initials="G.Z." surname="Papadopoulos" fullname="Georgios Z. Papadopoulos">
      <organization>IMT Atlantique</organization>
      <address>         
        <postal>            
           <street>Office B00 - 102A</street>
           <street>2 Rue de la Ch&acirc;taigneraie</street>
           <city>Cesson-S&eacute;vign&eacute; - Rennes</city>
            <code>35510</code>
            <country>FRANCE</country>
         </postal>
         <phone>+33 299 12 70 04</phone>
    <email>georgios.papadopoulos@imt-atlantique.fr</email>
      </address>
    </author>


   <date/>

   <workgroup>RAW</workgroup>

   <abstract>
   <t>
   The wireless medium presents significant specific challenges to achieve
   properties similar to those of wired deterministic networks. At the same
   time, a number of use cases cannot be solved with wires and justify the
   extra effort of going wireless. 
   This document presents some of these use-cases.
   </t>
   </abstract>
   

</front>

<middle>



<!--section title="TEMPORARY EDITORIAL NOTES">

   <t>
		This document is an Internet Draft, so it is work-in-progress by nature.
   		It contains the following work-in-progress elements:
   </t>	
     		 
   <t>
   	<list style="symbols">
		<t>"TODO" statements are elements which have not yet been written by the authors 
		for some reason (lack of time, ongoing discussions with no clear consensus, etc).
		The statement does indicate that the text will be written at some time.</t>
   	</list>
   </t>
   		 
</section --> 



<section title="Introduction">
	<t>
		Reliable and Available Wireless (RAW) is an effort that extends DetNet 
		to approach end-to-end deterministic performances over a network that 
		includes scheduled wireless segments.
		The wireless and wired media are fundamentally different at the physical level.
		Enabling thus reliable and available wireless communications is even more
		challenging than it is in wired IP networks, due to the numerous causes of loss 
		in transmission that add up to the congestion losses and the delays 
		caused by overbooked shared resources.
		To provide quality of service along a multihop path that is composed 
		of wired and wireless hops, additional methods needs to be considered 
		to leverage the potential lossy wireless communication.
	</t>
	
	<t>
		Traceability belongs to Operations, Administration, and Maintenance (OAM) 
		which is the toolset for fault detection and isolation, and for performance 
		measurement.  
		More can be found on OAM Tools in <xref target="RFC7276"/>.
	</t>
   
    <t>
    	The main purpose of this document is to detail the requirements of the OAM 
    	features recommended to construct a predictable communication infrastructure 
    	on top of a collection of wireless segments. 
      	This document describes the benefits, problems, and trade-offs for using OAM in 
    	wireless networks to provide availability and predictability.
	</t>
    
   <t>
      	In this document, the term OAM will be used according to its definition specified 
      	in <xref target="RFC6291"/>.
      	We expect to implement an OAM framework in RAW networks to maintain a real-time 
      	view of the network infrastructure, and its ability to respect the Service Level 
      	Agreements (SLA), such as delay and reliability, assigned to each data flow.
   </t>
 
 
	<section title="Terminology">
    	<t>
        	<list style="symbols">
            	<t>
               		OAM entity: a data flow to be controlled;
            	</t>
            
            	<t>
               		OAM end-devices: the source or destination of a data flow;
            	</t>
         
         		<t>
            		defect: a temporary change in the network characteristics (e.g. link 
            		quality degradation because of temporary external interference, a 
            		mobile obstacle)
         		</t>
         
         		<t>
            		fault: a definite change which may affect the network performance, 
            		e.g. a node runs out of energy,
            	</t>
         	</list>
    	</t>
	</section>
 
</section>





<!-- INTEREST OF OAM          -->
<section title="Needs for OAM in RAW">
	<t>
      RAW networks expect to make the communications reliable and predictable on top of a
      wireless network infrastructure. 
      Most critical applications will define a SLA to respect for the data flows it 
      generates.
      RAW considers network plane protocol elements such as OAM to improve the RAW operation 
      at the service and at the forwarding sub-layers.
	</t>


	<t>
      To respect strict guarantees, RAW relies on a Path Computation Element (PCE) which 
      will be responsible to schedule the transmissions in the deployed network.
      Thus, resources have to be provisioned a priori to handle any defect. 
      OAM represents the core of the over provisioning process, and maintains the network 
      operational by updating the schedule dynamically.
	</t>


	<t>
      Fault-tolerance also assumes that multiple path have to be provisioned
      so that an end-to-end circuit keeps on existing whatever the conditions.
      OAM is in charge of controlling the replication/elimination processes.
	</t>


	<t>
      To be energy-efficient, reserving some dedicated out-of-band resources
      for OAM seems idealistic, and only in-band solutions are considered here.
	</t>


	<t>
       RAW supports both proactive and on-demand troubleshooting.
	</t>
</section>





<!--  OPERATION          -->
<section title="Operation">
	<t>
    	OAM features will enable RAW with robust operation both for forwarding and routing 
    	purposes.
	</t>


    <section title="Connectivity Verification">
		<t>
			We need to verify that two endpoints are connected with each other.
         	Since we reserve resources along the path independently for each flow,
         	we must be able to verify that the path exists for a given flow label.
		</t>


		<t>
			The control and data packets may not follow the same path, and the
         	connectivity verification has to be triggered in-band without impacting
         	the data traffic. 
         	In particular, the control plane may work while the data plane may be broken.
		</t>


		<t>
        	The ping packets must be labeled in the same way as the data packets of the 
        	flow to monitor.
        	<!-- GP: I do not understand this sentence-->
		</t>
	</section>
        

    <section title="Route Tracing">
		<t>
       		Ping and traceroute are two very common tools for diagnostic. 
       		<!-- GP: we may check with Pascal, if he agrees, we may introduce the concepts of Bitmap-->
       		They help to identify the list of routers in the route. 
       		However, to be predictable, resources are reserved per flow in RAW. 
       		Thus, we need to define route tracing tools able to track the route for a 
       		specific flow.
		</t>


		<t>
        	Because the network has to be fault-tolerant, multipath can be considered, with 
        	multiple Maintenance Intermediate Endpoints for each hop in the path.
			Thus, all the possible paths between two maintenance endpoints should be 
			retrieved.
        </t>
    </section>

   
	<section title="Fault verification / detection">
		<t>
      		RAW expects to operate fault-tolerant networks. 
      		Thus, we need mechanisms able to detect faults, before they impact the network 
      		performance.
		</t>


		<t>
     		The network has to detect when a fault occurred, i.e. the network has deviated 
     		from its expected behavior. 
     		While the network must report an alarm, the cause may not be identified 
     		precisely. 
     		For instance, the end-to-end reliability has decreased significantly, or a 
     		buffer overflow occurs.
		</t>

  
   		<t>
   			<!-- GP: we may consider a meeting with Pascal or asking him to review the draft-->
   			<!-- GP: I think, we may get inputs about the statistics from Bitmaps that can be included in the ACKs-->
			We have to minimize the amount of statistics / measurements to exchange:
      		<list style="symbols">
         	<t>
            	energy efficiency: low-power devices have to limit the volume of monitoring
            	information since every bit consumes energy.
         	</t>
         	<t>
            	bandwidth: wireless networks exhibit a bandwidth significantly lower than
            	wired, best-effort networks.
         	</t>
            <t>
                per-packet cost: is is often more expensive to send several packets instead
                of combining them in a single link-layer frame.
               <!--FT: already said later (i.e. technique to exploit this characteristic)
               Thus, piggybacking is a very common technique oo insert additionnal control
                information if the packet size doesn't exceed the MTU.
               -->
            </t>
      		</list>
      
      		Thus, localized and centralized mechanisms have to be combined together, and
      		additional control packets have to be triggered only after a fault detection.
		</t>
	</section>


	<section title="Fault isolation / identification">
    	<t>
        	The network has isolated and identified the cause of the fault. 
        	For instance, the quality of a specific link has decreased, requiring more 
        	retransmissions, or the level of external interference has locally increased.
		</t>
	</section>
</section>



<!--    ADMINISTRATION          -->
<section title="Administration">
   
   <t>
      To take proper decisions, the network has to expose a collection of metrics, including:
      
      <list style="symbols">
         <t>
            Packet losses: the time-window average and maximum values of the
            number of packet losses has to be measured. Many critical applications
            stop to work if a few consecutive packets are dropped;
            </t>
         
         <t>
            Received Signal Strength Indicator (RSSI) is a very common metric
            in wireless to denote the link quality. The radio chipset is in charge
            of translating a received signal strength into a normalized
            quality indicator;
            </t>
         
         <t>
            Delay: the time elapsed between a packet generation / enqueuing
            and its reception by the next hop;
            </t>
         
         <t>
            Buffer occupancy: the number of packets present in the buffer,
            for each of the existing flows.
            </t>
         
      </list>
      </t>
   <t>
      These metrics should be collected:
      <list style="symbols">
         <t>
            per virtual circuit to measure the end-to-end performance for a given flow.
            Each of the paths has to be isolated in multipath strategies;
            </t>
         
         <t>per radio channel to measure e.g. the level of external interference,
            and to be able to apply counter-measures (e.g. blacklisting)
            </t>
         
         <t>
            per device to detect misbehaving node, when it relays the packets of
            several flows.
            </t>
         </list>
   </t>
   
   
   <section title="Worst-case metrics">
      <t>
         RAW aims to enable real-time communications on top of an heterogeneous architecture.
         Since wireless networks are known to be lossy, RAW has to implement strategies to improve the reliability on top of unreliable links.
         Hybrid Automatic Repeat reQuest (ARQ) has typically to enable retransmissions
         based on the end-to-end reliability and latency requirements.
      </t>
      
      <t>
         To take correct decisions, the controller needs to know the distribution
         of packet losses for each flow, and for each hop of the paths.
         In other words, average end-to-end statistics are not enough.
         They must allow the controller to predict the worst-case.
         </t>
      
   </section>
   
   <section title="Energy efficiency constraint">
      <t>
         RAW targets also low-power wireless networks, where energy represents a key
         constraint.
         Thus, we have to cake care of the energy and bandwidth consumption.
         The following techniques aim to reduce the cost of such maintenance:
         <list>
            <t>
               piggybacking: some control information are inserted in the data packets
               if they do not fragment the packet (i.e. the MTU is not exceeded).
               Information Elements represent a standardized way to handle such information;
               </t>
            
            <t>
               flags/fields: we have to set-up flags in the packets to monitor to be able
               to monitor the forwarding process accurately.
               A sequence number field may help to detect packet losses.
               Similarly, path inference tools such as <xref target="ipath"/> insert
               additional information in the headers to identify the path followed by a
               packet a posteriori.
               </t>
            </list>
         
         </t>
      
      </section>
   
   </section>
   
   
   
   
   
   
   


<!--   MAINTENANCE          -->

<!-- speak about predictive maintenance? -->

   <section title="Maintenance">
      <t>
         RAW needs to implement a self-healing and self-optimization approach.
         The network must continuously retrieve the state of the network,
         to judge about the relevance of a reconfiguration, quantifying:
         <list>
            <t> the cost of the sub-optimality: resources may not be used
               optimally (e.g. a better path exists);
            </t>
            <t>
               the reconfiguration cost: the controller needs to trigger some
               reconfigurations.
               For this transient period, resources may be twice reserved, and control packets have to be transmitted.
            </t>
         </list>
         Thus, reconfiguration may only be triggered if the gain is significant.
         </t>
      
     
      

      <section title="Multipath">
           <t>
               To be fault-tolerant, several paths can be reserved between two maintenance
               endpoints. 
               They must be node-disjoint, so that a path can be available at any time.
               </t>
           
           </section>
           
      <section title="Replication / Elimination">
          <t>
              When multiple paths are reserved between two maintenance endpoints,
              they may decide to replicate the packets to introduce redundancy, and
              thus to alleviate transmission errors and collisions. 
              For instance, in <xref target="fig_replication"/>, the source node S is 
              transmitting the packet to both parents, nodes A and B.              
              Each maintenance endpoint will decide to trigger the replication / 
              elimination process when a set of metrics passes through a threshold value.
          </t>
        
          <figure anchor="fig_replication" align="center"
            title="Packet Replication: S transmits twice the same data packet, to its DP
            (A) and to its AP (B).">
            <artwork align="center"><![CDATA[
 
               ===> (A) => (C) => (E) === 
             //        \\//   \\//       \\
   source (S)          //\\   //\\         (R) (root)
             \\       //  \\ //  \\      //
               ===> (B) => (D) => (F) ===

            ]]></artwork>
          </figure>
          
	  </section>
         
      <section title="Resource Reservation">
          <t>
            Because the QoS criteria associated to a path may degrade, the network has
            to provision additional resources along the path. We need to provide
            mechanisms to patch a schedule (changing the channel offset, allocating more
            timeslots, changing the path, etc.).
            </t>
          
        </section>
      
      <section title="Soft transition after reconfiguration">
          <t>
              Since RAW expects to support real-time flows, we have to support soft-reconfiguration,
              where the novel ressources are reserved before the ancient ones are released.
              Some mechanisms have to be proposed so that packets are forwarded through the
              novel track only when the resources are ready to be used, while maintaining
              the global state consistent (no packet re-ordering, duplication, etc.)
              </t>
          
        
              
        </section>
      </section>

   

      
      
</middle>

<back>
   
   <!--
    <references title="Normative References">
    </references>
    -->
   
   <references title="Informative References">
		<?rfc include="reference.RFC.6291"?> <!-- OAM definition -->
		<?rfc include="reference.RFC.7276"?> <!-- An Overview of Operations, Administration, and Maintenance (OAM) Tools -->
  
      
      <reference anchor="ipath" target="https://doi.org/10.1109/TNET.2014.2371459">
         <front>
            <title>iPath: path inference in wireless sensor networks.</title>
            <author initials="Y." surname="Gao" fullname="Y.Gao"></author>
            <author initials="W." surname="Dong" fullname="W. Dong"></author>
            <author initials="C." surname="Chen" fullname="C. Chen"></author>
            <author initials="J." surname="Bu" fullname="J. Bu"></author>
            <author initials="W." surname="Wu" fullname="W. Wu"></author>
            <author initials="X." surname="Liu" fullname="X. Liu"></author>
            <date year="2016"/>
            <!--
             IEEE/ACM Transactions on Networking (TON) archive
             Volume 24 Issue 1, February 2016
             Pages 517-528
             -->
         </front>
      </reference>
      
      
   </references>
   
   
   

</back>

</rfc>
