<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="no"?>
<?rfc subcompact="no"?>
<?rfc authorship="yes"?>
<?rfc tocappendix="yes"?>

<rfc category="std" ipr="trust200902" docName="draft-theoleyre-raw-oam-support-00">


<front> 

    <title abbrev="OAM features for RAW">Operations, Administration and Maintenance (OAM) features for RAW</title>
   
 
 
	<author initials="F." surname="Theoleyre" fullname="Fabrice Theoleyre">
    	<organization>CNRS</organization>
     	<address>
        	<postal>
            	<street>Building B</street>
             	<street>300 boulevard Sebastien Brant - CS 10413</street>
             	<city>Illkirch - Strasbourg</city>
             	<code>67400</code>
             	<country>FRANCE</country>
         	</postal>
         	<phone>+33 368 85 45 33</phone>
            <email>theoleyre@unistra.fr</email>
            <uri>http://www.theoleyre.eu</uri>
     	</address>
 	</author>

    <author initials="G.Z." surname="Papadopoulos" fullname="Georgios Z. Papadopoulos">
      <organization>IMT Atlantique</organization>
      <address>         
        <postal>            
           <street>Office B00 - 102A</street>
           <street>2 Rue de la Ch&acirc;taigneraie</street>
           <city>Cesson-S&eacute;vign&eacute; - Rennes</city>
            <code>35510</code>
            <country>FRANCE</country>
         </postal>
         <phone>+33 299 12 70 04</phone>
    <email>georgios.papadopoulos@imt-atlantique.fr</email>
      </address>
    </author>


   <date/>

   <workgroup>RAW</workgroup>

   <abstract>
   <t>
   The wireless medium presents significant specific challenges to achieve
   properties similar to those of wired deterministic networks. At the same
   time, a number of use cases cannot be solved with wires and justify the
   extra effort of going wireless. 
   This document presents some of these use-cases.
   </t>
   </abstract>
   

</front>

<middle>



<!--section title="TEMPORARY EDITORIAL NOTES">

   <t>
		This document is an Internet Draft, so it is work-in-progress by nature.
   		It contains the following work-in-progress elements:
   </t>	
     		 
   <t>
   	<list style="symbols">
		<t>"TODO" statements are elements which have not yet been written by the authors 
		for some reason (lack of time, ongoing discussions with no clear consensus, etc).
		The statement does indicate that the text will be written at some time.</t>
   	</list>
   </t>
   		 
</section --> 



<section title="Introduction">
	<t>
		RAW (Reliable and Available Wireless) is an effort to provide deterministic behavior over a network that includes a wireless physical layer.
		Enabling the wireless communication reliable and available is even more challenging than it is with wires, due to the numerous causes of loss in transmission that add up to the congestion losses and the delays caused by overbooked shared resources.
		To provide quality of service along a multihop path that is composed of wired and wireless hops, additional methods needs to be considered to leverage the potential lossy wireless communication.
	</t>
	
	<t>
		Traceability belongs to Operations, Administration, and Maintenance (OAM) which is the toolset for fault detection and isolation, and for performance measurement.  
		More can be found on OAM Tools in <!--<xref target="RFC7276"/>-->.
	</t>
   
    <t>
    	The main purpose of this document is to details the requirements of the OAM features recommended to construct a predictable communication infrastructure on top of a collection of wireless networks. 
      	In particular, we expect to provide packet loss evaluation, self-testing and automated adaptation to enable trade-offs between resilience and energy consumption.
	</t>

	<t>
    	This document describes the benefits, problems, and trade-offs for using OAM in wireless networks to provide availability and predictability.
	</t>
    
   <t>
      In this document, the term OAM will be used according to its definition specified in <xref target="RFC6291"/>.
      We expect to implement an OAM framework in RAW networks to maintain a real-time view of the network infrastructure, and its ability to respect the Service Level Agreements (delay, reliability) assigned to each data flow.
   </t>
 
 
	<section title="Terminology">
    	<t>
        	<list style="symbols">
            	<t>
               		OAM entity: a data flow to be controled;
            	</t>
            
            	<t>
               		OAM end-devices: the source or destination of a data flow;
            	</t>
         
         		<t>
            		defect: a temporary change in the network characteristics (e.g. link quality degradation because of temporary external interference, a mobile obstacle)
         		</t>
         
         		<t>
            	fault: a definite change which may affect the network performance, e.g. a node runs out of energy,
            	</t>
         	</list>
    	</t>
	</section>
 
</section>



<section title="OAM to provision appropriately the resources">
   <t>
      RAW networks expect to make the communications predictable on top of a
      wireless network infrastructure. Most critical applications will define
      a Service Level Agreeemnt to respect for the data flows it generates.
      Thus, the wireless networks have to be dimensionned to respect these SLAs.
      </t>
   
   <t>
      To respect strict guarantees, RAW relies on a PCE which has to schedule
      the transmissions in the different wireless networks. Thus, resources
      have to be provisionned to handle any defect. OAM represents the core of
      the overprovisonning process, and maintains the network operational by
      updating the schedule dynmically.
      </t>
   
   <t>
      Fault-tolerance also assumes that multiple path have to be provisionned
      so that an end-to-end circuit keeps on existing whatever the conditions.
      OAM is in charge of controling the replication/ process
      </t>
   
   
   <t>
      To be energy-efficient, reserving some dedicated out-of-band resources
      for OAM seems ireealistic, and only in-band solutions are considered here.
      </t>
   
   
</section>


<section title="Operation">
   <t>
      RAW expects to operate fault-tolerant networks. Thus, we need mechanisms
      able to detect faults, before they impact the network performance.
      </t>
   
  <t>
     We make a distinction between the two following complementary mechanisms:
     
     <list style="symbols">
        <t>
           Detection: the network detects that a fault occured, i.e. the network
           has deviated from its expected behavior. While the network must report
           an alarm, the cause may not be identified precisely. For instance, the
           end-to-end reliability has decreased significantly, or a buffer overflow
           occurs;
           </t>
        <t>
           Identification: the network has isolated and identified the cause
           of the fault. For instance, the quality of a specific link has decreased,
           requiring more retransmissions, or the level of external interference
           has locally increased.
           </t>
      </list>
     </t>
  
  
   <t>
      These two-steps identification is required since RAW expects to rely on wireless
      networks. Thus, we have to minimize the amount of statistics / measurements to
      exchange:
      <list style="symbols">
         <t>
            energy efficiency: low-power devices have to limit the volume of monitoring
            information since every bit consumes energy.
         </t>
         <t>
            bandwidth: wireless networks exhibit a bandwidth significantly lower than
            wired, best-effort networks.
         </t>
      </list>
      
      Thus, localized and centralized mechanisms have to be combined together, and
      additionnal control packets have to be triggered only after a fault detection.
         
      </t>
   
   </section>



<section title="Administration">
   
   <t>
      To take proper decisions, the network has to expose a collection of metrics, including:
      
      <list style="symbols">
         <t>
            Packet losses: the time-window average and maximum values of the
            number of packet losses has to be measured. Many critical applications
            stop to work if a few consecutive packets are dropped;
            </t>
         
         <t>
            Received Signal Strength Indicator (RSSI) is a very common metric
            in wireless to denote the link quality. The radio chipset is in charge
            of translating a received signal strngth into a normalized
            quality indicator;
            </t>
         
         <t>
            Delay: the time elapsed between a packet generation / enqueuing
            and its reception by the next hop;
            </t>
         
         <t>
            Buffer occupancy: the number of packets present in the buffer,
            for each the existing flows.
            </t>
         
      </list>
      </t>
   <t>
      These metrics should be collected:
      <list style="symbols">
         <t>
            per virtual circuit to measure the end-to-end performance for a given flow.
            Each of the paths has to be isolated in multipath strategies;
            </t>
         
         <t>per radio channel to measure e.g. the level of external interfence,
            and to be able to apply counter-measures (e.g. blacklisting)
            </t>
         
         <t>
            per device to detect misbehaving node, when it relays the packets of
            several flows.
            </t>
         </list>
   </t>
   
   
   <section title="Worst-case constraint">
      <t>
         RAW aims to enable real-time communications on top of an heterogeneous
         architecture. Since wireless networks are known to be lossy, RAW has
         to implement strategies to improve the reliability on top of unreliable
         links. Hybrid Automatic Repeat reQuest (ARQ) has typically to enable
         retransmissions based on the end-to-end reliability and latency
         requirements.
      </t>
      
      <t>
         To take correct decisions, the controller needs to know the distribution
         of packet losses for each flow, and for each hop of the paths. In other words,
         average end-to-end statistics are not enough. They must allow the controller
         to predict the worst-case.
         </t>
      
   </section>
   
   <section title="Energy efficiency constraint">
      <t>
         RAW targets also low-power wireless networks, where energy represents a key
         constraint. Thus, we have to cake care of the energy and bandwidth consumption.
         The following techniques aim to reduce the cost of such maintenance:
         <list>
            <t>
               piggybacking: some control information has inserted in the data packets
               if they don't fragment the packet (i.e. the MTU is not exceeded). Information
               Elements represent a standardized way to handle such information;
               </t>
            
            <t>
               flags/fields: we have to set-up flags in the packets to monitor to be able
               to monitor them accurately. A sequence number field may help to detect packet
               losses. Similarly, path inference tools such as <xref target="ipath"/> insert
               additionnal information in the headers to identify the path followed by a
               packet a posteriori.
               </t>
            </list>
         
         </t>
      
      </section>
   
   </section>
   
<!-- speak about predictive maintenance? -->

   <section title="Maintenance">
      <t>
         RAW needs to implement a self-healing and self-optimization approach.
         The network must continuously retrieve the state of the network,
         to judge about the relevance of a reconfiguration, quantifying:
         <list>
            <t> the cost of the sub-optimality: resources may not be used
               optimally (e.g. a better path exists);
            </t>
            <t>
               the reconfiguration cost: the controller needs to trigger some
               reconfigurations. For this transient period, resources may be twice
               reserved, control packets have to be transmitted.
            </t>
         </list>
         Thus, reconfiguration may only be triggered if the gain is significant.
         </t>
      
      <t>
         Since RAW expects to support real-time flows, we have to soft-reconfiguration,
         where the novel ressources are reserved before the ancient ones are released.
         Some mechanisms have to be proposed so that packets are forwarded through the
         novel track only when the resources are ready to be used, while maintaining
         the global state consistent (no packet re-ordering, duplication, etc.)
         </t>
      <t>
         In particular, RAW has to support the following modifications:
         <list>
            <t>
               patching a schedule, relocating some radion resources (radio channel,
               timeslots);
            </t>
            <t>
               a device can be reset (e.g. firmware upgrade) safely, all the flows
               being forwarded temporarly through alternative paths;
               </t>
            <t>
               a better path (delay, reliability, energy consumption) has been identified.
               </t>
         </list>

         </t>
      </section>

   

      
      
</middle>

<back>
   
   <!--
    <references title="Normative References">
    </references>
    -->
   
   <references title="Informative References">
		<?rfc include="reference.RFC.6291"?> <!-- OAM definition -->
  
      
      <reference anchor="ipath" target="https://doi.org/10.1109/TNET.2014.2371459">
         <front>
            <title>iPath: path inference in wireless sensor networks.</title>
            <author initials="Y." surname="Gao" fullname="Y.Gao"></author>
            <author initials="W." surname="Dong" fullname="W. Dong"></author>
            <author initials="C." surname="Chen" fullname="C. Chen"></author>
            <author initials="J." surname="Bu" fullname="J. Bu"></author>
            <author initials="W." surname="Wu" fullname="W. Wu"></author>
            <author initials="X." surname="Liu" fullname="X. Liu"></author>
            <date year="2016"/>
            <!--
             IEEE/ACM Transactions on Networking (TON) archive
             Volume 24 Issue 1, February 2016
             Pages 517-528
             -->
         </front>
      </reference>
      
      
   </references>
   
   
   

</back>

</rfc>
