idnits 2.17.1 draft-theoleyre-raw-oam-support-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 5, 2019) is 1756 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RAW F. Theoleyre 3 Internet-Draft CNRS 4 Intended status: Standards Track G. Papadopoulos 5 Expires: January 6, 2020 IMT Atlantique 6 July 5, 2019 8 Operations, Administration and Maintenance (OAM) features for RAW 9 draft-theoleyre-raw-oam-support-00 11 Abstract 13 The wireless medium presents significant specific challenges to 14 achieve properties similar to those of wired deterministic networks. 15 At the same time, a number of use cases cannot be solved with wires 16 and justify the extra effort of going wireless. This document 17 presents some of these use-cases. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on January 6, 2020. 36 Copyright Notice 38 Copyright (c) 2019 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. OAM to provision appropriately the resources . . . . . . . . 3 56 3. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 4. Administration . . . . . . . . . . . . . . . . . . . . . . . 4 58 4.1. Worst-case constraint . . . . . . . . . . . . . . . . . . 5 59 4.2. Energy efficiency constraint . . . . . . . . . . . . . . 5 60 5. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 6. Informative References . . . . . . . . . . . . . . . . . . . 6 62 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 64 1. Introduction 66 RAW (Reliable and Available Wireless) is an effort to provide 67 deterministic behavior over a network that includes a wireless 68 physical layer. Enabling the wireless communication reliable and 69 available is even more challenging than it is with wires, due to the 70 numerous causes of loss in transmission that add up to the congestion 71 losses and the delays caused by overbooked shared resources. To 72 provide quality of service along a multihop path that is composed of 73 wired and wireless hops, additional methods needs to be considered to 74 leverage the potential lossy wireless communication. 76 Traceability belongs to Operations, Administration, and Maintenance 77 (OAM) which is the toolset for fault detection and isolation, and for 78 performance measurement. More can be found on OAM Tools in . 80 The main purpose of this document is to details the requirements of 81 the OAM features recommended to construct a predictable communication 82 infrastructure on top of a collection of wireless networks. In 83 particular, we expect to provide packet loss evaluation, self-testing 84 and automated adaptation to enable trade-offs between resilience and 85 energy consumption. 87 This document describes the benefits, problems, and trade-offs for 88 using OAM in wireless networks to provide availability and 89 predictability. 91 In this document, the term OAM will be used according to its 92 definition specified in [RFC6291]. We expect to implement an OAM 93 framework in RAW networks to maintain a real-time view of the network 94 infrastructure, and its ability to respect the Service Level 95 Agreements (delay, reliability) assigned to each data flow. 97 1.1. Terminology 99 o OAM entity: a data flow to be controled; 101 o OAM end-devices: the source or destination of a data flow; 103 o defect: a temporary change in the network characteristics (e.g. 104 link quality degradation because of temporary external 105 interference, a mobile obstacle) 107 o fault: a definite change which may affect the network performance, 108 e.g. a node runs out of energy, 110 2. OAM to provision appropriately the resources 112 RAW networks expect to make the communications predictable on top of 113 a wireless network infrastructure. Most critical applications will 114 define a Service Level Agreeemnt to respect for the data flows it 115 generates. Thus, the wireless networks have to be dimensionned to 116 respect these SLAs. 118 To respect strict guarantees, RAW relies on a PCE which has to 119 schedule the transmissions in the different wireless networks. Thus, 120 resources have to be provisionned to handle any defect. OAM 121 represents the core of the overprovisonning process, and maintains 122 the network operational by updating the schedule dynmically. 124 Fault-tolerance also assumes that multiple path have to be 125 provisionned so that an end-to-end circuit keeps on existing whatever 126 the conditions. OAM is in charge of controling the replication/ 127 process 129 To be energy-efficient, reserving some dedicated out-of-band 130 resources for OAM seems ireealistic, and only in-band solutions are 131 considered here. 133 3. Operation 135 RAW expects to operate fault-tolerant networks. Thus, we need 136 mechanisms able to detect faults, before they impact the network 137 performance. 139 We make a distinction between the two following complementary 140 mechanisms: 142 o Detection: the network detects that a fault occured, i.e. the 143 network has deviated from its expected behavior. While the 144 network must report an alarm, the cause may not be identified 145 precisely. For instance, the end-to-end reliability has decreased 146 significantly, or a buffer overflow occurs; 148 o Identification: the network has isolated and identified the cause 149 of the fault. For instance, the quality of a specific link has 150 decreased, requiring more retransmissions, or the level of 151 external interference has locally increased. 153 These two-steps identification is required since RAW expects to rely 154 on wireless networks. Thus, we have to minimize the amount of 155 statistics / measurements to exchange: 157 o energy efficiency: low-power devices have to limit the volume of 158 monitoring information since every bit consumes energy. 160 o bandwidth: wireless networks exhibit a bandwidth significantly 161 lower than wired, best-effort networks. 163 Thus, localized and centralized mechanisms have to be combined 164 together, and additionnal control packets have to be triggered only 165 after a fault detection. 167 4. Administration 169 To take proper decisions, the network has to expose a collection of 170 metrics, including: 172 o Packet losses: the time-window average and maximum values of the 173 number of packet losses has to be measured. Many critical 174 applications stop to work if a few consecutive packets are 175 dropped; 177 o Received Signal Strength Indicator (RSSI) is a very common metric 178 in wireless to denote the link quality. The radio chipset is in 179 charge of translating a received signal strngth into a normalized 180 quality indicator; 182 o Delay: the time elapsed between a packet generation / enqueuing 183 and its reception by the next hop; 185 o Buffer occupancy: the number of packets present in the buffer, for 186 each the existing flows. 188 These metrics should be collected: 190 o per virtual circuit to measure the end-to-end performance for a 191 given flow. Each of the paths has to be isolated in multipath 192 strategies; 194 o per radio channel to measure e.g. the level of external 195 interfence, and to be able to apply counter-measures (e.g. 196 blacklisting) 198 o per device to detect misbehaving node, when it relays the packets 199 of several flows. 201 4.1. Worst-case constraint 203 RAW aims to enable real-time communications on top of an 204 heterogeneous architecture. Since wireless networks are known to be 205 lossy, RAW has to implement strategies to improve the reliability on 206 top of unreliable links. Hybrid Automatic Repeat reQuest (ARQ) has 207 typically to enable retransmissions based on the end-to-end 208 reliability and latency requirements. 210 To take correct decisions, the controller needs to know the 211 distribution of packet losses for each flow, and for each hop of the 212 paths. In other words, average end-to-end statistics are not enough. 213 They must allow the controller to predict the worst-case. 215 4.2. Energy efficiency constraint 217 RAW targets also low-power wireless networks, where energy represents 218 a key constraint. Thus, we have to cake care of the energy and 219 bandwidth consumption. The following techniques aim to reduce the 220 cost of such maintenance: 222 piggybacking: some control information has inserted in the data 223 packets if they don't fragment the packet (i.e. the MTU is not 224 exceeded). Information Elements represent a standardized way to 225 handle such information; 227 flags/fields: we have to set-up flags in the packets to monitor to 228 be able to monitor them accurately. A sequence number field may 229 help to detect packet losses. Similarly, path inference tools 230 such as [ipath] insert additionnal information in the headers to 231 identify the path followed by a packet a posteriori. 233 5. Maintenance 235 RAW needs to implement a self-healing and self-optimization approach. 236 The network must continuously retrieve the state of the network, to 237 judge about the relevance of a reconfiguration, quantifying: 239 the cost of the sub-optimality: resources may not be used 240 optimally (e.g. a better path exists); 241 the reconfiguration cost: the controller needs to trigger some 242 reconfigurations. For this transient period, resources may be 243 twice reserved, control packets have to be transmitted. 245 Thus, reconfiguration may only be triggered if the gain is 246 significant. 248 Since RAW expects to support real-time flows, we have to soft- 249 reconfiguration, where the novel ressources are reserved before the 250 ancient ones are released. Some mechanisms have to be proposed so 251 that packets are forwarded through the novel track only when the 252 resources are ready to be used, while maintaining the global state 253 consistent (no packet re-ordering, duplication, etc.) 255 In particular, RAW has to support the following modifications: 257 patching a schedule, relocating some radion resources (radio 258 channel, timeslots); 260 a device can be reset (e.g. firmware upgrade) safely, all the 261 flows being forwarded temporarly through alternative paths; 263 a better path (delay, reliability, energy consumption) has been 264 identified. 266 6. Informative References 268 [ipath] Gao, Y., Dong, W., Chen, C., Bu, J., Wu, W., and X. Liu, 269 "iPath: path inference in wireless sensor networks.", 270 2016, . 272 [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, 273 D., and S. Mansfield, "Guidelines for the Use of the "OAM" 274 Acronym in the IETF", BCP 161, RFC 6291, 275 DOI 10.17487/RFC6291, June 2011, 276 . 278 Authors' Addresses 279 Fabrice Theoleyre 280 CNRS 281 Building B 282 300 boulevard Sebastien Brant - CS 10413 283 Illkirch - Strasbourg 67400 284 FRANCE 286 Phone: +33 368 85 45 33 287 Email: theoleyre@unistra.fr 288 URI: http://www.theoleyre.eu 290 Georgios Z. Papadopoulos 291 IMT Atlantique 292 Office B00 - 102A 293 2 Rue de la Chataigneraie 294 Cesson-Sevigne - Rennes 35510 295 FRANCE 297 Phone: +33 299 12 70 04 298 Email: georgios.papadopoulos@imt-atlantique.fr