idnits 2.17.1 draft-theoleyre-raw-oam-support-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 3, 2019) is 1633 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RAW F. Theoleyre 3 Internet-Draft CNRS 4 Intended status: Standards Track G. Papadopoulos 5 Expires: May 6, 2020 IMT Atlantique 6 November 3, 2019 8 Operations, Administration and Maintenance (OAM) features for RAW 9 draft-theoleyre-raw-oam-support-01 11 Abstract 13 The wireless medium presents significant specific challenges to 14 achieve properties similar to those of wired deterministic networks. 15 At the same time, a number of use cases cannot be solved with wires 16 and justify the extra effort of going wireless. This document 17 presents some of these use-cases. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on May 6, 2020. 36 Copyright Notice 38 Copyright (c) 2019 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Needs for OAM in RAW . . . . . . . . . . . . . . . . . . . . 3 56 3. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 3.1. Connectivity Verification . . . . . . . . . . . . . . . . 4 58 3.2. Route Tracing . . . . . . . . . . . . . . . . . . . . . . 4 59 3.3. Fault verification / detection . . . . . . . . . . . . . 4 60 3.4. Fault isolation / identification . . . . . . . . . . . . 5 61 4. Administration . . . . . . . . . . . . . . . . . . . . . . . 5 62 4.1. Worst-case metrics . . . . . . . . . . . . . . . . . . . 6 63 4.2. Energy efficiency constraint . . . . . . . . . . . . . . 6 64 5. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 5.1. Multipath . . . . . . . . . . . . . . . . . . . . . . . . 7 66 5.2. Replication / Elimination . . . . . . . . . . . . . . . . 7 67 5.3. Resource Reservation . . . . . . . . . . . . . . . . . . 7 68 5.4. Soft transition after reconfiguration . . . . . . . . . . 7 69 6. Informative References . . . . . . . . . . . . . . . . . . . 8 70 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 72 1. Introduction 74 Reliable and Available Wireless (RAW) is an effort that extends 75 DetNet to approach end-to-end deterministic performances over a 76 network that includes scheduled wireless segments. The wireless and 77 wired media are fundamentally different at the physical level. 78 Enabling thus reliable and available wireless communications is even 79 more challenging than it is in wired IP networks, due to the numerous 80 causes of loss in transmission that add up to the congestion losses 81 and the delays caused by overbooked shared resources. To provide 82 quality of service along a multihop path that is composed of wired 83 and wireless hops, additional methods needs to be considered to 84 leverage the potential lossy wireless communication. 86 Traceability belongs to Operations, Administration, and Maintenance 87 (OAM) which is the toolset for fault detection and isolation, and for 88 performance measurement. More can be found on OAM Tools in 89 [RFC7276]. 91 The main purpose of this document is to detail the requirements of 92 the OAM features recommended to construct a predictable communication 93 infrastructure on top of a collection of wireless segments. This 94 document describes the benefits, problems, and trade-offs for using 95 OAM in wireless networks to provide availability and predictability. 97 In this document, the term OAM will be used according to its 98 definition specified in [RFC6291]. We expect to implement an OAM 99 framework in RAW networks to maintain a real-time view of the network 100 infrastructure, and its ability to respect the Service Level 101 Agreements (SLA), such as delay and reliability, assigned to each 102 data flow. 104 1.1. Terminology 106 o OAM entity: a data flow to be controlled; 108 o OAM end-devices: the source or destination of a data flow; 110 o defect: a temporary change in the network characteristics (e.g. 111 link quality degradation because of temporary external 112 interference, a mobile obstacle) 114 o fault: a definite change which may affect the network performance, 115 e.g. a node runs out of energy, 117 2. Needs for OAM in RAW 119 RAW networks expect to make the communications reliable and 120 predictable on top of a wireless network infrastructure. Most 121 critical applications will define a SLA to respect for the data flows 122 it generates. RAW considers network plane protocol elements such as 123 OAM to improve the RAW operation at the service and at the forwarding 124 sub-layers. 126 To respect strict guarantees, RAW relies on a Path Computation 127 Element (PCE) which will be responsible to schedule the transmissions 128 in the deployed network. Thus, resources have to be provisioned a 129 priori to handle any defect. OAM represents the core of the over 130 provisioning process, and maintains the network operational by 131 updating the schedule dynamically. 133 Fault-tolerance also assumes that multiple path have to be 134 provisioned so that an end-to-end circuit keeps on existing whatever 135 the conditions. OAM is in charge of controlling the replication/ 136 elimination processes. 138 To be energy-efficient, reserving some dedicated out-of-band 139 resources for OAM seems idealistic, and only in-band solutions are 140 considered here. 142 RAW supports both proactive and on-demand troubleshooting. 144 3. Operation 146 OAM features will enable RAW with robust operation both for 147 forwarding and routing purposes. 149 3.1. Connectivity Verification 151 We need to verify that two endpoints are connected with each other. 152 Since we reserve resources along the path independently for each 153 flow, we must be able to verify that the path exists for a given flow 154 label. 156 The control and data packets may not follow the same path, and the 157 connectivity verification has to be triggered in-band without 158 impacting the data traffic. In particular, the control plane may 159 work while the data plane may be broken. 161 The ping packets must be labeled in the same way as the data packets 162 of the flow to monitor. 164 3.2. Route Tracing 166 Ping and traceroute are two very common tools for diagnostic. They 167 help to identify the list of routers in the route. However, to be 168 predictable, resources are reserved per flow in RAW. Thus, we need 169 to define route tracing tools able to track the route for a specific 170 flow. 172 Because the network has to be fault-tolerant, multipath can be 173 considered, with multiple Maintenance Intermediate Endpoints for each 174 hop in the path. Thus, all the possible paths between two 175 maintenance endpoints should be retrieved. 177 3.3. Fault verification / detection 179 RAW expects to operate fault-tolerant networks. Thus, we need 180 mechanisms able to detect faults, before they impact the network 181 performance. 183 The network has to detect when a fault occurred, i.e. the network has 184 deviated from its expected behavior. While the network must report 185 an alarm, the cause may not be identified precisely. For instance, 186 the end-to-end reliability has decreased significantly, or a buffer 187 overflow occurs. 189 We have to minimize the amount of statistics / measurements to 190 exchange: 192 o energy efficiency: low-power devices have to limit the volume of 193 monitoring information since every bit consumes energy. 195 o bandwidth: wireless networks exhibit a bandwidth significantly 196 lower than wired, best-effort networks. 198 o per-packet cost: is is often more expensive to send several 199 packets instead of combining them in a single link-layer frame. 201 Thus, localized and centralized mechanisms have to be combined 202 together, and additional control packets have to be triggered only 203 after a fault detection. 205 3.4. Fault isolation / identification 207 The network has isolated and identified the cause of the fault. For 208 instance, the quality of a specific link has decreased, requiring 209 more retransmissions, or the level of external interference has 210 locally increased. 212 4. Administration 214 To take proper decisions, the network has to expose a collection of 215 metrics, including: 217 o Packet losses: the time-window average and maximum values of the 218 number of packet losses has to be measured. Many critical 219 applications stop to work if a few consecutive packets are 220 dropped; 222 o Received Signal Strength Indicator (RSSI) is a very common metric 223 in wireless to denote the link quality. The radio chipset is in 224 charge of translating a received signal strength into a normalized 225 quality indicator; 227 o Delay: the time elapsed between a packet generation / enqueuing 228 and its reception by the next hop; 230 o Buffer occupancy: the number of packets present in the buffer, for 231 each of the existing flows. 233 These metrics should be collected: 235 o per virtual circuit to measure the end-to-end performance for a 236 given flow. Each of the paths has to be isolated in multipath 237 strategies; 239 o per radio channel to measure e.g. the level of external 240 interference, and to be able to apply counter-measures (e.g. 241 blacklisting) 243 o per device to detect misbehaving node, when it relays the packets 244 of several flows. 246 4.1. Worst-case metrics 248 RAW aims to enable real-time communications on top of an 249 heterogeneous architecture. Since wireless networks are known to be 250 lossy, RAW has to implement strategies to improve the reliability on 251 top of unreliable links. Hybrid Automatic Repeat reQuest (ARQ) has 252 typically to enable retransmissions based on the end-to-end 253 reliability and latency requirements. 255 To take correct decisions, the controller needs to know the 256 distribution of packet losses for each flow, and for each hop of the 257 paths. In other words, average end-to-end statistics are not enough. 258 They must allow the controller to predict the worst-case. 260 4.2. Energy efficiency constraint 262 RAW targets also low-power wireless networks, where energy represents 263 a key constraint. Thus, we have to cake care of the energy and 264 bandwidth consumption. The following techniques aim to reduce the 265 cost of such maintenance: 267 piggybacking: some control information are inserted in the data 268 packets if they do not fragment the packet (i.e. the MTU is not 269 exceeded). Information Elements represent a standardized way to 270 handle such information; 272 flags/fields: we have to set-up flags in the packets to monitor to 273 be able to monitor the forwarding process accurately. A sequence 274 number field may help to detect packet losses. Similarly, path 275 inference tools such as [ipath] insert additional information in 276 the headers to identify the path followed by a packet a 277 posteriori. 279 5. Maintenance 281 RAW needs to implement a self-healing and self-optimization approach. 282 The network must continuously retrieve the state of the network, to 283 judge about the relevance of a reconfiguration, quantifying: 285 the cost of the sub-optimality: resources may not be used 286 optimally (e.g. a better path exists); 287 the reconfiguration cost: the controller needs to trigger some 288 reconfigurations. For this transient period, resources may be 289 twice reserved, and control packets have to be transmitted. 291 Thus, reconfiguration may only be triggered if the gain is 292 significant. 294 5.1. Multipath 296 To be fault-tolerant, several paths can be reserved between two 297 maintenance endpoints. They must be node-disjoint, so that a path 298 can be available at any time. 300 5.2. Replication / Elimination 302 When multiple paths are reserved between two maintenance endpoints, 303 they may decide to replicate the packets to introduce redundancy, and 304 thus to alleviate transmission errors and collisions. For instance, 305 in Figure 1, the source node S is transmitting the packet to both 306 parents, nodes A and B. Each maintenance endpoint will decide to 307 trigger the replication / elimination process when a set of metrics 308 passes through a threshold value. 310 ===> (A) => (C) => (E) === 311 // \\// \\// \\ 312 source (S) //\\ //\\ (R) (root) 313 \\ // \\ // \\ // 314 ===> (B) => (D) => (F) === 316 Figure 1: Packet Replication: S transmits twice the same data packet, 317 to its DP (A) and to its AP (B). 319 5.3. Resource Reservation 321 Because the QoS criteria associated to a path may degrade, the 322 network has to provision additional resources along the path. We 323 need to provide mechanisms to patch a schedule (changing the channel 324 offset, allocating more timeslots, changing the path, etc.). 326 5.4. Soft transition after reconfiguration 328 Since RAW expects to support real-time flows, we have to support 329 soft-reconfiguration, where the novel ressources are reserved before 330 the ancient ones are released. Some mechanisms have to be proposed 331 so that packets are forwarded through the novel track only when the 332 resources are ready to be used, while maintaining the global state 333 consistent (no packet re-ordering, duplication, etc.) 335 6. Informative References 337 [ipath] Gao, Y., Dong, W., Chen, C., Bu, J., Wu, W., and X. Liu, 338 "iPath: path inference in wireless sensor networks.", 339 2016, . 341 [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, 342 D., and S. Mansfield, "Guidelines for the Use of the "OAM" 343 Acronym in the IETF", BCP 161, RFC 6291, 344 DOI 10.17487/RFC6291, June 2011, 345 . 347 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 348 Weingarten, "An Overview of Operations, Administration, 349 and Maintenance (OAM) Tools", RFC 7276, 350 DOI 10.17487/RFC7276, June 2014, 351 . 353 Authors' Addresses 355 Fabrice Theoleyre 356 CNRS 357 Building B 358 300 boulevard Sebastien Brant - CS 10413 359 Illkirch - Strasbourg 67400 360 FRANCE 362 Phone: +33 368 85 45 33 363 Email: theoleyre@unistra.fr 364 URI: http://www.theoleyre.eu 366 Georgios Z. Papadopoulos 367 IMT Atlantique 368 Office B00 - 102A 369 2 Rue de la Chataigneraie 370 Cesson-Sevigne - Rennes 35510 371 FRANCE 373 Phone: +33 299 12 70 04 374 Email: georgios.papadopoulos@imt-atlantique.fr