idnits 2.17.1 draft-ietf-raw-oam-support-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (5 March 2022) is 754 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-11) exists of draft-ietf-detnet-oam-framework-05 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RAW F. Theoleyre 3 Internet-Draft CNRS 4 Intended status: Informational G.Z. Papadopoulos 5 Expires: 6 September 2022 IMT Atlantique 6 G. Mirsky 7 Ericsson 8 CJ. Bernardos 9 UC3M 10 5 March 2022 12 Operations, Administration and Maintenance (OAM) features for RAW 13 draft-ietf-raw-oam-support-04 15 Abstract 17 Some critical applications may use a wireless infrastructure. 18 However, wireless networks exhibit a bandwidth of several orders of 19 magnitude lower than wired networks. Besides, wireless transmissions 20 are lossy by nature; the probability that a packet cannot be decoded 21 correctly by the receiver may be quite high. In these conditions, 22 providing high reliability and a low delay is challenging. This 23 document lists the requirements of the Operation, Administration, and 24 Maintenance (OAM) features are recommended to construct a predictable 25 communication infrastructure on top of a collection of wireless 26 segments. This document describes the benefits, problems, and trade- 27 offs for using OAM in wireless networks to achieve Service Level 28 Objectives (SLO). 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on 6 September 2022. 47 Copyright Notice 49 Copyright (c) 2022 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 54 license-info) in effect on the date of publication of this document. 55 Please review these documents carefully, as they describe your rights 56 and restrictions with respect to this document. Code Components 57 extracted from this document must include Revised BSD License text as 58 described in Section 4.e of the Trust Legal Provisions and are 59 provided without warranty as described in the Revised BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 65 1.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 5 66 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 6 67 2. Role of OAM in RAW . . . . . . . . . . . . . . . . . . . . . 6 68 2.1. Link concept and quality . . . . . . . . . . . . . . . . 7 69 2.2. Broadcast Transmissions . . . . . . . . . . . . . . . . . 8 70 2.3. Complex Layer 2 Forwarding . . . . . . . . . . . . . . . 8 71 2.4. End-to-end delay . . . . . . . . . . . . . . . . . . . . 8 72 3. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 8 73 3.1. Information Collection . . . . . . . . . . . . . . . . . 9 74 3.2. Continuity Check . . . . . . . . . . . . . . . . . . . . 9 75 3.3. Connectivity Verification . . . . . . . . . . . . . . . . 9 76 3.4. Route Tracing . . . . . . . . . . . . . . . . . . . . . . 9 77 3.5. Fault Verification/detection . . . . . . . . . . . . . . 10 78 3.6. Fault Isolation/identification . . . . . . . . . . . . . 10 79 4. Administration . . . . . . . . . . . . . . . . . . . . . . . 10 80 4.1. Worst-case metrics . . . . . . . . . . . . . . . . . . . 11 81 4.2. Efficient measurement retrieval (Passive OAM) . . . . . . 11 82 4.3. Reporting OAM packets to the source (Active OAM) . . . . 12 83 5. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 13 84 5.1. Soft transition after reconfiguration . . . . . . . . . . 13 85 5.2. Predictive maintenance . . . . . . . . . . . . . . . . . 13 86 6. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 14 87 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 88 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 89 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 90 10. Informative References . . . . . . . . . . . . . . . . . . . 14 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 93 1. Introduction 95 Reliable and Available Wireless (RAW) is an effort that extends 96 DetNet to approach end-to-end deterministic performances over a 97 network that includes scheduled wireless segments. In wired 98 networks, many approaches try to enable Quality of Service (QoS) by 99 implementing traffic differentiation so that routers handle each type 100 of packets differently. However, this differentiated treatment was 101 expensive for most applications. 103 Deterministic Networking (DetNet) [RFC8655] has proposed to provide a 104 bounded end-to-end latency on top of the network infrastructure, 105 comprising both Layer 2 bridged and Layer 3 routed segments. Their 106 work encompasses the data plane, OAM, time synchronization, 107 management, control, and security aspects. 109 However, wireless networks create specific challenges. First of all, 110 radio bandwidth is significantly lower than in wired networks. In 111 these conditions, the volume of signaling messages has to be very 112 limited. Even worse, wireless links are lossy: a Layer 2 113 transmission may or may not be decoded correctly by the receiver, 114 depending on a broad set of parameters. Thus, providing high 115 reliability through wireless segments is particularly challenging. 117 Wired networks rely on the concept of _links_. All the devices 118 attached to a link receive any transmission. The concept of a link 119 in wireless networks is somewhat different from what many are used to 120 in wireline networks. A receiver may or may not receive a 121 transmission, depending on the presence of a colliding transmission, 122 the radio channel's quality, and the external interference. Besides, 123 a wireless transmission is broadcast by nature: any _neighboring_ 124 device may be able to decode it. This document includes detailed 125 information on the implications for the OAM features. 127 Last but not least, radio links present volatile characteristics. If 128 the wireless networks use an unlicensed band, packet losses are not 129 anymore temporally and spatially independent. Typically, links may 130 exhibit a very bursty characteristic, where several consecutive 131 packets may be dropped because of, e.g., temporary external 132 interference. Thus, providing availability and reliability on top of 133 the wireless infrastructure requires specific Layer 3 mechanisms to 134 counteract these bursty losses. 136 Operations, Administration, and Maintenance (OAM) Tools are of 137 primary importance for IP networks [RFC7276]. They define a toolset 138 for fault detection, isolation, and performance measurement. 140 The primary purpose of this document is to detail the specific 141 requirements of the OAM features recommended to construct a 142 predictable communication infrastructure on top of a collection of 143 wireless segments. This document describes the benefits, problems, 144 and trade-offs for using OAM in wireless networks to provide 145 availability and predictability. 147 1.1. Terminology 149 In this document, the term OAM will be used according to its 150 definition specified in [RFC6291]. We expect to implement an OAM 151 framework in RAW networks to maintain a real-time view of the network 152 infrastructure, and its ability to respect the Service Level 153 Objectives (SLO), such as delay and reliability, assigned to each 154 data flow. 156 We re-use here the same terminology as 157 [I-D.ietf-detnet-oam-framework]: 159 * OAM entity: a data flow to be monitored for defects and/or its 160 performance metrics measured.; 162 * Test End Point (TEP): OAM devices crossed when entering/exiting 163 the network. In RAW, it corresponds mostly to the source or 164 destination of a data flow. OAM message can be exchanged between 165 two TEPs; 167 * Monitoring endPoint (MonEP): an OAM system along the flow; a MonEP 168 MAY respond to an OAM message generated by the TEP; 170 * control/management/data plane: the control and management planes 171 are used to configure and control the network (long-term). On a 172 per-node basis, the data plane applies rules and policies for each 173 packet. For example, selecting the time-frequency block or the 174 next hop on a packet-by-packet basis. Relative to a data flow, 175 the control and/or management plane can be out-of-band; 177 * Active measurement methods (as defined in [RFC7799]) modify a 178 normal data flow by inserting novel fields, injecting specially 179 constructed test packets [RFC2544]). It is critical for the 180 quality of information obtained using an active method that 181 generated test packets are in-band with the monitored data flow. 182 In other words, a test packet is required to cross the same 183 network nodes and links and receive the same Quality of Service 184 (QoS) treatment as a data packet. Active methods may implement 185 one of these two strategies: 187 - In-band: control information follows the same path as the data 188 packets. In other words, a failure in the data plane may 189 prevent the control information from reaching the destination 190 (e.g., end-device or controller). 192 - out-of-band: control information is sent separately from the 193 data packets. Thus, the behavior of control vs. data packets 194 may differ; 196 * Passive measurement methods [RFC7799] infer information by 197 observing unmodified existing flows. 199 We also adopt the following terminology, which is particularly 200 relevant for RAW segments. 202 * piggybacking vs. dedicated control packets: control information 203 may be encapsulated in specific (dedicated) control packets. 204 Alternatively, it may be piggybacked in existing data packets, 205 when the MTU is larger than the actual packet length. 206 Piggybacking makes specifically sense in wireless networks, as the 207 cost (bandwidth and energy) is not linear with the packet size. 209 * router-over vs. mesh under: a control packet is either forwarded 210 directly to the layer-3 next hop (mesh under) or handled hop-by- 211 hop by each router. While the latter option consumes more 212 resources, it allows collecting additional intermediary 213 information, particularly relevant in wireless networks. 215 * Defect: a temporary change in the network (e.g., a radio link 216 which is broken due to a mobile obstacle); 218 * Fault: a definite change which may affect the network performance, 219 e.g., a node runs out of energy. 221 * End-to-end delay: the time between the packet generation and its 222 reception by the destination. 224 1.2. Acronyms 226 OAM Operations, Administration, and Maintenance 228 DetNet Deterministic Networking 230 PSE Path Selection Engine [I-D.pthubert-raw-architecture] 232 QoS Quality of Service 234 RAW Reliable and Available Wireless 235 SLO Service Level Objective 237 SNMP Simple Network Management Protocol 239 SDN Software-Defined Network 241 1.3. Requirements Language 243 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 244 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 245 "OPTIONAL" in this document are to be interpreted as described in BCP 246 14 [RFC2119] [RFC8174] when, and only when, they appear in all 247 capitals, as shown here. 249 2. Role of OAM in RAW 251 RAW networks expect to make the communications reliable and 252 predictable over a wireless network infrastructure. Most critical 253 applications will define an SLO required for the data flows it 254 generates. RAW considers network plane protocol elements such as OAM 255 to improve the RAW operation at the service and the forwarding sub- 256 layers. 258 To respect strict guarantees, RAW relies on the Path Selection Engine 259 (PSE) (as defined in [I-D.pthubert-raw-architecture] to monitor and 260 maintain the L3 network. An L2 scheduler may be used to allocate 261 transmission opportunities, based on the radio link characteristics, 262 SLO of the flows, the number of packets to forward. The PSE exploits 263 the L2 resources reserved by the scheduler and organizes the L3 paths 264 to introduce redundancy, fault tolerance and create backup paths. 265 OAM represents the core of the pre-provisioning process by 266 supervising the network. It maintains a global view of the network 267 resources to detect defects, faults, over-provisioning, anomalies. 269 Fault tolerance also assumes that multiple paths must be provisioned 270 so that an end-to-end circuit remains operational regardless of the 271 conditions. The Packet Replication and Elimination Function 272 ([I-D.thubert-bier-replication-elimination]) on a node is typically 273 controlled by the PSE. OAM mechanisms can be used to monitor that 274 PREOF is working correctly on a node and within the domain. 276 To be energy-efficient, out-of-band OAM SHOULD only be used to report 277 aggregated statistics (e.g., counters, histograms) from the nodes 278 using, e.g., SNMP or Netconf/Restconf using YANG-based data models. 279 The out-of-band OAM flow MAY use a dedicated control and management 280 channel, dedicated for this purpose. 282 RAW supports both proactive and on-demand troubleshooting. 283 Proactively, it is necessary to detect anomalies, report defects, or 284 reduce over-provisioning if it is not required. However, on-demand 285 may also be required to identify the cause of a specific defect. 286 Indeed, some specific faults may only be detected with a global, 287 detailed view of the network, which is too expensive to acquire in 288 the normal operating mode. 290 The specific characteristics of RAW are discussed below. 292 2.1. Link concept and quality 294 In wireless networks, a _link_ does not exist physically. A device 295 has a set of *neighbors* that correspond to all the devices that have 296 a non-null probability of receiving its packets correctly. We make a 297 distinction between: 299 * point-to-point (p2p) link with one transmitter and one receiver. 300 These links are used to transmit unicast packets. 302 * point-to-multipoint (p2mp) link associates one transmitter and a 303 collection of receivers. For instance, broadcast packets assume 304 the existence of p2mp links to avoid duplicating a broadcast 305 packet to reach each possible radio neighbor. 307 In scheduled radio networks, p2mp and p2p links are commonly not 308 scheduled simultaneously to save energy and/or to reduce the number 309 of collisions. More precisely, only one part of the neighbors may 310 wake up at a given instant. 312 Anycast is used in p2mp links to improve the reliability. A 313 collection of receivers are scheduled to wake up simultaneously, so 314 that the transmission fails only if none of the receivers can decode 315 the packet. 317 Each wireless link is associated with a link quality, often measured 318 as the Packet Delivery Ratio (PDR), i.e., the probability that the 319 receiver can decode the packet correctly. It is worth noting that 320 this link quality depends on many criteria, such as the level of 321 external interference, the presence of concurrent transmissions, or 322 the radio channel state. This link quality is even time-variant. 323 For p2mp links, consequently, we have a collection of PDR (one value 324 per receiver). Other more sophisticated, aggregated metrics exist 325 for these p2mp links, such as [anycast-property] 327 2.2. Broadcast Transmissions 329 The unicast transmission is delivered exclusively to the destination 330 in modern switching networks. Wireless networks are much closer to 331 the traditional *shared access* networks. Practically, unicast and 332 broadcast frames are handled similarly at the physical layer. The 333 link layer is just in charge of filtering the frames to discard 334 irrelevant receptions (e.g., different unicast MAC addresses). 336 However, contrary to wired networks, we cannot ensure that a packet 337 is received by *all* the devices attached to the Layer 2 segment. It 338 depends on the radio channel state between the transmitter(s) and the 339 receiver(s). In particular, concurrent transmissions may be possible 340 or not, depending on the radio conditions (e.g., do the different 341 transmitters use a different radio channel or are they sufficiently 342 spatially separated?) 344 2.3. Complex Layer 2 Forwarding 346 Multiple neighbors may receive a transmission. Thus, anycast Layer 2 347 forwarding helps to maximize reliability by assigning multiple 348 receivers to a single transmission. That way, the packet is lost 349 only if *none* of the receivers decode it. Practically, it has been 350 proven that different neighbors may exhibit very different radio 351 conditions, and that reception independence may hold for some of them 352 [anycast-property]. 354 2.4. End-to-end delay 356 In a wireless network, additional transmissions opportunities are 357 provisioned to accommodate packet losses. Thus, the end-to-end delay 358 consists of: 360 * Transmission delay, which is fixed and depends mainly on the data 361 rate, and the presence or absence of an acknowledgement. 363 * Residence time, corresponds to the buffering delay and depends on 364 the schedule. To account for retransmissions, the residence time 365 is equal to the difference between the time of last reception from 366 the previous hop (among all the retransmissions) and the time of 367 emission of the last retransmission. 369 3. Operation 371 OAM features will enable RAW with robust operation both for 372 forwarding and routing purposes. 374 3.1. Information Collection 376 The model for exchanging information should be the same as for a 377 DetNet network to ensure inter-operability. YANG may typically 378 fulfill this objective. 380 However, RAW networks imply specific constraints (e.g., low 381 bandwidth, packet losses, cost of medium access) that may require to 382 minimize the volume of information to collect. Thus, we discuss in 383 Section 4.2 different ways to collect information, i.e., transfer the 384 OAM information physically from the emitter to the receiver. This 385 corresponds to passive OAM as defined in [RFC7799] 387 3.2. Continuity Check 389 Similarly to DetNet, we need to verify that the source and the 390 destination are connected (at least one valid path exists) 392 3.3. Connectivity Verification 394 As in DetNet, we have to verify the absence of misconnection. We 395 focus here on the RAW specificities. 397 Because of radio transmissions' broadcast nature, several receivers 398 may be active at the same time to enable anycast Layer 2 forwarding. 399 Thus, the connectivity verification must test any combination. We 400 also consider priority-based mechanisms for anycast forwarding, i.e., 401 all the receivers have different probabilities of forwarding a 402 packet. To verify a delay SLO for a given flow, we must also 403 consider all the possible combinations, leading to a probability 404 distribution function for end-to-end transmissions. If this 405 verification is implemented naively, the number of combinations to 406 test may be exponential and too costly for wireless networks with low 407 bandwidth. 409 3.4. Route Tracing 411 Wireless networks are broadcast by nature: a radio transmission can 412 be decoded by any radio neighbor. In multihop wireless networks, 413 several paths exist between two endpoints. In hub networks, a device 414 may be covered by several Access Points. We should choose the most 415 efficient path or AP, concerning specifically the reliability, and 416 the delay. 418 Thus, multipath routing / multi-attachment can be viewed as making 419 the network more fault-tolerant. Even better, we can exploit the 420 broadcast nature of wireless networks: we may have multiple 421 Monitoring Endpoints (MonEP) for each of these kinds of hop. While 422 it may be reasonable in the multi-attachment case, the complexity 423 quickly increases with the path length. Indeed, each Maintenance 424 Intermediate Endpoint has several possible next hops in the 425 forwarding plane. Thus, all the possible paths between two 426 maintenance endpoints should be retrieved, which may quickly become 427 intractable if we apply a naive approach. 429 3.5. Fault Verification/detection 431 Wired networks tend to present stable performances. On the contrary, 432 wireless networks are time-variant. We must consequently make a 433 distinction between normal evolutions and malfunction. 435 3.6. Fault Isolation/identification 437 The network has isolated and identified the cause of the fault. 438 While DetNet already expects to identify malfunctions, some problems 439 are specific to wireless networks. We must consequently collect 440 metrics and implement algorithms tailored for wireless networking. 442 For instance, the decrease in the link quality may be caused by 443 several factors: external interference, obstacles, multipath fading, 444 mobility. It is fundamental to be able to discriminate the different 445 causes to make the right decision. 447 4. Administration 449 The RAW network has to expose a collection of metrics to support an 450 operator making proper decisions, including: 452 * Packet losses: the time-window average and maximum values of the 453 number of packet losses have to be measured. Many critical 454 applications stop working if a few consecutive packets are 455 dropped; 457 * Received Signal Strength Indicator (RSSI) is a very common metric 458 in wireless to denote the link quality. The radio chipset is in 459 charge of translating a received signal strength into a normalized 460 quality indicator; 462 * Delay: the time elapsed between a packet generation / enqueuing 463 and its reception by the next hop; 465 * Buffer occupancy: the number of packets present in the buffer, for 466 each of the existing flows. 468 * Battery lifetime: the expected remaining battery lifetime of the 469 device. Since many RAW devices might be battery-powered, this is 470 an important metric for an operator to make proper decisions. 472 * Mobility: if a device is known to be mobile, this might be 473 considered by an operator to take proper decisions. 475 These metrics should be collected per device, virtual circuit, and 476 path, as DetNet already does. However, in RAW, we have to deal with 477 them at a finer granularity: 479 * per radio channel to measure, e.g., the level of external 480 interference, and to be able to apply counter-measures (e.g., 481 blacklisting). 483 * per physical radio technology / interface, if a device has 484 multiple NICs. 486 * per link to detect misbehaving link (asymmetrical link, 487 fluctuating quality). 489 * per resource block: a collision in the schedule is particularly 490 challenging to identify in radio networks with spectrum reuse. In 491 particular, a collision may not be systematic (depending on the 492 radio characteristics and the traffic profile). 494 4.1. Worst-case metrics 496 RAW inherits the same requirements as DetNet: we need to know the 497 distribution of a collection of metrics. However, wireless networks 498 are known to be highly variable. Changes may be frequent, and may 499 exhibit a periodical pattern. Collecting and analyzing this amount 500 of measurements is challenging. 502 Wireless networks are known to be lossy, and RAW has to implement 503 strategies to improve reliability on top of unreliable links. 504 Reliability is typically achieved through Automatic Repeat Request 505 (ARQ), and Forward Error Correction (FEC). Since the different flows 506 don't have the same SLO, RAW must adjust the ARQ and FEC based on the 507 link and path characteristics. 509 4.2. Efficient measurement retrieval (Passive OAM) 511 We have to minimize the number of statistics / measurements to 512 exchange: 514 * energy efficiency: low-power devices have to limit the volume of 515 monitoring information since every bit consumes energy. 517 * bandwidth: wireless networks exhibit a bandwidth significantly 518 lower than wired, best-effort networks. 520 * per-packet cost: it is often more expensive to send several 521 packets instead of combining them in a single link-layer frame. 523 In conclusion, we have to take care of power and bandwidth 524 consumption. The following techniques aim to reduce the cost of such 525 maintenance: 527 * on-path collection: some control information is inserted in the 528 data packets if they do not fragment the packet (i.e., the MTU is 529 not exceeded). Information Elements represent a standardized way 530 to handle such information. IP hop by hop extension headers may 531 help to collect metrics all along the path; 533 * flags/fields: we have to set-up flags in the packets to monitor to 534 be able to monitor the forwarding process accurately. A sequence 535 number field may help to detect packet losses. Similarly, path 536 inference tools such as [ipath] insert additional information in 537 the headers to identify the path followed by a packet a 538 posteriori. 540 * hierarchical monitoring: localized and centralized mechanisms have 541 to be combined together. Typically, a local mechanism should 542 continuously monitor a set of metrics and trigger remote OAM 543 exchanges only when a fault is detected (but possibly not 544 identified). For instance, local temporary defects must not 545 trigger expensive OAM transmissions. Besides, the wireless 546 segments often represent the weakest parts of a path: the volume 547 of control information they produce has to be fixed accordingly. 549 Several passive techniques can be combined. For instance, the DetNet 550 forwarding sublayer MAY combine In-band Network Telemetry (INT) with 551 P4, iOAM and iPath to compute and report different statistics in the 552 track (e.g., number of link-layer retransmissions, link reliability). 554 4.3. Reporting OAM packets to the source (Active OAM) 556 The Test EndPoint will collect measurements from the OAM probes 557 received in the monitored track. However, the aggregated statistics 558 must then be reported to the other Test Endpoint that injected the 559 probes. Unfortunately, the monitored track MAY be unidirectional. 560 In this case, the statistics have to be reported out-of-band 561 (through, e.g., a dedicated control or management channel). 563 It is worth noting that Active OAM and Passive OAM techniques are not 564 mutually exclusive. In particular, Active OAM is useful when a 565 statistic cannot be acquired accurately passively. 567 Besides, Active OAM may also use piggybacking techniques: the OAM 568 packet may be piggybacked in a frame if the MTU is sufficient. 569 Indeed, increasing the number of transmissions in radio netwrks may 570 impact very negatively the performance of radio networks, 571 particularly for scheduled access, with fixed timeslot durations. 572 Thus, OAM packets may be buffered until another frame has sufficient 573 space, and has to be transmitted to the same neighbor. In 574 conclusion, active OAM packets may be out-of-band or in-band. 576 5. Maintenance 578 Maintenance needs to facilitate the maintenance (repairs and 579 upgrades). In wireless networks, repairs are expected to occur much 580 more frequently, since the link quality may be highly time-variant. 581 Thus, maintenance represents a key feature for RAW. 583 5.1. Soft transition after reconfiguration 585 Because of the wireless medium, the link quality may fluctuate, and 586 the network needs to reconfigure itself continuously. During this 587 transient state, flows may begin to be gradually re-forwarded, 588 consuming resources in different parts of the network. OAM has to 589 make a distinction between a metric that changed because of a legal 590 network change (e.g., flow redirection) and an unexpected event 591 (e.g., a fault). 593 5.2. Predictive maintenance 595 RAW needs to implement self-optimization features. While the network 596 is configured to be fault-tolerant, a reconfiguration may be required 597 to keep on respecting long-term objectives. Obviously, the network 598 keeps on respecting the SLO after a node's crash, but a 599 reconfiguration is required to handle future faults. In other words, 600 the reconfiguration delay MUST be strictly smaller than the inter- 601 fault time. 603 The network must continuously retrieve the state of the network, to 604 judge about the relevance of a reconfiguration, quantifying: 606 * the cost of the sub-optimality: resources may not be used 607 optimally (e.g., a better path exists); 609 * the reconfiguration cost: the controller needs to trigger some 610 reconfigurations. For this transient period, resources may be 611 twice reserved, and control packets have to be transmitted. 613 Thus, reconfiguration may only be triggered if the gain is 614 significant. 616 6. Requirements 618 This section lists requirements for OAM in a RAW domain: 620 1. Each Test and Monitoring Endpoint device MUST expose a list of 621 available metrics per track. It MUST at least provide the end- 622 to-end Packet Delivery Ratio, end-to-end latency, and Maximum 623 Consecutive Failures (MCF). 625 2. PREOF functions MUST guarantee order preservation in the 626 (sub)track. 628 3. OAM nodes MUST provide aggregated statistics to reduce the volume 629 of traffic for measurements. They MAY send a compressed 630 distribution of measurements, or MIN / MAX values over a time 631 interval. 633 4. Monitoring Endpoints SHOULD support route tracing with passive 634 OAM techniques. 636 7. IANA Considerations 638 This document has no actionable requirements for IANA. This section 639 can be removed before the publication. 641 8. Security Considerations 643 This section will be expanded in future versions of the draft. 645 9. Acknowledgments 647 TBD 649 10. Informative References 651 [anycast-property] 652 Teles Hermeto, R., Gallais, A., and F. Theoleyre, "Is 653 Link-Layer Anycast Scheduling Relevant for IEEE 654 802.15.4-TSCH Networks?", 2019, 655 . 657 [I-D.ietf-detnet-oam-framework] 658 Mirsky, G., Theoleyre, F., Papadopoulos, G. Z., Bernardos, 659 C. J., Varga, B., and J. Farkas, "Framework of Operations, 660 Administration and Maintenance (OAM) for Deterministic 661 Networking (DetNet)", Work in Progress, Internet-Draft, 662 draft-ietf-detnet-oam-framework-05, 14 October 2021, 663 . 666 [I-D.pthubert-raw-architecture] 667 Thubert, P., Papadopoulos, G. Z., and L. Berger, "Reliable 668 and Available Wireless Architecture/Framework", Work in 669 Progress, Internet-Draft, draft-pthubert-raw-architecture- 670 09, 7 July 2021, . 673 [I-D.thubert-bier-replication-elimination] 674 Thubert, P., Eckert, T., Brodard, Z., and H. Jiang, "BIER- 675 TE extensions for Packet Replication and Elimination 676 Function (PREF) and OAM", Work in Progress, Internet- 677 Draft, draft-thubert-bier-replication-elimination-03, 3 678 March 2018, . 681 [ipath] Gao, Y., Dong, W., Chen, C., Bu, J., Wu, W., and X. Liu, 682 "iPath: path inference in wireless sensor networks.", 683 2016, . 685 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 686 Requirement Levels", BCP 14, RFC 2119, 687 DOI 10.17487/RFC2119, March 1997, 688 . 690 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 691 Network Interconnect Devices", RFC 2544, 692 DOI 10.17487/RFC2544, March 1999, 693 . 695 [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, 696 D., and S. Mansfield, "Guidelines for the Use of the "OAM" 697 Acronym in the IETF", BCP 161, RFC 6291, 698 DOI 10.17487/RFC6291, June 2011, 699 . 701 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 702 Weingarten, "An Overview of Operations, Administration, 703 and Maintenance (OAM) Tools", RFC 7276, 704 DOI 10.17487/RFC7276, June 2014, 705 . 707 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 708 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 709 May 2016, . 711 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 712 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 713 May 2017, . 715 [RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas, 716 "Deterministic Networking Architecture", RFC 8655, 717 DOI 10.17487/RFC8655, October 2019, 718 . 720 Authors' Addresses 722 Fabrice Theoleyre 723 CNRS 724 Building B 725 300 boulevard Sebastien Brant - CS 10413 726 67400 Illkirch - Strasbourg 727 France 728 Phone: +33 368 85 45 33 729 Email: fabrice.theoleyre@cnrs.fr 730 URI: http://www.theoleyre.eu 732 Georgios Z. Papadopoulos 733 IMT Atlantique 734 Office B00 - 102A 735 2 Rue de la Chataigneraie 736 35510 Cesson-Sevigne - Rennes 737 France 738 Phone: +33 299 12 70 04 739 Email: georgios.papadopoulos@imt-atlantique.fr 741 Greg Mirsky 742 Ericsson 743 Email: gregimirsky@gmail.com 744 Carlos J. Bernardos 745 Universidad Carlos III de Madrid 746 Av. Universidad, 30 747 28911 Leganes, Madrid 748 Spain 749 Phone: +34 91624 6236 750 Email: cjbc@it.uc3m.es 751 URI: http://www.it.uc3m.es/cjbc/