idnits 2.17.1 draft-theoleyre-detnet-oam-support-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 25, 2020) is 1269 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 DetNet F. Theoleyre 3 Internet-Draft CNRS 4 Intended status: Standards Track G. Papadopoulos 5 Expires: April 28, 2021 IMT Atlantique 6 G. Mirsky 7 ZTE Corp. 8 CJ. Bernardos 9 UC3M 10 October 25, 2020 12 Operations, Administration and Maintenance (OAM) features for DetNet 13 draft-theoleyre-detnet-oam-support-00 15 Abstract 17 Deterministic Networking (DetNet), as defined in RFC 8655, is aimed 18 to provide a bounded end-to-end latency on top of the network 19 infrastructure, comprising both Layer 2 bridged and Layer 3 routed 20 segments. This document's primary purpose is to detail the specific 21 requirements of the Operation, Administration, and Maintenance (OAM) 22 recommended to maintain a deterministic network. With the 23 implementation of the OAM framework in DetNet, an operator will have 24 a real-time view of the network infrastructure regarding the 25 network's ability to respect the Service Level Objective (SLO), such 26 as packet delay, delay variation, and packet loss ratio, assigned to 27 each data flow. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on April 28, 2021. 46 Copyright Notice 48 Copyright (c) 2020 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (https://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. TEMPORARY EDITORIAL NOTES . . . . . . . . . . . . . . . . . . 2 64 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 66 2.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 4 67 2.3. Requirements Language . . . . . . . . . . . . . . . . . . 4 68 3. Role of OAM in DetNet . . . . . . . . . . . . . . . . . . . . 4 69 4. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 5 70 4.1. Information Collection . . . . . . . . . . . . . . . . . 5 71 4.2. Continuity Check . . . . . . . . . . . . . . . . . . . . 5 72 4.3. Connectivity Verification . . . . . . . . . . . . . . . . 6 73 4.4. Route Tracing . . . . . . . . . . . . . . . . . . . . . . 6 74 4.5. Fault Verification/detection . . . . . . . . . . . . . . 6 75 4.6. Fault Isolation/identification . . . . . . . . . . . . . 7 76 5. Administration . . . . . . . . . . . . . . . . . . . . . . . 7 77 5.1. Collection of metrics . . . . . . . . . . . . . . . . . . 7 78 5.2. Worst-case metrics . . . . . . . . . . . . . . . . . . . 7 79 6. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 8 80 6.1. Replication / Elimination . . . . . . . . . . . . . . . . 8 81 6.2. Resource Reservation . . . . . . . . . . . . . . . . . . 8 82 6.3. Soft transition after reconfiguration . . . . . . . . . . 9 83 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 84 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 85 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 86 10. Informative References . . . . . . . . . . . . . . . . . . . 9 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 89 1. TEMPORARY EDITORIAL NOTES 91 This document is an Internet Draft, so it is work-in-progress by 92 nature. It contains the following work-in-progress elements: 94 o "TODO" statements are elements which have not yet been written by 95 the authors for some reason (lack of time, ongoing discussions 96 with no clear consensus, etc). The statement does indicate that 97 the text will be written at some time. 99 2. Introduction 101 Deterministic Networking (DetNet) [RFC8655] has proposed to provide a 102 bounded end-to-end latency on top of the network infrastructure, 103 comprising both Layer 2 bridged and Layer 3 routed segments. Their 104 work encompasses the data plane, OAM, time synchronization, 105 management, control, and security aspects. 107 Operations, Administration, and Maintenance (OAM) Tools are of 108 primary importance for IP networks [RFC7276]. DetNet OAM should 109 provide a toolset for fault detection, localization, and performance 110 measurement. 112 This document's primary purpose is to detail the specific 113 requirements of the OAM features recommended to maintain a 114 deterministic/reliable network. Specifically, it investigates the 115 requirements for a deterministic network, supporting critical flows. 117 In this document, the term OAM will be used according to its 118 definition specified in [RFC6291]. DetNet expects to implement an 119 OAM framework to maintain a real-time view of the network 120 infrastructure, and its ability to respect the Service Level 121 Objectives (SLO), such as packet delay, delay variation, and packet 122 loss ratio, assigned to each data flow. 124 2.1. Terminology 126 The following terms are used througout this document as defined 127 below: 129 o OAM entity: a data flow to be monitored for defects and/or its 130 performance metrics measured. 132 o Maintenance End Point (MEP): OAM systems traversed by a data flow 133 when entering/exiting the network. In DetNet, it corresponds with 134 the source and destination of a data flow. OAM messages can be 135 exchanged between two MEPs. 137 o Maintenance Intermediate endPoint (MIP): an OAM system along the 138 flow; a MIP MAY respond to an OAM message generated by the MEP. 140 o Control and management plane: the control and management planes 141 are used to configure and control the network (long-term). 143 Relative to a data flow, the control and/or management plane can 144 be out-of-band. 146 o Active measurement methods (as defined in [RFC7799]) modify a 147 normal data flow by inserting novel fields, injecting specially 148 constructed test packets [RFC2544]). It is critical for the 149 quality of information obtained using an active method that 150 generated test packets are in-band with the monitored data flow. 151 In other words, a test packet is required to cross the same 152 network nodes and links and receive the same Quality of Service 153 (QoS) treatment as a data packet. 155 o Passive measurement methods [RFC7799] infer information by 156 observing unmodified existing flows. 158 o Hybrid measurement methods [RFC7799] is the combination of 159 elements of both active and passive measurement methods. 161 2.2. Acronyms 163 OAM: Operations, Administration, and Maintenance 165 DetNet: Deterministic Networking 167 SLO: Service Level Objective 169 QoS: Quality of Service 171 SNMP: Simple Network Management Protocol 173 SDN: Software Defined Network 175 we need here an exhaustive list, to be completed after the 176 document has evolved. 178 2.3. Requirements Language 180 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 181 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 182 "OPTIONAL" in this document are to be interpreted as described in BCP 183 14 [RFC2119] [RFC8174] when, and only when, they appear in all 184 capitals, as shown here. 186 3. Role of OAM in DetNet 188 DetNet networks expect to provide communications with predictable low 189 packet delay and packet loss. Most critical applications will define 190 an SLO to be required for the data flows it generates. 192 To respect strict guarantees, DetNet can use an orchestrator able to 193 monitor and maintain the network. Typically, a Software-Defined 194 Network (SDN) controller places DetNet flows in the deployed network 195 based on their the SLO. Thus, resources have to be provisioned a 196 priori for the regular operation of the network. OAM represents the 197 essential elements of the network operation and necessary for OAM 198 resources that need to be accounted for to maintain the network 199 operational. 201 Fault-tolerance also assumes that multiple paths could be provisioned 202 so that an end-to-end circuit is maintained by adapting to the 203 existing conditions. The central controller/orchestrator typically 204 controls the Packet Replication, Elimination, and Ordering Functions 205 (PREOF) on a node. OAM is expected to support monitoring and 206 troubleshooting PREOF on a particular node and within the domain. 208 Note that PREOF can also be controlled by a set of distributed 209 controllers, in those scenarios where DetNet solutions involve more 210 than one single central controller. 212 4. Operation 214 OAM features will enable DetNet with robust operation both for 215 forwarding and routing purposes. 217 4.1. Information Collection 219 Information about the state of the network can be collected using 220 several mechanisms. Some protocols, e.g., Simple Network Management 221 Protocol (SNMP), send queries. Others, e.g., YANG-based data models, 222 generate notifications based on the publish-subscribe method. In 223 either way, information about the state of the network being 224 collected and sent to the controller. 226 Also, we can characterize methods of transporting OAM information 227 relative to the path of data. For instance, OAM information may be 228 transported out-of-band or in-band with the data flow. 230 4.2. Continuity Check 232 Continuity check is used to monitor the continuity of a path, i.e., 233 that there exists a way to deliver the packets between two endpoints 234 A and B. 236 4.3. Connectivity Verification 238 In addition to the Continuity Check, DetNet solutions have to verify 239 the connectivity. This verification considers additional 240 constraints, i.e., the absence of misconnection. 242 In particular, resources have to be reserved for a given flow, so 243 they are booked for use without being impacted by other flows. 244 Similarly, the destination does not receive packets from different 245 flows through its interface. 247 It is worth noting that the test and data packets MUST follow the 248 same path, i.e., the connectivity verification has to be conducted 249 in-band without impacting the data traffic. Test packets MUST share 250 fate with the monitored data traffic without introducing congestion 251 in normal network conditions. 253 4.4. Route Tracing 255 Ping and traceroute are two ubiquitous tools that help localize and 256 characterize a failure in the network. They help to identify a 257 subset of the list of routers in the route. However, to be 258 predictable, resources are reserved per flow in DetNet. Thus, DetNet 259 needs to define route tracing tools able to track the route for a 260 specific flow. 262 DetNet with IP data plane is NOT RECOMMENDED to use multiple paths or 263 links, i.e., Equal-Cost Multipath (ECMP) [I-D.ietf-detnet-ip]. As 264 the result, OAM in IP ECMP environment is outside the scope of this 265 document. 267 4.5. Fault Verification/detection 269 DetNet expects to operate fault-tolerant networks. Thus, mechanisms 270 able to detect faults before they impact the network performance are 271 needed. 273 The network has to detect when a fault occurred, i.e., the network 274 has deviated from its expected behavior. While the network must 275 report an alarm, the cause may not be identified precisely. For 276 instance, the end-to-end reliability has decreased significantly, or 277 a buffer overflow occurs. 279 DetNet OAM mechanisms SHOULD allow a fault detection in real time. 280 They MAY, when possible, predict faults based on current network 281 conditions. They MAY also identify and report the cause of the 282 actual/predicted network failure. 284 4.6. Fault Isolation/identification 286 The network has isolated and identified the cause of the fault. For 287 instance, the replication process behaves not as expected to a 288 specific intermediary router. 290 5. Administration 292 The network SHOULD expose a collection of metrics to support an 293 operator making proper decisions, including: 295 o Queuing Delay: the time elapsed between a packet enqueued and its 296 transmission to the next hop. 298 o Buffer occupancy: the number of packets present in the buffer, for 299 each of the existing flows. 301 The following metrics SHOULD be collected: 303 o per virtual circuit to measure the end-to-end performance for a 304 given flow. Each of the paths has to be isolated in multipath 305 routing strategies. 307 o per path to detect misbehaving path when multiple paths are 308 applied. 310 o per device to detect misbehaving node, when it relays the packets 311 of several flows. 313 5.1. Collection of metrics 315 DetNet OAM SHOULD optimize the number of statistics / measurements to 316 collected, frequency of collecting. Distributed and centralized 317 mechanisms MAY be used in combination. Periodic and event-triggered 318 collection information characterizing the state of a network MAY be 319 used. 321 5.2. Worst-case metrics 323 DetNet aims to enable real-time communications on top of a 324 heterogeneous multi-hop architecture. To make correct decisions, the 325 controller needs to know the distribution of packet losses/delays for 326 each flow, and each hop of the paths. In other words, the average 327 end-to-end statistics are not enough. The collected information must 328 be sufficient to allow the controller to predict the worst-case. 330 6. Maintenance 332 DetNet needs to implement a self-healing and self-optimization 333 approach. The controller MUST be able to continuously retrieve the 334 state of the network, to evaluate conditions and trends about the 335 relevance of a reconfiguration, quantifying: 337 the cost of the sub-optimality: resources may not be used 338 optimally (e.g., a better path exists). 340 the reconfiguration cost: the controller needs to trigger some 341 reconfigurations. For this transient period, resources may be 342 twice reserved, and control packets have to be transmitted. 344 Thus, reconfiguration may only be triggered if the gain is 345 significant. 347 6.1. Replication / Elimination 349 When multiple paths are reserved between two maintenance endpoints, 350 packet replication may be used to introduce redundancy and alleviate 351 transmission errors and collisions. For instance, in Figure 1, the 352 source node S is transmitting the packet to both parents, nodes A and 353 B. Each maintenance endpoint will decide to trigger the packet 354 replication, elimination or the ordering process when a set of 355 metrics passes a threshold value. 357 ===> (A) => (C) => (E) === 358 // \\// \\// \\ 359 source (S) //\\ //\\ (R) (root) 360 \\ // \\ // \\ // 361 ===> (B) => (D) => (F) === 363 Figure 1: Packet Replication: S transmits twice the same data packet, 364 to DP(A) and AP (B). 366 6.2. Resource Reservation 368 Because the QoS criteria associated with a path may degrade, the 369 network has to provision additional resources along the path. We 370 need to provide mechanisms to patch the network configuration. 372 6.3. Soft transition after reconfiguration 374 Since DetNet expects to support real-time flows, DetNet OAM MUST 375 support soft-reconfiguration, where the novel resources are reserved 376 before the ancient ones are released. Some mechanisms have to be 377 proposed so that packets are forwarded through the novel track only 378 when the resources are ready to be used, while maintaining the global 379 state consistent (no packet reordering, duplication, etc.) 381 7. IANA Considerations 383 This document has no actionable requirements for IANA. This section 384 can be removed before the publication. 386 8. Security Considerations 388 This section will be expanded in future versions of the draft. 390 9. Acknowledgments 392 TBD 394 10. Informative References 396 [I-D.ietf-detnet-ip] 397 Varga, B., Farkas, J., Berger, L., Fedyk, D., and S. 398 Bryant, "DetNet Data Plane: IP", draft-ietf-detnet-ip-07 399 (work in progress), July 2020. 401 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 402 Requirement Levels", BCP 14, RFC 2119, 403 DOI 10.17487/RFC2119, March 1997, 404 . 406 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 407 Network Interconnect Devices", RFC 2544, 408 DOI 10.17487/RFC2544, March 1999, 409 . 411 [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, 412 D., and S. Mansfield, "Guidelines for the Use of the "OAM" 413 Acronym in the IETF", BCP 161, RFC 6291, 414 DOI 10.17487/RFC6291, June 2011, 415 . 417 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 418 Weingarten, "An Overview of Operations, Administration, 419 and Maintenance (OAM) Tools", RFC 7276, 420 DOI 10.17487/RFC7276, June 2014, 421 . 423 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 424 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 425 May 2016, . 427 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 428 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 429 May 2017, . 431 [RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas, 432 "Deterministic Networking Architecture", RFC 8655, 433 DOI 10.17487/RFC8655, October 2019, 434 . 436 Authors' Addresses 438 Fabrice Theoleyre 439 CNRS 440 300 boulevard Sebastien Brant - CS 10413 441 Illkirch - Strasbourg 67400 442 FRANCE 444 Phone: +33 368 85 45 33 445 Email: theoleyre@unistra.fr 446 URI: http://www.theoleyre.eu 448 Georgios Z. Papadopoulos 449 IMT Atlantique 450 Office B00 - 102A 451 2 Rue de la Chataigneraie 452 Cesson-Sevigne - Rennes 35510 453 FRANCE 455 Phone: +33 299 12 70 04 456 Email: georgios.papadopoulos@imt-atlantique.fr 458 Grek Mirsky 459 ZTE Corp. 461 Email: gregimirsky@gmail.com 462 Carlos J. Bernardos 463 Universidad Carlos III de Madrid 464 Av. Universidad, 30 465 Leganes, Madrid 28911 466 Spain 468 Phone: +34 91624 6236 469 Email: cjbc@it.uc3m.es 470 URI: http://www.it.uc3m.es/cjbc/