idnits 2.17.1 draft-tpmb-detnet-oam-framework-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 2 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (30 March 2021) is 1120 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 8321 (Obsoleted by RFC 9341) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 DetNet G. Mirsky 3 Internet-Draft ZTE Corp. 4 Intended status: Standards Track F. Theoleyre 5 Expires: 1 October 2021 CNRS 6 G.Z. Papadopoulos 7 IMT Atlantique 8 CJ. Bernardos 9 UC3M 10 30 March 2021 12 Framework of Operations, Administration and Maintenance (OAM) for 13 Deterministic Networking (DetNet) 14 draft-tpmb-detnet-oam-framework-01 16 Abstract 18 Deterministic Networking (DetNet), as defined in RFC 8655, is aimed 19 to provide a bounded end-to-end latency on top of the network 20 infrastructure, comprising both Layer 2 bridged and Layer 3 routed 21 segments. This document's primary purpose is to detail the specific 22 requirements of the Operation, Administration, and Maintenance (OAM) 23 recommended to maintain a deterministic network. With the 24 implementation of the OAM framework in DetNet, an operator will have 25 a real-time view of the network infrastructure regarding the 26 network's ability to respect the Service Level Objective, such as 27 packet delay, delay variation, and packet loss ratio, assigned to 28 each data flow. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on 1 October 2021. 47 Copyright Notice 49 Copyright (c) 2021 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 54 license-info) in effect on the date of publication of this document. 55 Please review these documents carefully, as they describe your rights 56 and restrictions with respect to this document. Code Components 57 extracted from this document must include Simplified BSD License text 58 as described in Section 4.e of the Trust Legal Provisions and are 59 provided without warranty as described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 4 66 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 4 67 2. Role of OAM in DetNet . . . . . . . . . . . . . . . . . . . . 5 68 3. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 5 69 3.1. Information Collection . . . . . . . . . . . . . . . . . 5 70 3.2. Continuity Check . . . . . . . . . . . . . . . . . . . . 6 71 3.3. Connectivity Verification . . . . . . . . . . . . . . . . 6 72 3.4. Route Tracing . . . . . . . . . . . . . . . . . . . . . . 6 73 3.5. Fault Verification/detection . . . . . . . . . . . . . . 6 74 3.6. Fault Isolation/identification . . . . . . . . . . . . . 7 75 3.7. Use of Hybrid OAM in DetNet . . . . . . . . . . . . . . . 7 76 4. Administration . . . . . . . . . . . . . . . . . . . . . . . 7 77 4.1. Collection of metrics . . . . . . . . . . . . . . . . . . 8 78 4.2. Worst-case metrics . . . . . . . . . . . . . . . . . . . 8 79 5. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 8 80 5.1. Replication / Elimination . . . . . . . . . . . . . . . . 8 81 5.2. Resource Reservation . . . . . . . . . . . . . . . . . . 9 82 5.3. Soft transition after reconfiguration . . . . . . . . . . 9 83 6. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 9 84 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 85 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 86 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 87 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 88 10.1. Normative References . . . . . . . . . . . . . . . . . . 11 89 10.2. Informative References . . . . . . . . . . . . . . . . . 11 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 92 1. Introduction 94 Deterministic Networking (DetNet) [RFC8655] has proposed to provide a 95 bounded end-to-end latency on top of the network infrastructure, 96 comprising both Layer 2 bridged and Layer 3 routed segments. Their 97 work encompasses the data plane, OAM, time synchronization, 98 management, control, and security aspects. 100 Operations, Administration, and Maintenance (OAM) Tools are of 101 primary importance for IP networks [RFC7276]. DetNet OAM should 102 provide a toolset for fault detection, localization, and performance 103 measurement. 105 This document's primary purpose is to detail the specific 106 requirements of the OAM features recommended to maintain a 107 deterministic/reliable network. Specifically, it investigates the 108 requirements for a deterministic network, supporting critical flows. 110 In this document, the term OAM will be used according to its 111 definition specified in [RFC6291]. DetNet expects to implement an 112 OAM framework to maintain a real-time view of the network 113 infrastructure, and its ability to respect the Service Level 114 Objectives (SLO), such as packet delay, delay variation, and packet 115 loss ratio, assigned to each data flow. 117 This document lists the functional requirements toward OAM for DetNet 118 domain. The list can further be used for gap analysis of available 119 OAM tools to identify possible enhancements of existing or whether 120 new OAM tools are required to support proactive and on-demand path 121 monitoring and service validation. 123 1.1. Terminology 125 The following terms are used througout this document as defined 126 below: 128 * OAM entity: a data flow to be monitored for defects and/or its 129 performance metrics measured. 131 * Maintenance End Point (MEP): OAM systems traversed by a data flow 132 when entering/exiting the network. In DetNet, it corresponds with 133 the source and destination of a data flow. OAM messages can be 134 exchanged between two MEPs. 136 * Maintenance Intermediate endPoint (MIP): an OAM system along the 137 flow; a MIP MAY respond to an OAM message generated by the MEP. 139 * Control and management plane: the control and management planes 140 are used to configure and control the network (long-term). 141 Relative to a data flow, the control and/or management plane can 142 be out-of-band. 144 * Active measurement methods (as defined in [RFC7799]) modify a 145 normal data flow by inserting novel fields, injecting specially 146 constructed test packets [RFC2544]). It is critical for the 147 quality of information obtained using an active method that 148 generated test packets are in-band with the monitored data flow. 149 In other words, a test packet is required to cross the same 150 network nodes and links and receive the same Quality of Service 151 (QoS) treatment as a data packet. 153 * Passive measurement methods [RFC7799] infer information by 154 observing unmodified existing flows. 156 * Hybrid measurement methods [RFC7799] is the combination of 157 elements of both active and passive measurement methods. 159 1.2. Acronyms 161 OAM: Operations, Administration, and Maintenance 163 DetNet: Deterministic Networking 165 SLO: Service Level Objective 167 QoS: Quality of Service 169 SNMP: Simple Network Management Protocol 171 SDN: Software Defined Network 173 we need here an exhaustive list, to be completed after the 174 document has evolved. 176 1.3. Requirements Language 178 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 179 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 180 "OPTIONAL" in this document are to be interpreted as described in BCP 181 14 [RFC2119] [RFC8174] when, and only when, they appear in all 182 capitals, as shown here. 184 2. Role of OAM in DetNet 186 DetNet networks expect to provide communications with predictable low 187 packet delay and packet loss. Most critical applications will define 188 an SLO to be required for the data flows it generates. 190 To respect strict guarantees, DetNet can use an orchestrator able to 191 monitor and maintain the network. Typically, a Software-Defined 192 Network (SDN) controller places DetNet flows in the deployed network 193 based on their the SLO. Thus, resources have to be provisioned a 194 priori for the regular operation of the network. OAM represents the 195 essential elements of the network operation and necessary for OAM 196 resources that need to be accounted for to maintain the network 197 operational. 199 Fault-tolerance also assumes that multiple paths could be provisioned 200 so that an end-to-end circuit is maintained by adapting to the 201 existing conditions. The central controller/orchestrator typically 202 controls the Packet Replication, Elimination, and Ordering Functions 203 (PREOF) on a node. OAM is expected to support monitoring and 204 troubleshooting PREOF on a particular node and within the domain. 206 Note that PREOF can also be controlled by a set of distributed 207 controllers, in those scenarios where DetNet solutions involve more 208 than one single central controller. 210 3. Operation 212 OAM features will enable DetNet with robust operation both for 213 forwarding and routing purposes. 215 3.1. Information Collection 217 Information about the state of the network can be collected using 218 several mechanisms. Some protocols, e.g., Simple Network Management 219 Protocol (SNMP), send queries. Others, e.g., YANG-based data models, 220 generate notifications based on the publish-subscribe method. In 221 either way, information about the state of the network being 222 collected and sent to the controller. 224 Also, we can characterize methods of transporting OAM information 225 relative to the path of data. For instance, OAM information may be 226 transported out-of-band or in-band with the data flow. 228 3.2. Continuity Check 230 Continuity check is used to monitor the continuity of a path, i.e., 231 that there exists a way to deliver the packets between two endpoints 232 A and B. 234 3.3. Connectivity Verification 236 In addition to the Continuity Check, DetNet solutions have to verify 237 the connectivity. This verification considers additional 238 constraints, i.e., the absence of misconnection. 240 In particular, resources have to be reserved for a given flow, so 241 they are booked for use without being impacted by other flows. 242 Similarly, the destination does not receive packets from different 243 flows through its interface. 245 It is worth noting that the test and data packets MUST follow the 246 same path, i.e., the connectivity verification has to be conducted 247 in-band without impacting the data traffic. Test packets MUST share 248 fate with the monitored data traffic without introducing congestion 249 in normal network conditions. 251 3.4. Route Tracing 253 Ping and traceroute are two ubiquitous tools that help localize and 254 characterize a failure in the network. They help to identify a 255 subset of the list of routers in the route. However, to be 256 predictable, resources are reserved per flow in DetNet. Thus, DetNet 257 needs to define route tracing tools able to track the route for a 258 specific flow. 260 DetNet with IP data plane is NOT RECOMMENDED to use multiple paths or 261 links, i.e., Equal-Cost Multipath (ECMP) [RFC8939]. As the result, 262 OAM in IP ECMP environment is outside the scope of this document. 264 3.5. Fault Verification/detection 266 DetNet expects to operate fault-tolerant networks. Thus, mechanisms 267 able to detect faults before they impact the network performance are 268 needed. 270 The network has to detect when a fault occurred, i.e., the network 271 has deviated from its expected behavior. While the network must 272 report an alarm, the cause may not be identified precisely. For 273 instance, the end-to-end reliability has decreased significantly, or 274 a buffer overflow occurs. 276 DetNet OAM mechanisms SHOULD allow a fault detection in real time. 277 They MAY, when possible, predict faults based on current network 278 conditions. They MAY also identify and report the cause of the 279 actual/predicted network failure. 281 3.6. Fault Isolation/identification 283 The network has isolated and identified the cause of the fault. For 284 instance, the replication process behaves not as expected to a 285 specific intermediary router. 287 3.7. Use of Hybrid OAM in DetNet 289 Hybrid OAM methods are used in performance monitoring and defined in 290 [RFC7799] as: 292 Hybrid Methods are Methods of Measurement that use a combination 293 of Active Methods and Passive Methods. 295 A hybrid measurement method may produce metrics as close to passive, 296 but it still alters something in a data packet even if that is the 297 value of a designated field in the packet encapsulation. One example 298 of such a hybrid measurement method is the Alternate Marking method 299 (AMM) described in [RFC8321]. One of the advantages of the use of 300 AMM in a DetNet domain with the IP data plane is that the marking is 301 applied to a data flow, thus ensuring that measured metrics are 302 directly applicable to the DetNet flow. 304 4. Administration 306 The network SHOULD expose a collection of metrics to support an 307 operator making proper decisions, including: 309 * Queuing Delay: the time elapsed between a packet enqueued and its 310 transmission to the next hop. 312 * Buffer occupancy: the number of packets present in the buffer, for 313 each of the existing flows. 315 The following metrics SHOULD be collected: 317 * per virtual circuit to measure the end-to-end performance for a 318 given flow. Each of the paths has to be isolated in multipath 319 routing strategies. 321 * per path to detect misbehaving path when multiple paths are 322 applied. 324 * per device to detect misbehaving node, when it relays the packets 325 of several flows. 327 4.1. Collection of metrics 329 DetNet OAM SHOULD optimize the number of statistics / measurements to 330 collected, frequency of collecting. Distributed and centralized 331 mechanisms MAY be used in combination. Periodic and event-triggered 332 collection information characterizing the state of a network MAY be 333 used. 335 4.2. Worst-case metrics 337 DetNet aims to enable real-time communications on top of a 338 heterogeneous multi-hop architecture. To make correct decisions, the 339 controller needs to know the distribution of packet losses/delays for 340 each flow, and each hop of the paths. In other words, the average 341 end-to-end statistics are not enough. The collected information must 342 be sufficient to allow the controller to predict the worst-case. 344 5. Maintenance 346 DetNet needs to implement a self-healing and self-optimization 347 approach. The controller MUST be able to continuously retrieve the 348 state of the network, to evaluate conditions and trends about the 349 relevance of a reconfiguration, quantifying: 351 the cost of the sub-optimality: resources may not be used 352 optimally (e.g., a better path exists). 354 the reconfiguration cost: the controller needs to trigger some 355 reconfigurations. For this transient period, resources may be 356 twice reserved, and control packets have to be transmitted. 358 Thus, reconfiguration may only be triggered if the gain is 359 significant. 361 5.1. Replication / Elimination 363 When multiple paths are reserved between two maintenance endpoints, 364 packet replication may be used to introduce redundancy and alleviate 365 transmission errors and collisions. For instance, in Figure 1, the 366 source node S is transmitting the packet to both parents, nodes A and 367 B. Each maintenance endpoint will decide to trigger the packet 368 replication, elimination or the ordering process when a set of 369 metrics passes a threshold value. 371 ===> (A) => (C) => (E) === 372 // \\// \\// \\ 373 source (S) //\\ //\\ (R) (root) 374 \\ // \\ // \\ // 375 ===> (B) => (D) => (F) === 377 Figure 1: Packet Replication: S transmits twice the same data 378 packet, to DP(A) and AP (B). 380 5.2. Resource Reservation 382 Because the QoS criteria associated with a path may degrade, the 383 network has to provision additional resources along the path. We 384 need to provide mechanisms to patch the network configuration. 386 5.3. Soft transition after reconfiguration 388 Since DetNet expects to support real-time flows, DetNet OAM MUST 389 support soft-reconfiguration, where the novel resources are reserved 390 before the ancient ones are released. Some mechanisms have to be 391 proposed so that packets are forwarded through the novel track only 392 when the resources are ready to be used, while maintaining the global 393 state consistent (no packet reordering, duplication, etc.) 395 6. Requirements 397 This section lists requirements for OAM in DetNet domain with MPLS 398 data plane: 400 1. It MUST be possible to initiate DetNet OAM session from any 401 DetNet node towards another DetNet node(s) within given domain. 403 2. It SHOULD be possible to initialize DetNet OAM session from a 404 centralized controller. 406 3. DetNet OAM MUST support proactive and on-demand OAM monitoring 407 and measurement methods. 409 4. DetNet OAM packets MUST be in-band, i.e., follow precisely the 410 same path as DetNet data plane traffic. 412 5. DetNet OAM MUST support unidirectional OAM methods, continuity 413 check, connectivity verification, and performance measurement. 415 6. DetNet OAM MUST support bi-directional OAM methods. Such OAM 416 methods MAY combine in-band monitoring or measurement in the 417 forward direction and out-of-bound notification in the reverse 418 direction, i.e., from egress to ingress end point of the OAM 419 test session. 421 7. DetNet OAM MUST support proactive monitoring of a DetNet node 422 availability in the given DetNet domain. 424 8. DetNet OAM MUST support Path Maximum Transmission Unit 425 discovery. 427 9. DetNet OAM MUST support Remote Defect Indication (RDI) 428 notification to the DetNet node performing continuity checking. 430 10. DetNet OAM MUST support performance measurement methods. 432 11. DetNet OAM MAY support hybrid performance measurement methods. 434 12. DetNet OAM MUST support unidirectional performance measurement 435 methods. Calculated performance metrics MUST include but are 436 not limited to throughput, packet loss, delay and delay 437 variation metrics. [RFC6374] provides excellent details on 438 performance measurement and performance metrics. 440 13. DetNet OAM MUST support defect notification mechanism, like 441 Alarm Indication Signal. Any DetNet node in the given DetNet 442 domain MAY originate a defect notification addressed to any 443 subset of nodes within the domain. 445 14. DetNet OAM MUST support methods to enable survivability of the 446 DetNet domain. These recovery methods MAY use protection 447 switching and restoration. 449 15. DetNet OAM MUST support the discovery of Packet Replication, 450 Elimination, and Order preservation sub-functions locations in 451 the domain. 453 16. DetNet OAM MUST support testing of Packet Replication, 454 Elimination, and Order preservation sub-functions in the domain. 456 17. DetNet OAM MUST support monitoring any sub-set of paths 457 traversed through the DetNet domain by the DetNet flow. 459 7. IANA Considerations 461 This document has no actionable requirements for IANA. This section 462 can be removed before the publication. 464 8. Security Considerations 466 This document lists the OAM requirements for a DetNet domain and does 467 not raise any security concerns or issues in addition to ones common 468 to networking and those specific to a DetNet discussed in 469 [I-D.ietf-detnet-security]. 471 9. Acknowledgments 473 TBD 475 10. References 477 10.1. Normative References 479 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 480 Requirement Levels", BCP 14, RFC 2119, 481 DOI 10.17487/RFC2119, March 1997, 482 . 484 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 485 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 486 May 2017, . 488 10.2. Informative References 490 [I-D.ietf-detnet-security] 491 Grossman, E., Mizrahi, T., and A. J. Hacker, 492 "Deterministic Networking (DetNet) Security 493 Considerations", Work in Progress, Internet-Draft, draft- 494 ietf-detnet-security-16, 2 March 2021, 495 . 498 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 499 Network Interconnect Devices", RFC 2544, 500 DOI 10.17487/RFC2544, March 1999, 501 . 503 [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, 504 D., and S. Mansfield, "Guidelines for the Use of the "OAM" 505 Acronym in the IETF", BCP 161, RFC 6291, 506 DOI 10.17487/RFC6291, June 2011, 507 . 509 [RFC6374] Frost, D. and S. Bryant, "Packet Loss and Delay 510 Measurement for MPLS Networks", RFC 6374, 511 DOI 10.17487/RFC6374, September 2011, 512 . 514 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 515 Weingarten, "An Overview of Operations, Administration, 516 and Maintenance (OAM) Tools", RFC 7276, 517 DOI 10.17487/RFC7276, June 2014, 518 . 520 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 521 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 522 May 2016, . 524 [RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, 525 L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 526 "Alternate-Marking Method for Passive and Hybrid 527 Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, 528 January 2018, . 530 [RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas, 531 "Deterministic Networking Architecture", RFC 8655, 532 DOI 10.17487/RFC8655, October 2019, 533 . 535 [RFC8939] Varga, B., Ed., Farkas, J., Berger, L., Fedyk, D., and S. 536 Bryant, "Deterministic Networking (DetNet) Data Plane: 537 IP", RFC 8939, DOI 10.17487/RFC8939, November 2020, 538 . 540 Authors' Addresses 542 Greg Mirsky 543 ZTE Corp. 545 Email: gregimirsky@gmail.com, gregory.mirsky@ztetx.com 547 Fabrice Theoleyre 548 CNRS 549 300 boulevard Sebastien Brant - CS 10413 550 67400 Illkirch - Strasbourg 551 France 553 Phone: +33 368 85 45 33 554 Email: theoleyre@unistra.fr 555 URI: http://www.theoleyre.eu 556 Georgios Z. Papadopoulos 557 IMT Atlantique 558 Office B00 - 102A 559 2 Rue de la Châtaigneraie 560 35510 Cesson-Sévigné - Rennes 561 France 563 Phone: +33 299 12 70 04 564 Email: georgios.papadopoulos@imt-atlantique.fr 566 Carlos J. Bernardos 567 Universidad Carlos III de Madrid 568 Av. Universidad, 30 569 28911 Leganes, Madrid 570 Spain 572 Phone: +34 91624 6236 573 Email: cjbc@it.uc3m.es 574 URI: http://www.it.uc3m.es/cjbc/