idnits 2.17.1 draft-tpmb-detnet-oam-framework-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 15, 2021) is 1190 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-16) exists of draft-ietf-detnet-security-13 -- Obsolete informational reference (is this intentional?): RFC 8321 (Obsoleted by RFC 9341) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 DetNet G. Mirsky 3 Internet-Draft ZTE Corp. 4 Intended status: Standards Track F. Theoleyre 5 Expires: July 19, 2021 CNRS 6 G. Papadopoulos 7 IMT Atlantique 8 CJ. Bernardos 9 UC3M 10 January 15, 2021 12 Framework of Operations, Administration and Maintenance (OAM) for 13 Deterministic Networking (DetNet) 14 draft-tpmb-detnet-oam-framework-00 16 Abstract 18 Deterministic Networking (DetNet), as defined in RFC 8655, is aimed 19 to provide a bounded end-to-end latency on top of the network 20 infrastructure, comprising both Layer 2 bridged and Layer 3 routed 21 segments. This document's primary purpose is to detail the specific 22 requirements of the Operation, Administration, and Maintenance (OAM) 23 recommended to maintain a deterministic network. With the 24 implementation of the OAM framework in DetNet, an operator will have 25 a real-time view of the network infrastructure regarding the 26 network's ability to respect the Service Level Objective, such as 27 packet delay, delay variation, and packet loss ratio, assigned to 28 each data flow. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on July 19, 2021. 47 Copyright Notice 49 Copyright (c) 2021 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 66 1.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 4 67 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 4 68 2. Role of OAM in DetNet . . . . . . . . . . . . . . . . . . . . 5 69 3. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 5 70 3.1. Information Collection . . . . . . . . . . . . . . . . . 5 71 3.2. Continuity Check . . . . . . . . . . . . . . . . . . . . 6 72 3.3. Connectivity Verification . . . . . . . . . . . . . . . . 6 73 3.4. Route Tracing . . . . . . . . . . . . . . . . . . . . . . 6 74 3.5. Fault Verification/detection . . . . . . . . . . . . . . 6 75 3.6. Fault Isolation/identification . . . . . . . . . . . . . 7 76 3.7. Use of Hybrid OAM in DetNet . . . . . . . . . . . . . . . 7 77 4. Administration . . . . . . . . . . . . . . . . . . . . . . . 7 78 4.1. Collection of metrics . . . . . . . . . . . . . . . . . . 8 79 4.2. Worst-case metrics . . . . . . . . . . . . . . . . . . . 8 80 5. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 8 81 5.1. Replication / Elimination . . . . . . . . . . . . . . . . 8 82 5.2. Resource Reservation . . . . . . . . . . . . . . . . . . 9 83 5.3. Soft transition after reconfiguration . . . . . . . . . . 9 84 6. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 9 85 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 86 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 87 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 88 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 89 10.1. Normative References . . . . . . . . . . . . . . . . . . 11 90 10.2. Informative References . . . . . . . . . . . . . . . . . 11 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 93 1. Introduction 95 Deterministic Networking (DetNet) [RFC8655] has proposed to provide a 96 bounded end-to-end latency on top of the network infrastructure, 97 comprising both Layer 2 bridged and Layer 3 routed segments. Their 98 work encompasses the data plane, OAM, time synchronization, 99 management, control, and security aspects. 101 Operations, Administration, and Maintenance (OAM) Tools are of 102 primary importance for IP networks [RFC7276]. DetNet OAM should 103 provide a toolset for fault detection, localization, and performance 104 measurement. 106 This document's primary purpose is to detail the specific 107 requirements of the OAM features recommended to maintain a 108 deterministic/reliable network. Specifically, it investigates the 109 requirements for a deterministic network, supporting critical flows. 111 In this document, the term OAM will be used according to its 112 definition specified in [RFC6291]. DetNet expects to implement an 113 OAM framework to maintain a real-time view of the network 114 infrastructure, and its ability to respect the Service Level 115 Objectives (SLO), such as packet delay, delay variation, and packet 116 loss ratio, assigned to each data flow. 118 This document lists the functional requirements toward OAM for DetNet 119 domain. The list can further be used for gap analysis of available 120 OAM tools to identify possible enhancements of existing or whether 121 new OAM tools are required to support proactive and on-demand path 122 monitoring and service validation. 124 1.1. Terminology 126 The following terms are used througout this document as defined 127 below: 129 o OAM entity: a data flow to be monitored for defects and/or its 130 performance metrics measured. 132 o Maintenance End Point (MEP): OAM systems traversed by a data flow 133 when entering/exiting the network. In DetNet, it corresponds with 134 the source and destination of a data flow. OAM messages can be 135 exchanged between two MEPs. 137 o Maintenance Intermediate endPoint (MIP): an OAM system along the 138 flow; a MIP MAY respond to an OAM message generated by the MEP. 140 o Control and management plane: the control and management planes 141 are used to configure and control the network (long-term). 142 Relative to a data flow, the control and/or management plane can 143 be out-of-band. 145 o Active measurement methods (as defined in [RFC7799]) modify a 146 normal data flow by inserting novel fields, injecting specially 147 constructed test packets [RFC2544]). It is critical for the 148 quality of information obtained using an active method that 149 generated test packets are in-band with the monitored data flow. 150 In other words, a test packet is required to cross the same 151 network nodes and links and receive the same Quality of Service 152 (QoS) treatment as a data packet. 154 o Passive measurement methods [RFC7799] infer information by 155 observing unmodified existing flows. 157 o Hybrid measurement methods [RFC7799] is the combination of 158 elements of both active and passive measurement methods. 160 1.2. Acronyms 162 OAM: Operations, Administration, and Maintenance 164 DetNet: Deterministic Networking 166 SLO: Service Level Objective 168 QoS: Quality of Service 170 SNMP: Simple Network Management Protocol 172 SDN: Software Defined Network 174 we need here an exhaustive list, to be completed after the 175 document has evolved. 177 1.3. Requirements Language 179 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 180 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 181 "OPTIONAL" in this document are to be interpreted as described in BCP 182 14 [RFC2119] [RFC8174] when, and only when, they appear in all 183 capitals, as shown here. 185 2. Role of OAM in DetNet 187 DetNet networks expect to provide communications with predictable low 188 packet delay and packet loss. Most critical applications will define 189 an SLO to be required for the data flows it generates. 191 To respect strict guarantees, DetNet can use an orchestrator able to 192 monitor and maintain the network. Typically, a Software-Defined 193 Network (SDN) controller places DetNet flows in the deployed network 194 based on their the SLO. Thus, resources have to be provisioned a 195 priori for the regular operation of the network. OAM represents the 196 essential elements of the network operation and necessary for OAM 197 resources that need to be accounted for to maintain the network 198 operational. 200 Fault-tolerance also assumes that multiple paths could be provisioned 201 so that an end-to-end circuit is maintained by adapting to the 202 existing conditions. The central controller/orchestrator typically 203 controls the Packet Replication, Elimination, and Ordering Functions 204 (PREOF) on a node. OAM is expected to support monitoring and 205 troubleshooting PREOF on a particular node and within the domain. 207 Note that PREOF can also be controlled by a set of distributed 208 controllers, in those scenarios where DetNet solutions involve more 209 than one single central controller. 211 3. Operation 213 OAM features will enable DetNet with robust operation both for 214 forwarding and routing purposes. 216 3.1. Information Collection 218 Information about the state of the network can be collected using 219 several mechanisms. Some protocols, e.g., Simple Network Management 220 Protocol (SNMP), send queries. Others, e.g., YANG-based data models, 221 generate notifications based on the publish-subscribe method. In 222 either way, information about the state of the network being 223 collected and sent to the controller. 225 Also, we can characterize methods of transporting OAM information 226 relative to the path of data. For instance, OAM information may be 227 transported out-of-band or in-band with the data flow. 229 3.2. Continuity Check 231 Continuity check is used to monitor the continuity of a path, i.e., 232 that there exists a way to deliver the packets between two endpoints 233 A and B. 235 3.3. Connectivity Verification 237 In addition to the Continuity Check, DetNet solutions have to verify 238 the connectivity. This verification considers additional 239 constraints, i.e., the absence of misconnection. 241 In particular, resources have to be reserved for a given flow, so 242 they are booked for use without being impacted by other flows. 243 Similarly, the destination does not receive packets from different 244 flows through its interface. 246 It is worth noting that the test and data packets MUST follow the 247 same path, i.e., the connectivity verification has to be conducted 248 in-band without impacting the data traffic. Test packets MUST share 249 fate with the monitored data traffic without introducing congestion 250 in normal network conditions. 252 3.4. Route Tracing 254 Ping and traceroute are two ubiquitous tools that help localize and 255 characterize a failure in the network. They help to identify a 256 subset of the list of routers in the route. However, to be 257 predictable, resources are reserved per flow in DetNet. Thus, DetNet 258 needs to define route tracing tools able to track the route for a 259 specific flow. 261 DetNet with IP data plane is NOT RECOMMENDED to use multiple paths or 262 links, i.e., Equal-Cost Multipath (ECMP) [RFC8939]. As the result, 263 OAM in IP ECMP environment is outside the scope of this document. 265 3.5. Fault Verification/detection 267 DetNet expects to operate fault-tolerant networks. Thus, mechanisms 268 able to detect faults before they impact the network performance are 269 needed. 271 The network has to detect when a fault occurred, i.e., the network 272 has deviated from its expected behavior. While the network must 273 report an alarm, the cause may not be identified precisely. For 274 instance, the end-to-end reliability has decreased significantly, or 275 a buffer overflow occurs. 277 DetNet OAM mechanisms SHOULD allow a fault detection in real time. 278 They MAY, when possible, predict faults based on current network 279 conditions. They MAY also identify and report the cause of the 280 actual/predicted network failure. 282 3.6. Fault Isolation/identification 284 The network has isolated and identified the cause of the fault. For 285 instance, the replication process behaves not as expected to a 286 specific intermediary router. 288 3.7. Use of Hybrid OAM in DetNet 290 Hybrid OAM methods are used in performance monitoring and defined in 291 [RFC7799] as: 293 Hybrid Methods are Methods of Measurement that use a combination 294 of Active Methods and Passive Methods. 296 A hybrid measurement method may produce metrics as close to passive, 297 but it still alters something in a data packet even if that is the 298 value of a designated field in the packet encapsulation. One example 299 of such a hybrid measurement method is the Alternate Marking method 300 (AMM) described in [RFC8321]. One of the advantages of the use of 301 AMM in a DetNet domain with the IP data plane is that the marking is 302 applied to a data flow, thus ensuring that measured metrics are 303 directly applicable to the DetNet flow. 305 4. Administration 307 The network SHOULD expose a collection of metrics to support an 308 operator making proper decisions, including: 310 o Queuing Delay: the time elapsed between a packet enqueued and its 311 transmission to the next hop. 313 o Buffer occupancy: the number of packets present in the buffer, for 314 each of the existing flows. 316 The following metrics SHOULD be collected: 318 o per virtual circuit to measure the end-to-end performance for a 319 given flow. Each of the paths has to be isolated in multipath 320 routing strategies. 322 o per path to detect misbehaving path when multiple paths are 323 applied. 325 o per device to detect misbehaving node, when it relays the packets 326 of several flows. 328 4.1. Collection of metrics 330 DetNet OAM SHOULD optimize the number of statistics / measurements to 331 collected, frequency of collecting. Distributed and centralized 332 mechanisms MAY be used in combination. Periodic and event-triggered 333 collection information characterizing the state of a network MAY be 334 used. 336 4.2. Worst-case metrics 338 DetNet aims to enable real-time communications on top of a 339 heterogeneous multi-hop architecture. To make correct decisions, the 340 controller needs to know the distribution of packet losses/delays for 341 each flow, and each hop of the paths. In other words, the average 342 end-to-end statistics are not enough. The collected information must 343 be sufficient to allow the controller to predict the worst-case. 345 5. Maintenance 347 DetNet needs to implement a self-healing and self-optimization 348 approach. The controller MUST be able to continuously retrieve the 349 state of the network, to evaluate conditions and trends about the 350 relevance of a reconfiguration, quantifying: 352 the cost of the sub-optimality: resources may not be used 353 optimally (e.g., a better path exists). 355 the reconfiguration cost: the controller needs to trigger some 356 reconfigurations. For this transient period, resources may be 357 twice reserved, and control packets have to be transmitted. 359 Thus, reconfiguration may only be triggered if the gain is 360 significant. 362 5.1. Replication / Elimination 364 When multiple paths are reserved between two maintenance endpoints, 365 packet replication may be used to introduce redundancy and alleviate 366 transmission errors and collisions. For instance, in Figure 1, the 367 source node S is transmitting the packet to both parents, nodes A and 368 B. Each maintenance endpoint will decide to trigger the packet 369 replication, elimination or the ordering process when a set of 370 metrics passes a threshold value. 372 ===> (A) => (C) => (E) === 373 // \\// \\// \\ 374 source (S) //\\ //\\ (R) (root) 375 \\ // \\ // \\ // 376 ===> (B) => (D) => (F) === 378 Figure 1: Packet Replication: S transmits twice the same data packet, 379 to DP(A) and AP (B). 381 5.2. Resource Reservation 383 Because the QoS criteria associated with a path may degrade, the 384 network has to provision additional resources along the path. We 385 need to provide mechanisms to patch the network configuration. 387 5.3. Soft transition after reconfiguration 389 Since DetNet expects to support real-time flows, DetNet OAM MUST 390 support soft-reconfiguration, where the novel resources are reserved 391 before the ancient ones are released. Some mechanisms have to be 392 proposed so that packets are forwarded through the novel track only 393 when the resources are ready to be used, while maintaining the global 394 state consistent (no packet reordering, duplication, etc.) 396 6. Requirements 398 This section lists requirements for OAM in DetNet domain with MPLS 399 data plane: 401 1. It MUST be possible to initiate DetNet OAM session from any 402 DetNet node towards another DetNet node(s) within given domain. 404 2. It SHOULD be possible to initialize DetNet OAM session from a 405 centralized controller. 407 3. DetNet OAM MUST support proactive and on-demand OAM monitoring 408 and measurement methods. 410 4. DetNet OAM packets MUST be in-band, i.e., follow precisely the 411 same path as DetNet data plane traffic. 413 5. DetNet OAM MUST support unidirectional OAM methods, continuity 414 check, connectivity verification, and performance measurement. 416 6. DetNet OAM MUST support bi-directional OAM methods. Such OAM 417 methods MAY combine in-band monitoring or measurement in the 418 forward direction and out-of-bound notification in the reverse 419 direction, i.e., from egress to ingress end point of the OAM 420 test session. 422 7. DetNet OAM MUST support proactive monitoring of a DetNet node 423 availability in the given DetNet domain. 425 8. DetNet OAM MUST support Path Maximum Transmission Unit 426 discovery. 428 9. DetNet OAM MUST support Remote Defect Indication (RDI) 429 notification to the DetNet node performing continuity checking. 431 10. DetNet OAM MUST support performance measurement methods. 433 11. DetNet OAM MAY support hybrid performance measurement methods. 435 12. DetNet OAM MUST support unidirectional performance measurement 436 methods. Calculated performance metrics MUST include but are 437 not limited to throughput, packet loss, delay and delay 438 variation metrics. [RFC6374] provides excellent details on 439 performance measurement and performance metrics. 441 13. DetNet OAM MUST support defect notification mechanism, like 442 Alarm Indication Signal. Any DetNet node in the given DetNet 443 domain MAY originate a defect notification addressed to any 444 subset of nodes within the domain. 446 14. DetNet OAM MUST support methods to enable survivability of the 447 DetNet domain. These recovery methods MAY use protection 448 switching and restoration. 450 15. DetNet OAM MUST support the discovery of Packet Replication, 451 Elimination, and Order preservation sub-functions locations in 452 the domain. 454 16. DetNet OAM MUST support testing of Packet Replication, 455 Elimination, and Order preservation sub-functions in the domain. 457 17. DetNet OAM MUST support monitoring any sub-set of paths 458 traversed through the DetNet domain by the DetNet flow. 460 7. IANA Considerations 462 This document has no actionable requirements for IANA. This section 463 can be removed before the publication. 465 8. Security Considerations 467 This document lists the OAM requirements for a DetNet domain and does 468 not raise any security concerns or issues in addition to ones common 469 to networking and those specific to a DetNet discussed in 470 [I-D.ietf-detnet-security]. 472 9. Acknowledgments 474 TBD 476 10. References 478 10.1. Normative References 480 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 481 Requirement Levels", BCP 14, RFC 2119, 482 DOI 10.17487/RFC2119, March 1997, 483 . 485 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 486 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 487 May 2017, . 489 10.2. Informative References 491 [I-D.ietf-detnet-security] 492 Grossman, E., Mizrahi, T., and A. Hacker, "Deterministic 493 Networking (DetNet) Security Considerations", draft-ietf- 494 detnet-security-13 (work in progress), December 2020. 496 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 497 Network Interconnect Devices", RFC 2544, 498 DOI 10.17487/RFC2544, March 1999, 499 . 501 [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, 502 D., and S. Mansfield, "Guidelines for the Use of the "OAM" 503 Acronym in the IETF", BCP 161, RFC 6291, 504 DOI 10.17487/RFC6291, June 2011, 505 . 507 [RFC6374] Frost, D. and S. Bryant, "Packet Loss and Delay 508 Measurement for MPLS Networks", RFC 6374, 509 DOI 10.17487/RFC6374, September 2011, 510 . 512 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 513 Weingarten, "An Overview of Operations, Administration, 514 and Maintenance (OAM) Tools", RFC 7276, 515 DOI 10.17487/RFC7276, June 2014, 516 . 518 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 519 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 520 May 2016, . 522 [RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, 523 L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 524 "Alternate-Marking Method for Passive and Hybrid 525 Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, 526 January 2018, . 528 [RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas, 529 "Deterministic Networking Architecture", RFC 8655, 530 DOI 10.17487/RFC8655, October 2019, 531 . 533 [RFC8939] Varga, B., Ed., Farkas, J., Berger, L., Fedyk, D., and S. 534 Bryant, "Deterministic Networking (DetNet) Data Plane: 535 IP", RFC 8939, DOI 10.17487/RFC8939, November 2020, 536 . 538 Authors' Addresses 540 Greg Mirsky 541 ZTE Corp. 543 Email: gregimirsky@gmail.com 545 Fabrice Theoleyre 546 CNRS 547 300 boulevard Sebastien Brant - CS 10413 548 Illkirch - Strasbourg 67400 549 FRANCE 551 Phone: +33 368 85 45 33 552 Email: theoleyre@unistra.fr 553 URI: http://www.theoleyre.eu 554 Georgios Z. Papadopoulos 555 IMT Atlantique 556 Office B00 - 102A 557 2 Rue de la Chataigneraie 558 Cesson-Sevigne - Rennes 35510 559 FRANCE 561 Phone: +33 299 12 70 04 562 Email: georgios.papadopoulos@imt-atlantique.fr 564 Carlos J. Bernardos 565 Universidad Carlos III de Madrid 566 Av. Universidad, 30 567 Leganes, Madrid 28911 568 Spain 570 Phone: +34 91624 6236 571 Email: cjbc@it.uc3m.es 572 URI: http://www.it.uc3m.es/cjbc/