idnits 2.17.1 draft-mirsky-ippm-epm-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (24 October 2021) is 915 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group G. Mirsky 3 Internet-Draft J. Halpern 4 Intended status: Standards Track Ericsson 5 Expires: 27 April 2022 X. Min 6 ZTE Corp. 7 L. Han 8 China Mobile 9 24 October 2021 11 Error Performance Measurement in Packet-switched Networks 12 draft-mirsky-ippm-epm-04 14 Abstract 16 This document describes the use of the error performance metric to 17 characterize a packet-switched network's conformance to the pre- 18 defined set of performance objectives. In this document, metrics 19 that characterize error performance in a packet-switched network 20 (PSN) are defined, as well as methods to measure and calculate them. 21 Also, the requirements for an active Operation, Administration, and 22 Maintenance protocol to support the error performance measurement in 23 PSN are discussed, and potential candidate protocols are analyzed. 24 All metrics and measurement methods are equally applicable to 25 underlay and overlay networks. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on 27 April 2022. 44 Copyright Notice 46 Copyright (c) 2021 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 51 license-info) in effect on the date of publication of this document. 52 Please review these documents carefully, as they describe your rights 53 and restrictions with respect to this document. Code Components 54 extracted from this document must include Simplified BSD License text 55 as described in Section 4.e of the Trust Legal Provisions and are 56 provided without warranty as described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Conventions used in this document . . . . . . . . . . . . . . 3 62 2.1. Terminology and Acronyms . . . . . . . . . . . . . . . . 3 63 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 4 64 3. Error Performance Metrics . . . . . . . . . . . . . . . . . . 4 65 3.1. Measure Error Performance Metrics . . . . . . . . . . . . 4 66 3.2. Calculate Error Performance Metrics . . . . . . . . . . . 5 67 4. Requirements to EPM . . . . . . . . . . . . . . . . . . . . . 5 68 5. Active OAM Protocol for EPM . . . . . . . . . . . . . . . . . 6 69 6. Availability of Anything-as-a-Service . . . . . . . . . . . . 6 70 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 71 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 72 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 73 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 74 10.1. Normative References . . . . . . . . . . . . . . . . . . 8 75 10.2. Informative References . . . . . . . . . . . . . . . . . 8 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 78 1. Introduction 80 Operations, Administration, and Maintenance (OAM) is a collection of 81 methods to detect, characterize, localize failures in a network, and 82 monitor the network's performance using various measurement methods. 83 Traditionally, the former set of OAM tools identified as Fault 84 Management (FM) OAM. The latter - Performance Monitoring (PM) OAM. 85 Some OAM protocols can be used for both groups of tasks, while some 86 serve one particular group. But regardless of how many OAM protocols 87 are in use, network operators and network users are faced with 88 multiple metrics that characterize the network conditions. This 89 document describes a new component of packet-switched network (PSN) 90 OAM. 92 Error performance measurement (EPM) is a part of an OAM toolset that 93 provides an operator with information related to network measurements 94 for a uni-directional or a bidirectional connection between two 95 systems. In current technology, EPM has been defined only for data 96 communication methods that have a constant bit-rate transmission 98 [ITU.G.826] and not for PSN, where transmissions are statistically 99 random. As a statistically multiplexed network in a PSN, a receiver 100 node does not expect a packet to arrive from a sender node at a 101 specific moment, less from a particular sender. That is what 102 differentiates PSN from networks built on a constant bit-rate 103 transmission, where a stream of bits between two nodes is always 104 present, whether it represents data or not. That provides the 105 receiver with a predictable number of measurements in a series of 106 measurement intervals. In PSN, on-path OAM methods, i.e., 107 measurement methods that use data flow, cannot provide such 108 predictability and thus be used for EPM. In PSN, EPM needs to use 109 active OAM methods, per definition in [RFC7799]. This document 110 identifies metrics that characterize PSN error performance and 111 methods to measure and calculate them. Also, the requirements for an 112 active OAM protocol to support EPM in PSN are discussed, and 113 potential candidate protocols are analyzed. 115 2. Conventions used in this document 117 2.1. Terminology and Acronyms 119 OAM Operations, Administration, and Maintenance 121 EP Error Performance 123 EPM Error Performance Measurement 125 ES Errored Second 127 ESR Errored Second Ratio 129 SES Severely Errored Second 131 SESR Severely Errored Second Ratio 133 EFS Error-Free Second 135 PSN Packet-switched Network 137 FM Fault Management 139 PM Performance Monitoring 141 2.2. Requirements Language 143 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 144 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 145 "OPTIONAL" in this document are to be interpreted as described in BCP 146 14 [RFC2119] [RFC8174] when, and only when, they appear in all 147 capitals, as shown here. 149 3. Error Performance Metrics 151 When analyzing the error performance of a path between two nodes, we 152 need to select a time interval as the unit of EPM. In [ITU.G.826], a 153 time interval of one second is used. It is reasonable to use the 154 same time interval for EPM for PSNs. Further, for the purpose of 155 EPM, each time interval, i.e., second, is classified either as 156 Errored Second (ES), Severely Errored Second (SES), or Error-Free 157 Second (EFS). These are defined as follows: 159 * An ES is a time interval during which at least one of the 160 performance parameters degraded below its optimal level threshold 161 or a defect was detected. 163 * An SES is a time interval during which at least one the 164 performance parameters degraded below its critical threshold or a 165 defect was detected. 167 * Consequently, an EFS is a time interval during which all 168 performance objectives are at or above their respective optimal 169 levels, and no defect has been detected. 171 The definition of a state of a defect in the network is also 172 necessary for understanding the EPM. In this document, the defect is 173 interpreted as the state of inability to communicate between a 174 particular set of nodes. It is important to note that it is being 175 defined as a state, and thus, it has conditions that define entry 176 into it and exit out of it. Also, the state of defect exists only in 177 connection to the particular group of nodes in the network, not the 178 network as a domain. 180 3.1. Measure Error Performance Metrics 182 The definitions of ES, SES, and EFS allow for characterization of the 183 communication between two nodes relative to the level of required and 184 acceptable performance and when performance degrades below the 185 acceptable level. The former condition in this document referred to 186 as network availability. The latter - network unavailability. Based 187 on the definitions, SES is the one-second of network unavailability 188 while ES and EFS present an interval of network availability. But 189 since the conditions of network are everchanging periods of network 190 availability and unavailability need to be defined with duration 191 larger than one-second interval to reduce the number of state changes 192 while correctly reflecting the network condition. The method to 193 determine the state of the network in terms of EPM OAM is described 194 below: 196 * If ten consecutive SES intervals been detected, then the EPM state 197 of the network determined as unavailability and the beginning of 198 that period of unavailability state is at the start of the first 199 SES in the sequence of the consecutive SES intervals. 201 * Similarly, ten consecutive non-SES intervals, i.e., either ES or 202 EFS, indicate that the network is in the availability period, 203 i.e., available. The start of that period is at the beginning of 204 the first non-SES interval. 206 * Resulting from these two definitions, a sequence of less than ten 207 consecutive SES or non-SES intervals does not change the EPM state 208 of the network. For example, if the EPM state is determined as 209 unavailability, a sequence of seven EFS intervals is not viewed as 210 an availability period. 212 3.2. Calculate Error Performance Metrics 214 Determining the period in which the path is currently EP-wise is 215 helpful. But because switching between periods requires ten 216 consecutive one-second intervals, conditions that last shorter 217 intervals may not be adequately reflected. Two additional EP OAM 218 metrics can be used, and they are defined as follows: 220 * errored second ratio (ESR) is the ratio of ES to the total number 221 of seconds in a time of the availability periods during a fixed 222 measurement interval. 224 * severely errored second ratio (SESR) - is the ratio of SES to the 225 total number of seconds in a time of the availability periods 226 during a fixed measurement interval. 228 4. Requirements to EPM 230 TBA 232 5. Active OAM Protocol for EPM 234 Digital communication methods characterized as the constant-bit rate 235 digital paths and connections allow measurement of the error 236 performance without using an active OAM. That is possible because a 237 predictable flow of digital signals is expected at an egress system. 238 That is not the case for packet-switched networks that are based on 239 the principle of statistical multiplexing flows. The latter usually 240 improves the utilization of the communication network's resources, 241 but it also makes the flow unpredictable for the egress system. For 242 that reason, an active OAM has to be used in measuring the error 243 performance in a network. A combination of OAM protocols can provide 244 the necessary for EPM functionality. For example, Bidirectional 245 Forwarding Detection (BFD) [RFC5880] can be used to monitor the 246 continuity of a path between the ingress and egress systems. And 247 STAMP [RFC8762] can be used to measure and calculate performance 248 metrics that are used as Service Level Objectives. But using two 249 protocols and correlating the state of the network from them adds to 250 the complexity in network operation. 252 6. Availability of Anything-as-a-Service 254 Anything as a service (XaaS) describes a general category of services 255 related to cloud computing and remote access. These services include 256 the vast number of products, tools, and technologies that are 257 delivered to users as a service over the Internet. In this document, 258 the availability of XaaS is viewed as the ability to access the 259 service over a period of time with pre-defined performance 260 objectives. Among the advantages of the XaaS model are: 262 * Improving the expense model by purchasing services from providers 263 on a subscription basis rather than buying individual products, 264 e.g., software, hardware, servers, security, infrastructure, and 265 install them on-site, and then link everything together to create 266 networks. 268 * Speeding new apps and business processes by quickly adapting to 269 changing market conditions with new applications or solutions. 271 * Shifting IT resources to specialized higher-value projects that 272 use the core expertise of the company. 274 But XaaS model also has potential challenges: 276 * Possible downtime resulting from issues of internet reliability, 277 resilience, provisioning, and managing the infrastructure 278 resources. 280 * Performance issues caused by depleted resources like bandwidth, 281 computing power, inefficiencies of virtualized environments, 282 ongoing management and security of multi-cloud services. 284 * Complexity impacts enterprise IT team that must remain in the 285 process of the continued learning of the provided services. 287 The framework and metrics of the EPM defined in Section 3 allow a 288 provider of XaaS and their customers to quantify, measure, monitor 289 for conformance what is often referred to as an ephemeral - 290 availability of the service to be delivered. There are other 291 definitions and methods of expressing availability. For example, 292 [HighAvailability-WP] uses the following equation: 294 Availability Average = MTBF/(MTBF + MTRR), 295 where: 296 MTBF (Mean Time Between Failures) - mean time between 297 individual component failures. For example, a hard drive 298 malfunction or hypervisor reboot. 299 MTTR (Mean Time To Repair) - refers to how long it takes to fix 300 the broken component or the application to come back online, 302 While this approach estimates the expected availability of a XaaS, 303 the EPM reflects near-real-time availability of a service as 304 experienced by a user. It also provides valuable data for more 305 accurate and realistic MTBF and MTTR in the particular environment, 306 and simplifies comparison of different solutions that may use 307 redundant servers (web and database), load balancers. 309 In another field of communication, mobile voice and data services, 310 the definition of service availability is understood as "the 311 probability of successful service reception: a given area is declared 312 "in-coverage" if the service in that area is available with a pre- 313 specified minimum rate of success. Service availability has the 314 advantage of being more easily understandable for consumers and is 315 expressed as a percentage of the number of attempts to access a given 316 service." [BEREC-CP]. The definition of the availability used in 317 the EPM throughout this document is close to the quoted above. It 318 might be considered as the extension that allows regulators, 319 operators, and consumers to compare not only the rate of successfully 320 establishing a connection but the quality of the connection during 321 its lifetime. 323 7. IANA Considerations 325 TBA 327 8. Security Considerations 329 TBA 331 9. Acknowledgments 333 TBA 335 10. References 337 10.1. Normative References 339 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 340 Requirement Levels", BCP 14, RFC 2119, 341 DOI 10.17487/RFC2119, March 1997, 342 . 344 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 345 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 346 May 2017, . 348 10.2. Informative References 350 [BEREC-CP] Body of European Regulators for Electronic Communications, 351 "BEREC Common Position on information to consumers on 352 mobile coverage", Common Approaches/Positions BoR (18) 353 237, June 2018, . 358 [HighAvailability-WP] 359 Avi Freedman, Server Central, "High Availability in Cloud 360 and Dedicated Infrastructure", . 364 [ITU.G.826] 365 ITU-T, "End-to-end error performance parameters and 366 objectives for international, constant bit-rate digital 367 paths and connections", ITU-T G.826, December 2002. 369 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 370 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 371 . 373 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 374 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 375 May 2016, . 377 [RFC8762] Mirsky, G., Jun, G., Nydell, H., and R. Foote, "Simple 378 Two-Way Active Measurement Protocol", RFC 8762, 379 DOI 10.17487/RFC8762, March 2020, 380 . 382 Authors' Addresses 384 Greg Mirsky 385 Ericsson 387 Email: gregimirsky@gmail.com 389 Joel Halpern 390 Ericsson 392 Email: joel.halpern@ericsson.com 394 Xiao Min 395 ZTE Corp. 397 Email: xiao.min2@zte.com.cn 399 Liuyan Han 400 China Mobile 401 32 XuanWuMenXi Street 402 Beijing 403 100053 404 China 406 Email: hanliuyan@chinamobile.com