idnits 2.17.1 draft-ietf-ippm-alt-mark-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 25, 2017) is 2346 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-04) exists of draft-fioccola-ippm-multipoint-alt-mark-00 == Outdated reference: A later version (-12) exists of draft-ietf-bier-mpls-encapsulation-10 == Outdated reference: A later version (-15) exists of draft-ietf-bier-pmmm-oam-03 == Outdated reference: A later version (-07) exists of draft-ietf-mpls-flow-ident-05 == Outdated reference: A later version (-10) exists of draft-ietf-mpls-rfc6374-sfl-00 == Outdated reference: A later version (-12) exists of draft-ietf-nvo3-encap-00 Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group G. Fioccola, Ed. 3 Internet-Draft A. Capello 4 Intended status: Experimental M. Cociglio 5 Expires: April 28, 2018 L. Castaldelli 6 Telecom Italia 7 M. Chen 8 L. Zheng 9 Huawei Technologies 10 G. Mirsky 11 ZTE 12 T. Mizrahi 13 Marvell 14 October 25, 2017 16 Alternate Marking method for passive and hybrid performance monitoring 17 draft-ietf-ippm-alt-mark-13 19 Abstract 21 This document describes a method to perform packet loss, delay and 22 jitter measurements on live traffic. This method is based on 23 Alternate Marking (Coloring) technique. A report is provided in 24 order to explain an example and show the method applicability. This 25 technique can be applied in various situations as detailed in this 26 document and could be considered passive or hybrid depending on the 27 application. 29 Requirements Language 31 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 32 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 33 "OPTIONAL" in this document are to be interpreted as described in BCP 34 14 [RFC2119] [RFC8174] when, and only when, they appear in all 35 capitals, as shown here. 37 Status of This Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at https://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on April 28, 2018. 54 Copyright Notice 56 Copyright (c) 2017 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (https://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 72 2. Overview of the method . . . . . . . . . . . . . . . . . . . 4 73 3. Detailed description of the method . . . . . . . . . . . . . 6 74 3.1. Packet loss measurement . . . . . . . . . . . . . . . . . 6 75 3.1.1. Coloring the packets . . . . . . . . . . . . . . . . 11 76 3.1.2. Counting the packets . . . . . . . . . . . . . . . . 11 77 3.1.3. Collecting data and calculating packet loss . . . . . 12 78 3.2. Timing aspects . . . . . . . . . . . . . . . . . . . . . 13 79 3.3. One-way delay measurement . . . . . . . . . . . . . . . . 14 80 3.3.1. Single marking methodology . . . . . . . . . . . . . 14 81 3.3.2. Double marking methodology . . . . . . . . . . . . . 16 82 3.4. Delay variation measurement . . . . . . . . . . . . . . . 17 83 4. Considerations . . . . . . . . . . . . . . . . . . . . . . . 18 84 4.1. Synchronization . . . . . . . . . . . . . . . . . . . . . 18 85 4.2. Data Correlation . . . . . . . . . . . . . . . . . . . . 19 86 4.3. Packet Re-ordering . . . . . . . . . . . . . . . . . . . 20 87 5. Implementation and deployment . . . . . . . . . . . . . . . . 20 88 5.1. Report on the operational experiment at Telecom Italia . 21 89 5.1.1. Metric transparency . . . . . . . . . . . . . . . . . 22 90 5.2. IP flow performance measurement (IPFPM) . . . . . . . . . 23 91 5.3. OAM Passive Performance Measurement . . . . . . . . . . . 23 92 5.4. RFC6374 Use Case . . . . . . . . . . . . . . . . . . . . 23 93 5.5. Application to active performance measurement . . . . . . 24 94 6. Hybrid measurement . . . . . . . . . . . . . . . . . . . . . 24 95 7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 96 8. Compliance with RFC6390 guidelines . . . . . . . . . . . . . 25 97 9. Security Considerations . . . . . . . . . . . . . . . . . . . 27 98 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 99 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 28 100 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 101 12.1. Normative References . . . . . . . . . . . . . . . . . . 28 102 12.2. Informative References . . . . . . . . . . . . . . . . . 29 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 32 105 1. Introduction 107 Nowadays, most Service Providers' networks carry traffic with 108 contents that are highly sensitive to packet loss [RFC7680], delay 109 [RFC7679], and jitter [RFC3393]. 111 In view of this scenario, Service Providers need methodologies and 112 tools to monitor and measure network performances with an adequate 113 accuracy, in order to constantly control the quality of experience 114 perceived by their customers. On the other hand, performance 115 monitoring provides useful information for improving network 116 management (e.g. isolation of network problems, troubleshooting, 117 etc.). 119 A lot of work related to OAM, that includes also performance 120 monitoring techniques, has been done by Standards Developing 121 Organizations(SDOs): [RFC7276] provides a good overview of existing 122 OAM mechanisms defined in IETF, ITU-T and IEEE. Considering IETF, a 123 lot of work has been done on fault detection and connectivity 124 verification, while a minor effort has been dedicated so far to 125 performance monitoring. The IPPM WG has defined standard metrics to 126 measure network performance; however, the methods developed in this 127 WG mainly refer to focus on active measurement techniques. More 128 recently, the MPLS WG has defined mechanisms for measuring packet 129 loss, one-way and two-way delay, and delay variation in MPLS 130 networks[RFC6374], but their applicability to passive measurements 131 has some limitations, especially for pure connection-less networks. 133 The lack of adequate tools to measure packet loss with the desired 134 accuracy drove an effort to design a new method for the performance 135 monitoring of live traffic, possibly easy to implement and deploy. 136 The effort led to the method described in this document: basically, 137 it is a passive performance monitoring technique, potentially 138 applicable to any kind of packet based traffic, including Ethernet, 139 IP, and MPLS, both unicast and multicast. The method addresses 140 primarily packet loss measurement, but it can be easily extended to 141 one-way delay and delay variation measurements as well. 143 The method has been explicitly designed for passive measurements but 144 it can also be used with active probes. Passive measurements are 145 usually more easily understood by customers and provide a much better 146 accuracy, especially for packet loss measurements. 148 RFC 7799 [RFC7799] defines passive and hybrid methods of measurement. 149 In particular, Passive Methods of Measurement are based solely on 150 observations of an undisturbed and unmodified packet stream of 151 interest; Hybrid Methods are Methods of Measurement that use a 152 combination of Active Methods and Passive Methods. 154 Taking into consideration these definitions, Alternate Marking Method 155 could be considered Hybrid or Passive depending on the case. In case 156 the marking field is obtained by changing existing field values of 157 the packets (e.g. DSCP field), the technique is Hybrid. In case the 158 marking field is dedicated, reserved and is included in the protocol 159 specification Alternate Marking technique can be considered as 160 Passive (e.g. RFC6374 Synonymous Flow Label or OAM Marking Bits in 161 BIER Header). 163 This document is organized as follows: 165 o Section 2 gives an overview of the method, including a comparison 166 with different measurement strategies; 168 o Section 3 describes the method in detail; 170 o Section 4 reports considerations about synchronization, data 171 correlation and packet re-ordering; 173 o Section 5 reports examples of implementation and deployment of the 174 method. Furthermore the operational experiment done at Telecom 175 Italia is described; 177 o Section 6 introduces Hybrid measurement aspects; 179 o Section 7 finally summarizes some concluding remarks. 181 o Section 8 is about the compliance with RFC6390 guidelines; 183 o Section 9 includes some security aspects; 185 2. Overview of the method 187 In order to perform packet loss measurements on a live traffic flow, 188 different approaches exist. The most intuitive one consists in 189 numbering the packets, so that each router that receives the flow can 190 immediately detect a packet missing. This approach, though very 191 simple in theory, is not simple to achieve: it requires the insertion 192 of a sequence number into each packet and the devices must be able to 193 extract the number and check it in real time. Such a task can be 194 difficult to implement on live traffic: if UDP is used as the 195 transport protocol, the sequence number is not available; on the 196 other hand, if a higher layer sequence number (e.g. in the RTP 197 header) is used, extracting that information from each packet and 198 process it in real time could overload the device. 200 An alternate approach is to count the number of packets sent on one 201 end, the number of packets received on the other end, and to compare 202 the two values. This operation is much simpler to implement, but 203 requires that the devices performing the measurement are in sync: in 204 order to compare two counters it is required that they refer exactly 205 to the same set of packets. Since a flow is continuous and cannot be 206 stopped when a counter has to be read, it could be difficult to 207 determine exactly when to read the counter. A possible solution to 208 overcome this problem is to virtually split the flow in consecutive 209 blocks by inserting periodically a delimiter so that each counter 210 refers exactly to the same block of packets. The delimiter could be 211 for example a special packet inserted artificially into the flow. 212 However, delimiting the flow using specific packets has some 213 limitations. First, it requires generating additional packets within 214 the flow and requires the equipment to be able to process those 215 packets. In addition, the method is vulnerable to out of order 216 reception of delimiting packets and, to a lesser extent, to their 217 loss. 219 The method proposed in this document follows the second approach, but 220 it doesn't use additional packets to virtually split the flow in 221 blocks. Instead, it "marks" the packets so that the packets 222 belonging to the same block will have the same color, whilst 223 consecutive blocks will have different colors. Each change of color 224 represents a sort of auto-synchronization signal that guarantees the 225 consistency of measurements taken by different devices along the path 226 (see also [I-D.cociglio-mboned-multicast-pm] and 227 [I-D.tempia-opsawg-p3m], where this technique was introduced). 229 Figure 1 represents a very simple network and shows how the method 230 can be used to measure packet loss on different network segments: by 231 enabling the measurement on several interfaces along the path, it is 232 possible to perform link monitoring, node monitoring or end-to-end 233 monitoring. The method is flexible enough to measure packet loss on 234 any segment of the network and can be used to isolate the faulty 235 element. 237 Traffic flow 238 ========================================================> 239 +------+ +------+ +------+ +------+ 240 ---<> R1 <>-----<> R2 <>-----<> R3 <>-----<> R4 <>--- 241 +------+ +------+ +------+ +------+ 242 . . . . . . 243 . . . . . . 244 . <------> <-------> . 245 . Node Packet Loss Link Packet Loss . 246 . . 247 <---------------------------------------------------> 248 End-to-End Packet loss 250 Figure 1: Available measurements 252 3. Detailed description of the method 254 This section describes in detail how the method operate. A special 255 emphasis is given to the measurement of packet loss, that represents 256 the core application of the method, but applicability to delay and 257 jitter measurements is also considered. 259 3.1. Packet loss measurement 261 The basic idea is to virtually split traffic flows into consecutive 262 blocks: each block represents a measurable entity unambiguously 263 recognizable by all network devices along the path. By counting the 264 number of packets in each block and comparing the values measured by 265 different network devices along the path, it is possible to measure 266 packet loss occurred in any single block between any two points. 268 As discussed in the previous section, a simple way to create the 269 blocks is to "color" the traffic (two colors are sufficient) so that 270 packets belonging to different consecutive blocks will have different 271 colors. Whenever the color changes, the previous block terminates 272 and the new one begins. Hence, all the packets belonging to the same 273 block will have the same color and packets of different consecutive 274 blocks will have different colors. The number of packets in each 275 block depends on the criterion used to create the blocks: 277 o if the color is switched after a fixed number of packets, then 278 each block will contain the same number of packets (except for any 279 losses); 281 o if the color is switched according to a fixed timer, then the 282 number of packets may be different in each block depending on the 283 packet rate. 285 The following figure shows how a flow looks like when it is split in 286 traffic blocks with colored packets. 288 A: packet with A coloring 289 B: packet with B coloring 291 | | | | | 292 | | Traffic flow | | 293 -------------------------------------------------------------------> 294 BBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA 295 -------------------------------------------------------------------> 296 ... | Block 5 | Block 4 | Block 3 | Block 2 | Block 1 297 | | | | | 299 Figure 2: Traffic coloring 301 Figure 3 shows how the method can be used to measure link packet loss 302 between two adjacent nodes. 304 Referring to the figure, let's assume we want to monitor the packet 305 loss on the link between two routers: router R1 and router R2. 306 According to the method, the traffic is colored alternatively with 307 two different colors, A and B. Whenever the color changes, the 308 transition generates a sort of square-wave signal, as depicted in the 309 following figure. 311 Color A ----------+ +-----------+ +---------- 312 | | | | 313 Color B +-----------+ +-----------+ 314 Block n ... Block 3 Block 2 Block 1 315 <---------> <---------> <---------> <---------> <---------> 317 Traffic flow 318 ===========================================================> 319 Color ...AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA... 320 ===========================================================> 322 Figure 3: Computation of link packet loss 324 Traffic coloring could be done by R1 itself or by an upward router. 325 R1 needs two counters, C(A)R1 and C(B)R1, on its egress interface: 326 C(A)R1 counts the packets with color A and C(B)R1 counts those with 327 color B. As long as traffic is colored A, only counter C(A)R1 will 328 be incremented, while C(B)R1 is not incremented; vice versa, when the 329 traffic is colored as B, only C(B)R1 is incremented. C(A)R1 and 330 C(B)R1 can be used as reference values to determine the packet loss 331 from R1 to any other measurement point down the path. Router R2, 332 similarly, will need two counters on its ingress interface, C(A)R2 333 and C(B)R2, to count the packets received on that interface and 334 colored with color A and B respectively. When an A block ends, it is 335 possible to compare C(A)R1 and C(A)R2 and calculate the packet loss 336 within the block; similarly, when the successive B block terminates, 337 it is possible to compare C(B)R1 with C(B)R2, and so on for every 338 successive block. 340 Likewise, by using two counters on R2 egress interface it is possible 341 to count the packets sent out of R2 interface and use them as 342 reference values to calculate the packet loss from R2 to any 343 measurement point down R2. 345 Using a fixed timer for color switching offers a better control over 346 the method: the (time) length of the blocks can be chosen large 347 enough to simplify the collection and the comparison of measures 348 taken by different network devices. It's preferable to read the 349 value of the counters not immediately after the color switch: some 350 packets could arrive out of order and increment the counter 351 associated to the previous block (color), so it is worth waiting for 352 some time. A safe choice is to wait L/2 time units (where L is the 353 duration for each block) after the color switch, to read the still 354 counter of the previous color, so the possibility to read a running 355 counter instead of a still one is minimized. The drawback is that 356 the longer the duration of the block, the less frequent the 357 measurement can be taken. 359 The following table shows how the counters can be used to calculate 360 the packet loss between R1 and R2. The first column lists the 361 sequence of traffic blocks while the other columns contain the 362 counters of A-colored packets and B-colored packets for R1 and R2. 363 In this example, we assume that the values of the counters are reset 364 to zero whenever a block ends and its associated counter has been 365 read: with this assumption, the table shows only relative values, 366 that is the exact number of packets of each color within each block. 367 If the values of the counters were not reset, the table would contain 368 cumulative values, but the relative values could be determined simply 369 by difference from the value of the previous block of the same color. 371 The color is switched on the basis of a fixed timer (not shown in the 372 table), so the number of packets in each block is different. 374 +-------+--------+--------+--------+--------+------+ 375 | Block | C(A)R1 | C(B)R1 | C(A)R2 | C(B)R2 | Loss | 376 +-------+--------+--------+--------+--------+------+ 377 | 1 | 375 | 0 | 375 | 0 | 0 | 378 | | | | | | | 379 | 2 | 0 | 388 | 0 | 388 | 0 | 380 | | | | | | | 381 | 3 | 382 | 0 | 381 | 0 | 1 | 382 | | | | | | | 383 | 4 | 0 | 377 | 0 | 374 | 3 | 384 | | | | | | | 385 | ... | ... | ... | ... | ... | ... | 386 | | | | | | | 387 | 2n | 0 | 387 | 0 | 387 | 0 | 388 | | | | | | | 389 | 2n+1 | 379 | 0 | 377 | 0 | 2 | 390 +-------+--------+--------+--------+--------+------+ 392 Table 1: Evaluation of counters for packet loss measurements 394 During an A block (blocks 1, 3 and 2n+1), all the packets are 395 A-colored, therefore the C(A) counters are incremented to the number 396 seen on the interface, while C(B) counters are zero. Vice versa, 397 during a B block (blocks 2, 4 and 2n), all the packets are B-colored: 398 C(A) counters are zero, while C(B) counters are incremented. 400 When a block ends (because of color switching) the relative counters 401 stop incrementing and it is possible to read them, compare the values 402 measured on router R1 and R2 and calculate the packet loss within 403 that block. 405 For example, looking at the table above, during the first block 406 (A-colored), C(A)R1 and C(A)R2 have the same value (375), which 407 corresponds to the exact number of packets of the first block (no 408 loss). Also during the second block (B-colored) R1 and R2 counters 409 have the same value (388), which corresponds to the number of packets 410 of the second block (no loss). During blocks three and four, R1 and 411 R2 counters are different, meaning that some packets have been lost: 412 in the example, one single packet (382-381) was lost during block 413 three and three packets (377-374) were lost during block four. 415 The method applied to R1 and R2 can be extended to any other router 416 and applied to more complex networks, as far as the measurement is 417 enabled on the path followed by the traffic flow(s) being observed. 419 It's worth mentioning two different strategies that can be used when 420 implementing the method: 422 o flow-based: the flow-based strategy is used when only a limited 423 number of traffic flows need to be monitored. According to this 424 strategy, only a subset of the flows is colored. Counters for 425 packet loss measurements can be instantiated for each single flow, 426 or for the set as a whole, depending on the desired granularity. 427 A relevant problem with this approach is the necessity to know in 428 advance the path followed by flows that are subject to 429 measurement. Path rerouting and traffic load-balancing increase 430 the issue complexity, especially for unicast traffic. The problem 431 is easier to solve for multicast traffic where load balancing is 432 seldom used and static joins are frequently used to force traffic 433 forwarding and replication. 435 o link-based: measurements are performed on all the traffic on a 436 link by link basis. The link could be a physical link or a 437 logical link. Counters could be instantiated for the traffic as a 438 whole or for each traffic class (in case it is desired to monitor 439 each class separately), but in the second case a couple of 440 counters is needed for each class. 442 As mentioned, the flow-based measurement requires the identification 443 of the flow to be monitored and the discovery of the path followed by 444 the selected flow. It is possible to monitor a single flow or 445 multiple flows grouped together, but in this case measurement is 446 consistent only if all the flows in the group follow the same path. 447 Moreover if a measurement is performed by grouping many flows, it is 448 not possible to determine exactly which flow was affected by packets 449 loss. In order to have measures per single flow it is necessary to 450 configure counters for each specific flow. Once the flow(s) to be 451 monitored have been identified, it is necessary to configure the 452 monitoring on the proper nodes. Configuring the monitoring means 453 configuring the rule to intercept the traffic and configuring the 454 counters to count the packets. To have just an end-to-end 455 monitoring, it is sufficient to enable the monitoring on the first 456 and the last hop routers of the path: the mechanism is completely 457 transparent to intermediate nodes and independent from the path 458 followed by traffic flows. On the contrary, to monitor the flow on a 459 hop-by-hop basis along its whole path it is necessary to enable the 460 monitoring on every node from the source to the destination. In case 461 the exact path followed by the flow is not known a priori (i.e. the 462 flow has multiple paths to reach the destination) it is necessary to 463 enable the monitoring system on every path: counters on interfaces 464 traversed by the flow will report packet count, counters on other 465 interfaces will be null. 467 3.1.1. Coloring the packets 469 The coloring operation is fundamental in order to create packet 470 blocks. This implies choosing where to activate the coloring and how 471 to color the packets. 473 In case of flow-based measurements, it is desirable, in general, to 474 have a single coloring node because it is easier to manage and 475 doesn't rise any risk of conflict (consider the case where multiple 476 nodes color the same flow). Thus it is advantageous to color the 477 flow as close as possible to the source. In addition, coloring a 478 flow close to the source allows an end-to-end measure if a 479 measurement point is enabled on the last-hop router as well. 480 Coloring in multiple nodes is also possible and the requirement is 481 that the coloring must change periodically according to the timing 482 considerations in Section 3.2; so every node, that is designated as a 483 measurement point along the path, should be able to identify 484 unambiguously the colored packets. Furthermore 485 [I-D.fioccola-ippm-multipoint-alt-mark] generalizes the coloring for 486 multipoint to multipoint flow. 488 For link-based measurements, all traffic needs to be colored when 489 transmitted on the link. If the traffic had already been colored, 490 then it has to be re-colored because the color must be consistent on 491 the link. This means that each hop along the path must (re-)color 492 the traffic; the color is not required to be consistent along 493 different links. 495 Traffic coloring can be implemented by setting a specific bit in the 496 packet header and changing the value of that bit periodically. How 497 to choose the marking field depends on the application and is out of 498 scope here. However some examples are reported in Section 5. 500 3.1.2. Counting the packets 502 For flow-based measurements, assuming that the coloring of the 503 packets is performed only by the source node, the nodes between 504 source and destination (included) have to count the colored packets 505 that they receive and forward: this operation can be enabled on every 506 router along the path or only on a subset, depending on which network 507 segment is being monitored (a single link, a particular metro area, 508 the backbone, the whole path). Furthermore 509 [I-D.fioccola-ippm-multipoint-alt-mark] generalizes the counting for 510 multipoint to multipoint flow. 512 Since the color switches periodically between two values, two 513 counters (one for each value) are needed: one counter for packets 514 with color A and one counter for packets with color B. For each flow 515 (or group of flows) being monitored and for every interface where the 516 monitoring is active, a couple of counters is needed. For example, 517 in order to monitor separately 3 flows on a router with 4 interfaces 518 involved, 24 counters are needed (2 counters for each of the 3 flows 519 on each of the 4 interfaces). 521 In case of link-based measurements the behaviour is similar except 522 that coloring and counting operations are performed on a link by link 523 basis at each endpoint of the link. 525 Another important aspect to take into consideration is when to read 526 the counters: in order to count the exact number of packets of a 527 block the routers must perform this operation when that block has 528 ended: in other words, the counter for color A must be read when the 529 current block has color B, in order to be sure that the value of the 530 counter is stable. This task can be accomplished in two ways. The 531 general approach suggests to read the counters periodically, many 532 times during a block duration, and to compare these successive 533 readings: when the counter stops incrementing means that the current 534 block has ended and its value can be elaborated safely. 535 Alternatively, if the coloring operation is performed on the basis of 536 a fixed timer, it is possible to configure the reading of the 537 counters according to that timer: for example, reading the counter 538 for color A every period in the middle of the subsequent block with 539 color B is a safe choice. A sufficient margin should be considered 540 between the end of a block and the reading of the counter, in order 541 to take into account any out-of-order packets. 543 3.1.3. Collecting data and calculating packet loss 545 The nodes enabled to perform performance monitoring collect the value 546 of the counters, but they are not able to directly use this 547 information to measure packet loss, because they only have their own 548 samples. For this reason, an external Network Management System 549 (NMS) can be used to collect and elaborate data and to perform packet 550 loss calculation. The NMS compares the values of counters from 551 different nodes and can calculate if some packets were lost (even a 552 single packet) and also where packets were lost. 554 The value of the counters needs to be transmitted to the NMS as soon 555 as it has been read. This can be accomplished by using SNMP or FTP 556 and can be done in Push Mode or Polling Mode. In the first case, 557 each router periodically sends the information to the NMS, in the 558 latter case it is the NMS that periodically polls routers to collect 559 information. In any case, the NMS has to collect all the relevant 560 values from all the routers within one cycle of the timer. 562 it would be also possible to use a protocol to exchange values of 563 counters between the two endpoints in order to let them perform the 564 packet loss calculation for each traffic direction. 566 A possible approach for the performance measurement architecture is 567 explained in [I-D.chen-ippm-coloring-based-ipfpm-framework], while 568 [I-D.chen-ippm-ipfpm-report] introduces new information elements of 569 IPFIX (RFC 7011 [RFC7011]). 571 3.2. Timing aspects 573 This document introduces two color switching method: one is based on 574 fixed number of packet, the other is based on fixed timer. But the 575 method based on fixed timer is preferable because is more 576 deterministic, and will be considered in the rest of the dcoument. 578 By considering the clock error between network devices R1 and R2, 579 they must be synchronized to the same clock reference with an 580 accuracy of +/- L/2 time units, where L is the time duration of the 581 block. So each colored packet can be assigned to the right batch by 582 each router. This is because the minimum time distance between two 583 packets of the same color but belonging to different batches is L 584 time units. 586 In practice, there are also out of order at batch boundaries, 587 strictly related to the delay between measurement points. This means 588 that, without considering clock error, we wait L/2 after color 589 switching to be sure to take a still counter. 591 In summary we need to take into account two contributions: clock 592 error between network devices and the interval we need to wait to 593 avoid out of order because of network delay. 595 The following figure explains both issues. 597 ...BBBBBBBBB | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | BBBBBBBBB... 598 |<======================================>| 599 | L | 600 ...=========>|<==================><==================>|<==========... 601 | L/2 L/2 | 602 |<===>| |<===>| 603 d | | d 604 |<==========================>| 605 available counting interval 607 Figure 4: Timing aspects 609 It is assumed that all network devices are synchronized to a common 610 reference time with an accuracy of +/- A/2. Thus, the difference 611 between the clock values of any two network devices is bounded by A. 613 The guardband d is given by: 615 d = A + D_max - D_min, 617 where A is the clock accuracy, D_max is an upper bound on the network 618 delay between the network devices, and D_min is a lower bound on the 619 delay. 621 The available counting interval is L - 2d that must be > 0. 623 The condition that must be satisfied and is a requirement on the 624 synchronization accuracy is: 626 d < L/2. 628 3.3. One-way delay measurement 630 The same principle used to measure packet loss can be applied also to 631 one-way delay measurement. There are three alternatives, as 632 described hereinafter. 634 3.3.1. Single marking methodology 636 The alternation of colors can be used as a time reference to 637 calculate the delay. Whenever the color changes (that means that a 638 new block has started) a network device can store the timestamp of 639 the first packet of the new block; that timestamp can be compared 640 with the timestamp of the same packet on a second router to compute 641 packet delay. Considering Figure 2, R1 stores a timestamp TS(A1)R1 642 when it sends the first packet of block 1 (A-colored), a timestamp 643 TS(B2)R1 when it sends the first packet of block 2 (B-colored) and so 644 on for every other block. R2 performs the same operation on the 645 receiving side, recording TS(A1)R2, TS(B2)R2 and so on. Since the 646 timestamps refer to specific packets (the first packet of each block) 647 we are sure that timestamps compared to compute delay refer to the 648 same packets. By comparing TS(A1)R1 with TS(A1)R2 (and similarly 649 TS(B2)R1 with TS(B2)R2 and so on) it is possible to measure the delay 650 between R1 and R2. In order to have more measurements, it is 651 possible to take and store more timestamps, referring to other 652 packets within each block. 654 In order to coherently compare timestamps collected on different 655 routers, the clocks on the network nodes must be in sync. 656 Furthermore, a measurement is valid only if no packet loss occurs and 657 if packet misordering can be avoided, otherwise the first packet of a 658 block on R1 could be different from the first packet of the same 659 block on R2 (f.i. if that packet is lost between R1 and R2 or it 660 arrives after the next one). 662 The following table shows how timestamps can be used to calculate the 663 delay between R1 and R2. The first column lists the sequence of 664 blocks while other columns contain the timestamp referring to the 665 first packet of each block on R1 and R2. The delay is computed as a 666 difference between timestamps. For the sake of simplicity, all the 667 values are expressed in milliseconds. 669 +-------+---------+---------+---------+---------+-------------+ 670 | Block | TS(A)R1 | TS(B)R1 | TS(A)R2 | TS(B)R2 | Delay R1-R2 | 671 +-------+---------+---------+---------+---------+-------------+ 672 | 1 | 12.483 | - | 15.591 | - | 3.108 | 673 | | | | | | | 674 | 2 | - | 6.263 | - | 9.288 | 3.025 | 675 | | | | | | | 676 | 3 | 27.556 | - | 30.512 | - | 2.956 | 677 | | | | | | | 678 | | - | 18.113 | - | 21.269 | 3.156 | 679 | | | | | | | 680 | ... | ... | ... | ... | ... | ... | 681 | | | | | | | 682 | 2n | 77.463 | - | 80.501 | - | 3.038 | 683 | | | | | | | 684 | 2n+1 | - | 24.333 | - | 27.433 | 3.100 | 685 +-------+---------+---------+---------+---------+-------------+ 687 Table 2: Evaluation of timestamps for delay measurements 689 The first row shows timestamps taken on R1 and R2 respectively and 690 referring to the first packet of block 1 (which is A-colored). Delay 691 can be computed as a difference between the timestamp on R2 and the 692 timestamp on R1. Similarly, the second row shows timestamps (in 693 milliseconds) taken on R1 and R2 and referring to the first packet of 694 block 2 (which is B-colored). Comparing timestamps taken on 695 different nodes in the network and referring to the same packets 696 (identified using the alternation of colors) it is possible to 697 measure delay on different network segments. 699 For the sake of simplicity, in the above example a single measurement 700 is provided within a block, taking into account only the first packet 701 of each block. The number of measurements can be easily increased by 702 considering multiple packets in the block: for instance, a timestamp 703 could be taken every N packets, thus generating multiple delay 704 measurements. Taking this to the limit, in principle the delay could 705 be measured for each packet, by taking and comparing the 706 corresponding timestamps (possible but impractical from an 707 implementation point of view). 709 3.3.1.1. Mean delay 711 As mentioned before, the method previously exposed for measuring the 712 delay is sensitive to out of order reception of packets. In order to 713 overcome this problem, a different approach has been considered: it 714 is based on the concept of mean delay. The mean delay is calculated 715 by considering the average arrival time of the packets within a 716 single block. The network device locally stores a timestamp for each 717 packet received within a single block: summing all the timestamps and 718 dividing by the total number of packets received, the average arrival 719 time for that block of packets can be calculated. By subtracting the 720 average arrival times of two adjacent devices it is possible to 721 calculate the mean delay between those nodes. When computing the 722 mean delay, measurement error could be augmented by accumulating 723 measurement error of a lot of packets. This method is robust to out 724 of order packets and also to packet loss (only a small error is 725 introduced). Moreover, it greatly reduces the number of timestamps 726 (only one per block for each network device) that have to be 727 collected by the management system. On the other hand, it only gives 728 one measure for the duration of the block (f.i. 5 minutes), and it 729 doesn't give the minimum, maximum and median delay values (RFC 6703 730 [RFC6703]). This limitation could be overcome by reducing the 731 duration of the block (f.i. from 5 minutes to a few seconds), that 732 implicates an highly optimized implementation of the method. 734 By summing the mean delays of the two directions of a path, it is 735 also possible to measure the two-way mean delay (round-trip delay). 737 3.3.2. Double marking methodology 739 The Single marking methodology for one-way delay measurement is 740 sensitive to out of order reception of packets. The first approach 741 to overcome this problem is described before and is based on the 742 concept of mean delay. But the limitation of mean delay is that it 743 doesn't give information about the delay values distribution for the 744 duration of the block. Additionally it may be useful to have not 745 only the mean delay but also the minimum, maximum and median delay 746 values and, in wider terms, to know more about the statistic 747 distribution of delay values. So in order to have more information 748 about the delay and to overcome out of order issues, a different 749 approach can be introduced: it is based on double marking 750 methodology. 752 Basically, the idea is to use the first marking to create the 753 alternate flow and, within this colored flow, a second marking to 754 select the packets for measuring delay/jitter. The first marking is 755 needed for packet loss and mean delay measurement. The second 756 marking creates a new set of marked packets that are fully identified 757 over the network, so that a network device can store the timestamps 758 of these packets; these timestamps can be compared with the 759 timestamps of the same packets on a second router to compute packet 760 delay values for each packet. The number of measurements can be 761 easily increased by changing the frequency of the second marking. 762 But the frequency of the second marking must be not too high in order 763 to avoid out of order issues. Between packets with the second 764 marking there should be a security time gap (e.g. this gap could be, 765 at the minimum, the mean network delay calculated with the previous 766 methodology) to avoid out of order issues and also to have a number 767 of measurement packets that is rate independent. If a second marking 768 packet is lost, the delay measurement for the considered block is 769 corrupted and should be discarded. 771 Mean delay is calculated on all the packets of a sample and is a 772 simple computation to be performed for single marking method. In 773 some cases the mean delay measure is not sufficient to characterize 774 the sample, and more statistics of delay extent data are needed, e.g. 775 percentiles, variance and median delay values. The conventional 776 range (maximum-minimum) should be avoided for several reasons, 777 including stability of the maximum delay due to the influence by 778 outliers. RFC 5481 [RFC5481] Section 6.5 highlights how the 99.9th 779 percentile of delay and delay variation is more helpful to 780 performance planners. To overcome this drawback the idea is to 781 couple the mean delay measure for the entire batch with double 782 marking method, where a subset of batch packets are selected for 783 extensive delay calculation by using a second marking. In this way 784 it is possible to perform a detailed analysis on these double marked 785 packets. Please note that there are classic algorithms for median 786 and variance calculation, but are out of the scope of this document. 787 The comparison between the mean delay for the entire batch and the 788 mean delay on these double marked packets gives an useful information 789 since it is possible to understand if the double marking measurements 790 are actually representative of the delay trends. 792 3.4. Delay variation measurement 794 Similarly to one-way delay measurement (both for single marking and 795 double marking), the method can also be used to measure the inter- 796 arrival jitter. We refer to the definition in RFC 3393 [RFC3393]. 797 The alternation of colors, for single marking method, can be used as 798 a time reference to measure delay variations. In case of double 799 marking, the time reference is given by the second marked packets. 801 Considering the example depicted in Figure 2, R1 stores a timestamp 802 TS(A)R1 whenever it sends the first packet of a block and R2 stores a 803 timestamp TS(B)R2 whenever it receives the first packet of a block. 804 The inter-arrival jitter can be easily derived from one-way delay 805 measurement, by evaluating the delay variation of consecutive 806 samples. 808 The concept of mean delay can also be applied to delay variation, by 809 evaluating the average variation of the interval between consecutive 810 packets of the flow from R1 to R2. 812 4. Considerations 814 This section highlights some considerations about the methodology. 816 4.1. Synchronization 818 The Alternate Marking technique does not require a strong 819 synchronization, especially for packet loss and two-way delay 820 measurement. Only one-way delay measurement requires network devices 821 to have synchronized clocks. 823 The color switching is the reference for all the network devices, and 824 the only requirement to be achieved is that all network devices have 825 to recognize the right batch along the path. 827 If the length of the measurement period is L time units, then all 828 network devices must be synchronized to the same clock reference with 829 an accuracy of +/- L/2 time units (without considering network 830 delay). This level of accuracy guarantees that all network devices 831 consistently match the color bit to the correct block. For example, 832 if the color is toggeled every second (L = 1 second), then clocks 833 must be synchronized with an accuracy of +/- 0.5 second to a common 834 time reference. 836 This synchronization requirement can be satisfied even with a 837 relatively inaccurate synchronization method. This is true for 838 packet loss and two-way delay measurement, instead, for one-way delay 839 measurement clock synchronization must be accurate. 841 Therefore, a system that uses only packet loss and two-way delay 842 measurement does not require synchronization. This is because the 843 value of the clocks of network devices does not affect the 844 computation of the two-way delay measurement. 846 4.2. Data Correlation 848 Data Correlation is the mechanism to compare counters and timestamps 849 for packet loss, delay and delay variation calculation. It could be 850 performed in several ways depending on the alternate marking 851 application and use case. 853 o A possibility is to use a centralized solution using Network 854 Management System (NMS) to correlate data; 856 o Another possibility is to define a protocol based distributed 857 solution, by defining a new protocol or by extending the existing 858 protocols (e.g. RFC6374, TWAMP, OWAMP) in order to communicate 859 the counters and timestamps between nodes. 861 In the following paragraphs an example data correlation mechanism is 862 explained and could be use independently of the adopted solutions. 864 When data is collected on the upstream and downstream node, e.g., 865 packet counts for packet loss measurement or timestamps for packet 866 delay measurement, and periodically reported to or pulled by other 867 nodes or NMS, a certain data correlation mechanism SHOULD be in use 868 to help the nodes or NMS to tell whether any two or more packet 869 counts are related to the same block of markers, or any two 870 timestamps are related to the same marked packet. 872 The alternate marking method described in this document literally 873 split the packets of the measured flow into different measurement 874 blocks, in addition a Block Number could be assigned to each of such 875 measurement block. The BN is generated each time a node reads the 876 data (packet counts or timestamps), and is associated with each 877 packet count and timestamp reported to or pulled by other nodes or 878 NMS. The value of BN could be calculated as the modulo of the local 879 time (when the data are read) and the interval of the marking time 880 period. 882 When the nodes or NMS see, for example, same BNs associated with two 883 packet counts from an upstream and a downstream node respectively, it 884 considers that these two packet counts corresponding to the same 885 block, i.e. that these two packet counts belong to the same block of 886 markers from the upstream and downstream node. The assumption of 887 this BN mechanism is that the measurement nodes are time 888 synchronized. This requires the measurement nodes to have a certain 889 time synchronization capability (e.g., the Network Time Protocol 890 (NTP) RFC 5905 [RFC5905], or the IEEE 1588 Precision Time Protocol 891 (PTP) [IEEE-1588]). Synchronization aspects are further discussed in 892 Section 4. 894 4.3. Packet Re-ordering 896 Due to ECMP, packet re-ordering is very common in IP network. The 897 accuracy of marking based PM, especially packet loss measurement, may 898 be affected by packet re-ordering. Take a look at the following 899 example: 901 Block : 1 | 2 | 3 | 4 | 5 |... 902 --------|---------|---------|---------|---------|---------|--- 903 Node R1 : AAAAAAA | BBBBBBB | AAAAAAA | BBBBBBB | AAAAAAA |... 904 Node R2 : AAAAABB | AABBBBA | AAABAAA | BBBBBBA | ABAAABA |... 906 Figure 5: Packet Reordering 908 In Figure 5 the packet stream for Node R1 isn't being reordered, and 909 can be safely assigned to interval blocks, but the packet stream for 910 Node R2 is being reordered, so, looking at the packet with the marker 911 of "B" in block 3, there is no safe way to tell whether the packet 912 belongs to block 2 or block 4. 914 In general there is the need to assign packets with the marker of "B" 915 or "A" to the right interval blocks. Most of the packet re-ordering 916 occur at the edge of adjacent blocks, and they are easy to handle if 917 the interval of each block is sufficient large. Then, it can assume 918 that the packets with different marker belong to the block that they 919 are more close to. If the interval is small, it is difficult and 920 sometime impossible to determine to which block a packet belongs. 922 To choose a proper interval is important and how to choose a proper 923 interval is out of the scope of this document. But an implementation 924 SHOULD provide a way to configure the interval and allow a certain 925 degree of packet re-ordering. 927 5. Implementation and deployment 929 The methodology described in the previous sections can be applied in 930 various situations. Basically Alternate Marking technique could be 931 used in many cases for performance measurement. The only requirement 932 is to select and mark the flow to be monitored; in this way packets 933 are batched by the sender and each batch is alternately marked such 934 that can be easily recognized by the receiver. 936 An example of implementation and deployment is explained in the next 937 section, just to clarify how the method can work. 939 5.1. Report on the operational experiment at Telecom Italia 941 The method described in this document, also called PNPM (Packet 942 Network Performance Monitoring), has been invented and engineered in 943 Telecom Italia and it's currently being used in Telecom Italia's 944 network. The methodology has been applied by leveraging functions 945 and tools available on IP routers and it's currently being used to 946 monitor packet loss in some portions of Telecom Italia's network. 947 The application of the method to delay measurement is currently being 948 evaluated in Telecom Italia's labs. This section describes how the 949 features currently available on existing routing platforms can be 950 used to apply the method, in order to give an example of 951 implementation and deployment. 953 The current implementation in Telecom Italia uses the flow-based 954 strategy, as defined in Section 3. The link-based strategy could be 955 applied to physical link or a logical link (e.g. Ethernet VLAN or a 956 MPLS PW). 958 The method is applied in Telecom Italia's network to multicast IPTV 959 channels or other specific traffic flows with high QoS requirements 960 (i.e. Mobile Backhauling traffic implemented with a VPN MPLS). 962 The implementation of the method by a Service Provider needs to use 963 the router features. With current router implementations, only QoS 964 related fields and features offer the required flexibility to set 965 bits in the packet header. In case a Service Provider only uses the 966 three most significant bits of the DSCP field (corresponding to IP 967 Precedence) for QoS classification and queuing, it is possible to use 968 the two less significant bits of the DSCP field (bit 0 and bit 1) to 969 implement the method without affecting QoS policies. One of the two 970 bits (bit 0) could be used to identify flows subject to traffic 971 monitoring (set to 1 if the flow is under monitoring, otherwise it is 972 set to 0), while the second (bit 1) can be used for coloring the 973 traffic (switching between values 0 and 1, corresponding to color A 974 and B) and creating the blocks. 976 In practice, coloring the traffic using the DSCP field can be 977 implemented by configuring on the router output interface an access 978 list that intercepts the flow(s) to be monitored and applies to them 979 a policy that sets the DSCP field accordingly. Since traffic 980 coloring has to be switched between the two values over time, the 981 policy needs to be modified periodically: an automatic script is used 982 to perform this task on the basis of a fixed timer. 984 In Telecom Italia's implementation the timer is set to 5 minutes: 985 this value showed to be a good compromise between measurement 986 frequency and stability of the measurement (i.e. possibility to 987 collect all the measures referring to the same block). 989 If traffic is colored using the DSCP field an access-list that 990 matches specific DSCP values can be used to count the packets of the 991 flow(s) being monitored. The access-list is installed on all the 992 routers of the path. In addition, network flow monitoring, such as 993 provided by IPFIX (RFC 7011 [RFC7011]), can be used to recognize 994 timestamps of first/last packet of a batch. 996 The counters are collected by using an automatic script that sends 997 out these to a Network Management System (NMS). The NMS is 998 responsible for packet loss calculation, performed by comparing the 999 values of counters from the routers along the flow(s) path. 5 1000 minutes timer for color switching is a safe choice for reading the 1001 counters and is also coherent with the reporting window of the NMS. 1003 Note that the use of the DSCP field for marking implies that the 1004 method in this case works reliably only within a single management 1005 and operation domain. 1007 A flow to monitor can be defined by a set of selection rules (e.g. 1008 headers fields) used to match a subset of the packets; in this way it 1009 is possible to control the number of involved nodes, the path 1010 followed by the packets and the size of the flows. As an example, 1011 the Telecom Italia experiment considers a flow as all the packets 1012 sharing the same source IP address or the same destination IP 1013 address, depending on the direction. 1015 Lastly, the Telecom Italia experiment scales up to 1000 flows 1016 monitored together on a single router, while an implementation on 1017 dedicated hardware scales more. 1019 5.1.1. Metric transparency 1021 Since a Service Provider application is described here, the method 1022 can be applied to end-to-end services supplied to Customers. So it 1023 is important to highlight that the method SHOULD be transparent 1024 outside the Service Provider domain. 1026 In Telecom Italia's implementation the source node colors the packets 1027 with a policy that is modified periodically via an automatic script 1028 in order to alternate the DSCP field of the packets. The nodes 1029 between source and destination (included) have to count with an 1030 access-list the colored packets that they receive and forward. 1032 Moreover the destination node has an important role: the colored 1033 packets are intercepted and a policy restores and sets the DSCP field 1034 of all the packets to the initial value. In this way the metric is 1035 transparent because outside the section of the network under 1036 monitoring the traffic flow is unchanged. 1038 In such a case, thanks to this restoring technique, network elements 1039 outside the Alternate Marking monitoring domain (e.g. the two 1040 Provider Edge nodes of the Mobile Backhauling VPN MPLS) are totally 1041 anaware that packets were marked. So this restoring technique makes 1042 Alternate Marking completely transparent outside its monitoring 1043 domain. 1045 5.2. IP flow performance measurement (IPFPM) 1047 This application of marking method is described in 1048 [I-D.chen-ippm-coloring-based-ipfpm-framework]. As an example, in 1049 this document, the last reserved bit of the Flag field of the IPv4 1050 header is proposed to be used for marking, while a solution for IPv6 1051 could be to leverage the IPv6 extension header for marking. 1053 5.3. OAM Passive Performance Measurement 1055 In [I-D.ietf-bier-mpls-encapsulation] two OAM bits from Bit Index 1056 Explicit Replication (BIER) Header are reserved for the passive 1057 performance measurement marking method. [I-D.ietf-bier-pmmm-oam] 1058 details the measurement for multicast service over BIER domain. 1060 In addition, the alternate marking method could also be used in a 1061 Service Function Chaining (SFC) domain. 1063 The application of the marking method to Network Virtualization 1064 Overlays (NVO3) protocols is a work in progress (see 1065 [I-D.ietf-nvo3-encap]). 1067 5.4. RFC6374 Use Case 1069 RFC6374 [RFC6374] uses the LM packet as the packet accounting 1070 demarcation point. Unfortunately this gives rise to a number of 1071 problems that may lead to significant packet accounting errors in 1072 certain situations. [I-D.ietf-mpls-flow-ident] discusses the desired 1073 capabilities for MPLS flow identification in order to perform a 1074 better in-band performance monitoring of user data packets. A method 1075 of accomplishing identification is Synonymous Flow Labels (SFL) 1076 introduced in [I-D.bryant-mpls-sfl-framework], while 1077 [I-D.ietf-mpls-rfc6374-sfl] describes RFC6374 performance 1078 measurements with SFL. 1080 5.5. Application to active performance measurement 1082 [I-D.fioccola-ippm-alt-mark-active] describes how to extend the 1083 existing Active Measurement Protocol, in order to implement alternate 1084 marking methodology. [I-D.fioccola-ippm-rfc6812-alt-mark-ext] 1085 describes an extension to the Cisco SLA Protocol Measurement-Type 1086 UDP-Measurement. 1088 6. Hybrid measurement 1090 The method has been explicitly designed for passive measurements but 1091 it can also be used with active measurements. In order to have both 1092 end to end measurements and intermediate measurements (hybrid 1093 measurements) two end points can exchanges artificial traffic flows 1094 and apply alternate marking over these flows. In the intermediate 1095 points artificial traffic is managed in the same way as real traffic 1096 and measured as specified before. So the application of marking 1097 method can simplify also the active measurement, as explained in 1098 [I-D.fioccola-ippm-alt-mark-active]. 1100 7. Summary 1102 The advantages of the method described in this document are: 1104 o easy implementation: it can be implemented or by using features 1105 already available on major routing platforms as described in 1106 Section 5.1 or by applying an optimized implementation of the 1107 method for both legacy and newest technologies; 1109 o low computational effort: the additional load on processing is 1110 negligible; 1112 o accurate packet loss measurement: single packet loss granularity 1113 is achieved with a passive measurement; 1115 o potential applicability to any kind of packet/frame -based 1116 traffic: Ethernet, IP, MPLS, etc., both unicast and multicast; 1118 o robustness: the method can tolerate out of order packets and it's 1119 not based on "special" packets whose loss could have a negative 1120 impact; 1122 o flexibility: all the timestamp formats are allowed, because they 1123 are managed out-of-band. The format (the Network Time Protocol 1124 (NTP) RFC 5905 [RFC5905] or the IEEE 1588 Precision Time Protocol 1125 (PTP) [IEEE-1588]) depends on the precision you want; 1127 o no interoperability issues: the features required to experiment 1128 and test the method (as described in Section 5.1) are available on 1129 all current routing platforms. Both a centarlized or distributed 1130 solution can be used to harvest data from the routers. 1132 The method doesn't raise any specific need for protocol extension, 1133 but it could be further improved by means of some extension to 1134 existing protocols. Specifically, the use of DiffServ bits for 1135 coloring the packets could not be a viable solution in some cases: a 1136 standard method to color the packets for this specific application 1137 could be beneficial. 1139 8. Compliance with RFC6390 guidelines 1141 RFC6390 [RFC6390] defines a framework and a process for developing 1142 Performance Metrics for protocols above and below the IP layer (such 1143 as IP-based applications that operate over reliable or datagram 1144 transport protocols). 1146 This document doesn't aim to propose a new Performance Metric but a 1147 new method of measurement for a few Performance Metrics that have 1148 already been standardized. Nevertheless, it's worth applying 1149 [RFC6390] guidelines to the present document, in order to provide a 1150 more complete and coherent description of the proposed method. We 1151 used a subset of the Performance Metric Definition template defined 1152 by [RFC6390]. 1154 o Metric name and description: as already stated, this document 1155 doesn't propose any new Performance Metric. On the contrary, it 1156 describes a novel method for measuring packet loss [RFC7680]. The 1157 same concept, with small differences, can also be used to measure 1158 delay [RFC7679], and jitter [RFC3393]. The document mainly 1159 describes the applicability to packet loss measurement. 1161 o Method of Measurement or Calculation: according to the method 1162 described in the previous sections, the number of packets lost is 1163 calculated by subtracting the value of the counter on the source 1164 node from the value of the counter on the destination node. Both 1165 counters must refer to the same color. The calculation is 1166 performed when the value of the counters is in a steady state. 1167 The steady state is an intrinsic characteristic of the marking 1168 method counters because the alternation of color makes the 1169 counters associated to each color still one at a time for the 1170 duration of a marking period. 1172 o Units of Measurement: the method calculates and reports the exact 1173 number of packets sent by the source node and not received by the 1174 destination node. 1176 o Measurement Points: the measurement can be performed between 1177 adjacent nodes, on a per-link basis, or along a multi-hop path, 1178 provided that the traffic under measurement follows that path. In 1179 case of a multi-hop path, the measurements can be performed both 1180 end-to-end and hop-by-hop. 1182 o Measurement Timing: the method have a constraint on the frequency 1183 of measurements. This is detailed in Section 3.2, where it is 1184 specified that the marking period and the guardband interval are 1185 strictly related each other to avoid out of order issues. That is 1186 because, in order to perform a measure, the counter must be in a 1187 steady state and this happens when the traffic is being colored 1188 with the alternate color. As an example in the Telecom Italia 1189 application of the method the time interval is set to 5 minutes, 1190 while other optimized implementations can also use a marking 1191 period of a few seconds. 1193 o Implementation: the Telecom Italia application of the method uses 1194 two encodings of the DSCP field to color the packets; this enables 1195 the use of policy configurations on the router to color the 1196 packets and accordingly configure the counter for each color. The 1197 path followed by traffic being measured should be known in advance 1198 in order to configure the counters along the path and be able to 1199 compare the correct values. 1201 o Verification: both in the Lab and in the operational network the 1202 methodology has been tested and experimented for packet loss and 1203 delay measurements by using traffic generators together with 1204 precision test instruments and network emulators. 1206 o Use and Applications: the method can be used to measure packet 1207 loss with high precision on live traffic; moreover, by combining 1208 end-to-end and per-link measurements, the method is useful to 1209 pinpoint the single link that is experiencing loss events. 1211 o Reporting Model: the value of the counters has to be sent to a 1212 centralized management system that perform the calculations; such 1213 samples must contain a reference to the time interval they refer 1214 to, so that the management system can perform the correct 1215 correlation; the samples have to be sent while the corresponding 1216 counter is in a steady state (within a time interval), otherwise 1217 the value of the sample should be stored locally. 1219 o Dependencies: the values of the counters have to be correlated to 1220 the time interval they refer to; moreover, as far the Telecom 1221 Italia application of the method is based on DSCP values, there 1222 are significant dependencies on the usage of the DSCP field: it 1223 must be possible to rely on unused DSCP values without affecting 1224 QoS-related configuration and behavior; moreover, the intermediate 1225 nodes must not change the value of the DSCP field not to alter the 1226 measurement. 1228 o Organization of Results: the method of measurement produces 1229 singletons. 1231 o Parameters: currently, the main parameter of the method is the 1232 time interval used to alternate the colors and read the counters. 1234 9. Security Considerations 1236 This document specifies a method to perform measurements in the 1237 context of a Service Provider's network and has not been developed to 1238 conduct Internet measurements, so it does not directly affect 1239 Internet security nor applications which run on the Internet. 1240 However, implementation of this method must be mindful of security 1241 and privacy concerns. 1243 There are two types of security concerns: potential harm caused by 1244 the measurements and potential harm to the measurements. 1246 o Harm caused by the measurement: the measurements described in this 1247 document are passive, so there are no new packets injected into 1248 the network causing potential harm to the network itself and to 1249 data traffic. Nevertheless, the method implies modifications on 1250 the fly to the IP header of data packets: this must be performed 1251 in a way that doesn't alter the quality of service experienced by 1252 packets subject to measurements and that preserve stability and 1253 performance of routers doing the measurements. One of the main 1254 security threats in OAM protocols is network reconnaissance; an 1255 attacker can gather information about the network performance by 1256 passively eavesdropping to OAM messages. The advantage of the 1257 methods described in this document is that the marking bits are 1258 the only information that is exchanged between the network 1259 devices. Therefore, passive eavesdropping to data plane traffic 1260 does not allow attackers to gain information about the network 1261 performance. 1263 o Harm to the measurement: the measurements could be harmed by 1264 routers altering the marking of the packets, or by an attacker 1265 injecting artificial traffic. Authentication techniques, such as 1266 digital signatures, may be used where appropriate to guard against 1267 injected traffic attacks. Since the measurement itself may be 1268 affected by routers (or other network devices) along the path of 1269 IP packets intentionally altering the value of marking bits of 1270 packets, as mentioned above, the mechanism specified in this 1271 document can be applied just in the context of a controlled 1272 domain, and thus the routers (or other network devices) are 1273 locally administered and this type of attack can be avoided. In 1274 addition, an attacker can't gain information about network 1275 performance from a single monitoring point, and must use 1276 synchronized monitoring points at multiple points on the path, 1277 because they have to do the same kind of measurement and 1278 aggregation that Service Providers using Alternate Marking must 1279 do. 1281 The privacy concerns of network measurement are limited because the 1282 method only relies on information contained in the IP header without 1283 any release of user data. 1285 Delay attacks are another potential threat in the context of this 1286 document. Delay measurement is performed using a specific packet in 1287 each block, marked by a dedicated color bit. Therefore, a man-in- 1288 the-middle attacker can selectively induce synthetic delay only to 1289 delay-colored packets, causing systematic error in the delay 1290 measurements. As discussed in previous sections, the methods 1291 described in this document rely on an underlying time synchronization 1292 protocol. Thus, by attacking the time protocol an attacker can 1293 potentially compromise the integrity of the measurement. A detailed 1294 discussion about the threats against time protocols and how to 1295 mitigate them is presented in RFC 7384 [RFC7384]. 1297 10. IANA Considerations 1299 There are no IANA actions required. 1301 11. Acknowledgements 1303 The previous IETF drafts about this technique were: 1304 [I-D.cociglio-mboned-multicast-pm] and [I-D.tempia-opsawg-p3m]. 1306 The authors would like to thank Alberto Tempia Bonda, Domenico 1307 Laforgia, Daniele Accetta and Mario Bianchetti for their contribution 1308 to the definition and the implementation of the method. 1310 12. References 1312 12.1. Normative References 1314 [IEEE-1588] 1315 IEEE 1588-2008, "IEEE Standard for a Precision Clock 1316 Synchronization Protocol for Networked Measurement and 1317 Control Systems", July 2008. 1319 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1320 Requirement Levels", BCP 14, RFC 2119, 1321 DOI 10.17487/RFC2119, March 1997, 1322 . 1324 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 1325 Metric for IP Performance Metrics (IPPM)", RFC 3393, 1326 DOI 10.17487/RFC3393, November 2002, 1327 . 1329 [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch, 1330 "Network Time Protocol Version 4: Protocol and Algorithms 1331 Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010, 1332 . 1334 [RFC7679] Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton, 1335 Ed., "A One-Way Delay Metric for IP Performance Metrics 1336 (IPPM)", STD 81, RFC 7679, DOI 10.17487/RFC7679, January 1337 2016, . 1339 [RFC7680] Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton, 1340 Ed., "A One-Way Loss Metric for IP Performance Metrics 1341 (IPPM)", STD 82, RFC 7680, DOI 10.17487/RFC7680, January 1342 2016, . 1344 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1345 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1346 May 2017, . 1348 12.2. Informative References 1350 [I-D.bryant-mpls-sfl-framework] 1351 Bryant, S., Chen, M., Li, Z., Swallow, G., Sivabalan, S., 1352 and G. Mirsky, "Synonymous Flow Label Framework", draft- 1353 bryant-mpls-sfl-framework-05 (work in progress), June 1354 2017. 1356 [I-D.chen-ippm-coloring-based-ipfpm-framework] 1357 Chen, M., Zheng, L., Mirsky, G., Fioccola, G., and T. 1358 Mizrahi, "IP Flow Performance Measurement Framework", 1359 draft-chen-ippm-coloring-based-ipfpm-framework-06 (work in 1360 progress), March 2016. 1362 [I-D.chen-ippm-ipfpm-report] 1363 Chen, M., Zheng, L., and G. Mirsky, "IP Flow Performance 1364 Measurement Report", draft-chen-ippm-ipfpm-report-01 (work 1365 in progress), April 2016. 1367 [I-D.cociglio-mboned-multicast-pm] 1368 Cociglio, M., Capello, A., Bonda, A., and L. Castaldelli, 1369 "A method for IP multicast performance monitoring", draft- 1370 cociglio-mboned-multicast-pm-01 (work in progress), 1371 October 2010. 1373 [I-D.fioccola-ippm-alt-mark-active] 1374 Fioccola, G., Clemm, A., Bryant, S., Cociglio, M., 1375 Chandramouli, M., and A. Capello, "Alternate Marking 1376 Extension to Active Measurement Protocol", draft-fioccola- 1377 ippm-alt-mark-active-01 (work in progress), March 2017. 1379 [I-D.fioccola-ippm-multipoint-alt-mark] 1380 Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, 1381 "Multipoint Alternate Marking method for passive and 1382 hybrid performance monitoring", draft-fioccola-ippm- 1383 multipoint-alt-mark-00 (work in progress), June 2017. 1385 [I-D.fioccola-ippm-rfc6812-alt-mark-ext] 1386 Fioccola, G., Clemm, A., Cociglio, M., Chandramouli, M., 1387 and A. Capello, "Alternate Marking Extension to Cisco SLA 1388 Protocol RFC6812", draft-fioccola-ippm-rfc6812-alt-mark- 1389 ext-01 (work in progress), March 2016. 1391 [I-D.ietf-bier-mpls-encapsulation] 1392 Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J., 1393 Aldrin, S., and I. Meilik, "Encapsulation for Bit Index 1394 Explicit Replication in MPLS and non-MPLS Networks", 1395 draft-ietf-bier-mpls-encapsulation-10 (work in progress), 1396 October 2017. 1398 [I-D.ietf-bier-pmmm-oam] 1399 Mirsky, G., Zheng, L., Chen, M., and G. Fioccola, 1400 "Performance Measurement (PM) with Marking Method in Bit 1401 Index Explicit Replication (BIER) Layer", draft-ietf-bier- 1402 pmmm-oam-03 (work in progress), October 2017. 1404 [I-D.ietf-mpls-flow-ident] 1405 Bryant, S., Pignataro, C., Chen, M., Li, Z., and G. 1406 Mirsky, "MPLS Flow Identification Considerations", draft- 1407 ietf-mpls-flow-ident-05 (work in progress), July 2017. 1409 [I-D.ietf-mpls-rfc6374-sfl] 1410 Bryant, S., Chen, M., Li, Z., Swallow, G., Sivabalan, S., 1411 Mirsky, G., and G. Fioccola, "RFC6374 Synonymous Flow 1412 Labels", draft-ietf-mpls-rfc6374-sfl-00 (work in 1413 progress), June 2017. 1415 [I-D.ietf-nvo3-encap] 1416 Boutros, S., Ganga, I., Garg, P., Manur, R., Mizrahi, T., 1417 Mozes, D., and E. Nordmark, "NVO3 Encapsulation 1418 Considerations", draft-ietf-nvo3-encap-00 (work in 1419 progress), June 2017. 1421 [I-D.tempia-opsawg-p3m] 1422 Capello, A., Cociglio, M., Castaldelli, L., and A. Bonda, 1423 "A packet based method for passive performance 1424 monitoring", draft-tempia-opsawg-p3m-04 (work in 1425 progress), February 2014. 1427 [RFC5481] Morton, A. and B. Claise, "Packet Delay Variation 1428 Applicability Statement", RFC 5481, DOI 10.17487/RFC5481, 1429 March 2009, . 1431 [RFC6374] Frost, D. and S. Bryant, "Packet Loss and Delay 1432 Measurement for MPLS Networks", RFC 6374, 1433 DOI 10.17487/RFC6374, September 2011, 1434 . 1436 [RFC6390] Clark, A. and B. Claise, "Guidelines for Considering New 1437 Performance Metric Development", BCP 170, RFC 6390, 1438 DOI 10.17487/RFC6390, October 2011, 1439 . 1441 [RFC6703] Morton, A., Ramachandran, G., and G. Maguluri, "Reporting 1442 IP Network Performance Metrics: Different Points of View", 1443 RFC 6703, DOI 10.17487/RFC6703, August 2012, 1444 . 1446 [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, 1447 "Specification of the IP Flow Information Export (IPFIX) 1448 Protocol for the Exchange of Flow Information", STD 77, 1449 RFC 7011, DOI 10.17487/RFC7011, September 2013, 1450 . 1452 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 1453 Weingarten, "An Overview of Operations, Administration, 1454 and Maintenance (OAM) Tools", RFC 7276, 1455 DOI 10.17487/RFC7276, June 2014, 1456 . 1458 [RFC7384] Mizrahi, T., "Security Requirements of Time Protocols in 1459 Packet Switched Networks", RFC 7384, DOI 10.17487/RFC7384, 1460 October 2014, . 1462 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 1463 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 1464 May 2016, . 1466 Authors' Addresses 1468 Giuseppe Fioccola (editor) 1469 Telecom Italia 1470 Via Reiss Romoli, 274 1471 Torino 10148 1472 Italy 1474 Email: giuseppe.fioccola@telecomitalia.it 1476 Alessandro Capello 1477 Telecom Italia 1478 Via Reiss Romoli, 274 1479 Torino 10148 1480 Italy 1482 Email: alessandro.capello@telecomitalia.it 1484 Mauro Cociglio 1485 Telecom Italia 1486 Via Reiss Romoli, 274 1487 Torino 10148 1488 Italy 1490 Email: mauro.cociglio@telecomitalia.it 1492 Luca Castaldelli 1493 Telecom Italia 1494 Via Reiss Romoli, 274 1495 Torino 10148 1496 Italy 1498 Email: luca.castaldelli@telecomitalia.it 1500 Mach(Guoyi) Chen 1501 Huawei Technologies 1503 Email: mach.chen@huawei.com 1504 Lianshu Zheng 1505 Huawei Technologies 1507 Email: vero.zheng@huawei.com 1509 Greg Mirsky 1510 ZTE 1511 USA 1513 Email: gregimirsky@gmail.com 1515 Tal Mizrahi 1516 Marvell 1517 6 Hamada st. 1518 Yokneam 1519 Israel 1521 Email: talmi@marvell.com