idnits 2.17.1 draft-ietf-ippm-alt-mark-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 701: '...correlation mechanism SHOULD be in use...' RFC 2119 keyword, line 756: '... SHOULD provide a way to configure t...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 26, 2017) is 2490 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'RFC5905' is mentioned on line 724, but not defined == Missing Reference: 'IEEE1588' is mentioned on line 725, but not defined ** Obsolete normative reference: RFC 2679 (Obsoleted by RFC 7679) ** Obsolete normative reference: RFC 2680 (Obsoleted by RFC 7680) == Outdated reference: A later version (-05) exists of draft-bryant-mpls-sfl-framework-04 == Outdated reference: A later version (-12) exists of draft-ietf-bier-mpls-encapsulation-07 == Outdated reference: A later version (-15) exists of draft-ietf-bier-pmmm-oam-01 == Outdated reference: A later version (-07) exists of draft-ietf-mpls-flow-ident-04 == Outdated reference: A later version (-10) exists of draft-ietf-mpls-rfc6374-sfl-00 == Outdated reference: A later version (-14) exists of draft-mirsky-sfc-pmamm-00 Summary: 3 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group G. Fioccola, Ed. 3 Internet-Draft A. Capello, Ed. 4 Intended status: Experimental M. Cociglio 5 Expires: December 28, 2017 L. Castaldelli 6 Telecom Italia 7 M. Chen, Ed. 8 L. Zheng, Ed. 9 Huawei Technologies 10 G. Mirsky, Ed. 11 ZTE 12 T. Mizrahi, Ed. 13 Marvell 14 June 26, 2017 16 Alternate Marking method for passive and hybrid performance monitoring 17 draft-ietf-ippm-alt-mark-05 19 Abstract 21 This document describes a method to perform packet loss, delay and 22 jitter measurements on live traffic. This method is based on 23 Alternate Marking (Coloring) technique. A report on the operational 24 experiment done at Telecom Italia is explained in order to give an 25 example and show the method applicability. This technique can be 26 applied in various situations as detailed in this document and could 27 be considered passive or hybrid depending on the application. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on December 28, 2017. 46 Copyright Notice 48 Copyright (c) 2017 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Overview of the method . . . . . . . . . . . . . . . . . . . 4 65 3. Detailed description of the method . . . . . . . . . . . . . 6 66 3.1. Packet loss measurement . . . . . . . . . . . . . . . . . 6 67 3.2. Timing aspects . . . . . . . . . . . . . . . . . . . . . 10 68 3.3. One-way delay measurement . . . . . . . . . . . . . . . . 11 69 3.3.1. Single marking methodology . . . . . . . . . . . . . 11 70 3.3.2. Double marking methodology . . . . . . . . . . . . . 13 71 3.4. Delay variation measurement . . . . . . . . . . . . . . . 14 72 4. Considerations . . . . . . . . . . . . . . . . . . . . . . . 15 73 4.1. Synchronization . . . . . . . . . . . . . . . . . . . . . 15 74 4.2. Data Correlation . . . . . . . . . . . . . . . . . . . . 15 75 4.3. Packet Re-ordering . . . . . . . . . . . . . . . . . . . 16 76 5. Implementation and deployment . . . . . . . . . . . . . . . . 17 77 5.1. Report on the operational experiment at Telecom Italia . 17 78 5.1.1. Coloring the packets . . . . . . . . . . . . . . . . 19 79 5.1.2. Counting the packets . . . . . . . . . . . . . . . . 20 80 5.1.3. Collecting data and calculating packet loss . . . . . 21 81 5.1.4. Metric transparency . . . . . . . . . . . . . . . . . 22 82 5.2. IP flow performance measurement (IPFPM) . . . . . . . . . 22 83 5.3. OAM Passive Performance Measurement . . . . . . . . . . . 22 84 5.4. RFC6374 Use Case . . . . . . . . . . . . . . . . . . . . 22 85 5.5. Application to active performance measurement . . . . . . 23 86 6. Hybrid measurement . . . . . . . . . . . . . . . . . . . . . 23 87 7. Compliance with RFC6390 guidelines . . . . . . . . . . . . . 23 88 8. Security Considerations . . . . . . . . . . . . . . . . . . . 25 89 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26 90 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 91 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 27 92 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 93 12.1. Normative References . . . . . . . . . . . . . . . . . . 27 94 12.2. Informative References . . . . . . . . . . . . . . . . . 27 95 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 97 1. Introduction 99 Nowadays, most of the traffic in Service Providers' networks carries 100 contents that are highly sensitive to packet loss [RFC2680], delay 101 [RFC2679], and jitter [RFC3393]. 103 In view of this scenario, Service Providers need methodologies and 104 tools to monitor and measure network performances with an adequate 105 accuracy, in order to constantly control the quality of experience 106 perceived by their customers. On the other hand, performance 107 monitoring provides useful information for improving network 108 management (e.g. isolation of network problems, troubleshooting, 109 etc.). 111 A lot of work related to OAM, that includes also performance 112 monitoring techniques, has been done by Standards Developing 113 Organizations(SDOs): [RFC7276] provides a good overview of existing 114 OAM mechanisms defined in IETF, ITU-T and IEEE. Considering IETF, a 115 lot of work has been done on fault detection and connectivity 116 verification, while a minor effort has been dedicated so far to 117 performance monitoring. The IPPM WG has defined standard metrics to 118 measure network performance; however, the methods developed in this 119 WG mainly refer to focus on active measurement techniques. More 120 recently, the MPLS WG has defined mechanisms for measuring packet 121 loss, one-way and two-way delay, and delay variation in MPLS 122 networks[RFC6374], but their applicability to passive measurements 123 has some limitations, especially for pure connection-less networks. 125 The lack of adequate tools to measure packet loss with the desired 126 accuracy drove an effort to design a new method for the performance 127 monitoring of live traffic, possibly easy to implement and deploy. 128 The effort led to the method described in this document: basically, 129 it is a passive performance monitoring technique, potentially 130 applicable to any kind of packet based traffic, including Ethernet, 131 IP, and MPLS, both unicast and multicast. The method addresses 132 primarily packet loss measurement, but it can be easily extended to 133 one-way delay and delay variation measurements as well. 135 The method has been explicitly designed for passive measurements but 136 it can also be used with active probes. Passive measurements are 137 usually more easily understood by customers and provide a much better 138 accuracy, especially for packet loss measurements. 140 RFC 7799 [RFC7799] defines passive and hybrid methods of measurement. 141 In particular, Passive Methods of Measurement are based solely on 142 observations of an undisturbed and unmodified packet stream of 143 interest; Hybrid Methods are Methods of Measurement that use a 144 combination of Active Methods and Passive Methods. 146 Taking into consideration these definitions, Alternate Marking Method 147 could be considered Hybrid or Passive depending on the case. In case 148 the marking field is obtained by changing existing field values of 149 the packets (e.g. DSCP field), the technique is Hybrid. In case the 150 marking field is dedicated, reserved and is included in the protocol 151 specification Alternate Marking technique can be considered as 152 Passive (e.g. RFC6374 Synonymous Flow Label or OAM Marking Bits in 153 BIER Header). 155 This document is organized as follows: 157 o Section 2 gives an overview of the method, including a comparison 158 with different measurement strategies; 160 o Section 3 describes the method in detail; 162 o Section 4 reports considerations about synchronization, data 163 correlation and packet re-ordering; 165 o Section 5 reports examples of implementation and deployment of the 166 method. Furthermore the operational experiment done at Telecom 167 Italia is described; 169 o Section 6 introduces Hybrid measurement aspects; 171 o Section 7 is about the Compliance with RFC6390 guidelines; 173 o Section 8 includes some security aspects; 175 o Section 9 finally summarizes some concluding remarks. 177 2. Overview of the method 179 In order to perform packet loss measurements on a live traffic flow, 180 different approaches exist. The most intuitive one consists in 181 numbering the packets, so that each router that receives the flow can 182 immediately detect a packet missing. This approach, though very 183 simple in theory, is not simple to achieve: it requires the insertion 184 of a sequence number into each packet and the devices must be able to 185 extract the number and check it in real time. Such a task can be 186 difficult to implement on live traffic: if UDP is used as the 187 transport protocol, the sequence number is not available; on the 188 other hand, if a higher layer sequence number (e.g. in the RTP 189 header) is used, extracting that information from each packet and 190 process it in real time could overload the device. 192 An alternate approach is to count the number of packets sent on one 193 end, the number of packets received on the other end, and to compare 194 the two values. This operation is much simpler to implement, but 195 requires that the devices performing the measurement are in sync: in 196 order to compare two counters it is required that they refer exactly 197 to the same set of packets. Since a flow is continuous and cannot be 198 stopped when a counter has to be read, it could be difficult to 199 determine exactly when to read the counter. A possible solution to 200 overcome this problem is to virtually split the flow in consecutive 201 blocks by inserting periodically a delimiter so that each counter 202 refers exactly to the same block of packets. The delimiter could be 203 for example a special packet inserted artificially into the flow. 204 However, delimiting the flow using specific packets has some 205 limitations. First, it requires generating additional packets within 206 the flow and requires the equipment to be able to process those 207 packets. In addition, the method is vulnerable to out of order 208 reception of delimiting packets and, to a lesser extent, to their 209 loss. 211 The method proposed in this document follows the second approach, but 212 it doesn't use additional packets to virtually split the flow in 213 blocks. Instead, it "colors" the packets so that the packets 214 belonging to the same block will have the same color, whilst 215 consecutive blocks will have different colors. Each change of color 216 represents a sort of auto-synchronization signal that guarantees the 217 consistency of measurements taken by different devices along the 218 path. 220 Figure 1 represents a very simple network and shows how the method 221 can be used to measure packet loss on different network segments: by 222 enabling the measurement on several interfaces along the path, it is 223 possible to perform link monitoring, node monitoring or end-to-end 224 monitoring. The method is flexible enough to measure packet loss on 225 any segment of the network and can be used to isolate the faulty 226 element. 228 Traffic flow 229 ========================================================> 230 +------+ +------+ +------+ +------+ 231 ---<> R1 <>-----<> R2 <>-----<> R3 <>-----<> R4 <>--- 232 +------+ +------+ +------+ +------+ 233 . . . . . . 234 . . . . . . 235 . <------> <-------> . 236 . Node Packet Loss Link Packet Loss . 237 . . 238 <---------------------------------------------------> 239 End-to-End Packet loss 241 Figure 1: Available measurements 243 3. Detailed description of the method 245 This section describes in detail how the method operate. A special 246 emphasis is given to the measurement of packet loss, that represents 247 the core application of the method, but applicability to delay and 248 jitter measurements is also considered. 250 3.1. Packet loss measurement 252 The basic idea is to virtually split traffic flows into consecutive 253 blocks: each block represents a measurable entity unambiguously 254 recognizable by all network devices along the path. By counting the 255 number of packets in each block and comparing the values measured by 256 different network devices along the path, it is possible to measure 257 packet loss occurred in any single block between any two points. 259 As discussed in the previous section, a simple way to create the 260 blocks is to "color" the traffic (two colors are sufficient) so that 261 packets belonging to different consecutive blocks will have different 262 colors. Whenever the color changes, the previous block terminates 263 and the new one begins. Hence, all the packets belonging to the same 264 block will have the same color and packets of different consecutive 265 blocks will have different colors. The number of packets in each 266 block depends on the criterion used to create the blocks: if the 267 color is switched after a fixed number of packets, then each block 268 will contain the same number of packets (except for any losses); but 269 if the color is switched according to a fixed timer, then the number 270 of packets may be different in each block depending on the packet 271 rate. 273 The following figure shows how a flow looks like when it is split in 274 traffic blocks with colored packets. 276 A: packet with A coloring 277 B: packet with B coloring 279 | | | | | 280 | | Traffic flow | | 281 -------------------------------------------------------------------> 282 BBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA 283 -------------------------------------------------------------------> 284 ... | Block 5 | Block 4 | Block 3 | Block 2 | Block 1 285 | | | | | 287 Figure 2: Traffic coloring 289 Figure 3 shows how the method can be used to measure link packet loss 290 between two adjacent nodes. 292 Referring to the figure, let's assume we want to monitor the packet 293 loss on the link between two routers: router R1 and router R2. 294 According to the method, the traffic is colored alternatively with 295 two different colors, A and B. Whenever the color changes, the 296 transition generates a sort of square-wave signal, as depicted in the 297 following figure. 299 Color A ----------+ +-----------+ +---------- 300 | | | | 301 Color B +-----------+ +-----------+ 302 Block n ... Block 3 Block 2 Block 1 303 <---------> <---------> <---------> <---------> <---------> 305 Traffic flow 306 ===========================================================> 307 Color ...AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA... 308 ===========================================================> 310 Figure 3: Computation of link packet loss 312 Traffic coloring could be done by R1 itself or by an upward router. 313 R1 needs two counters, C(A)R1 and C(B)R1, on its egress interface: 314 C(A)R1 counts the packets with color A and C(B)R1 counts those with 315 color B. As long as traffic is colored A, only counter C(A)R1 will 316 be incremented, while C(B)R1 is not incremented; vice versa, when the 317 traffic is colored as B, only C(B)R1 is incremented. C(A)R1 and 318 C(B)R1 can be used as reference values to determine the packet loss 319 from R1 to any other measurement point down the path. Router R2, 320 similarly, will need two counters on its ingress interface, C(A)R2 321 and C(B)R2, to count the packets received on that interface and 322 colored with color A and B respectively. When an A block ends, it is 323 possible to compare C(A)R1 and C(A)R2 and calculate the packet loss 324 within the block; similarly, when the successive B block terminates, 325 it is possible to compare C(B)R1 with C(B)R2, and so on for every 326 successive block. 328 Likewise, by using two counters on R2 egress interface it is possible 329 to count the packets sent out of R2 interface and use them as 330 reference values to calculate the packet loss from R2 to any 331 measurement point down R2. 333 Using a fixed timer for color switching offers a better control over 334 the method: the (time) length of the blocks can be chosen large 335 enough to simplify the collection and the comparison of measures 336 taken by different network devices. It's preferable to read the 337 value of the counters not immediately after the color switch: some 338 packets could arrive out of order and increment the counter 339 associated to the previous block (color), so it is worth waiting for 340 some time. A safe choice is to wait L/2 time units (where L is the 341 duration for each block) after the color switch, to read the still 342 counter of the previous color, so the possibility to read a running 343 counter instead of a still one is minimized. The drawback is that 344 the longer the duration of the block, the less frequent the 345 measurement can be taken. 347 The following table shows how the counters can be used to calculate 348 the packet loss between R1 and R2. The first column lists the 349 sequence of traffic blocks while the other columns contain the 350 counters of A-colored packets and B-colored packets for R1 and R2. 351 In this example, we assume that the values of the counters are reset 352 to zero whenever a block ends and its associated counter has been 353 read: with this assumption, the table shows only relative values, 354 that is the exact number of packets of each color within each block. 355 If the values of the counters were not reset, the table would contain 356 cumulative values, but the relative values could be determined simply 357 by difference from the value of the previous block of the same color. 359 The color is switched on the basis of a fixed timer (not shown in the 360 table), so the number of packets in each block is different. 362 +-------+--------+--------+--------+--------+------+ 363 | Block | C(A)R1 | C(B)R1 | C(A)R2 | C(B)R2 | Loss | 364 +-------+--------+--------+--------+--------+------+ 365 | 1 | 375 | 0 | 375 | 0 | 0 | 366 | | | | | | | 367 | 2 | 0 | 388 | 0 | 388 | 0 | 368 | | | | | | | 369 | 3 | 382 | 0 | 381 | 0 | 1 | 370 | | | | | | | 371 | 4 | 0 | 377 | 0 | 374 | 3 | 372 | | | | | | | 373 | ... | ... | ... | ... | ... | ... | 374 | | | | | | | 375 | 2n | 0 | 387 | 0 | 387 | 0 | 376 | | | | | | | 377 | 2n+1 | 379 | 0 | 377 | 0 | 2 | 378 +-------+--------+--------+--------+--------+------+ 380 Table 1: Evaluation of counters for packet loss measurements 382 During an A block (blocks 1, 3 and 2n+1), all the packets are 383 A-colored, therefore the C(A) counters are incremented to the number 384 seen on the interface, while C(B) counters are zero. Vice versa, 385 during a B block (blocks 2, 4 and 2n), all the packets are B-colored: 386 C(A) counters are zero, while C(B) counters are incremented. 388 When a block ends (because of color switching) the relative counters 389 stop incrementing and it is possible to read them, compare the values 390 measured on router R1 and R2 and calculate the packet loss within 391 that block. 393 For example, looking at the table above, during the first block 394 (A-colored), C(A)R1 and C(A)R2 have the same value (375), which 395 corresponds to the exact number of packets of the first block (no 396 loss). Also during the second block (B-colored) R1 and R2 counters 397 have the same value (388), which corresponds to the number of packets 398 of the second block (no loss). During blocks three and four, R1 and 399 R2 counters are different, meaning that some packets have been lost: 400 in the example, one single packet (382-381) was lost during block 401 three and three packets (377-374) were lost during block four. 403 The method applied to R1 and R2 can be extended to any other router 404 and applied to more complex networks, as far as the measurement is 405 enabled on the path followed by the traffic flow(s) being observed. 407 3.2. Timing aspects 409 This document introduces two color switching method: one is based on 410 fixed number of packet, the other is based on fixed timer. But the 411 method based on fixed timer is preferable because is more 412 deterministic, and will be considered in the rest of the dcoument. 414 By considering the clock error between network devices R1 and R2, 415 they must be synchronized to the same clock reference with an 416 accuracy of +/- L/2 time units, where L is the time duration of the 417 block. So each colored packet can be assigned to the right batch by 418 each router. This is because the minimum time distance between two 419 packets of the same color but belonging to different batches is L 420 time units. 422 In practice, there are also out of order at batch boundaries, 423 strictly related to the delay between measurement points. This means 424 that, without considering clock error, we wait L/2 after color 425 switching to be sure to take a still counter. 427 In summary we need to take into account two contributions: clock 428 error between network devices and the interval we need to wait to 429 avoid out of order because of network delay. 431 The following figure explains both issues. 433 ...BBBBBBBBB | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | BBBBBBBBB... 434 |<======================================>| 435 | L | 436 ...=========>|<==================><==================>|<==========... 437 | L/2 L/2 | 438 |<===>| |<===>| 439 d | | d 440 |<==========================>| 441 available counting interval 443 Figure 4: Timing aspects 445 It is assumed that all network devices are synchronized to a common 446 reference time with an accuracy of +/- A/2. Thus, the difference 447 between the clock values of any two network devices is bounded by A. 449 The guardband d is given by: 451 d = A + D_max - D_min, 452 where A is the clock accuracy, D_max is an upper bound on the network 453 delay between the network devices, and D_min is a lower bound on the 454 delay. 456 The available counting interval is L - 2d that must be > 0. 458 The condition that must be satisfied and is a requirement on the 459 synchronization accuracy is: 461 d < L/2. 463 3.3. One-way delay measurement 465 The same principle used to measure packet loss can be applied also to 466 one-way delay measurement. There are three alternatives, as 467 described hereinafter. 469 3.3.1. Single marking methodology 471 The alternation of colors can be used as a time reference to 472 calculate the delay. Whenever the color changes (that means that a 473 new block has started) a network device can store the timestamp of 474 the first packet of the new block; that timestamp can be compared 475 with the timestamp of the same packet on a second router to compute 476 packet delay. Considering Figure 2, R1 stores a timestamp TS(A1)R1 477 when it sends the first packet of block 1 (A-colored), a timestamp 478 TS(B2)R1 when it sends the first packet of block 2 (B-colored) and so 479 on for every other block. R2 performs the same operation on the 480 receiving side, recording TS(A1)R2, TS(B2)R2 and so on. Since the 481 timestamps refer to specific packets (the first packet of each block) 482 we are sure that timestamps compared to compute delay refer to the 483 same packets. By comparing TS(A1)R1 with TS(A1)R2 (and similarly 484 TS(B2)R1 with TS(B2)R2 and so on) it is possible to measure the delay 485 between R1 and R2. In order to have more measurements, it is 486 possible to take and store more timestamps, referring to other 487 packets within each block. 489 In order to coherently compare timestamps collected on different 490 routers, the network nodes must be in sync. Furthermore, a 491 measurement is valid only if no packet loss occurs and if packet 492 misordering can be avoided, otherwise the first packet of a block on 493 R1 could be different from the first packet of the same block on R2 494 (f.i. if that packet is lost between R1 and R2 or it arrives after 495 the next one). 497 The following table shows how timestamps can be used to calculate the 498 delay between R1 and R2. The first column lists the sequence of 499 blocks while other columns contain the timestamp referring to the 500 first packet of each block on R1 and R2. The delay is computed as a 501 difference between timestamps. For the sake of simplicity, all the 502 values are expressed in milliseconds. 504 +-------+---------+---------+---------+---------+-------------+ 505 | Block | TS(A)R1 | TS(B)R1 | TS(A)R2 | TS(B)R2 | Delay R1-R2 | 506 +-------+---------+---------+---------+---------+-------------+ 507 | 1 | 12.483 | - | 15.591 | - | 3.108 | 508 | | | | | | | 509 | 2 | - | 6.263 | - | 9.288 | 3.025 | 510 | | | | | | | 511 | 3 | 27.556 | - | 30.512 | - | 2.956 | 512 | | | | | | | 513 | | - | 18.113 | - | 21.269 | 3.156 | 514 | | | | | | | 515 | ... | ... | ... | ... | ... | ... | 516 | | | | | | | 517 | 2n | 77.463 | - | 80.501 | - | 3.038 | 518 | | | | | | | 519 | 2n+1 | - | 24.333 | - | 27.433 | 3.100 | 520 +-------+---------+---------+---------+---------+-------------+ 522 Table 2: Evaluation of timestamps for delay measurements 524 The first row shows timestamps taken on R1 and R2 respectively and 525 referring to the first packet of block 1 (which is A-colored). Delay 526 can be computed as a difference between the timestamp on R2 and the 527 timestamp on R1. Similarly, the second row shows timestamps (in 528 milliseconds) taken on R1 and R2 and referring to the first packet of 529 block 2 (which is B-colored). Comparing timestamps taken on 530 different nodes in the network and referring to the same packets 531 (identified using the alternation of colors) it is possible to 532 measure delay on different network segments. 534 For the sake of simplicity, in the above example a single measurement 535 is provided within a block, taking into account only the first packet 536 of each block. The number of measurements can be easily increased by 537 considering multiple packets in the block: for instance, a timestamp 538 could be taken every N packets, thus generating multiple delay 539 measurements. Taking this to the limit, in principle the delay could 540 be measured for each packet, by taking and comparing the 541 corresponding timestamps (possible but impractical from an 542 implementation point of view). 544 3.3.1.1. Mean delay 546 As mentioned before, the method previously exposed for measuring the 547 delay is sensitive to out of order reception of packets. In order to 548 overcome this problem, a different approach has been considered: it 549 is based on the concept of mean delay. The mean delay is calculated 550 by considering the average arrival time of the packets within a 551 single block. The network device locally stores a timestamp for each 552 packet received within a single block: summing all the timestamps and 553 dividing by the total number of packets received, the average arrival 554 time for that block of packets can be calculated. By subtracting the 555 average arrival times of two adjacent devices it is possible to 556 calculate the mean delay between those nodes. When computing the 557 mean delay, measurement error could be augmented by accumulating 558 measurement error of a lot of packets. This method is robust to out 559 of order packets and also to packet loss (only a small error is 560 introduced). Moreover, it greatly reduces the number of timestamps 561 (only one per block for each network device) that have to be 562 collected by the management system. On the other hand, it only gives 563 one measure for the duration of the block (f.i. 5 minutes), and it 564 doesn't give the minimum, maximum and median delay values (RFC 6703 565 [RFC6703]). This limitation could be overcome by reducing the 566 duration of the block (f.i. from 5 minutes to a few seconds), that 567 implicates an highly optimized implementation of the method. 569 By summing the mean delays of the two directions of a path, it is 570 also possible to measure the two-way mean delay (round-trip delay). 572 3.3.2. Double marking methodology 574 The Single marking methodology for one-way delay measurement is 575 sensitive to out of order reception of packets. The first approach 576 to overcome this problem is described before and is based on the 577 concept of mean delay. But the limitation of mean delay is that it 578 doesn't give information about the delay values distribution for the 579 duration of the block. Additionally it may be useful to have not 580 only the mean delay but also the minimum, maximum and median delay 581 values and, in wider terms, to know more about the statistic 582 distribution of delay values. So in order to have more information 583 about the delay and to overcome out of order issues, a different 584 approach can be introduced: it is based on double marking 585 methodology. 587 Basically, the idea is to use the first marking to create the 588 alternate flow and, within this colored flow, a second marking to 589 select the packets for measuring delay/jitter. The first marking is 590 needed for packet loss and mean delay measurement. The second 591 marking creates a new set of marked packets that are fully identified 592 over the network, so that a network device can store the timestamps 593 of these packets; these timestamps can be compared with the 594 timestamps of the same packets on a second router to compute packet 595 delay values for each packet. The number of measurements can be 596 easily increased by changing the frequency of the second marking. 597 But the frequency of the second marking must be not too high in order 598 to avoid out of order issues. Between packets with the second 599 marking there should be a security time gap (e.g. this gap could be, 600 at the minimum, the mean network delay calculated with the previous 601 methodology) to avoid out of order issues and also to have a number 602 of measurement packets that is rate independent. If a second marking 603 packet is lost, the delay measurement for the considered block is 604 corrupted and should be discarded. 606 Mean delay is calculated on all the packets of a sample and is a 607 simple computation to be performed for single marking method. In 608 some cases the mean delay measure is not sufficient to characterize 609 the sample, and more statistics of delay extent data are needed, e.g. 610 percentiles, variance and median delay values. The conventional 611 range (maximum-minimum) should be avoided for several reasons, 612 including stability of the maximum delay due to the influence by 613 outliers. RFC 5481 [RFC5481] section 6.5 highlights how the 99.9th 614 percentile of delay and delay variation is more helpful to 615 performance planners. To overcome this drawback the idea is to 616 couple the mean delay measure for the entire batch with double 617 marking method, where a subset of batch packets are selected for 618 extensive delay calculation by using a second marking. In this way 619 it is possible to perform a detailed analysis on these double marked 620 packets. Please note that there are classic algorithms for median 621 and variance calculation, but are out of the scope of this document. 622 The comparison between the mean delay for the entire batch and the 623 mean delay on these double marked packets gives an useful information 624 since it is possible to understand if the double marking measurements 625 are actually representative of the delay trends. 627 3.4. Delay variation measurement 629 Similarly to one-way delay measurement (both for single marking and 630 double marking), the method can also be used to measure the inter- 631 arrival jitter. We refer to the definition in RFC 3393 [RFC3393]. 632 The alternation of colors, for single marking method, can be used as 633 a time reference to measure delay variations. In case of double 634 marking, the time reference is given by the second marked packets. 635 Considering the example depicted in Figure 2, R1 stores a timestamp 636 TS(A)R1 whenever it sends the first packet of a block and R2 stores a 637 timestamp TS(B)R2 whenever it receives the first packet of a block. 638 The inter-arrival jitter can be easily derived from one-way delay 639 measurement, by evaluating the delay variation of consecutive 640 samples. 642 The concept of mean delay can also be applied to delay variation, by 643 evaluating the average variation of the interval between consecutive 644 packets of the flow from R1 to R2. 646 4. Considerations 648 This section highlights some considerations about the methodology. 650 4.1. Synchronization 652 The Alternate Marking technique does not require a strong 653 synchronization, especially for packet loss and two-way delay 654 measurement. Only one-way delay measurement requires network devices 655 to have synchronized clocks. 657 The color switching is the reference for all the network devices, and 658 the only requirement to be achieved is that all network devices have 659 to recognize the right batch along the path. 661 If the length of the measurement period is L time units, then all 662 network devices must be synchronized to the same clock reference with 663 an accuracy of +/- L/2 time units (without considering network 664 delay). This level of accuracy guarantees that all network devices 665 consistently match the color bit to the correct block. For example, 666 if the color is toggeled every second (L = 1 second), then clocks 667 must be synchronized with an accuracy of +/- 0.5 second to a common 668 time reference. 670 This synchronization requirement can be satisfied even with a 671 relatively inaccurate synchronization method. This is true for 672 packet loss and two-way delay measurement, instead, for one-way delay 673 measurement clock synchronization must be accurate. 675 Therefore, a system that uses only packet loss and two-way delay 676 measurement does not require synchronization. This is because the 677 value of the clocks of network devices does not affect the 678 computation of the two-way delay measurement. 680 4.2. Data Correlation 682 Data Correlation is the mechanism to compare counters and timestamps 683 for packet loss, delay and delay variation calculation. It could be 684 performed in several ways depending on the alternate marking 685 application and use case. 687 o A possibility is to use a centralized solution using Network 688 Management System (NMS) to correlate data; 690 o Another possibility is to define a protocol based distributed 691 solution, by defining a new protocol or by extending the existing 692 protocols (e.g. RFC6374, TWAMP, OWAMP) in order to communicate 693 the counters and timestamps between nodes. 695 In the following paragraphs an example data correlation mechanism is 696 explained and could be use independently of the adopted solutions. 698 When data is collected on the upstream and downstream node, e.g., 699 packet counts for packet loss measurement or timestamps for packet 700 delay measurement, and periodically reported to or pulled by other 701 nodes or NMS, a certain data correlation mechanism SHOULD be in use 702 to help the nodes or NMS to tell whether any two or more packet 703 counts are related to the same block of markers, or any two 704 timestamps are related to the same marked packet. 706 The alternate marking method described in this document literally 707 split the packets of the measured flow into different measurement 708 blocks, in addition a Block Number could be assigned to each of such 709 measurement block. The BN is generated each time a node reads the 710 data (packet counts or timestamps), and is associated with each 711 packet count and timestamp reported to or pulled by other nodes or 712 NMS. The value of BN could be calculated as the modulo of the local 713 time (when the data are read) and the interval of the marking time 714 period. 716 When the nodes or NMS see, for example, same BNs associated with two 717 packet counts from an upstream and a downstream node respectively, it 718 considers that these two packet counts corresponding to the same 719 block, i.e. that these two packet counts belong to the same block of 720 markers from the upstream and downstream node. The assumption of 721 this BN mechanism is that the measurement nodes are time 722 synchronized. This requires the measurement nodes to have a certain 723 time synchronization capability (e.g., the Network Time Protocol 724 (NTP) [RFC5905], or the IEEE 1588 Precision Time Protocol (PTP) 725 [IEEE1588]). Synchronization aspects are further discussed in 726 Section 4. 728 4.3. Packet Re-ordering 730 Due to ECMP, packet re-ordering is very common in IP network. The 731 accuracy of marking based PM, especially packet loss measurement, may 732 be affected by packet re-ordering. Take a look at the following 733 example: 735 Block : 1 | 2 | 3 | 4 | 5 |... 736 --------|---------|---------|---------|---------|---------|--- 737 Node R1 : AAAAAAA | BBBBBBB | AAAAAAA | BBBBBBB | AAAAAAA |... 738 Node R2 : AAAAABB | AABBBBA | AAABAAA | BBBBBBA | ABAAABA |... 740 Figure 5: Packet Reordering 742 In the following paragraphs an example of data correlation mechanism 743 is explained and could be use independently of the adopted solutions. 745 Most of the packet re-ordering occur at the edge of adjacent blocks, 746 and they are easy to handle if the interval of each block is 747 sufficient large. Then, it can assume that the packets with 748 different marker belong to the block that they are more close to. If 749 the interval is small, it is difficult and sometime impossible to 750 determine to which block a packet belongs. See above example, the 751 packet with the marker of "B" in block 3, there is no safe way to 752 tell whether the packet belongs to block 2 or block 4. 754 To choose a proper interval is important and how to choose a proper 755 interval is out of the scope of this document. But an implementation 756 SHOULD provide a way to configure the interval and allow a certain 757 degree of packet re-ordering. 759 5. Implementation and deployment 761 The methodology described in the previous sections can be applied in 762 various situations. Basically Alternate Marking technique could be 763 used in many cases for performance measurement. The only requirement 764 is to select and mark the flow to be monitored; in this way packets 765 are batched by the sender and each batch is alternately marked such 766 that can be easily recognized by the receiver. 768 An example of implementation and deployment is explained in the next 769 section, just to clarify how the method can work. 771 5.1. Report on the operational experiment at Telecom Italia 773 The method described in this document, also called PNPM (Packet 774 Network Performance Monitoring), has been invented and engineered in 775 Telecom Italia and it's currently being used in Telecom Italia's 776 network. The methodology has been applied by leveraging functions 777 and tools available on IP routers and it's currently being used to 778 monitor packet loss in some portions of Telecom Italia's network. 779 The application of the method to delay measurement is currently being 780 evaluated in Telecom Italia's labs. This section describes how the 781 features currently available on existing routing platforms can be 782 used to apply the method, in order to give an example of 783 implementation and deployment. 785 The fundamental steps for this implementation of the method can be 786 summarized in the following items: 788 o coloring the packets; 790 o counting the packets; 792 o collecting data and calculating the packet loss. 794 o metric transparency. 796 Before going deeper into the implementation details, it's worth 797 mentioning two different strategies that can be used when 798 implementing the method: 800 o flow-based: the flow-based strategy is used when only a limited 801 number of traffic flows need to be monitored. This could be the 802 case, for example, of IPTV channels or other specific applications 803 traffic with high QoS requirements (i.e. Mobile Backhauling 804 traffic). According to this strategy, only a subset of the flows 805 is colored. Counters for packet loss measurements can be 806 instantiated for each single flow, or for the set as a whole, 807 depending on the desired granularity. A relevant problem with 808 this approach is the necessity to know in advance the path 809 followed by flows that are subject to measurement. Path rerouting 810 and traffic load-balancing increase the issue complexity, 811 especially for unicast traffic. The problem is easier to solve 812 for multicast traffic where load balancing is seldom used, 813 especially for IPTV traffic where static joins are frequently used 814 to force traffic forwarding and replication. Another application 815 is on Mobile Backhauling, implemented with a VPN MPLS in Telecom 816 Italia's network; where the monitoring is between the Provider 817 Edge nodes of the VPN MPLS. 819 o link-based: measurements are performed on all the traffic on a 820 link by link basis. The link could be a physical link or a 821 logical link (for instance an Ethernet VLAN or a MPLS PW). 822 Counters could be instantiated for the traffic as a whole or for 823 each traffic class (in case it is desired to monitor each class 824 separately), but in the second case a couple of counters is needed 825 for each class. 827 The current implementation in Telecom Italia uses the first strategy. 828 As mentioned, the flow-based measurement requires the identification 829 of the flow to be monitored and the discovery of the path followed by 830 the selected flow. It is possible to monitor a single flow or 831 multiple flows grouped together, but in this case measurement is 832 consistent only if all the flows in the group follow the same path. 833 Moreover, a Service Provider should be aware that, if a measurement 834 is performed by grouping many flows, it is not possible to determine 835 exactly which flow was affected by packets loss. In order to have 836 measures per single flow it is necessary to configure counters for 837 each specific flow. Once the flow(s) to be monitored have been 838 identified, it is necessary to configure the monitoring on the proper 839 nodes. Configuring the monitoring means configuring the policy to 840 intercept the traffic and configuring the counters to count the 841 packets. To have just an end-to-end monitoring, it is sufficient to 842 enable the monitoring on the first and the last hop routers of the 843 path: the mechanism is completely transparent to intermediate nodes 844 and independent from the path followed by traffic flows. On the 845 contrary, to monitor the flow on a hop-by-hop basis along its whole 846 path it is necessary to enable the monitoring on every node from the 847 source to the destination. In case the exact path followed by the 848 flow is not known a priori (i.e. the flow has multiple paths to reach 849 the destination) it is necessary to enable the monitoring system on 850 every path: counters on interfaces traversed by the flow will report 851 packet count, counters on other interfaces will be null. 853 5.1.1. Coloring the packets 855 The coloring operation is fundamental in order to create packet 856 blocks. This implies choosing where to activate the coloring and how 857 to color the packets. 859 In case of flow-based measurements, it is desirable, in general, to 860 have a single coloring node because it is easier to manage and 861 doesn't rise any risk of conflict (consider the case where two nodes 862 color the same flow). Thus it is advantageous to color the flow as 863 close as possible to the source. In addition, coloring a flow close 864 to the source allows an end-to-end measure if a measurement point is 865 enabled on the last-hop router as well. The only requirement is that 866 the coloring must change periodically and every node along the path 867 must be able to identify unambiguously the colored packets. For 868 link-based measurements, all traffic needs to be colored when 869 transmitted on the link. If the traffic had already been colored, 870 then it has to be re-colored because the color must be consistent on 871 the link. This means that each hop along the path must (re-)color 872 the traffic; the color is not required to be consistent along 873 different links. 875 Traffic coloring can be implemented by setting a specific bit in the 876 packet header and changing the value of that bit periodically. With 877 current router implementations, only QoS related fields and features 878 offer the required flexibility to set bits in the packet header. In 879 case a Service Provider only uses the three most significant bits of 880 the DSCP field (corresponding to IP Precedence) for QoS 881 classification and queuing, it is possible to use the two less 882 significant bits of the DSCP field (bit 0 and bit 1) to implement the 883 method without affecting QoS policies. One of the two bits (bit 0) 884 could be used to identify flows subject to traffic monitoring (set to 885 1 if the flow is under monitoring, otherwise it is set to 0), while 886 the second (bit 1) can be used for coloring the traffic (switching 887 between values 0 and 1, corresponding to color A and B) and creating 888 the blocks. 890 In practice, coloring the traffic using the DSCP field can be 891 implemented by configuring on the router output interface an access 892 list that intercepts the flow(s) to be monitored and applies to them 893 a policy that sets the DSCP field accordingly. Since traffic 894 coloring has to be switched between the two values over time, the 895 policy needs to be modified periodically: an automatic script can be 896 used perform this task on the basis of a fixed timer. In Telecom 897 Italia's implementation this timer is set to 5 minutes: this value 898 showed to be a good compromise between measurement frequency and 899 stability of the measurement (i.e. possibility to collect all the 900 measures referring to the same block). 902 5.1.2. Counting the packets 904 Assuming that the coloring of the packets is performed only by the 905 source node, the nodes between source and destination (included) have 906 to count the colored packets that they receive and forward: this 907 operation can be enabled on every router along the path or only on a 908 subset, depending on which network segment is being monitored (a 909 single link, a particular metro area, the backbone, the whole path). 911 Since the color switches periodically between two values, two 912 counters (one for each value) are needed: one counter for packets 913 with color A and one counter for packets with color B. For each flow 914 (or group of flows) being monitored and for every interface where the 915 monitoring is active, a couple of counters is needed. For example, 916 in order to monitor separately 3 flows on a router with 4 interfaces 917 involved, 24 counters are needed (2 counters for each of the 3 flows 918 on each of the 4 interfaces). If traffic is colored using the DSCP 919 field, as in Telecom Italia's implementation, an access-list that 920 matches specific DSCP values can be used to count the packets of the 921 flow(s) being monitored. 923 In case of link-based measurements the behaviour is similar except 924 that coloring and counting operations are performed on a link by link 925 basis at each endpoint of the link. 927 Another important aspect to take into consideration is when to read 928 the counters: in order to count the exact number of packets of a 929 block the routers must perform this operation when that block has 930 ended: in other words, the counter for color A must be read when the 931 current block has color B, in order to be sure that the value of the 932 counter is stable. This task can be accomplished in two ways. The 933 general approach suggests to read the counters periodically, many 934 times during a block duration, and to compare these successive 935 readings: when the counter stops incrementing means that the current 936 block has ended and its value can be elaborated safely. 937 Alternatively, if the coloring operation is performed on the basis of 938 a fixed timer, it is possible to configure the reading of the 939 counters according to that timer: for example, if each block is 5 940 minutes long, reading the counter for color A every 5 minute in the 941 middle of the subsequent block (with color B) is a safe choice. A 942 sufficient margin should be considered between the end of a block and 943 the reading of the counter, in order to take into account any out-of- 944 order packets. The choice of a 5 minutes timer for colore switching 945 was also inspired by these considerations. 947 5.1.3. Collecting data and calculating packet loss 949 The nodes enabled to perform performance monitoring collect the value 950 of the counters, but they are not able to directly use this 951 information to measure packet loss, because they only have their own 952 samples. For this reason, an external Network Management System 953 (NMS) is required to collect and elaborate data and to perform packet 954 loss calculation. The NMS compares the values of counters from 955 different nodes and can calculate if some packets were lost (even a 956 single packet) and also where packets were lost. 958 The value of the counters needs to be transmitted to the NMS as soon 959 as it has been read. This can be accomplished by using SNMP or FTP 960 and can be done in Push Mode or Polling Mode. In the first case, 961 each router periodically sends the information to the NMS, in the 962 latter case it is the NMS that periodically polls routers to collect 963 information. In any case, the NMS has to collect all the relevant 964 values from all the routers within one cycle of the timer (5 965 minutes). 967 If link-based measurement is used, it would be possible to use a 968 protocol to exchange values of counters between the two endpoints in 969 order to let them perform the packet loss calculation for each 970 traffic direction. A similar approach could be complicated if 971 applied to a flow-based measurement. 973 5.1.4. Metric transparency 975 In Telecom Italia's implementation the source node colors the packets 976 with a policy that is modified periodically via an automatic script 977 in order to alternate the DSCP field of the packets. The nodes 978 between source and destination (included) have to count with an 979 access-list the colored packets that they receive and forward. 981 Moreover the destination node has an important role: the colored 982 packets are intercepted and a policy restores and sets the DSCP field 983 of all the packets to the initial value. In this way the metric is 984 transparent because outside the section of the network under 985 monitoring the traffic flow is unchanged. 987 In such a case, thanks to this restoring technique, network elements 988 outside the Alternate Marking monitoring domain (e.g. the two 989 Provider Edge nodes of the Mobile Backhauling VPN MPLS) are totally 990 anaware that packets were marked. So this restoring technique makes 991 Alternate Marking completely transparent outside its monitoring 992 domain. 994 5.2. IP flow performance measurement (IPFPM) 996 This application of marking method is described in 997 [I-D.chen-ippm-coloring-based-ipfpm-framework]. 999 5.3. OAM Passive Performance Measurement 1001 In [I-D.ietf-bier-mpls-encapsulation] two OAM bits from Bit Index 1002 Explicit Replication (BIER) Header are reserved for the passive 1003 performance measurement marking method. [I-D.ietf-bier-pmmm-oam] 1004 details the measurement for multicast service over BIER domain. 1006 [I-D.mirsky-sfc-pmamm] describes how the alternate marking method can 1007 be used as the passive performance measurement method in a Service 1008 Function Chaining (SFC) domain. 1010 The application of the marking method to Network Virtualization 1011 Overlays (NVO3) protocols is a work in progress. 1013 5.4. RFC6374 Use Case 1015 RFC6374 [RFC6374] uses the LM packet as the packet accounting 1016 demarcation point. Unfortunately this gives rise to a number of 1017 problems that may lead to significant packet accounting errors in 1018 certain situations. [I-D.ietf-mpls-flow-ident] discusses the desired 1019 capabilities for MPLS flow identification in order to perform a 1020 better in-band performance monitoring of user data packets. A method 1021 of accomplishing identification is Synonymous Flow Labels (SFL) 1022 introduced in [I-D.bryant-mpls-sfl-framework], while 1023 [I-D.ietf-mpls-rfc6374-sfl] describes RFC6374 performance 1024 measurements with SFL. 1026 5.5. Application to active performance measurement 1028 [I-D.fioccola-ippm-alt-mark-active] describes how to extend the 1029 existing Active Measurement Protocol, in order to implement alternate 1030 marking methodology. [I-D.fioccola-ippm-rfc6812-alt-mark-ext] 1031 describes an extension to the Cisco SLA Protocol Measurement-Type 1032 UDP-Measurement. 1034 6. Hybrid measurement 1036 The method has been explicitly designed for passive measurements but 1037 it can also be used with active measurements. In order to have both 1038 end to end measurements and intermediate measurements (hybrid 1039 measurements) two end points can exchanges artificial traffic flows 1040 and apply alternate marking over these flows. In the intermediate 1041 points artificial traffic is managed in the same way as real traffic 1042 and measured as specified before. So the application of marking 1043 method can simplify also the active measurement, as explained in 1044 [I-D.fioccola-ippm-alt-mark-active]. 1046 7. Compliance with RFC6390 guidelines 1048 RFC6390 [RFC6390] defines a framework and a process for developing 1049 Performance Metrics for protocols above and below the IP layer (such 1050 as IP-based applications that operate over reliable or datagram 1051 transport protocols). 1053 This document doesn't aim to propose a new Performance Metric but a 1054 new method of measurement for a few Performance Metrics that have 1055 already been standardized. Nevertheless, it's worth applying 1056 [RFC6390] guidelines to the present document, in order to provide a 1057 more complete and coherent description of the proposed method. We 1058 used a subset of the Performance Metric Definition template defined 1059 by [RFC6390]. 1061 o Metric name and description: as already stated, this document 1062 doesn't propose any new Performance Metric. On the contrary, it 1063 describes a novel method for measuring packet loss [RFC2680]. The 1064 same concept, with small differences, can also be used to measure 1065 delay [RFC2679], and jitter [RFC3393]. The document mainly 1066 describes the applicability to packet loss measurement. 1068 o Method of Measurement or Calculation: according to the method 1069 described in the previous sections, the number of packets lost is 1070 calculated by subtracting the value of the counter on the source 1071 node from the value of the counter on the destination node. Both 1072 counters must refer to the same color. The calculation is 1073 performed when the value of the counters is in a steady state. 1075 o Units of Measurement: the method calculates and reports the exact 1076 number of packets sent by the source node and not received by the 1077 destination node. 1079 o Measurement Points: the measurement can be performed between 1080 adjacent nodes, on a per-link basis, or along a multi-hop path, 1081 provided that the traffic under measurement follows that path. In 1082 case of a multi-hop path, the measurements can be performed both 1083 end-to-end and hop-by-hop. 1085 o Measurement Timing: the method have a constraint on the frequency 1086 of measurements. In order to perform a measure, the counter must 1087 be in a steady state: this happens when the traffic is being 1088 colored with the alternate color; for example in the Telecom 1089 Italia application of the method the time interval is set to 5 1090 minutes. 1092 o Implementation: the Telecom Italia application of the method uses 1093 two encodings of the DSCP field to color the packets; this enables 1094 the use of policy configurations on the router to color the 1095 packets and accordingly configure the counter for each color. The 1096 path followed by traffic being measured should be known in advance 1097 in order to configure the counters along the path and be able to 1098 compare the correct values. 1100 o Use and Applications: the method can be used to measure packet 1101 loss with high precision on live traffic; moreover, by combining 1102 end-to-end and per-link measurements, the method is useful to 1103 pinpoint the single link that is experiencing loss events. 1105 o Reporting Model: the value of the counters has to be sent to a 1106 centralized management system that perform the calculations; such 1107 samples must contain a reference to the time interval they refer 1108 to, so that the management system can perform the correct 1109 correlation; the samples have to be sent while the corresponding 1110 counter is in a steady state (within a time interval), otherwise 1111 the value of the sample should be stored locally. 1113 o Dependencies: the values of the counters have to be correlated to 1114 the time interval they refer to; moreover, as far the Telecom 1115 Italia application of the method is based on DSCP values, there 1116 are significant dependencies on the usage of the DSCP field: it 1117 must be possible to rely on unused DSCP values without affecting 1118 QoS-related configuration and behavior; moreover, the intermediate 1119 nodes must not change the value of the DSCP field not to alter the 1120 measurement. 1122 o Organization of Results: the method of measurement produces 1123 singletons. 1125 o Parameters: currently, the main parameter of the method is the 1126 time interval used to alternate the colors and read the counters. 1128 8. Security Considerations 1130 This document specifies a method to perform measurements in the 1131 context of a Service Provider's network and has not been developed to 1132 conduct Internet measurements, so it does not directly affect 1133 Internet security nor applications which run on the Internet. 1134 However, implementation of this method must be mindful of security 1135 and privacy concerns. 1137 There are two types of security concerns: potential harm caused by 1138 the measurements and potential harm to the measurements. For what 1139 concerns the first point, the measurements described in this document 1140 are passive, so there are no packets injected into the network 1141 causing potential harm to the network itself and to data traffic. 1142 Nevertheless, the method implies modifications on the fly to the IP 1143 header of data packets: this must be performed in a way that doesn't 1144 alter the quality of service experienced by packets subject to 1145 measurements and that preserve stability and performance of routers 1146 doing the measurements. The measurements themselves could be harmed 1147 by routers altering the marking of the packets, or by an attacker 1148 injecting artificial traffic. Authentication techniques, such as 1149 digital signatures, may be used where appropriate to guard against 1150 injected traffic attacks. 1152 The privacy concerns of network measurement are limited because the 1153 method only relies on information contained in the IP header without 1154 any release of user data. 1156 The measurement itself may be affected by routers (or other network 1157 devices) along the path of IP packets intentionally altering the 1158 value of marking bits of packets. As mentioned above, the mechanism 1159 specified in this document is just in the context of one Service 1160 Provider's network, and thus the routers (or other network devices) 1161 are locally administered and this type of attack can be avoided. 1163 One of the main security threats in OAM protocols is network 1164 reconnaissance; an attacker can gather information about the network 1165 performance by passively eavesdropping to OAM messages. The 1166 advantage of the methods described in this document is that the 1167 marking bits are the only information that is exchanged between the 1168 network devices. Therefore, passive eavesdropping to data plane 1169 traffic does not allow attackers to gain information about the 1170 network performance. 1172 Delay attacks are another potential threat in the context of this 1173 document. Delay measurement is performed using a specific packet in 1174 each block, marked by a dedicated color bit. Therefore, a man-in- 1175 the-middle attacker can selectively induce synthetic delay only to 1176 delay-colored packets, causing systematic error in the delay 1177 measurements. As discussed in previous sections, the methods 1178 described in this document rely on an underlying time synchronization 1179 protocol. Thus, by attacking the time protocol an attacker can 1180 potentially compromise the integrity of the measurement. A detailed 1181 discussion about the threats against time protocols and how to 1182 mitigate them is presented in RFC 7384 [RFC7384]. 1184 9. Conclusions 1186 The advantages of the method described in this document are: 1188 o easy implementation: it can be implemented using features already 1189 available on major routing platforms; 1191 o low computational effort: the additional load on processing is 1192 negligible; 1194 o accurate packet loss measurement: single packet loss granularity 1195 is achieved with a passive measurement; 1197 o potential applicability to any kind of packet/frame -based 1198 traffic: Ethernet, IP, MPLS, etc., both unicast and multicast; 1200 o robustness: the method can tolerate out of order packets and it's 1201 not based on "special" packets whose loss could have a negative 1202 impact; 1204 o no interoperability issues: the features required to implement the 1205 method are available on all current routing platforms. 1207 The method doesn't raise any specific need for protocol extension, 1208 but it could be further improved by means of some extension to 1209 existing protocols. Specifically, the use of DiffServ bits for 1210 coloring the packets could not be a viable solution in some cases: a 1211 standard method to color the packets for this specific application 1212 could be beneficial. 1214 10. IANA Considerations 1216 There are no IANA actions required. 1218 11. Acknowledgements 1220 The previous IETF drafts about this technique were: 1221 [I-D.cociglio-mboned-multicast-pm] and [I-D.tempia-opsawg-p3m]. 1222 There are some references to this methodology in other IETF works 1223 (e.g. [I-D.ietf-mpls-flow-ident], [I-D.bryant-mpls-sfl-framework] 1224 [I-D.ietf-mpls-rfc6374-sfl], [I-D.ietf-bier-mpls-encapsulation], 1225 [I-D.ietf-bier-pmmm-oam] 1226 [I-D.chen-ippm-coloring-based-ipfpm-framework]). 1228 In addition the authors would like to thank Alberto Tempia Bonda, 1229 Domenico Laforgia, Daniele Accetta and Mario Bianchetti for their 1230 contribution to the definition and the implementation of the method. 1232 12. References 1234 12.1. Normative References 1236 [RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 1237 Delay Metric for IPPM", RFC 2679, DOI 10.17487/RFC2679, 1238 September 1999, . 1240 [RFC2680] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 1241 Packet Loss Metric for IPPM", RFC 2680, 1242 DOI 10.17487/RFC2680, September 1999, 1243 . 1245 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 1246 Metric for IP Performance Metrics (IPPM)", RFC 3393, 1247 DOI 10.17487/RFC3393, November 2002, 1248 . 1250 12.2. Informative References 1252 [I-D.bryant-mpls-sfl-framework] 1253 Bryant, S., Chen, M., Li, Z., Swallow, G., Sivabalan, S., 1254 and G. Mirsky, "Synonymous Flow Label Framework", draft- 1255 bryant-mpls-sfl-framework-04 (work in progress), April 1256 2017. 1258 [I-D.chen-ippm-coloring-based-ipfpm-framework] 1259 Chen, M., Zheng, L., Mirsky, G., Fioccola, G., and T. 1260 Mizrahi, "IP Flow Performance Measurement Framework", 1261 draft-chen-ippm-coloring-based-ipfpm-framework-06 (work in 1262 progress), March 2016. 1264 [I-D.cociglio-mboned-multicast-pm] 1265 Cociglio, M., Capello, A., Bonda, A., and L. Castaldelli, 1266 "A method for IP multicast performance monitoring", draft- 1267 cociglio-mboned-multicast-pm-01 (work in progress), 1268 October 2010. 1270 [I-D.fioccola-ippm-alt-mark-active] 1271 Fioccola, G., Clemm, A., Bryant, S., Cociglio, M., 1272 Chandramouli, M., and A. Capello, "Alternate Marking 1273 Extension to Active Measurement Protocol", draft-fioccola- 1274 ippm-alt-mark-active-01 (work in progress), March 2017. 1276 [I-D.fioccola-ippm-rfc6812-alt-mark-ext] 1277 Fioccola, G., Clemm, A., Cociglio, M., Chandramouli, M., 1278 and A. Capello, "Alternate Marking Extension to Cisco SLA 1279 Protocol RFC6812", draft-fioccola-ippm-rfc6812-alt-mark- 1280 ext-01 (work in progress), March 2016. 1282 [I-D.ietf-bier-mpls-encapsulation] 1283 Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J., 1284 Aldrin, S., and I. Meilik, "Encapsulation for Bit Index 1285 Explicit Replication in MPLS and non-MPLS Networks", 1286 draft-ietf-bier-mpls-encapsulation-07 (work in progress), 1287 June 2017. 1289 [I-D.ietf-bier-pmmm-oam] 1290 Mirsky, G., Zheng, L., Chen, M., and G. Fioccola, 1291 "Performance Measurement (PM) with Marking Method in Bit 1292 Index Explicit Replication (BIER) Layer", draft-ietf-bier- 1293 pmmm-oam-01 (work in progress), January 2017. 1295 [I-D.ietf-mpls-flow-ident] 1296 Bryant, S., Pignataro, C., Chen, M., Li, Z., and G. 1297 Mirsky, "MPLS Flow Identification Considerations", draft- 1298 ietf-mpls-flow-ident-04 (work in progress), February 2017. 1300 [I-D.ietf-mpls-rfc6374-sfl] 1301 Bryant, S., Chen, M., Li, Z., Swallow, G., Sivabalan, S., 1302 Mirsky, G., and G. Fioccola, "RFC6374 Synonymous Flow 1303 Labels", draft-ietf-mpls-rfc6374-sfl-00 (work in 1304 progress), June 2017. 1306 [I-D.mirsky-sfc-pmamm] 1307 Mirsky, G. and G. Fioccola, "Performance Measurement (PM) 1308 with Alternate Marking Method in Service Function Chaining 1309 (SFC) Domain", draft-mirsky-sfc-pmamm-00 (work in 1310 progress), April 2017. 1312 [I-D.tempia-opsawg-p3m] 1313 Capello, A., Cociglio, M., Castaldelli, L., and A. Bonda, 1314 "A packet based method for passive performance 1315 monitoring", draft-tempia-opsawg-p3m-04 (work in 1316 progress), February 2014. 1318 [RFC5481] Morton, A. and B. Claise, "Packet Delay Variation 1319 Applicability Statement", RFC 5481, DOI 10.17487/RFC5481, 1320 March 2009, . 1322 [RFC6374] Frost, D. and S. Bryant, "Packet Loss and Delay 1323 Measurement for MPLS Networks", RFC 6374, 1324 DOI 10.17487/RFC6374, September 2011, 1325 . 1327 [RFC6390] Clark, A. and B. Claise, "Guidelines for Considering New 1328 Performance Metric Development", BCP 170, RFC 6390, 1329 DOI 10.17487/RFC6390, October 2011, 1330 . 1332 [RFC6703] Morton, A., Ramachandran, G., and G. Maguluri, "Reporting 1333 IP Network Performance Metrics: Different Points of View", 1334 RFC 6703, DOI 10.17487/RFC6703, August 2012, 1335 . 1337 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 1338 Weingarten, "An Overview of Operations, Administration, 1339 and Maintenance (OAM) Tools", RFC 7276, 1340 DOI 10.17487/RFC7276, June 2014, 1341 . 1343 [RFC7384] Mizrahi, T., "Security Requirements of Time Protocols in 1344 Packet Switched Networks", RFC 7384, DOI 10.17487/RFC7384, 1345 October 2014, . 1347 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 1348 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 1349 May 2016, . 1351 Authors' Addresses 1353 Giuseppe Fioccola (editor) 1354 Telecom Italia 1355 Via Reiss Romoli, 274 1356 Torino 10148 1357 Italy 1359 Email: giuseppe.fioccola@telecomitalia.it 1361 Alessandro Capello (editor) 1362 Telecom Italia 1363 Via Reiss Romoli, 274 1364 Torino 10148 1365 Italy 1367 Email: alessandro.capello@telecomitalia.it 1369 Mauro Cociglio 1370 Telecom Italia 1371 Via Reiss Romoli, 274 1372 Torino 10148 1373 Italy 1375 Email: mauro.cociglio@telecomitalia.it 1377 Luca Castaldelli 1378 Telecom Italia 1379 Via Reiss Romoli, 274 1380 Torino 10148 1381 Italy 1383 Email: luca.castaldelli@telecomitalia.it 1385 Mach(Guoyi) Chen (editor) 1386 Huawei Technologies 1388 Email: mach.chen@huawei.com 1390 Lianshu Zheng (editor) 1391 Huawei Technologies 1393 Email: vero.zheng@huawei.com 1394 Greg Mirsky (editor) 1395 ZTE 1396 USA 1398 Email: gregimirsky@gmail.com 1400 Tal Mizrahi (editor) 1401 Marvell 1402 6 Hamada st. 1403 Yokneam 1404 Israel 1406 Email: talmi@marvell.com