idnits 2.17.1 draft-ietf-ippm-alt-mark-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 10, 2016) is 2869 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 2679 (Obsoleted by RFC 7679) ** Obsolete normative reference: RFC 2680 (Obsoleted by RFC 7680) == Outdated reference: A later version (-04) exists of draft-bryant-mpls-rfc6374-sfl-00 == Outdated reference: A later version (-05) exists of draft-bryant-mpls-sfl-framework-00 == Outdated reference: A later version (-12) exists of draft-ietf-bier-mpls-encapsulation-04 == Outdated reference: A later version (-07) exists of draft-ietf-mpls-flow-ident-00 Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Capello 3 Internet-Draft M. Cociglio 4 Intended status: Experimental G. Fioccola 5 Expires: December 12, 2016 L. Castaldelli 6 Telecom Italia 7 A. Tempia Bonda 8 June 10, 2016 10 Alternate Marking method for passive performance monitoring 11 draft-ietf-ippm-alt-mark-00 13 Abstract 15 This document describes a passive method to perform packet loss, 16 delay and jitter measurements on live traffic. This method is based 17 on Alternate Marking (Coloring) technique. A report on the 18 operational experiment done at Telecom Italia is explained in order 19 to give an example and show the method applicability. This technique 20 can be applied in various situations as detailed in this document. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on December 12, 2016. 39 Copyright Notice 41 Copyright (c) 2016 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Overview of the method . . . . . . . . . . . . . . . . . . . 4 58 3. Detailed description of the method . . . . . . . . . . . . . 5 59 3.1. Packet loss measurement . . . . . . . . . . . . . . . . . 5 60 3.2. One-way delay measurement . . . . . . . . . . . . . . . . 9 61 3.2.1. Single marking methodology . . . . . . . . . . . . . 9 62 3.2.2. Average delay . . . . . . . . . . . . . . . . . . . . 11 63 3.2.3. Double marking methodology . . . . . . . . . . . . . 11 64 3.3. Delay variation measurement . . . . . . . . . . . . . . . 12 65 4. Implementation and deployment . . . . . . . . . . . . . . . . 12 66 4.1. Report on the operational experiment at Telecom Italia . 13 67 4.1.1. Coloring the packets . . . . . . . . . . . . . . . . 14 68 4.1.2. Counting the packets . . . . . . . . . . . . . . . . 15 69 4.1.3. Collecting data and calculating packet loss . . . . . 16 70 4.1.4. Metric transparency . . . . . . . . . . . . . . . . . 17 71 4.2. IP flow performance measurement (IPFPM) . . . . . . . . . 17 72 4.3. Performance Measurement Marking Method in BIER Domain . . 17 73 4.4. RFC6374 Use Case . . . . . . . . . . . . . . . . . . . . 17 74 4.5. Application to active performance measurement . . . . . . 18 75 5. Hybrid measurement . . . . . . . . . . . . . . . . . . . . . 18 76 6. Compliance with RFC6390 guidelines . . . . . . . . . . . . . 18 77 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 78 8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 20 79 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 80 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 81 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 82 11.1. Normative References . . . . . . . . . . . . . . . . . . 21 83 11.2. Informative References . . . . . . . . . . . . . . . . . 22 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 86 1. Introduction 88 Nowadays, most of the traffic in Service Providers' networks carries 89 real time content. These contents are highly sensitive to packet 90 loss [RFC2680], while interactive contents are sensitive to delay 91 [RFC2679], and jitter [RFC3393]. 93 In view of this scenario, Service Providers need methodologies and 94 tools to monitor and measure network performances with an adequate 95 accuracy, in order to constantly control the quality of experience 96 perceived by their customers. On the other hand, performance 97 monitoring provides useful information for improving network 98 management (e.g. isolation of network problems, troubleshooting, 99 etc.). 101 A lot of work related to OAM, that includes also performance 102 monitoring techniques, has been done by Standards Developing 103 Organizations: [RFC7276] provides a good overview of existing OAM 104 mechanisms defined in IETF, ITU-T and IEEE. Considering IETF, a lot 105 of work has been done on fault detection and connectivity 106 verification, while a minor effort has been dedicated so far to 107 performance monitoring. The IPPM WG has defined standard metrics to 108 measure network performance; however, the methods developed in the WG 109 mainly refer to active measurement techniques. More recently, the 110 MPLS WG has defined mechanisms for measuring packet loss, one-way and 111 two-way delay, and delay variation in MPLS networks[RFC6374], but 112 their applicability to passive measurements has some limitations, 113 especially for pure connection-less networks. 115 The lack of adequate tools to measure packet loss with the desired 116 accuracy drove an effort to design a new method for the performance 117 monitoring of live traffic, possibly easy to implement and deploy. 118 The effort led to the method described in this document: basically, 119 it is a passive performance monitoring technique, potentially 120 applicable to any kind of packet based traffic, including Ethernet, 121 IP, and MPLS, both unicast and multicast. The method addresses 122 primarily packet loss measurement, but it can be easily extended to 123 one-way delay and delay variation measurements as well. It doesn't 124 require any protocol extension or interaction with existing 125 protocols, thus avoiding any interoperability issue. Even if the 126 method doesn't raise any specific need for standardization, it could 127 be further improved by means of some extension to existing protocols, 128 but this aspect is left for further study and it is out of the scope 129 of this document. 131 The method has been explicitly designed for passive measurements but 132 it can also be used with active probes. Passive measurements are 133 usually more easily understood by customers and provide a much better 134 accuracy, especially for packet loss measurements. 136 The method described in this document, also called PNPM (Packet 137 Network Performance Monitoring), has been invented and engineered in 138 Telecom Italia and it's currently being used in Telecom Italia's 139 network. The previous IETF drafts about this technique were: 140 [I-D.cociglio-mboned-multicast-pm] and [I-D.tempia-opsawg-p3m]. 141 There are some references to this methodology in other IETF works 142 (e.g. [I-D.ietf-mpls-flow-ident], [I-D.bryant-mpls-sfl-framework] 143 [I-D.bryant-mpls-rfc6374-sfl], [I-D.ietf-bier-mpls-encapsulation], 145 [I-D.mirsky-bier-pmmm-oam] 146 [I-D.chen-ippm-coloring-based-ipfpm-framework]). 148 This document is organized as follows: 150 o Section 2 gives an overview of the method, including a comparison 151 with different measurement strategies; 153 o Section 3 describes the method in detail; 155 o Section 4 reports examples of implementation and deployment of the 156 method. Furthermore the operational experiment done at Telecom 157 Italia is described; 159 o Section 5 includes some considerations about security aspects; 161 o Section 6 finally summarizes some concluding remarks. 163 2. Overview of the method 165 In order to perform packet loss measurements on a live traffic flow, 166 different approaches exist. The most intuitive one consists in 167 numbering the packets, so that each router that receives the flow can 168 immediately detect a packet missing. This approach, though very 169 simple in theory, is not simple to achieve: it requires the insertion 170 of a sequence number into each packet and the devices must be able to 171 extract the number and check it in real time. Such a task can be 172 difficult to implement on live traffic: if UDP is used as the 173 transport protocol, the sequence number is not available; on the 174 other hand, if a higher layer sequence number (e.g. in the RTP 175 header) is used, extracting that information from each packet and 176 process it in real time could overload the device. 178 An alternate approach is to count the number of packets sent on one 179 end, the number of packets received on the other end, and to compare 180 the two values. This operation is much simpler to implement, but 181 requires that the devices performing the measurement are in sync: in 182 order to compare two counters it is required that they refer exactly 183 to the same set of packets. Since a flow is continuous and cannot be 184 stopped when a counter has to be read, it could be difficult to 185 determine exactly when to read the counter. A possible solution to 186 overcome this problem is to virtually split the flow in consecutive 187 blocks by inserting periodically a delimiter so that each counter 188 refers exactly to the same block of packets. The delimiter could be 189 for example a special packet inserted artificially into the flow. 190 However, delimiting the flow using specific packets has some 191 limitations. First, it requires generating additional packets within 192 the flow and requires the equipment to be able to process those 193 packets. In addition, the method is vulnerable to out of order 194 reception of delimiting packets and, to a lesser extent, to their 195 loss. 197 The method proposed in this document follows the second approach, but 198 it doesn't use additional packets to virtually split the flow in 199 blocks. Instead, it "colors" the packets so that the packets 200 belonging to the same block will have the same color, whilst 201 consecutive blocks will have different colors. Each change of color 202 represents a sort of auto-synchronization signal that guarantees the 203 consistency of measurements taken by different devices along the 204 path. 206 Figure 1 represents a very simple network and shows how the method 207 can be used to measure packet loss on different network segments: by 208 enabling the measurement on several interfaces along the path, it is 209 possible to perform link monitoring, node monitoring or end-to-end 210 monitoring. The method is flexible enough to measure packet loss on 211 any segment of the network and can be used to isolate the faulty 212 element. 214 Traffic flow 215 ========================================================> 216 +------+ +------+ +------+ +------+ 217 ---<> R1 <>-----<> R2 <>-----<> R3 <>-----<> R4 <>--- 218 +------+ +------+ +------+ +------+ 219 . . . . . . 220 . . . . . . 221 . <------> <-------> . 222 . Node Packet Loss Link Packet Loss . 223 . . 224 <---------------------------------------------------> 225 End-to-End Packet loss 227 Figure 1: Available measurements 229 3. Detailed description of the method 231 This section describes in detail how the method operate. A special 232 emphasis is given to the measurement of packet loss, that represents 233 the core application of the method, but applicability to delay and 234 jitter measurements is also considered. 236 3.1. Packet loss measurement 238 The basic idea is to virtually split traffic flows into consecutive 239 blocks: each block represents a measurable entity unambiguously 240 recognizable by all network devices along the path. By counting the 241 number of packets in each block and comparing the values measured by 242 different network devices along the path, it is possible to measure 243 packet loss occurred in any single block between any two points. 245 As discussed in the previous section, a simple way to create the 246 blocks is to "color" the traffic (two colors are sufficient) so that 247 packets belonging to different consecutive blocks will have different 248 colors. Whenever the color changes, the previous block terminates 249 and the new one begins. Hence, all the packets belonging to the same 250 block will have the same color and packets of different consecutive 251 blocks will have different colors. The number of packets in each 252 block depends on the criterion used to create the blocks: if the 253 color is switched after a fixed number of packets, then each block 254 will contain the same number of packets (except for any losses); but 255 if the color is switched according to a fixed timer, then the number 256 of packets may be different in each block depending on the packet 257 rate. 259 The following figure shows how a flow looks like when it is split in 260 traffic blocks with colored packets. 262 A: packet with A coloring 263 B: packet with B coloring 265 | | | | | 266 | | Traffic flow | | 267 -------------------------------------------------------------------> 268 BBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA 269 -------------------------------------------------------------------> 270 ... | Block 5 | Block 4 | Block 3 | Block 2 | Block 1 271 | | | | | 273 Figure 2: Traffic coloring 275 Figure 3 shows how the method can be used to measure link packet loss 276 between two adjacent nodes. 278 Referring to the figure, let's assume we want to monitor the packet 279 loss on the link between two routers: router R1 and router R2. 280 According to the method, the traffic is colored alternatively with 281 two different colors, A and B. Whenever the color changes, the 282 transition generates a sort of square-wave signal, as depicted in the 283 following figure. 285 Color A ----------+ +-----------+ +---------- 286 | | | | 287 Color B +-----------+ +-----------+ 288 Block n ... Block 3 Block 2 Block 1 289 <---------> <---------> <---------> <---------> <---------> 291 Traffic flow 292 ===========================================================> 293 Color ...AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA... 294 ===========================================================> 296 Figure 3: Computation of link packet loss 298 Traffic coloring could be done by R1 itself or by an upward router. 299 R1 needs two counters, C(A)R1 and C(B)R1, on its egress interface: 300 C(A)R1 counts the packets with color A and C(B)R1 counts those with 301 color B. As long as traffic is colored A, only counter C(A)R1 will 302 be incremented, while C(B)R1 is not incremented; vice versa, when the 303 traffic is colored as B, only C(B)R1 is incremented. C(A)R1 and 304 C(B)R1 can be used as reference values to determine the packet loss 305 from R1 to any other measurement point down the path. Router R2, 306 similarly, will need two counters on its ingress interface, C(A)R2 307 and C(B)R2, to count the packets received on that interface and 308 colored with color A and B respectively. When an A block ends, it is 309 possible to compare C(A)R1 and C(A)R2 and calculate the packet loss 310 within the block; similarly, when the successive B block terminates, 311 it is possible to compare C(B)R1 with C(B)R2, and so on for every 312 successive block. 314 Likewise, by using two counters on R2 egress interface it is possible 315 to count the packets sent out of R2 interface and use them as 316 reference values to calculate the packet loss from R2 to any 317 measurement point down R2. 319 Using a fixed timer for color switching offers a better control over 320 the method: the (time) length of the blocks can be chosen large 321 enough to simplify the collection and the comparison of measures 322 taken by different network devices. It's preferable to read the 323 value of the counters not immediately after the color switch: some 324 packets could arrive out of order and increment the counter 325 associated to the previous block (color), so it is worth waiting for 326 some time. A safe choice is to wait L/2 time units (where L is the 327 duration for each block) after the color switch, to read the still 328 counter of the previous color, so the possibility to read a running 329 counter instead of a still one is minimized. The drawback is that 330 the longer the duration of the block, the less frequent the 331 measurement can be taken. 333 The following table shows how the counters can be used to calculate 334 the packet loss between R1 and R2. The first column lists the 335 sequence of traffic blocks while the other columns contain the 336 counters of A-colored packets and B-colored packets for R1 and R2. 337 In this example, we assume that the values of the counters are reset 338 to zero whenever a block ends and its associated counter has been 339 read: with this assumption, the table shows only relative values, 340 that is the exact number of packets of each color within each block. 341 If the values of the counters were not reset, the table would contain 342 cumulative values, but the relative values could be determined simply 343 by difference from the value of the previous block of the same color. 345 The color is switched on the basis of a fixed timer (not shown in the 346 table), so the number of packets in each block is different. 348 +-------+--------+--------+--------+--------+------+ 349 | Block | C(A)R1 | C(B)R1 | C(A)R2 | C(B)R2 | Loss | 350 +-------+--------+--------+--------+--------+------+ 351 | 1 | 375 | 0 | 375 | 0 | 0 | 352 | | | | | | | 353 | 2 | 0 | 388 | 0 | 388 | 0 | 354 | | | | | | | 355 | 3 | 382 | 0 | 381 | 0 | 1 | 356 | | | | | | | 357 | 4 | 0 | 377 | 0 | 374 | 3 | 358 | | | | | | | 359 | ... | ... | ... | ... | ... | ... | 360 | | | | | | | 361 | n | 0 | 387 | 0 | 387 | 0 | 362 | | | | | | | 363 | n+1 | 379 | 0 | 377 | 0 | 2 | 364 +-------+--------+--------+--------+--------+------+ 366 Table 1: Evaluation of counters for packet loss measurements 368 During an A block (blocks 1, 3 and n+1), all the packets are 369 A-colored, therefore the C(A) counters are incremented to the number 370 seen on the interface, while C(B) counters are zero. Vice versa, 371 during a B block (blocks 2, 4 and n), all the packets are B-colored: 372 C(A) counters are zero, while C(B) counters are incremented. 374 When a block ends (because of color switching) the relative counters 375 stop incrementing and it is possible to read them, compare the values 376 measured on router R1 and R2 and calculate the packet loss within 377 that block. 379 For example, looking at the table above, during the first block 380 (A-colored), C(A)R1 and C(A)R2 have the same value (375), which 381 corresponds to the exact number of packets of the first block (no 382 loss). Also during the second block (B-colored) R1 and R2 counters 383 have the same value (388), which corresponds to the number of packets 384 of the second block (no loss). During blocks three and four, R1 and 385 R2 counters are different, meaning that some packets have been lost: 386 in the example, one single packet (382-381) was lost during block 387 three and three packets (377-374) were lost during block four. 389 R1 and R2 require a clock error less than +/-L/2 time units, where L 390 is the time duration of the block. In this way each colored packet 391 can be assigned to the right block by each router. This is because 392 the minimum time distance between two packets of the same color but 393 belonging to different blocks is L time units. 395 The method applied to R1 and R2 can be extended to any other router 396 and applied to more complex networks, as far as the measurement is 397 enabled on the path followed by the traffic flow(s) being observed. 399 3.2. One-way delay measurement 401 The same principle used to measure packet loss can be applied also to 402 one-way delay measurement. There are three alternatives, as 403 described hereinafter. 405 3.2.1. Single marking methodology 407 The alternation of colors can be used as a time reference to 408 calculate the delay. Whenever the color changes (that means that a 409 new block has started) a network device can store the timestamp of 410 the first packet of the new block; that timestamp can be compared 411 with the timestamp of the same packet on a second router to compute 412 packet delay. Considering Figure 4, R1 stores a timestamp TS(A1)R1 413 when it sends the first packet of block 1 (A-colored), a timestamp 414 TS(B2)R1 when it sends the first packet of block 2 (B-colored) and so 415 on for every other block. R2 performs the same operation on the 416 receiving side, recording TS(A1)R2, TS(B2)R2 and so on. Since the 417 timestamps refer to specific packets (the first packet of each block) 418 we are sure that timestamps compared to compute delay refer to the 419 same packets. By comparing TS(A1)R1 with TS(A1)R2 (and similarly 420 TS(B2)R1 with TS(B2)R2 and so on) it is possible to measure the delay 421 between R1 and R2. In order to have more measurements, it is 422 possible to take and store more timestamps, referring to other 423 packets within each block. 425 In order to coherently compare timestamps collected on different 426 routers, the network nodes must be in sync. Furthermore, a 427 measurement is valid only if no packet loss occurs and if packet 428 misordering can be avoided, otherwise the first packet of a block on 429 R1 could be different from the first packet of the same block on R2 430 (f.i. if that packet is lost between R1 and R2 or it arrives after 431 the next one). 433 The following table shows how timestamps can be used to calculate the 434 delay between R1 and R2. The first column lists the sequence of 435 blocks while other columns contain the timestamp referring to the 436 first packet of each block on R1 and R2. The delay is computed as a 437 difference between timestamps. For the sake of simplicity, all the 438 values are expressed in milliseconds. 440 +-------+---------+---------+---------+---------+-------------+ 441 | Block | TS(A)R1 | TS(B)R1 | TS(A)R2 | TS(B)R2 | Delay R1-R2 | 442 +-------+---------+---------+---------+---------+-------------+ 443 | 1 | 12.483 | - | 15.591 | - | 3.108 | 444 | | | | | | | 445 | 2 | - | 6.263 | - | 9.288 | 3.025 | 446 | | | | | | | 447 | 3 | 27.556 | - | 30.512 | - | 2.956 | 448 | | | | | | | 449 | | - | 18.113 | - | 21.269 | 3.156 | 450 | | | | | | | 451 | ... | ... | ... | ... | ... | ... | 452 | | | | | | | 453 | n | 77.463 | - | 80.501 | - | 3.038 | 454 | | | | | | | 455 | n+1 | - | 24.333 | - | 27.433 | 3.100 | 456 +-------+---------+---------+---------+---------+-------------+ 458 Table 2: Evaluation of timestamps for delay measurements 460 The first row shows timestamps taken on R1 and R2 respectively and 461 referring to the first packet of block 1 (which is A-colored). Delay 462 can be computed as a difference between the timestamp on R2 and the 463 timestamp on R1. Similarly, the second row shows timestamps (in 464 milliseconds) taken on R1 and R2 and referring to the first packet of 465 block 2 (which is B-colored). Comparing timestamps taken on 466 different nodes in the network and referring to the same packets 467 (identified using the alternation of colors) it is possible to 468 measure delay on different network segments. 470 For the sake of simplicity, in the above example a single measurement 471 is provided within a block, taking into account only the first packet 472 of each block. The number of measurements can be easily increased by 473 considering multiple packets in the block: for instance, a timestamp 474 could be taken every N packets, thus generating multiple delay 475 measurements. Taking this to the limit, in principle the delay could 476 be measured for each packet, by taking and comparing the 477 corresponding timestamps (possible but impractical from an 478 implementation point of view). 480 3.2.2. Average delay 482 As mentioned before, the method previously exposed for measuring the 483 delay is sensitive to out of order reception of packets. In order to 484 overcome this problem, a different approach has been considered: it 485 is based on the concept of average delay. The average delay is 486 calculated by considering the average arrival time of the packets 487 within a single block. The network device locally stores a timestamp 488 for each packet received within a single block: summing all the 489 timestamps and dividing by the total number of packets received, the 490 average arrival time for that block of packets can be calculated. By 491 subtracting the average arrival times of two adjacent devices it is 492 possible to calculate the average delay between those nodes. This 493 method is robust to out of order packets and also to packet loss 494 (only a small error is introduced). Moreover, it greatly reduces the 495 number of timestamps (only one per block for each network device) 496 that have to be collected by the management system. On the other 497 hand, it only gives one measure for the duration of the block (f.i. 5 498 minutes), and it doesn't give the minimum, maximum and median delay 499 values (RFC 6703 [RFC6703]). This limitation could be overcome by 500 reducing the duration of the block (f.i. from 5 minutes to a few 501 seconds) by means of an highly optimized implementation of the 502 method. 504 By summing the average delays of the two directions of a path, it is 505 also possible to measure the two-way delay (round-trip delay). 507 3.2.3. Double marking methodology 509 The Single marking methodology for one-way delay measurement is 510 sensitive to out of order reception of packets. The first approach 511 to overcome this problem is described before and is based on the 512 concept of average delay. But the limitation of average delay is 513 that it doesn't give information about the delay values distribution 514 for the duration of the block. Additionally it may be useful to have 515 not only the average delay but also the minimum and maximum delay 516 values and, in wider terms, to know more about the statistic 517 distribution of delay values. So in order to have more information 518 about the delay and to overcome out of order issues, a different 519 approach can be introduced: it is based on double marking 520 methodology. 522 Basically, the idea is to use the first marking to create the 523 alternate flow and, within this colored flow, a second marking to 524 select the packets for measuring delay/jitter. The first marking is 525 needed for packet loss and average delay measurement. The second 526 marking creates a new set of marked packets that are fully identified 527 over the network, so that a network device can store the timestamps 528 of these packets; these timestamps can be compared with the 529 timestamps of the same packets on a second router to compute packet 530 delay values for each packet. The number of measurements can be 531 easily increased by changing the frequency of the second marking. 532 But the frequency of the second marking must be not too high in order 533 to avoid out of order issues. Between packets with the second 534 marking there should be a security time gap (e.g. this gap could be, 535 at the minimum, the average network delay calculated with the 536 previous methodology) to avoid out of order issues and also to have a 537 number of measurement packets that is rate independent. If a second 538 marking packet is lost, the delay measurement for the considered 539 block is corrupted and should be discarded. 541 3.3. Delay variation measurement 543 Similarly to one-way delay measurement (both for single marking and 544 double marking), the method can also be used to measure the inter- 545 arrival jitter. The alternation of colors can be used as a time 546 reference to measure delay variations. Considering the example 547 depicted in Figure 4, R1 stores a timestamp TS(A)R1 whenever it sends 548 the first packet of a block and R2 stores a timestamp TS(B)R2 549 whenever it receives the first packet of a block. The inter-arrival 550 jitter can be easily derived from one-way delay measurement, by 551 evaluating the delay variation of consecutive samples. 553 The concept of average delay can also be applied to delay variation, 554 by evaluating the variation of average interval between consecutive 555 packets of the flow from R1 to R2. 557 4. Implementation and deployment 559 The methodology described in the previous sections can be applied in 560 various situations. Basically Alternate Marking technique could be 561 used in many cases for performance measurement. The only requirement 562 is to select and mark the flow to be monitored; in this way packets 563 are batched by the sender and each batch is alternately marked such 564 that can be easily recognized by the receiver. 566 An example of implementation and deployment is explained in the next 567 section, just to clarify how the method can work. 569 4.1. Report on the operational experiment at Telecom Italia 571 The methodology has been applied in Telecom Italia by leveraging 572 functions and tools available on IP routers and it's currently being 573 used to monitor packet loss in some portions of Telecom Italia's 574 network. The application of the method to delay measurement is 575 currently being evaluated in Telecom Italia's labs. This section 576 describes how the features currently available on existing routing 577 platforms can be used to apply the method, in order to give an 578 example of implementation and deployment. 580 The fundamental steps for this implementation of the method can be 581 summarized in the following items: 583 o coloring the packets; 585 o counting the packets; 587 o collecting data and calculating the packet loss. 589 o metric transparency. 591 Before going deeper into the implementation details, it's worth 592 mentioning two different strategies that can be used when 593 implementing the method: 595 o flow-based: the flow-based strategy is used when only a limited 596 number of traffic flows need to be monitored. This could be the 597 case, for example, of IPTV channels or other specific applications 598 traffic with high QoS requirements (i.e. Mobile Backhauling 599 traffic). According to this strategy, only a subset of the flows 600 is colored. Counters for packet loss measurements can be 601 instantiated for each single flow, or for the set as a whole, 602 depending on the desired granularity. A relevant problem with 603 this approach is the necessity to know in advance the path 604 followed by flows that are subject to measurement. Path rerouting 605 and traffic load-balancing increase the issue complexity, 606 especially for unicast traffic. The problem is easier to solve 607 for multicast traffic where load balancing is seldom used, 608 especially for IPTV traffic where static joins are frequently used 609 to force traffic forwarding and replication. Another application 610 is on Mobile Backhauling, implemented with a VPN MPLS in Telecom 611 Italia's network; in this case the problem with unicast traffic is 612 overcome by monitoring just the two Provider Edge nodes of the VPN 613 MPLS. 615 o link-based: measurements are performed on all the traffic on a 616 link by link basis. The link could be a physical link or a 617 logical link (for instance an Ethernet VLAN or a MPLS PW). 618 Counters could be instantiated for the traffic as a whole or for 619 each traffic class (in case it is desired to monitor each class 620 separately), but in the second case a couple of counters is needed 621 for each class. 623 The current implementation in Telecom Italia uses the first strategy. 624 As mentioned, the flow-based measurement requires the identification 625 of the flow to be monitored and the discovery of the path followed by 626 the selected flow. It is possible to monitor a single flow or 627 multiple flows grouped together, but in this case measurement is 628 consistent only if all the flows in the group follow the same path. 629 Moreover, a Service Provider should be aware that, if a measurement 630 is performed by grouping many flows, it is not possible to determine 631 exactly which flow was affected by packets loss. In order to have 632 measures per single flow it is necessary to configure counters for 633 each specific flow. Once the flow(s) to be monitored have been 634 identified, it is necessary to configure the monitoring on the proper 635 nodes. Configuring the monitoring means configuring the policy to 636 intercept the traffic and configuring the counters to count the 637 packets. To have just an end-to-end monitoring, it is sufficient to 638 enable the monitoring on the first and the last hop routers of the 639 path: the mechanism is completely transparent to intermediate nodes 640 and independent from the path followed by traffic flows. On the 641 contrary, to monitor the flow on a hop-by-hop basis along its whole 642 path it is necessary to enable the monitoring on every node from the 643 source to the destination. In case the exact path followed by the 644 flow is not known a priori (i.e. the flow has multiple paths to reach 645 the destination) it is necessary to enable the monitoring system on 646 every path: counters on interfaces traversed by the flow will report 647 packet count, counters on other interfaces will be null. 649 4.1.1. Coloring the packets 651 The coloring operation is fundamental in order to create packet 652 blocks. This implies choosing where to activate the coloring and how 653 to color the packets. 655 In case of flow-based measurements, it is desirable, in general, to 656 have a single coloring node because it is easier to manage and 657 doesn't rise any risk of conflict (consider the case where two nodes 658 color the same flow). Thus it is necessary to color the flow as 659 close as possible to the source. In addition, coloring a flow close 660 to the source allows an end-to-end measure if a measurement point is 661 enabled on the last-hop router as well. The only requirement is that 662 the coloring must change periodically and every node along the path 663 must be able to identify unambiguously the colored packets. For 664 link-based measurements, all traffic needs to be colored when 665 transmitted on the link. If the traffic had already been colored, 666 then it has to be re-colored because the color must be consistent on 667 the link. This means that each hop along the path must (re-)color 668 the traffic; the color is not required to be consistent along 669 different links. 671 Traffic coloring can be implemented by setting a specific bit in the 672 packet header and changing the value of that bit periodically. With 673 current router implementations, only QoS related fields and features 674 offer the required flexibility to set bits in the packet header. In 675 case a Service Provider only uses the three most significant bits of 676 the DSCP field (corresponding to IP Precedence) for QoS 677 classification and queuing, it is possible to use the two less 678 significant bits of the DSCP field (bit 0 and bit 1) to implement the 679 method without affecting QoS policies. One of the two bits (bit 0) 680 could be used to identify flows subject to traffic monitoring (set to 681 1 if the flow is under monitoring, otherwise it is set to 0), while 682 the second (bit 1) can be used for coloring the traffic (switching 683 between values 0 and 1, corresponding to color A and B) and creating 684 the blocks. 686 In practice, coloring the traffic using the DSCP field can be 687 implemented by configuring on the router output interface an access 688 list that intercepts the flow(s) to be monitored and applies to them 689 a policy that sets the DSCP field accordingly. Since traffic 690 coloring has to be switched between the two values over time, the 691 policy needs to be modified periodically: an automatic script ca be 692 used perform this task on the basis of a fixed timer. In Telecom 693 Italia's implementation this timer is set to 5 minutes: this value 694 showed to be a good compromise between measurement frequency and 695 stability of the measurement (i.e. possibility to collect all the 696 measures referring to the same block). 698 4.1.2. Counting the packets 700 Assuming that the coloring of the packets is performed only by the 701 source node, the nodes between source and destination (included) have 702 to count the colored packets that they receive and forward: this 703 operation can be enabled on every router along the path or only on a 704 subset, depending on which network segment is being monitored (a 705 single link, a particular metro area, the backbone, the whole path). 707 Since the color switches periodically between two values, two 708 counters (one for each value) are needed: one counter for packets 709 with color A and one counter for packets with color B. For each flow 710 (or group of flows) being monitored and for every interface where the 711 monitoring is active, a couple od counters is needed. For example, 712 in order to monitor separately 3 flows on a router with 4 interfaces 713 involved, 24 counters are needed (2 counters for each of the 3 flows 714 on each of the 4 interfaces). If traffic is colored using the DSCP 715 field, as in Telecom Italia's implementation, an access-list that 716 matches specific DSCP values can be used to count the packets of the 717 flow(s) being monitored. 719 In case of link-based measurements the behaviour is similar except 720 that coloring and counting operations are performed on a link by link 721 basis at each endpoint of the link. 723 Another important aspect to take into consideration is when to read 724 the counters: in order to count the exact number of packets of a 725 block the routers must perform this operation when that block has 726 ended: in other words, the counter for color A must be read when the 727 current block has color B, in order to be sure that the value of the 728 counter is stable. This task can be accomplished in two ways. The 729 general approach suggests to read the counters periodically, many 730 times during a block duration, and to compare these successive 731 readings: when the counter stops incrementing means that the current 732 block has ended and its value can be elaborated safely. 733 Alternatively, if the coloring operation is performed on the basis of 734 a fixed timer, it is possible to configure the reading of the 735 counters according to that timer: for example, if each block is 5 736 minutes long, reading the counter for color A every 5 minute in the 737 middle of the subsequent block (with color B) is a safe choice. A 738 sufficient margin should be considered between the end of a block and 739 the reading of the counter, in order to take into account any out-of- 740 order packets. The choice of a 5 minutes timer for colore switching 741 was also inspired by these considerations. 743 4.1.3. Collecting data and calculating packet loss 745 The nodes enabled to perform performance monitoring collect the value 746 of the counters, but they are not able to directly use this 747 information to measure packet loss, because they only have their own 748 samples. For this reason, an external Network Management System 749 (NMS) is required to collect and elaborate data and to perform packet 750 loss calculation. The NMS compares the values of counters from 751 different nodes and can calculate if some packets were lost (even a 752 single packet) and also where packets were lost. 754 The value of the counters needs to be transmitted to the NMS as soon 755 as it has been read. This can be accomplished by using SNMP or FTP 756 and can be done in Push Mode or Polling Mode. In the first case, 757 each router periodically sends the information to the NMS, in the 758 latter case it is the NMS that periodically polls routers to collect 759 information. In any case, the NMS has to collect all the relevant 760 values from all the routers within one cycle of the timer (5 761 minutes). 763 If link-based measurement is used, it would be possible to use a 764 protocol to exchange values of counters between the two endpoints in 765 order to let them perform the packet loss calculation for each 766 traffic direction. A similar approach could be complicated if 767 applied to a flow-based measurement. 769 4.1.4. Metric transparency 771 In Telecom Italia's implementation the source node colors the packets 772 with a policy that is modified periodically via an automatic script 773 in order to alternate the DSCP field of the packets. The nodes 774 between source and destination (included) have to count with an 775 access-list the colored packets that they receive and forward. 777 Moreover the destination node has an important role: the colored 778 packets are intercepted and a policy restores and sets the DSCP field 779 of all the packets to the initial value. In this way the metric is 780 transparent because outside the section of the network under 781 monitoring the traffic flow is unchanged. 783 In such a case, thanks to this restoring technique, network elements 784 outside the Alternate Marking monitoring domain (e.g. the two 785 Provider Edge nodes of the Mobile Backhauling VPN MPLS) are totally 786 anaware that packets were marked. So this restoring technique makes 787 Alternate Marking completely transparent outside its monitoring 788 domain. 790 4.2. IP flow performance measurement (IPFPM) 792 This application of marking method is described in 793 [I-D.chen-ippm-coloring-based-ipfpm-framework]. 795 4.3. Performance Measurement Marking Method in BIER Domain 797 In [I-D.ietf-bier-mpls-encapsulation] two OAM bits from Bit Index 798 Explicit Replication (BIER) Header are reserved for the passive 799 performance measurement marking method. [I-D.mirsky-bier-pmmm-oam] 800 details the measurement for multicast service over BIER domain. 802 4.4. RFC6374 Use Case 804 RFC6374 [RFC6374] uses the LM packet as the packet accounting 805 demarcation point. Unfortunately this gives rise to a number of 806 problems that may lead to significant packet accounting errors in 807 certain situations. [I-D.ietf-mpls-flow-ident] discusses the desired 808 capabilities for MPLS flow identification in order to perform a 809 better in-band performance monitoring of user data packets. A method 810 of accomplishing identification is Synonymous Flow Labels (SFL) 811 introduced in [I-D.bryant-mpls-sfl-framework], while 812 [I-D.bryant-mpls-rfc6374-sfl] describes RFC6374 performance 813 measurements with SFL. 815 4.5. Application to active performance measurement 817 [I-D.fioccola-ippm-rfc6812-alt-mark-ext] describes an extension to 818 the Cisco SLA Protocol Measurement-Type UDP-Measurement, in order to 819 implement alternate marking methodology. 821 5. Hybrid measurement 823 The method has been explicitly designed for passive measurements but 824 it can also be used with active measurements. In order to have both 825 end to end measurements and intermediate measurements (hybrid 826 measurements) two end points can exchanges artificial traffic flows 827 and apply alternate marking over these flows. In the intermediate 828 points artificial traffic is managed in the same way as real traffic 829 and measured as specified before. 831 6. Compliance with RFC6390 guidelines 833 RFC6390 [RFC6390] defines a framework and a process for developing 834 Performance Metrics for protocols above and below the IP layer (such 835 as IP-based applications that operate over reliable or datagram 836 transport protocols). 838 This document doesn't aim to propose a new Performance Metric but a 839 new method of measurement for a few Performance Metrics that have 840 already been standardized. Nevertheless, it's worth applying 841 [RFC6390] guidelines to the present document, in order to provide a 842 more complete and coherent description of the proposed method. We 843 used a subset of the Performance Metric Definition template defined 844 by [RFC6390]. 846 o Metric name and description: as already stated, this document 847 doesn't propose any new Performance Metric. On the contrary, it 848 describes a novel method for measuring packet loss [RFC2680]. The 849 same concept, with small differences, can also be used to measure 850 delay [RFC2679], and jitter [RFC3393]. The document mainly 851 describes the applicability to packet loss measurement. 853 o Method of Measurement or Calculation: according to the method 854 described in the previous sections, the number of packets lost is 855 calculated by subtracting the value of the counter on the source 856 node from the value of the counter on the destination node. Both 857 counters must refer to the same color. The calculation is 858 performed when the value of the counters is in a steady state. 860 o Units of Measurement: the method calculates and reports the exact 861 number of packets sent by the source node and not received by the 862 destination node. 864 o Measurement Points: the measurement can be performed between 865 adjacent nodes, on a per-link basis, or along a multi-hop path, 866 provided that the traffic under measurement follows that path. In 867 case of a multi-hop path, the measurements can be performed both 868 end-to-end and hop-by-hop. 870 o Measurement Timing: the method have a constraint on the frequency 871 of measurements. In order to perform a measure, the counter must 872 be in a steady state: this happens when the traffic is being 873 colored with the alternate color; for example in the Telecom 874 Italia application of the method the time interval is set to 5 875 minutes. 877 o Implementation: the Telecom Italia application of the method uses 878 two encodings of the DSCP field to color the packets; this enables 879 the use of policy configurations on the router to color the 880 packets and accordingly configure the counter for each color. The 881 path followed by traffic being measured should be known in advance 882 in order to configure the counters along the path and be able to 883 compare the correct values. 885 o Use and Applications: the method can be used to measure packet 886 loss with high precision on live traffic; moreover, by combining 887 end-to-end and per-link measurements, the method is useful to 888 pinpoint the single link that is experiencing loss events. 890 o Reporting Model: the value of the counters has to be sent to a 891 centralized management system that perform the calculations; such 892 samples must contain a reference to the time interval they refer 893 to, so that the management system can perform the correct 894 correlation; the samples have to be sent while the corresponding 895 counter is in a steady state (within a time interval), otherwise 896 the value of the sample should be stored locally. 898 o Dependencies: the values of the counters have to be correlated to 899 the time interval they refer to; moreover, as far the Telecom 900 Italia application of the method is based on DSCP values, there 901 are significant dependencies on the usage of the DSCP field: it 902 must be possible to rely on unused DSCP values without affecting 903 QoS-related configuration and behavior; moreover, the intermediate 904 nodes must not change the value of the DSCP field not to alter the 905 measurement. 907 o Organization of Results: the method of measurement produces 908 singletons. 910 o Parameters: currently, the main parameter of the method is the 911 time interval used to alternate the colors and read the counters. 913 7. Security Considerations 915 This document specifies a method to perform measurements in the 916 context of a Service Provider's network and has not been developed to 917 conduct Internet measurements, so it does not directly affect 918 Internet security nor applications which run on the Internet. 919 However, implementation of this method must be mindful of security 920 and privacy concerns. 922 There are two types of security concerns: potential harm caused by 923 the measurements and potential harm to the measurements. For what 924 concerns the first point, the measurements described in this document 925 are passive, so there are no packets injected into the network 926 causing potential harm to the network itself and to data traffic. 927 Nevertheless, the method implies modifications on the fly to the IP 928 header of data packets: this must be performed in a way that doesn't 929 alter the quality of service experienced by packets subject to 930 measurements and that preserve stability and performance of routers 931 doing the measurements. The measurements themselves could be harmed 932 by routers altering the coloring of the packets, or by an attacker 933 injecting artificial traffic. Authentication techniques, such as 934 digital signatures, may be used where appropriate to guard against 935 injected traffic attacks. 937 The privacy concerns of network measurement are limited because the 938 method only relies on information contained in the IP header without 939 any release of user data. 941 8. Conclusions 943 The advantages of the method described in this document are: 945 o easy implementation: it can be implemented using features already 946 available on major routing platforms; 948 o low computational effort: the additional load on processing is 949 negligible; 951 o accurate packet loss measurement: single packet loss granularity 952 is achieved with a passive measurement; 954 o potential applicability to any kind of packet/frame -based 955 traffic: Ethernet, IP, MPLS, etc., both unicast and multicast; 957 o robustness: the method can tolerate out of order packets and it's 958 not based on "special" packets whose loss could have a negative 959 impact; 961 o no interoperability issues: the features required to implement the 962 method are available on all current routing platforms. 964 The method doesn't raise any specific need for standardization, but 965 it could be further improved by means of some extension to existing 966 protocols. Specifically, the use of DiffServ bits for coloring the 967 packets could not be a viable solution in some cases: a standard 968 method to color the packets for this specific application could be 969 beneficial. 971 9. IANA Considerations 973 There are no IANA actions required. 975 10. Acknowledgements 977 The authors would like to thank Domenico Laforgia, Daniele Accetta 978 and Mario Bianchetti for their contribution to the definition and the 979 implementation of the method. 981 11. References 983 11.1. Normative References 985 [RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 986 Delay Metric for IPPM", RFC 2679, DOI 10.17487/RFC2679, 987 September 1999, . 989 [RFC2680] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 990 Packet Loss Metric for IPPM", RFC 2680, 991 DOI 10.17487/RFC2680, September 1999, 992 . 994 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 995 Metric for IP Performance Metrics (IPPM)", RFC 3393, 996 DOI 10.17487/RFC3393, November 2002, 997 . 999 11.2. Informative References 1001 [I-D.bryant-mpls-rfc6374-sfl] 1002 Bryant, S., Swallow, G., Sivabalan, S., Mirsky, G., Chen, 1003 M., and Z. Li, "RFC6374 Synonymous Flow Labels", draft- 1004 bryant-mpls-rfc6374-sfl-00 (work in progress), October 1005 2015. 1007 [I-D.bryant-mpls-sfl-framework] 1008 Bryant, S., Swallow, G., Sivabalan, S., Mirsky, G., Chen, 1009 M., and Z. Li, "Synonymous Flow Label Framework", draft- 1010 bryant-mpls-sfl-framework-00 (work in progress), October 1011 2015. 1013 [I-D.chen-ippm-coloring-based-ipfpm-framework] 1014 Chen, M., Zheng, L., Mirsky, G., Fioccola, G., and T. 1015 Mizrahi, "IP Flow Performance Measurement Framework", 1016 draft-chen-ippm-coloring-based-ipfpm-framework-06 (work in 1017 progress), March 2016. 1019 [I-D.cociglio-mboned-multicast-pm] 1020 Cociglio, M., Capello, A., Bonda, A., and L. Castaldelli, 1021 "A method for IP multicast performance monitoring", draft- 1022 cociglio-mboned-multicast-pm-01 (work in progress), 1023 October 2010. 1025 [I-D.fioccola-ippm-rfc6812-alt-mark-ext] 1026 Fioccola, G., Clemm, A., Cociglio, M., Chandramouli, M., 1027 and A. Capello, "Alternate Marking Extension to Cisco SLA 1028 Protocol RFC6812", draft-fioccola-ippm-rfc6812-alt-mark- 1029 ext-01 (work in progress), March 2016. 1031 [I-D.ietf-bier-mpls-encapsulation] 1032 Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J., and 1033 S. Aldrin, "Encapsulation for Bit Index Explicit 1034 Replication in MPLS Networks", draft-ietf-bier-mpls- 1035 encapsulation-04 (work in progress), April 2016. 1037 [I-D.ietf-mpls-flow-ident] 1038 Bryant, S., Pignataro, C., Chen, M., Li, Z., and G. 1039 Mirsky, "MPLS Flow Identification Considerations", draft- 1040 ietf-mpls-flow-ident-00 (work in progress), December 2015. 1042 [I-D.mirsky-bier-pmmm-oam] 1043 Mirsky, G., Zheng, L., Chen, M., and G. Fioccola, 1044 "Performance Measurement (PM) with Marking Method in Bit 1045 Index Explicit Replication (BIER) Layer", draft-mirsky- 1046 bier-pmmm-oam-01 (work in progress), March 2016. 1048 [I-D.tempia-opsawg-p3m] 1049 Capello, A., Cociglio, M., Castaldelli, L., and A. Bonda, 1050 "A packet based method for passive performance 1051 monitoring", draft-tempia-opsawg-p3m-04 (work in 1052 progress), February 2014. 1054 [RFC6374] Frost, D. and S. Bryant, "Packet Loss and Delay 1055 Measurement for MPLS Networks", RFC 6374, 1056 DOI 10.17487/RFC6374, September 2011, 1057 . 1059 [RFC6390] Clark, A. and B. Claise, "Guidelines for Considering New 1060 Performance Metric Development", BCP 170, RFC 6390, 1061 DOI 10.17487/RFC6390, October 2011, 1062 . 1064 [RFC6703] Morton, A., Ramachandran, G., and G. Maguluri, "Reporting 1065 IP Network Performance Metrics: Different Points of View", 1066 RFC 6703, DOI 10.17487/RFC6703, August 2012, 1067 . 1069 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 1070 Weingarten, "An Overview of Operations, Administration, 1071 and Maintenance (OAM) Tools", RFC 7276, 1072 DOI 10.17487/RFC7276, June 2014, 1073 . 1075 Authors' Addresses 1077 Alessandro Capello 1078 Telecom Italia 1079 Via Reiss Romoli, 274 1080 Torino 10148 1081 Italy 1083 Email: alessandro.capello@telecomitalia.it 1085 Mauro Cociglio 1086 Telecom Italia 1087 Via Reiss Romoli, 274 1088 Torino 10148 1089 Italy 1091 Email: mauro.cociglio@telecomitalia.it 1092 Giuseppe Fioccola 1093 Telecom Italia 1094 Via Reiss Romoli, 274 1095 Torino 10148 1096 Italy 1098 Email: giuseppe.fioccola@telecomitalia.it 1100 Luca Castaldelli 1101 Telecom Italia 1102 Via Reiss Romoli, 274 1103 Torino 10148 1104 Italy 1106 Email: luca.castaldelli@telecomitalia.it 1108 Alberto Tempia Bonda 1110 Email: alberto.tempia@gmail.com