idnits 2.17.1 draft-tempia-opsawg-p3m-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 13, 2014) is 3722 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'RFC6390' is defined on line 848, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2679 (Obsoleted by RFC 7679) ** Obsolete normative reference: RFC 2680 (Obsoleted by RFC 7680) == Outdated reference: A later version (-16) exists of draft-ietf-opsawg-oam-overview-13 Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Capello 3 Internet-Draft M. Cociglio 4 Intended status: Experimental L. Castaldelli 5 Expires: August 17, 2014 Telecom Italia 6 A. Tempia Bonda 8 February 13, 2014 10 A packet based method for passive performance monitoring 11 draft-tempia-opsawg-p3m-04.txt 13 Abstract 15 This document describes a passive method to perform packet loss, 16 delay and jitter measurements on live traffic. Implementation and 17 deployment details are also explained in order to clarify how the 18 tools and features currently available on existing routing platforms 19 can be used to implement the method. This method has been invented 20 and engineered in Telecom Italia and it's currently being used in 21 Telecom Italia's network. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on August 17, 2014. 40 Copyright Notice 42 Copyright (c) 2014 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Overview of the method . . . . . . . . . . . . . . . . . . . 4 56 3. Detailed description of the method . . . . . . . . . . . . . 5 57 3.1. Packet loss measurement . . . . . . . . . . . . . . . . . 5 58 3.2. One-way delay measurement . . . . . . . . . . . . . . . . 9 59 3.2.1. Average delay . . . . . . . . . . . . . . . . . . . . 10 60 3.3. Delay variation measurement . . . . . . . . . . . . . . . 11 61 4. Implementation and deployment . . . . . . . . . . . . . . . . 11 62 4.1. Coloring the packets . . . . . . . . . . . . . . . . . . 13 63 4.2. Counting the packets . . . . . . . . . . . . . . . . . . 14 64 4.3. Collecting data and calculating packet loss . . . . . . . 15 65 5. Compliance with RFC6390 guidelines . . . . . . . . . . . . . 15 66 6. Security Considerations . . . . . . . . . . . . . . . . . . . 17 67 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 17 68 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 69 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 70 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 71 10.1. Normative References . . . . . . . . . . . . . . . . . . 18 72 10.2. Informative References . . . . . . . . . . . . . . . . . 19 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 75 1. Introduction 77 Nowadays, most of the traffic in Service Providers' networks carries 78 multimedia content. Video contents are highly sensitive to packet 79 loss [RFC2680], while interactive contents are sensitive to delay 80 [RFC2679], and jitter [RFC3393]. 82 In front of this scenario, Service Providers need methodologies and 83 tools to monitor and measure network performances with an adequate 84 accuracy, in order to constantly control the quality of experience 85 perceived by their customers. On the other hand, performance 86 monitoring provides useful information for improving network 87 management (e.g. isolation of network problems, troubleshooting, 88 etc.). 90 A lot of work related to OAM, that includes also performance 91 monitoring techniques, has been done by Standards Developing 92 Organizations: [I-D.ietf-opsawg-oam-overview] provides a good 93 overview of existing OAM mechanisms defined in IETF, ITU-T and IEEE. 94 Considering IETF, a lot of work has been done on fault detection and 95 connectivity verification, while a minor effort has been dedicated so 96 far to performance monitoring. The IPPM WG has defined standard 97 metrics to measure network performance; however, the methods 98 developed in the WG mainly refer to active measurement techniques. 99 More recently, the MPLS WG has defined mechanisms for measuring 100 packet loss, one-way and two-way delay, and delay variation in MPLS 101 networks[RFC6374], but their applicability to passive measurements 102 has some limitations, especially for pure connection-less networks. 104 The lack of adequate tools to measure packet loss with the desired 105 accuracy drove an effort in Telecom Italia to design a new method for 106 the performance monitoring of live traffic, possibly easy to 107 implement and deploy. The effort led to the method described in this 108 document: basically, it is a passive performance monitoring 109 technique, potentially applicable to any kind of packet based 110 traffic, including Ethernet, IP, and MPLS, both unicast and 111 multicast. The method addresses primarily packet loss measurement, 112 but it can be easily extended to one-way delay and delay variation 113 measurements as well. It doesn't require any protocol extension or 114 interaction with existing protocols, thus avoiding any 115 interoperability issue. Even if the method doesn't raise any 116 specific need for standardization, it could be further improved by 117 means of some extension to existing protocols, but this aspect is 118 left for further study and it is out of the scope of this document. 120 The method has been explicitly designed for passive measurements but 121 it can also be used with active probes. Passive measurements are 122 usually more easily understood by customers and provide a much better 123 accuracy, especially for packet loss measurements. 125 The method described in this document has been invented and 126 engineered in Telecom Italia and it's currently being used in Telecom 127 Italia's network. 129 This document is organized as follows: 131 o Section 2 gives an overview of the method, including a comparison 132 with alternate measurement strategies; 134 o Section 3 describes the method in detail 136 o Section 4 discusses implementation and deployment considerations, 137 with special regard to the choices adopted in Telecom Italia's own 138 implementation; 140 o Section 5 includes some considerations about security aspects; 142 o Section 6 finally summarizes some concluding remarks. 144 2. Overview of the method 146 In order to perform packet loss measurements on a live traffic flow, 147 different approaches exist. The most intuitive one consists in 148 numbering the packets, so that each router that receives the flow can 149 immediately detect a packet missing. This approach, though very 150 simple in theory, is not simple to achieve: it requires the insertion 151 of a sequence number into each packet and the devices must be able to 152 extract the number and check it in real time. Such a task can be 153 difficult to implement on live traffic: if UDP is used as the 154 transport protocol, the sequence number is not available; on the 155 other hand, if a higher layer sequence number (e.g. in the RTP 156 header) is used, extracting that information from each packet and 157 process it in real time could overload the device. 159 An alternate approach is to count the number of packets sent on one 160 end, the number of packets received on the other end, and to compare 161 the two values. This operation is much simpler to implement, but 162 requires that the devices performing the measurement are in sync: in 163 order to compare two counters it is required that they refer exactly 164 to the same set of packets. Since a flow is continuous and cannot be 165 stopped when a counter has to be read, it could be difficult to 166 determine exactly when to read the counter. A possible solution to 167 overcome this problem is to virtually split the flow in consecutive 168 blocks by inserting periodically a delimiter so that each counter 169 refers exactly to the same block of packets. The delimiter could be 170 for example a special packet inserted artificially into the flow. 171 However, delimiting the flow using specific packets has some 172 limitations. First, it requires generating additional packets within 173 the flow and requires the equipment to be able to process those 174 packets. In addition, the method is vulnerable to out of order 175 reception of delimiting packets and, to a lesser extent, to their 176 loss. 178 The method proposed in this document follows the second approach, but 179 it doesn't use additional packets to virtually split the flow in 180 blocks. Instead, it "colors" the packets so that the packets 181 belonging to the same block will have the same color, whilst 182 consecutive blocks will have different colors. Each change of color 183 represents a sort of auto-synchronization signal that guarantees the 184 consistency of measurements taken by different devices along the 185 path. 187 Figure 1 represents a very simple network and shows how the method 188 can be used to measure packet loss on different network segments: by 189 enabling the measurement on several interfaces along the path, it is 190 possible to perform link monitoring, node monitoring or end-to-end 191 monitoring. The method is flexible enough to measure packet loss on 192 any segment of the network and can be used to isolate the faulty 193 element. 195 Traffic flow 196 ========================================================> 197 +------+ +------+ +------+ +------+ 198 ---<> R1 <>-----<> R2 <>-----<> R3 <>-----<> R4 <>--- 199 +------+ +------+ +------+ +------+ 200 . . . . . . 201 . . . . . . 202 . <------> <-------> . 203 . Node Packet Loss Link Packet Loss . 204 . . 205 <---------------------------------------------------> 206 End-to-End Packet loss 208 Figure 1: Available measurements 210 3. Detailed description of the method 212 This section describes in detail how the method. A special emphasis 213 is given to the measurement of packet loss, that represents the core 214 application of the method, but applicability to delay and jitter 215 measurements is also considered. 217 3.1. Packet loss measurement 219 The basic idea is to virtually split traffic flows into consecutive 220 blocks: each block represents a measurable entity unambiguously 221 recognizable by all network devices along the path. By counting the 222 number of packets in each block and comparing the values measured by 223 different network devices along the path, it is possible to measure 224 packet loss occurred in any single block between any two points. 226 As discussed in the previous section, a simple way to create the 227 blocks is to "color" the traffic (two colors are sufficient) so that 228 packets belonging to different consecutive blocks will have different 229 colors. Whenever the color changes, the previous block terminates 230 and the new one begins. Hence, all the packets belonging to the same 231 block will have the same color and packets of different consecutive 232 blocks will have different colors. The number of packets in each 233 block depends on the criterion used to create the blocks: if the 234 color is switched after a fixed number of packets, then each block 235 will contain the same number of packets (except for any losses); but 236 if the color is switched according to a fixed timer, then the number 237 of packets may be different in each block depending on the packet 238 rate. 240 The following figure shows how a flow looks like when it is split in 241 traffic blocks with colored packets. 243 A: packet with A coloring 244 B: packet with B coloring 246 | | | | | 247 | | Traffic flow | | 248 -------------------------------------------------------------------> 249 BBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA 250 -------------------------------------------------------------------> 251 ... | Block 5 | Block 4 | Block 3 | Block 2 | Block 1 252 | | | | | 254 Figure 2: Traffic coloring 256 Figure 3 shows how the method can be used to measure link packet loss 257 between two adjacent nodes. 259 Referring to the figure, let's assume we want to monitor the packet 260 loss on the link between two routers: router R1 and router R2. 261 According to the method, the traffic is colored alternatively with 262 two different colors, A and B. Whenever the color changes, the 263 transition generates a sort of square-wave signal, as depicted in the 264 following figure. 266 Color A ----------+ +-----------+ +---------- 267 | | | | 268 Color B +-----------+ +-----------+ 269 Block n ... Block 3 Block 2 Block 1 270 <---------> <---------> <---------> <---------> <---------> 272 Traffic flow 273 ===========================================================> 274 Color ... AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA... 275 ===========================================================> 277 Figure 3: Application of the method to compute link packet loss 279 Traffic coloring could be done by R1 itself or by an upward router. 280 R1 needs two counters, C(A)R1 and C(B)R1, on its egress interface: 281 C(A)R1 counts the packets with color A and C(B)R1 counts those with 282 color B. As long as traffic is colored A, only counter C(A)R1 will be 283 incremented, while C(B)R1 is not incremented; vice versa, when the 284 traffic is colored as B, only C(B)R1 is incremented. C(A)R1 and 285 C(B)R1 can be used as reference values to determine the packet loss 286 from R1 to any other measurement point down the path. Router R2, 287 similarly, will need two counters on its ingress interface, C(A)R2 288 and C(B)R2, to count the packets received on that interface and 289 colored with color A and B respectively. When an A block ends, it is 290 possible to compare C(A)R1 and C(A)R2 and calculate the packet loss 291 within the block; similarly, when the successive B block terminates, 292 it is possible to compare C(B)R1 with C(B)R2, and so on for every 293 successive block. 295 Likewise, by using two counters on R2 egress interface it is possible 296 to count the packets sent out of R2 interface and use them as 297 reference values to calculate the packet loss from R2 to any 298 measurement point down R2. 300 Using a fixed timer for color switching offers a better control over 301 the method: the (time) length of the blocks can be chosen large 302 enough to simplify the collection and the comparison of measures 303 taken by different network devices. It's preferable to read the 304 value of the counters not immediately after the color switch: some 305 packets could arrive out of order and increment the counter 306 associated to the previous block (color), so it is worth waiting for 307 some seconds. The drawback is that the longer the duration of the 308 block, the less frequent the measurement can be taken. 310 The following table shows how the counters can be used to calculate 311 the packet loss between R1 and R2. The first column lists the 312 sequence of traffic blocks while the other columns contain the 313 counters of A-colored packets and B-colored packets for R1 and R2. 314 In this example, we assume that the values of the counters are reset 315 to zero whenever a block ends and its associated counter has been 316 read: with this assumption, the table shows only relative values, 317 that is the exact number of packets of each color within each block. 318 If the values of the counters were not reset, the table would contain 319 cumulative values, but the relative values could be determined simply 320 by difference from the value of the previous block of the same color. 322 The color is switched on the basis of a fixed timer (not shown in the 323 table), so the number of packets in each block is different. 325 +-------+--------+--------+--------+--------+------+ 326 | Block | C(A)R1 | C(B)R1 | C(A)R2 | C(B)R2 | Loss | 327 +-------+--------+--------+--------+--------+------+ 328 | 1 | 375 | 0 | 375 | 0 | 0 | 329 | | | | | | | 330 | 2 | 0 | 388 | 0 | 388 | 0 | 331 | | | | | | | 332 | 3 | 382 | 0 | 381 | 0 | 1 | 333 | | | | | | | 334 | 4 | 0 | 377 | 0 | 374 | 3 | 335 | | | | | | | 336 | ... | ... | ... | ... | ... | ... | 337 | | | | | | | 338 | n | 0 | 387 | 0 | 387 | 0 | 339 | | | | | | | 340 | n+1 | 379 | 0 | 377 | 0 | 2 | 341 +-------+--------+--------+--------+--------+------+ 343 Table 1: Evaluation of counters for packet loss measurements 345 During an A block (blocks 1, 3 and n+1), all the packets are 346 A-colored, therefore the C(A) counters are incremented to the number 347 seen on the interface, while C(B) counters are zero. Vice versa, 348 during a B block (blocks 2, 4 and n), all the packets are B-colored: 349 C(A) counters are zero, while C(B) counters are incremented. 351 When a block ends (because of color switching) the relative counters 352 stop incrementing and it is possible to read them, compare the values 353 measured on router R1 and R2 and calculate the packet loss within 354 that block. 356 For example, looking at the table above, during the first block 357 (A-colored), C(A)R1 and C(A)R2 have the same value (375), which 358 corresponds to the exact number of packets of the first block (no 359 loss). Also during the second block (B-colored) R1 and R2 counters 360 have the same value (388), which corresponds to the number of packets 361 of the second block (no loss). During blocks three and four, R1 and 362 R2 counters are different, meaning that some packets have been lost: 363 in the example, one single packet (382-381) was lost during block 364 three and three packets (377-374) were lost during block four. 366 The method applied to R1 and R2 can be extended to any other router 367 and applied to more complex networks, as far as the measurement is 368 enabled on the path followed by the traffic flow(s) being observed. 370 3.2. One-way delay measurement 372 The same principle used to measure packet loss can be applied also to 373 one-way delay measurement: the alternation of colors can be used as a 374 time reference to calculate the delay. Whenever the color changes 375 (that means that a new block has started) a network device can store 376 the timestamp of the first packet of the new block; that timestamp 377 can be compared with the timestamp of the same packet on a second 378 router to compute packet delay. Considering Figure 4, R1 stores a 379 timestamp TS(A1)R1 when it sends the first packet of block 1 380 (A-colored), a timestamp TS(B2)R1 when it sends the first packet of 381 block 2 (B-colored) and so on for every other block. R2 performs the 382 same operation on the receiving side, recording TS(A1)R2, TS(B2)R2 383 and so on. Since the timestamps refer to specific packets (the first 384 packet of each block) we are sure that timestamps compared to compute 385 delay refer to the same packets. By comparing TS(A1)R1 with TS(A1)R2 386 (and similarly TS(B2)R1 with TS(B2)R2 and so on) it is possible to 387 measure the delay between R1 and R2. In order to have more 388 measurements, it is possible to take and store more timestamps, 389 referring to other packets within each block. 391 In order to coherently compare timestamps collected on different 392 routers, the network nodes must be in sync. Furthermore, a 393 measurement is valid only if no packet loss occurs and if packet 394 misordering can be avoided, otherwise the first packet of a block on 395 R1 could be different from the first packet of the same block on R2 396 (f.i. if that packet is lost between R1 and R2 or it arrives after 397 the next one). 399 The following table shows how timestamps can be used to calculate the 400 delay between R1 and R2. The first column lists the sequence of 401 blocks while other columns contain the timestamp referring to the 402 first packet of each block on R1 and R2. The delay is computed as a 403 difference between timestamps. For the sake of simplicity, all the 404 values are expressed in milliseconds. 406 +-------+---------+---------+---------+---------+-------------+ 407 | Block | TS(A)R1 | TS(B)R1 | TS(A)R2 | TS(B)R2 | Delay R1-R2 | 408 +-------+---------+---------+---------+---------+-------------+ 409 | 1 | 12.483 | - | 15.591 | - | 3.108 | 410 | | | | | | | 411 | 2 | - | 6.263 | - | 9.288 | 3.025 | 412 | | | | | | | 413 | 3 | 27.556 | - | 30.512 | - | 2.956 | 414 | | | | | | | 415 | | - | 18.113 | - | 21.269 | 3.156 | 416 | | | | | | | 417 | ... | ... | ... | ... | ... | ... | 418 | | | | | | | 419 | n | 77.463 | - | 80.501 | - | 3.038 | 420 | | | | | | | 421 | n+1 | - | 24.333 | - | 27.433 | 3.100 | 422 +-------+---------+---------+---------+---------+-------------+ 424 Table 2: Evaluation of timestamps for delay measurements 426 The first row shows timestamps taken on R1 and R2 respectively and 427 referring to the first packet of block 1 (which is A-colored). Delay 428 can be computed as a difference between the timestamp on R2 and the 429 timestamp on R1. Similarly, the second row shows timestamps (in 430 milliseconds) taken on R1 and R2 and referring to the first packet of 431 block 2 (which is B-colored). Comparing timestamps taken on 432 different nodes in the network and referring to the same packets 433 (identified using the alternation of colors) it is possible to 434 measure delay on different network segments. 436 For the sake of simplicity, in the above example a single measurement 437 is provided within a block, taking into account only the first packet 438 of each block. The number of measurements can be easily increased by 439 considering multiple packets in the block: for instance, a timestamp 440 could be taken every N packets, thus generating multiple delay 441 measurements. Taking this to the limit, in principle the delay could 442 be measured for each packet, by taking and comparing the 443 corresponding timestamps (possible but impractical from an 444 implementation point of view). 446 3.2.1. Average delay 448 As mentioned before, the method previously exposed for measuring the 449 delay is sensitive to out of order reception of packets. In order to 450 overcome this problem, a different approach has been considered: it 451 is based on the concept of average delay. The average delay is 452 calculated by considering the average arrival time of the packets 453 within a single block. The network device locally stores a timestamp 454 for each packet received within a single block: summing all the 455 timestamps and dividing by the total number of packets received, the 456 average arrival time for that block of packets can be calculated. By 457 subtracting the average arrival times of two adjacent devices it is 458 possible to calculate the average delay between those nodes. This 459 method is robust to out of order packets and also to packet loss 460 (only a small error is introduced). Moreover, it greatly reduces the 461 number of timestamps (only one per block for each network device) 462 that have to be collected by the management system. On the other 463 hand, it only gives one measure for the duration of the block (f.i. 5 464 minutes), and it doesn't give the minimum and maximum delay values. 465 This limitation could be overcome by reducing the duration of the 466 block (f.i. from 5 minutes to a few seconds) by means of an highly 467 optimized implementation of the method. 469 By summing the average delays of the two directions of a path, it is 470 also possible to measure the two-way delay (round-trip delay). 472 3.3. Delay variation measurement 474 Similarly to one-way delay measurement, the method can also be used 475 to measure the inter-arrival jitter. The alternation of colors can 476 be used as a time reference to measure delay variations. Considering 477 the example depicted in Figure 4, R1 stores a timestamp TS(A)R1 478 whenever it sends the first packet of a block and R2 stores a 479 timestamp TS(B)R2 whenever it receives the first packet of a block. 480 The inter-arrival jitter can be easily derived from one-way delay 481 measurement, by evaluating the delay variation of consecutive 482 samples. 484 The concept of average delay can also be applied to delay variation, 485 by evaluating the variation of consecutive measures of the average 486 delay. 488 4. Implementation and deployment 490 The methodology described in the previous sections has been 491 implemented in Telecom Italia by leveraging functions and tools 492 available on IP routers and it's currently being used to monitor 493 packet loss in some portions of Telecom Italia's network. The 494 application of the method to delay measurement is currently being 495 evaluated in Telecom Italia's labs. 497 The fundamental steps for the implementation of the method can be 498 summarized in the following items: 500 o coloring the packets; 501 o counting the packets; 503 o collecting data and calculating the packet loss. 505 Before going deeper into the implementation details, it's worth 506 mentioning two different strategies that can be used when 507 implementing the method: 509 o flow-based: the flow-based strategy is used when only a limited 510 number of traffic flows need to be monitored. This could be the 511 case, for example, of IPTV channels or other specific applications 512 traffic with high QoS requirements. According to this strategy, 513 only a subset of the flows is colored. Counters for packet loss 514 measurements can be instantiated for each single flow, or for the 515 set as a whole, depending on the desired granularity. A relevant 516 problem with this approach is the necessity to know in advance the 517 path followed by flows that are subject to measurement. Path 518 rerouting and traffic load-balancing increase the issue 519 complexity, especially for unicast traffic. The problem is easier 520 to solve for multicast traffic where load balancing is seldom 521 used, especially for IPTV traffic where static joins are 522 frequently used to force traffic forwarding and replication. 524 o link-based: measurements are performed on all the traffic on a 525 link by link basis. The link could be a physical link or a 526 logical link (for instance an Ethernet VLAN or a MPLS PW). 527 Counters could be instantiated for the traffic as a whole or for 528 each traffic class (in case it is desired to monitor each class 529 separately), but in the second case a couple of counters is needed 530 for each class. 532 The current implementation in Telecom Italia uses the first strategy. 533 As mentioned, the flow-based measurement requires the identification 534 of the flow to be monitored and the discovery of the path followed by 535 the selected flow. It is possible to monitor a single flow or 536 multiple flows grouped together, but in this case measurement is 537 consistent only if all the flows in the group follow the same path. 538 Moreover, a Service Provider should be aware that, if a measurement 539 is performed by grouping many flows, it is not possible to determine 540 exactly which flow was affected by packets loss. In order to have 541 measures per single flow it is necessary to configure counters for 542 each specific flow. Once the flow(s) to be monitored have been 543 identified, it is necessary to configure the monitoring on the proper 544 nodes. Configuring the monitoring means configuring the policy to 545 intercept the traffic and configuring the counters to count the 546 packets. To have just an end-to-end monitoring, it is sufficient to 547 enable the monitoring on the first and the last hop routers of the 548 path: the mechanism is completely transparent to intermediate nodes 549 and independent from the path followed by traffic flows. On the 550 contrary, to monitor the flow on a hop-by-hop basis along its whole 551 path it is necessary to enable the monitoring on every node from the 552 source to the destination. In case the exact path followed by the 553 flow is not known a priori (i.e. the flow has multiple paths to reach 554 the destination) it is necessary to enable the monitoring system on 555 every path: counters on interfaces traversed by the flow will report 556 packet count, counters on other interfaces will be null. 558 4.1. Coloring the packets 560 The coloring operation is fundamental in order to create packet 561 blocks. This implies choosing where to activate the coloring and how 562 to color the packets. 564 In case of flow-based measurements, it is desirable, in general, to 565 have a single coloring node because it is easier to manage and 566 doesn't rise any risk of conflict (consider the case where two nodes 567 color the same flow). Thus it is necessary to color the flow as 568 close as possible to the source. In addition, coloring a flow close 569 to the source allows an end-to-end measure if a measurement point is 570 enabled on the last-hop router as well. The only requirement is that 571 the coloring must change periodically and every node along the path 572 must be able to identify unambiguously the colored packets. For 573 link-based measurements, all traffic needs to be colored when 574 transmitted on the link. If the traffic had already been colored, 575 then it has to be re-colored because the color must be consistent on 576 the link. This means that each hop along the path must (re-)color 577 the traffic; the color is not required to be consistent along 578 different links. 580 Traffic coloring can be implemented by setting a specific bit in the 581 packet header and changing the value of that bit periodically. With 582 current router implementations, only QoS-related fields and features 583 offer the required flexibility to explicitly set the value of some 584 bits in the packet header from the Command Line Interface (CLI). In 585 case a Service Provider only uses the three most significant bits of 586 the DSCP field (corresponding to IP Precedence) for QoS 587 classification and queuing, it is possible to use the two less 588 significant bits of the DSCP field (bit 0 and bit 1) to implement the 589 method without affecting QoS policies. One of the two bits (bit 0) 590 could be used to identify flows subject to traffic monitoring (set to 591 1 if the flow is under monitoring, otherwise it is set to 0), while 592 the second (bit 1) can be used for coloring the traffic (switching 593 between values 0 and 1, corresponding to color A and B) and creating 594 the blocks. 596 In practice, coloring the traffic using the DSCP field can be 597 implemented by configuring on the router output interface an access 598 list that intercepts the flow(s) to be monitored and applies to them 599 a policy that sets the DSCP field accordingly. Since traffic 600 coloring has to be switched between the two values over time, the 601 policy needs to be modified periodically: an automatic script ca be 602 used perform this task on the basis of a fixed timer. In Telecom 603 Italia's implementation this timer is set to 5 minutes: this value 604 showed to be a good compromise between measurement frequency and 605 stability of the measurement (i.e. possibility to collect all the 606 measures referring to the same block). 608 4.2. Counting the packets 610 Assuming that the coloring of the packets is performed only by the 611 source node, the nodes between source and destination (included) have 612 to count the colored packets that they receive and forward: this 613 operation can be enabled on every router along the path or only on a 614 subset, depending on which network segment is being monitored (a 615 single link, a particular metro area, the backbone, the whole path). 617 Since the color switches periodically between two values, two 618 counters (one for each value) are needed: one counter for packets 619 with color A and one counter for packets with color B. For each flow 620 (or group of flows) being monitored and for every interface where the 621 monitoring is active, a couple of counters is needed. For example, 622 in order to monitor separately 3 flows on a router with 4 interfaces 623 involved, 24 counters are needed (2 counters for each of the 3 flows 624 on each of the 4 interfaces). If traffic is colored using the DSCP 625 field, as in Telecom Italia's implementation, an access-list that 626 matches specific DSCP values can be used to count the packets of the 627 flow(s) being monitored. 629 In case of link-based measurements the behavior is similar except 630 that coloring and counting operations are performed on a link by link 631 basis at each endpoint of the link. 633 Another important aspect to take into consideration is when to read 634 the counters: in order to count the exact number of packets of a 635 block the routers must perform this operation when that block has 636 ended: in other words, the counter for color A must be read when the 637 current block has color B, in order to be sure that the value of the 638 counter is stable. This task can be accomplished in two ways. The 639 general approach suggests to read the counters periodically, many 640 times during a block duration, and to compare these successive 641 readings: when the counter stops incrementing means that the current 642 block has ended and its value can be elaborated safely. 643 Alternatively, if the coloring operation is performed on the basis of 644 a fixed timer, it is possible to configure the reading of the 645 counters according to that timer: for example, if each block is 5 646 minutes long, reading the counter for color A every 5 minute in the 647 middle of the subsequent block (with color B) is a safe choice. A 648 sufficient margin should be considered between the end of a block and 649 the reading of the counter, in order to take into account any out-of- 650 order packets. The choice of a 5 minutes timer for color switching 651 was also suggested by these considerations 653 4.3. Collecting data and calculating packet loss 655 The nodes enabled to perform performance monitoring collect the value 656 of the counters, but they are not able to directly use this 657 information to measure packet loss, because they only have their own 658 samples. For this reason, an external Network Management System 659 (NMS) is required to collect and elaborate data and to perform packet 660 loss calculation. The NMS compares the values of counters from 661 different nodes and can calculate if some packets were lost (even a 662 single packet) and also where packets were lost. 664 The value of the counters needs to be transmitted to the NMS as soon 665 as it has been read. This can be accomplished by using SNMP or FTP 666 and can be done in Push Mode or Polling Mode. In the first case, 667 each router periodically sends the information to the NMS, in the 668 latter case it is the NMS that periodically polls routers to collect 669 information. In any case, the NMS has to collect all the relevant 670 values from all the routers within one cycle of the timer (5 671 minutes). 673 5. Compliance with RFC6390 guidelines 675 RFC6390 [RFC6390]defines a framework and a process for developing 676 Performance Metrics for protocols above and below the IP layer (such 677 as IP-based applications that operate over reliable or datagram 678 transport protocols). 680 This document doesn't aim to propose a new Performance Metric but a 681 new method of measurement for a few Performance Metrics that have 682 already been standardized. Nevertheless, it's worth applying RFC6390 683 guidelines to the present document, in order to provide a more 684 complete and coherent description of the proposed method. We used a 685 subset of the Performance Metric Definition template defined by 686 RFC6390. 688 o Metric name and description: as already stated, this document 689 doesn't propose any new Performance Metric. On the contrary, it 690 describes a novel method for measuring packet loss[RFC2680]. The 691 same concept, with small differences, can also be used to measure 692 delay[RFC2679], and jitter[RFC3393]. The document mainly 693 describes the applicability to packet loss measurement. 695 o Method of Measurement or Calculation: according to the method 696 described in the previous sections, the number of packets lost is 697 calculated by subtracting the value of the counter on the source 698 node from the value of the counter on the destination node. Both 699 counters must refer to the same color. The calculation is 700 performed when the value of the counters is in a steady state. 702 o Units of Measurement: the method calculates and reports the exact 703 number of packets sent by the source node and not received by the 704 destination node. 706 o Measurement Points: the measurement can be performed between 707 adjacent nodes, on a per-link basis, or along a multi-hop path, 708 provided that the traffic under measurement follows that path. In 709 case of a multi-hop path, the measurements can be performed both 710 end-to-end and hop-by-hop. 712 o Measurement Timing: the method have a constraint on the frequency 713 of measurements. In order to perform a measure, the counter must 714 be in a steady state: this happens when the traffic is being 715 colored with the alternate color; in the current implementation 716 the time interval is set to 5 minutes. 718 o Implementation: the current implementation of the method uses two 719 encodings of the DSCP field to color the packets; this enables the 720 use of policy configurations on the router to color the packets 721 and accordingly configure the counter for each color. The path 722 followed by traffic being measured should be known in advance in 723 order to configure the counters along the path and be able to 724 compare the correct values. 726 o Use and Applications: the method can be used to measure packet 727 loss with high precision (i.e. 10exp(-7)) on live traffic; 728 moreover, by combining end-to-end and per-link measurements, the 729 method is useful to pinpoint the single link that is experiencing 730 loss events. 732 o Reporting Model: the value of the counters has to be sent to a 733 centralized management system that perform the calculations; such 734 samples must contain a reference to the time interval they refer 735 to, so that the management system can perform the correct 736 correlation; the samples have to be sent while the corresponding 737 counter is in a steady state (within a time interval), otherwise 738 the value of the sample should be stored locally. 740 o Dependencies: the values of the counters have to be correlated to 741 the time interval they refer to; moreover, as far the current 742 implementation is based on DSCP values, there are significant 743 dependencies on the usage of the DSCP field: it must be possible 744 to rely on unused DSCP values without affecting QoS-related 745 configuration and behavior; moreover, the intermediate nodes must 746 not change the value of the DSCP field not to alter the 747 measurement. 749 o Organization of Results: the method of measurement produces 750 singletons 752 o Parameters: currently, the main parameter of the method is the 753 time interval used to alternate the colors and read the counters. 755 6. Security Considerations 757 This document specifies a method to perform measurements in the 758 context of a Service Provider's network and has not been developed to 759 conduct Internet measurements, so it does not directly affect 760 Internet security nor applications which run on the Internet. 761 However, implementation of this method must be mindful of security 762 and privacy concerns. 764 There are two types of security concerns: potential harm caused by 765 the measurements and potential harm to the measurements. For what 766 concerns the first point, the measurements described in this document 767 are passive, so there are no packets injected into the network 768 causing potential harm to the network itself and to data traffic. 769 Nevertheless, the method implies modifications on the fly to the IP 770 header of data packets: this must be performed in a way that doesn't 771 alter the quality of service experienced by packets subject to 772 measurements and that preserve stability and performance of routers 773 doing the measurements. The measurements themselves could be harmed 774 by routers altering the coloring of the packets, or by an attacker 775 injecting artificial traffic. Authentication techniques, such as 776 digital signatures, may be used where appropriate to guard against 777 injected traffic attacks. 779 The privacy concerns of network measurement are limited because the 780 method only relies on information contained in the IP header without 781 any release of user data. 783 7. Conclusions 785 The advantages of the method described in this document are: 787 o easy implementation: it can be implemented using features already 788 available on major routing platforms; 790 o low computational effort: the additional load on processing is 791 negligible; 793 o accurate packet loss measurement: single packet loss granularity 794 is achieved with a passive measurement; 796 o potential applicability to any kind of packet/frame -based 797 traffic: Ethernet, IP, MPLS, etc., both unicast and multicast; 799 o robustness: the method can tolerate out of order packets and it's 800 not based on "special" packets whose loss could have a negative 801 impact; 803 o no interoperability issues: the features required to implement the 804 method are available on all current routing platforms. 806 The method doesn't raise any specific need for standardization, but 807 it could be further improved by means of some extension to existing 808 protocols. Specifically, the use of DiffServ bits for coloring the 809 packets could not be a viable solution in some cases: a standard 810 method to color the packets for this specific application could be 811 beneficial. 813 8. IANA Considerations 815 There are no IANA actions required. 817 9. Acknowledgements 819 The authors would like to thank Domenico Laforgia, Daniele Accetta 820 and Mario Bianchetti for their contribution to the definition and the 821 implementation of the method. 823 10. References 825 10.1. Normative References 827 [RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 828 Delay Metric for IPPM", RFC 2679, September 1999. 830 [RFC2680] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 831 Packet Loss Metric for IPPM", RFC 2680, September 1999. 833 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 834 Metric for IP Performance Metrics (IPPM)", RFC 3393, 835 November 2002. 837 10.2. Informative References 839 [I-D.ietf-opsawg-oam-overview] 840 Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 841 Weingarten, "An Overview of Operations, Administration, 842 and Maintenance (OAM) Tools", draft-ietf-opsawg-oam- 843 overview-13 (work in progress), January 2014. 845 [RFC6374] Frost, D. and S. Bryant, "Packet Loss and Delay 846 Measurement for MPLS Networks", RFC 6374, September 2011. 848 [RFC6390] Clark, A. and B. Claise, "Guidelines for Considering New 849 Performance Metric Development", BCP 170, RFC 6390, 850 October 2011. 852 Authors' Addresses 854 Alessandro Capello 855 Telecom Italia 856 Via Reiss Romoli, 274 857 Torino 10148 858 Italy 860 Email: alessandro.capello@telecomitalia.it 862 Mauro Cociglio 863 Telecom Italia 864 Via Reiss Romoli, 274 865 Torino 10148 866 Italy 868 Email: mauro.cociglio@telecomitalia.it 870 Luca Castaldelli 871 Telecom Italia 872 Via Reiss Romoli, 274 873 Torino 10148 874 Italy 876 Email: luca.castaldelli@telecomitalia.it 877 Alberto Tempia Bonda 879 Email: alberto.tempia@gmail.com