idnits 2.17.1 draft-cociglio-mboned-multicast-pm-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 22, 2010) is 4936 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-02) exists of draft-bipi-mboned-ip-multicast-pm-requirement-00 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MBONED M. Cociglio 3 Internet-Draft A. Capello 4 Intended status: Experimental A. Tempia Bonda 5 Expires: April 25, 2011 L. Castaldelli 6 Telecom Italia 7 October 22, 2010 9 A method for IP multicast performance monitoring 10 draft-cociglio-mboned-multicast-pm-01.txt 12 Abstract 14 This document defines a method to accomplish performance monitoring 15 measurements on live IP flows, including packet loss, one-way delay 16 and jitter. The proposed method is applicable to both unicast and 17 multicast traffic, but only IP multicast streams are considered in 18 this document. The method can be implemented using tools and 19 features already available on IP routers and does not require any 20 protocol extension. For this reason, it does not raise any 21 interoperability issue. However, the method could be further 22 improved by means of some extension to existing protocols, but this 23 aspect is left for further study and it is out of the scope of the 24 document. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on April 25, 2011. 43 Copyright Notice 45 Copyright (c) 2010 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 3. Principle of the method . . . . . . . . . . . . . . . . . . . 5 63 4. Characteristics of the method . . . . . . . . . . . . . . . . 7 64 5. Detailed description of the method . . . . . . . . . . . . . . 9 65 5.1. Packet Loss . . . . . . . . . . . . . . . . . . . . . . . 9 66 5.2. One-way Delay . . . . . . . . . . . . . . . . . . . . . . 12 67 5.3. Inter-arrival jitter . . . . . . . . . . . . . . . . . . . 13 68 6. Deployment considerations . . . . . . . . . . . . . . . . . . 15 69 6.1. Multicast Flow Identification . . . . . . . . . . . . . . 15 70 6.2. Path Discovery . . . . . . . . . . . . . . . . . . . . . . 15 71 6.3. Flow Marking . . . . . . . . . . . . . . . . . . . . . . . 15 72 6.4. Monitoring Nodes . . . . . . . . . . . . . . . . . . . . . 16 73 6.5. Management System . . . . . . . . . . . . . . . . . . . . 17 74 6.6. Scalability . . . . . . . . . . . . . . . . . . . . . . . 17 75 6.7. Interoperability . . . . . . . . . . . . . . . . . . . . . 18 76 7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 77 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 78 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21 79 10. Informative References . . . . . . . . . . . . . . . . . . . . 22 80 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 82 1. Introduction 84 The deployment of video services managed by Service Providers 85 determined the following two main consequences: 87 o a widespread adoption of IP multicast to carry live TV channels 89 o a strong effort to guarantee a user experience comparable to 90 traditional TV broadcasting services 92 The second point implies a reinforced interest in performance 93 monitoring techniques, including packet loss, delay and jitter 94 measurements. As discussed in 95 [I-D.bipi-mboned-ip-multicast-pm-requirement], these techniques 96 should satisfy a few fundamental requirements: 98 o applicability to real traffic 100 o availability of packet loss, delay and jitter measurements 102 o possibility to have both end-to-end and segment-by-segment 103 measures, in order to exploit fault localization 105 o scalability 107 o low interoperability issues 109 Currently available tools are not compliant with all of these 110 requirements, thus the opportunity to work on a new solution. 112 The method described in the present document allows performing packet 113 loss, delay and jitter measurements on real IP multicast streams, on 114 an end-to-end or segment-by-segment basis. In the basic proposal, 115 there are no interoperability issues, since the method doesn't 116 require any extension to existing protocols and can be implemented 117 using tools already available on major routing platforms. 119 2. Terminology 121 Terminology used in this document: 123 o CB Bit (Control Bit): bit used to "mark" traffic to be monitored 125 o Block: sequence of consecutive packets with the CB set to the same 126 value 128 o MI (Marking Interval): duration of a block (it defines the 129 frequency at which CB is changed) 131 o PI (Polling Interval): it defines the frequency at which 132 performance information is collected 134 o NMS: Network Management System 136 3. Principle of the method 138 In order to perform packet loss measurements on real traffic flows, 139 it is generally required to include a sequence number in the packet 140 header and to have an equipment able to extract the sequence number 141 and check in real time if some packets are missing. Such approach 142 can be difficult to implement on real traffic: if UDP is used as the 143 transport protocol the sequence number is not available, on the other 144 hand if a higher layer sequence number (e.g. in the RTP header) is 145 used, extracting the information from the RTP header on every packet 146 and performing the calculation in real-time can be stressing for the 147 equipment. 149 The method proposed in this document is a simple and efficient way to 150 measure packet loss on real traffic streams, without numbering 151 packets or overloading network equipment. The basic idea is to 152 consider the traffic being measured as a sequence of blocks made of 153 consecutive packets. Blocks can be defined based on the number of 154 packets (each block contains a configured fixed number of packets) or 155 on its duration (f.i. blocks are 5 minutes long and the number of 156 packets on each block can vary and depends on the flow rate). In any 157 case blocks must be recognizable unambiguously on every node along 158 the path: by counting on a node the number of packets in each block 159 and comparing the values with those measured by a different router 160 along the path, it is possible to measure packet loss (if any) 161 between the two nodes. 163 Figure 1 represents a simple multicast forwarding tree made of 6 164 nodes and 3 receivers. 166 +-----+ +--------+ 167 | SRC | +--------<> R5 <>--- Recv 1 168 +-----+ | +--------+ 169 | | 170 | | 171 +---<>---+ +--------+ +---<>---+ +--------+ 172 | R1 <>----<> R2 <>----<> R3 <>---<> R6 <>--- Recv 2 173 +--------+ +---<>---+ +--------+ +--------+ 174 | 175 | 176 | +--------+ 177 +---------<> R4 <>--- Recv 3 178 +--------+ 179 <> Interface 180 ---- Link 182 Figure 1: Example of multicast forwarding tree 184 Blocks of consecutive packets are identified using some information 185 in the packet flow itself, for instance a field in the packet header 186 that can assume two different values. The first-hop router (R1 in 187 Figure 1) sets such field and changes it periodically (f.i. every 5 188 minutes or every 100000 packets) alternating the two values and 189 creating a sequence of blocks. All the packets within a block have 190 the field set to the same value and all the packets within the 191 following block have the field set to the second value. If blocks 192 are defined on a time basis, the number of packets in each block is 193 not fixed, but depends on the flow rate. However, since blocks are 194 created on the first-hop router and not modified along the path, all 195 the nodes should count the same number of packets within the same 196 block (if no packet loss occurs). By counting the number of packets 197 in the block on each node and comparing those values, it's possible 198 to unveil any packet loss with the maximum precision (a single packet 199 lost) and to identify where the loss occurred. 201 In the following we will assume to define blocks on a time basis. 203 The same approach can also be used to measure one-way delay and 204 inter-arrival jitter. In this case, the transition from a block to 205 the following one is used as a time reference to calculate the delay 206 between any two nodes in the network. Time synchronization is 207 required in order to have a consistent delay measurement. 209 Inter-arrival jitter can be easily estimated from delay measures and 210 does not require necessarily synchronization between the nodes. 212 4. Characteristics of the method 214 The method described in this document fulfills all the requirements 215 described in , in addition it is characterized by the following 216 advantages: 218 o easy implementation (use of features already available on major 219 routing platforms) 221 o low computational effort 223 o highly precise packet loss measurement (single packet loss 224 granularity) 226 o applicability to any kind of IP traffic (unicast and multicast) 228 o independence from the flow bit rate 230 o independence from higher level protocols (e.g. RTP, etc.) or 231 video coding (e.g. MPEG, etc.) 233 o no interoperability issues 235 Figure 2 represents a subtree of the multicast forwarding tree 236 depicted in figure 1 and shows how the method can be used to measure 237 packet loss (or one-way delay and inter-arrival jitter) on different 238 network segments. 240 +-----+ 241 | SRC | 242 +-----+ 243 | 244 | 245 +---<>---+ +--------+ +--------+ +--------+ 246 | R1 <>----<> R2 <>---<> R3 <>----<> R6 <>--- Recv 2 247 +--------+ +--------+ +--------+ +--------+ 248 . . . . . . 249 . . . . . . 250 . <--------> <------> . 251 . Node Packet Loss Link Packet Loss . 252 . . 253 <---------------------------------------------------> 254 End-to-End Packet loss 256 Figure 2: Available measurements 258 By applying the method on different interfaces along the multicast 259 distribution tree, it is possible to measure packet loss across a 260 single link, across a node (e.g. due to queuing management) or end- 261 to-end. In general, it is possible to monitor any segment of the 262 network. 264 5. Detailed description of the method 266 This section describes more in detail the application of the method 267 for measuring packet loss, one-way delay and jitter in packet- 268 switched networks. 270 5.1. Packet Loss 272 Figure 1 shows how the method described in this document can be used 273 to measure the packet loss across a link between two adjacent nodes. 274 For example, referring Figure 1, we are interested in monitoring the 275 packet loss on the link between R1 and R2. According to the method 276 briefly described in Section 3, since router R1 is the first-hop 277 router, it is responsible for marking the field in the packet header. 278 As discussed before, a single bit is sufficient to this purpose : the 279 bit used to mark the traffic is called Control Bit (CB bit). By 280 assuming alternately on each period values 0 and 1, the Control Bit 281 generates a sort of square-wave signal and the original traffic flow 282 is converted in a sequence of blocks. The semi-period T/2 of the 283 square-wave is called Marking Interval (MI) and corresponds to the 284 duration of each single block. The action of "marking" the traffic 285 (setting the Control Bit) can be executed on the ingress interface of 286 R1. On the egress interface of R1 two counters, named C(0)R1 and 287 C(1)R1, will count the number of packets with the CB bit set to 0 and 288 1 respectively. As long as traffic is marked to 0, only counter 289 C(0)R1 is incremented while C(1)R1 doesn't change. Counters C(0)R1 290 and C(1)R1 can be used as reference values to determine the packet 291 loss from R1 to R2 (or to other nodes along the path toward the 292 destination). 294 Router R2, similarly, will instantiate on its ingress interface two 295 counters, C(0)R2 and C(1)R2, to count the number of packets received 296 with the CB bit set to 0 and 1 respectively. By comparing C(0)R1 297 with C(0)R2 and C(1)R1 with C(1)R2 and repeating this operation on 298 every block, it is possible to detect the number of packets lost in 299 the link between R1 and R2. 301 Similarly, using 2 counters on the R2 egress interface and on every 302 interface along the path, it is possible to use them to determine 303 packet loss on every network segment and therefore detect where 304 packet losses occur. 306 T/2 T 307 <------> <--------------> 308 +-------+ +-------+ 309 | | | | 310 +-------+ +-------+ +------- 311 Control Bit 0000000011111111000000001111111100000000 313 Block Block Block Block Block 314 <------><------><------><------><------> 316 +---------+ +---------+ 317 -------> <> R1 <> -----------------------> <> R2 <> ---> 318 +---------+ +---------+ 320 Figure 3: Application of the method to compute link packet loss 322 The method doesn't require any synchronization in the network, as the 323 traffic flow implicitly carries the synchronization in the 324 alternation of values of the Control Bit. 326 Table 1 shows an example of the use of router counters to calculate 327 the packet loss between R1 and R2. Time is expressed in minutes and 328 we assume to check counter values on each router every two minutes 329 (it doesn't matter if R1 and R2 are not synchronized). We assume 330 also that the Marking Interval is 5 minutes, meaning that the CB bit 331 changes every 5 minutes. 333 The columns contain the values of C(0) and C(1) for both R1 and R2, 334 in particular, the table shows the values they assume every 2 335 minutes. Counters increases according to the Control Bit: when CB is 336 0, only C(0) increases and C(1) is still, when CB is 1, only C(1) 337 increases and C(0) is still. Packet loss calculation must be 338 performed when a counter is stable, because it means that a block is 339 terminated and we can count exactly the number of packets within that 340 block. 342 +------+--------+--------+--------+--------+ 343 | Time | C(0)R1 | C(1)R1 | C(0)R2 | C(1)R2 | 344 +------+--------+--------+--------+--------+ 345 | 0 | 0 | 0 | 0 | 0 | 346 | | | | | | 347 | 2 | 112 | 0 | 110 | 0 | 348 | | | | | | 349 | 4 | 234 | 0 | 237 | 0 | 350 | | | | | | 351 | 6 | 277 | 103 | 277 | 101 | 352 | | | | | | 353 | 8 | 277 | 212 | 277 | 210 | 354 | | | | | | 355 | 10 | 277 | 259 | 277 | 256 | 356 | | | | | | 357 | 12 | 403 | 262 | 401 | 261 | 358 | | | | | | 359 | 14 | 827 | 262 | 819 | 261 | 360 +------+--------+--------+--------+--------+ 362 Table 1: Evaluation of counters for packet loss measurements 364 For example, looking at Table 1, traffic is initially marked with 365 CB=0 because only C(0)R1 and C(0)R2 increase, while C(1) counters are 366 still. At minute 6, C(1) counters have started moving while C(0) 367 counters have stopped (in fact at minute 8 they have the same values 368 they had at minute 6): it means that the block with CB=0 is 369 terminated and the flow is now being marked with CB=1. Hence the 370 value of C(0) counters gives the exact number of packets transmitted 371 in that block. Comparing C(0)R1 and C(0)R2 at minute 8 it is 372 possible to verify if any packet of the first block was lost in the 373 link between R1 and R2 (in the case shown in the table C(0)R1 = 374 C(0)R2 = 277, meaning that no packets were lost). At minute 12, C(0) 375 counters have started moving again while C(1) counters have stopped 376 (at minute 14 they have the same values they had at minute 12): it 377 means now that the block with CB=1 is terminated and the flow is now 378 being marked again with CB=0. The value of C(1) counters gives the 379 exact number of packets transmitted in the block just terminated. 380 Comparing C(1)R1 and C(1)R2 at minute 14 it is possible to verify if 381 any packet of that block was lost (this time C(1)R1 = 262 and C(1)R2 382 = 261, meaning that 1 packet was lost). 384 The same method can be applied to more complex networks, as far as 385 the measurement is enabled on the path followed by the traffic flow 386 being analyzed. 388 5.2. One-way Delay 390 The method to measure one-way delay directly refers to the packet 391 loss method. The event when the marking changes from 0 to 1 or vice 392 versa is used as a time reference to calculate the delay. 393 Considering again the example depicted in Figure 1, R1 will record as 394 an event every change in the marking, by storing a timestamp TS R1 395 every time it sends the first packet of a block. R2 will do the same 396 operation, recording TS R2 every time it receives the first packet of 397 a block. By comparing TS R1 and TS R2 it's possible to calculate the 398 delay between R1 and R2. 400 In order to coherently compare the timestamps collected on different 401 routers, synchronization is required in the network. Moreover, the 402 measurement can be considered valid only if no packet loss occurred. 403 If some packets are lost it is possible that the first packet of a 404 block on R1 is not the first packet of the same block on R2. 406 Going into details, whenever an interface sends/receives the first 407 packet of a block (that is a packet with Control Bit set to 0 or 1, 408 while previous packets were marked with the opposite value), a 409 timestamp should be recorded. By comparing timestamps recorded on 410 different nodes in the network, it is possible to calculate the delay 411 on each network segment. As stated before, synchronization is 412 required to get a reliable delay measurement. 414 Table 2 considers the same example of Figure 1, but both packet loss 415 and one-way delay are now measured. Time is expressed in minutes, 416 while timestamps are expressed in milliseconds (hours and minutes are 417 omitted for simplicity). We assume to check counters and timestamp 418 values on each router every two minutes and we assume the Marking 419 Interval is 5 minutes. Routers R1 and R2, besides incrementing 420 counters C(0) and C(1), now also set a timestamp whenever the 421 corresponding counter begins incrementing (i.e. the first packet is 422 sent/received). 424 +-------+-----+--------+-----+--------+-----+--------+-----+--------+ 425 | Time | R1 | TS0 R1 | R1 | TS1 R1 | R2 | TS0 R2 | R2 | TS1 R2 | 426 | (min) | C0 | (sec) | C1 | (sec) | C0 | (sec) | C1 | (sec) | 427 +-------+-----+--------+-----+--------+-----+--------+-----+--------+ 428 | 0 | 0 | - | 0 | - | 0 | - | 0 | - | 429 | | | | | | | | | | 430 | 2 | 112 | 7.483 | 0 | - | 110 | 7.487 | 0 | - | 431 | | | | | | | | | | 432 | 4 | 234 | - | 0 | - | 237 | - | 0 | - | 433 | | | | | | | | | | 434 | 6 | 277 | - | 103 | 3.621 | 277 | - | 101 | 3.626 | 435 | | | | | | | | | | 436 | 8 | 277 | - | 212 | - | 277 | - | 210 | - | 437 | | | | | | | | | | 438 | 10 | 277 | - | 259 | - | 277 | - | 256 | - | 439 | | | | | | | | | | 440 | 12 | 403 | 5.752 | 262 | - | 401 | 5.757 | 262 | - | 441 | | | | | | | | | | 442 | 14 | 827 | - | 262 | - | 819 | - | 262 | - | 443 +-------+-----+--------+-----+--------+-----+--------+-----+--------+ 445 Table 2: Evaluation of counters for delay measurements 447 At minute 2, C(0) counters have started moving on both routers and 448 the first timestamp (relative to the first packet with CB=0) is 449 recorded: R1 timestamp is 7.483, R2 timestamp is 7.487. Notice that 450 those timestamps refer to the same packet because the first packet of 451 the block is the same on both routers (if no packet loss has 452 occurred): therefore they can be compared and, if we assume that R1 453 and R2 are synchronized, they can be used to measure the delay 454 between R1 and R2 (4 msec). At minute 6 the marking has changed, 455 C(0) counters have stopped and C(1) counters have started moving: it 456 means that a new block with CB=1 has started, therefore R1 and R2 457 record a new timestamp. The new timestamp refers to the first packet 458 of the block with CB=1 (which is the same packet on both routers). 459 R1 timestamp is 3.621, R2 timestamp is 3.626; again, the two values 460 are comparable and the delay is 5 msec. 462 It is possible to perform more than one delay measurement per period 463 by taking not only the timestamp of the first packet of each block, 464 but also the timestamp of other packets within the same block. What 465 is required is packets triggering timestamps being the same on every 466 router along the path. 468 5.3. Inter-arrival jitter 470 Similarly to one-way delay measurement, the method to evaluate the 471 inter-arrival jitter directly refers to the packet loss method. 473 Again, the event when the marking changes from 0 to 1 or vice versa 474 is used as a time reference to record timestamps: considering the 475 example depicted in Figure 1, R1 will store a timestamp TS R1 every 476 time it sends the first packet of a block and R2 will record a 477 timestamp TS R2 every time it receives the first packet of a block. 479 The inter-arrival jitter can be easily derived from one-way delay 480 measurement. For example, it is possible to evaluate the jitter 481 calculating the delay variation on two consecutive samples: 482 considering the values shown in Table 2, since the measured delay is 483 4 msec for the first sample and 5 msec for the second sample, the 484 derived jitter is 1 msec. 486 In this case, synchronization in the network is not strictly required 487 because it is compensated by jitter calculation. 489 6. Deployment considerations 491 This section describes some aspects that should be taken into account 492 when the method is deployed in a real network. For sake of 493 simplicity, we consider a network scenario where only packet loss is 494 being measured, but all the considerations are valid and can be 495 easily extended to one-way delay and inter-arrival jitter measurement 496 as well. 498 6.1. Multicast Flow Identification 500 The first thing to do in order to monitor multicast traffic in a real 501 network is to identify the flow to be monitored. The method 502 described in this document is able to monitor a single multicast 503 stream or multiple flows grouped together, but in this case 504 measurement is consistent only if all the flows in the group follow 505 the same path. Moreover, a network operator must be aware that, if 506 measurement is performed on many streams, it is not possible to 507 determine exactly which flow was affected by packets loss (all the 508 flows are considered as a single stream by the monitoring system). 510 6.2. Path Discovery 512 Once the multicast stream(s) to be monitored is identified, it is 513 important to enable the monitoring system in the proper nodes. In 514 order to have just an end-to-end monitoring it is sufficient to 515 enable the monitoring system on the first and last-hop routers of the 516 path: the mechanism is completely transparent to intermediate nodes 517 and independent from the path followed by multicast streams. At the 518 contrary, to monitor the flow along its whole path and on every 519 segment (every node and link) it is necessary to enable monitoring on 520 every node from the source to the destination. To this purpose it 521 isn't strictly required to know the exact path followed by the flow. 522 If, for example, the flow has multiple paths to reach a destination, 523 it is sufficient to enable the monitoring system on every path, then 524 a Management System will process just the right information (or it 525 will process all the counters but some of them will be zero, meaning 526 that the considered flow is not flowing through the corresponding 527 interface). 529 6.3. Flow Marking 531 Once the multicast stream is identified and its path is known, it is 532 necessary to "mark" the flow so to create packet blocks. This means 533 choosing where to activate the marking and how to "mark" packets. 535 Regarding the first point, it is desirable, in general, to have a 536 single marking node because it is simpler to manage and doesn't rise 537 any risk of conflict (consider the case where two nodes mark the same 538 flow). To this purpose it is necessary to mark the flow as close as 539 possible to the multicast source, f.i. on the first router downstream 540 to multicast sources where all the multicast streams can be marked. 541 In addition, marking a flow close to the source allows an end-to-end 542 measurement if a measurement point is enabled on the last-hop router 543 as well. Theoretically, the flow could be marked before the first- 544 hop router, directly by the sources: in this case the first-hop 545 router just need to count packets of each block and acts as an 546 intermediate node. The only requirement is that marking must change 547 periodically and every node along the path must be able to identify 548 unambiguously marked packets. 550 On the contrary, if many marking nodes are required, it is important 551 that each marking node marks different flows so to avoid "marking 552 conflicts" that would invalidate measurements. 554 Regarding the second point, as described in Section 5.1, a field in 555 the IP header could be sufficient for this purpose. As an example, 556 it is possible to use the two less significant bits of the DSCP field 557 (bit 0 and bit 1). One of them (bit 0) is always set to value 1 and 558 is used to identify the flow to be measured, the other one (bit 1) is 559 changed periodically and assumes alternately values 0 and 1. This 560 way traffic flow is transformed in a sequence of blocks where each 561 block has all the packets with bit 1 of DSCP field set to the same 562 value (0 or 1). Of course, marking can be based on DSCP field if 563 differentiated packet scheduling is not based on that field and, for 564 instance, it is based only on IP Precedence bits. 566 In practice, the marking using the DSCP field can be performed 567 configuring on the first-hop router an access list that intercepts 568 the flow(s) to be measured and a policy that sets the DSCP field 569 accordingly. Flows to be measured can be changed easily modifying 570 the access list. Moreover, since traffic marking must change to 571 create traffic blocks, it is necessary to change the policy 572 periodically: this can be done for example using an automatic script 573 that periodically modifies the configuration. 575 6.4. Monitoring Nodes 577 The operation of marking flows to be monitored can be accomplished by 578 a single node, namely the first-hop router. All the intermediate 579 nodes are not required to perform any particular operation except 580 counting marked packets they receive and forward: this operation can 581 be enabled on every router along the multicast forwarding tree or 582 just on a small subset, depending on which network segment we want to 583 monitor (a single link, a particular metro area, the backbone, the 584 whole path). 586 The operation of counting packets on intermediate nodes is very 587 simple and can be accomplished f.i. configuring an access list that 588 intercepts packets belonging to the multicast group being monitored 589 with certain DSCP values (those configured on the first-hop router 590 and used to mark the flow). This way only "marked" packets will be 591 counted. Since marking changes periodically between two values, two 592 counters (one for each value) are needed for a single flow being 593 monitored: one counter for packets with CB = 0 and one counter for 594 packets with CB = 1. 596 Marking and counting are two decoupled operations: it is possible to 597 mark all the multicast flows on the source but monitor just one or 598 few flows, by enabling counters only for the intended streams. 600 6.5. Management System 602 Nodes enabled to perform performance monitoring collect counters 603 relative to multicast flows, but they are not able to use this 604 information to measure packet loss, because they only have local 605 information and lack a global view of the network. For this reason 606 an external Network Management System (NMS) is required to collect 607 and elaborate data and to perform packet loss calculation. The NMS 608 compares values of counters from different nodes and is then able to 609 determine if some packets were lost (even a single packet) and also 610 where packets were lost. 612 Information collected by the routers (counter values) needs to be 613 transferred to the NMS periodically. This can be accomplished f.i. 614 via FTP or TFTP and can be done in Push Mode or Polling Mode. In the 615 first case, each router sends periodically the information it 616 collects to the NMS, in the latter case it is the NMS that 617 periodically polls routers to collect information. In any case, the 618 Polling Interval (PI) should be compliant with the Shannon theorem: 619 (PI < MI / 2). This means that the Management System should collect, 620 during every Marking Interval, at least two samples of each counter 621 (in order to determine if the counter is incrementing or is still 622 within the considered interval). 624 6.6. Scalability 626 This section describes what is needed on a node in order to enable 627 the performance measurement system to the purpose of understand its 628 scalability. 630 Regarding the marking, it is preferable to have a single marking node 631 for reasons explained in Section 6.3. The marking can be easily 632 performed on a single multicast flow as well as on the entire 633 multicast traffic. What is needed for example is a single policy 634 that marks all the intended traffic with a specific DSCP value: this 635 operation doesn't raise any scalability issue, since it is generally 636 performed by routers for QoS purposes. 638 Regarding the counting, what is needed are two counters for every 639 flow (or group of flows) being monitored and for every interface 640 where the monitoring system is activated. For example, in order to 641 monitor 3 multicast flows on a router with 4 interfaces involved, 24 642 counters are needed (2 counters for each of the 3 flows on each of 643 the 4 interfaces). If access lists are used to count packets, a 644 single ACL can be used to count packets of many flows (access list 645 entries will increase with the number of flows), but a different 646 access list is required on every interface. 648 The number of counters and access lists can easily increase with the 649 number of flows and interfaces, however monitoring is not required on 650 every interface (it should be activated only on interfaces belonging 651 to the multicast forwarding tree). Besides, it can be sufficient to 652 monitor few flows to have a monitoring system that spans the whole 653 network because multicast flows follow the shortest path which is 654 usually the same for all the streams (except in case of multiple 655 equal cost paths), therefore flows using the same path are subject to 656 give similar performance results. 658 6.7. Interoperability 660 The method described in this document doesn't raise any 661 interoperability issue, since it doesn't require any new protocol or 662 any kind of interaction among nodes. Traffic marking can be 663 performed by a single node, while counting of packets is performed 664 locally by each router and the correlation between counters is done 665 by an external NMS. 667 The only requirement is that every node should be able to identify 668 marked flows, but, as explained in Sections 6.3 and 6.4, this can be 669 accomplished using simple functionalities that doesn't have any 670 interoperability issue and are already available on major routing 671 platforms. 673 7. Security Considerations 675 This document specifies a method to perform measurements in the 676 context of a Service Provider's network and has not been developed to 677 conduct Internet measurements, so it does not directly affect 678 Internet security nor applications which run on the Internet. 679 However, implementation of this method must be mindful of security 680 and privacy concerns. 682 There are two types of security concerns: potential harm caused by 683 the measurements and potential harm to the measurements. For what 684 concerns the first point, the measurements described in this document 685 are passive, so there are no packets injected into the network 686 causing potential harm to the network itself and to data traffic. 687 Nevertheless, the method implies modifications on the fly to the IP 688 header of data packets: this must be performed in a way that doesn't 689 alter the quality of service experienced by packets subject to 690 measurements and that preserve stability and performance of routers 691 doing the measurements. The measurements themselves could be harmed 692 by routers altering the marking of the packets, or by an attacker 693 injecting artificial traffic. Authentication techniques, such as 694 digital signatures, may be used where appropriate to guard against 695 injected traffic attacks. 697 The privacy concerns of network measurement are limited because the 698 method only relies on information contained in the IP header without 699 any release of user data. 701 8. IANA Considerations 703 There are no IANA actions required. 705 9. Acknowledgements 707 The authors would like to thank Domenico Laforgia, Daniele Accetta 708 and Mario Bianchetti for their contribution to the definition and the 709 implementation of the method. The authors would also like to thank 710 Paolo Fasano and Matteo Cravero for their useful suggestions. 712 10. Informative References 714 [I-D.bipi-mboned-ip-multicast-pm-requirement] 715 Bianchetti, M., Picciano, G., Chen, M., and J. Qiu, 716 "Requirements for IP multicast performance monitoring", 717 draft-bipi-mboned-ip-multicast-pm-requirement-00 (work in 718 progress), July 2009. 720 Authors' Addresses 722 Mauro Cociglio 723 Telecom Italia 724 Via Reiss Romoli, 274 725 Torino 10148 726 Italy 728 Email: mauro.cociglio@telecomitalia.it 730 Alessandro Capello 731 Telecom Italia 732 Via Reiss Romoli, 274 733 Torino 10148 734 Italy 736 Email: alessandro.capello@telecomitalia.it 738 Alberto Tempia Bonda 739 Telecom Italia 740 Via Reiss Romoli, 274 741 Torino 10148 742 Italy 744 Email: alberto.tempiabonda@telecomitalia.it 746 Luca Castaldelli 747 Telecom Italia 748 Via Reiss Romoli, 274 749 Torino 10148 750 Italy 752 Email: luca.castaldelli@telecomitalia.it