idnits 2.17.1 draft-ietf-ippm-npmps-07.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 284 has weird spacing: '... derive the s...' == Line 318 has weird spacing: '... of a s...' == Line 332 has weird spacing: '... (and the p...' == Line 334 has weird spacing: '... dTloss are...' == Line 368 has weird spacing: '... to the clock...' == (3 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 14 looks like a reference -- Missing reference section? '2' on line 55 looks like a reference -- Missing reference section? '3' on line 928 looks like a reference -- Missing reference section? '4' on line 672 looks like a reference -- Missing reference section? '5' on line 257 looks like a reference -- Missing reference section? 'I' on line 262 looks like a reference -- Missing reference section? '6' on line 600 looks like a reference -- Missing reference section? '7' on line 667 looks like a reference -- Missing reference section? '8' on line 673 looks like a reference Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IP Performance Measurement Working Group V.Raisanen 2 Internet Draft Nokia 3 Document: G.Grotefeld 4 Category: Standards Track Motorola 5 A.Morton 6 AT&T Labs 8 Network performance measurement with periodic streams 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026 [1]. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. Internet-Drafts are draft documents valid for a maximum of 19 six months and may be updated, replaced, or made obsolete by other 20 documents at any time. It is inappropriate to use Internet-Drafts as 21 reference material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 1. Abstract 31 This memo describes a periodic sampling method and relevant metrics 32 for assessing the performance of IP networks. First, the memo 33 motivates periodic sampling and addresses the question of its value 34 as an alternative to Poisson sampling described in RFC 2330. The 35 benefits include applicability to active and passive measurements, 36 simulation of constant bit rate (CBR) traffic (typical of multimedia 37 communication, or nearly CBR, as found with voice activity 38 detection), and several instances where analysis can be simplified. 39 The sampling method avoids predictability by mandating random start 40 times and finite length tests. Following descriptions of the 41 sampling method and sample metric parameters, measurement methods 42 and errors are discussed. Finally, we give additional information on 43 periodic measurements including security considerations. 45 2. Conventions used in this document 47 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 48 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 49 document are to be interpreted as described in RFC 2119 [2]. 50 Although RFC 2119 was written with protocols in mind, the key words 51 are used in this document for similar reasons. They are used to 52 ensure the results of measurements from two different 53 implementations are comparable, and to note instances when an 54 implementation could perturb the network. 56 3. Introduction 58 This memo describes a sampling method and performance metrics 59 relevant to certain applications of IP networks. The original driver 60 for this work was Quality of Service of interactive periodic streams 61 such as multimedia conferencing over IP, but the idea of periodic 62 sampling and measurement has wider applicability. Interactive 63 multimedia traffic is used as an example below to illustrate the 64 concept. 66 Transmitting equal size packets (or mostly same-size packets) 67 through a network at regular intervals simulates a constant bit-rate 68 (CBR), or nearly CBR multimedia bit stream. Hereafter, these packets 69 are called periodic streams. Cases of "mostly same-size packets" may 70 be found in applications that have multiple coding methods (e.g. 71 digitally coded comfort noise during silence gaps in speech). 73 In the following sections, a sampling methodology and metrics are 74 presented for periodic streams. The measurement results may be used 75 in derivative metrics such as average and maximum delays. The memo 76 seeks to formalize periodic stream measurements to achieve 77 comparable results between independent implementations. 79 3.1 Motivation 81 As noted in the IPPM framework RFC 2330 [3], a sample metric using 82 regularly spaced singleton tests has some limitations when 83 considered from a general measurement point of view: only part of 84 the network performance spectrum is sampled. However, some 85 applications also sample this limited performance spectrum and their 86 performance may be of critical interest. 88 Periodic sampling is useful for the following reasons: 90 * It is applicable to passive measurement, as well as active 91 measurement. 93 * An active measurement can be configured to match the 94 characteristics of media flows, and simplifies the estimation of 95 application performance. 97 * Measurements of many network impairments (e.g., delay variation, 98 consecutive loss, reordering) are sensitive to the sampling 99 frequency. When the impairments themselves are time-varying (and 100 the variations are somewhat rare, yet important), a constant 101 sampling frequency simplifies analysis. 103 * Frequency Domain analysis is simplified when the samples are 104 equally spaced. 106 Simulation of CBR flows with periodic streams encourages dense 107 sampling of network performance, since typical multimedia flows have 108 10 to 100 packets in each second. Dense sampling permits the 109 characterization of network phenomena with short duration. 111 4. Periodic Sampling Methodology 113 The Framework RFC [3] points out the following potential problems 114 with Periodic Sampling: 116 1. The performance sampled may be synchronized with some other 117 periodic behavior, or the samples may be anticipated and the 118 results manipulated. Unpredictable sampling is preferred. 120 2. Active measurements can cause congestion, and periodic sampling 121 might drive congestion-aware senders into a synchronized state, 122 producing atypical results. 124 Poisson sampling produces an unbiased sample for the various IP 125 performance metrics, yet there are situations where alternative 126 sampling methods are advantageous (as discussed under Motivation). 128 We can prescribe periodic sampling methods that address the problems 129 listed above. Predictability and some forms of synchronization can 130 be mitigated through the use of random start times and limited 131 stream duration over a test interval. The periodic sampling 132 parameters produce bias, and judicious selection can produce a known 133 bias of interest. The total traffic generated by this or any 134 sampling method should be limited to avoid adverse affects on non- 135 test traffic (packet size, packet rate, and sample duration and 136 frequency should all be considered). 138 The configuration parameters of periodic sampling are: 140 + T, the beginning of a time interval where a periodic sample is 141 desired. 142 + dT, the duration of the interval for allowed sample start times. 143 + T0, a time that MUST be selected at random from the interval 144 [T, T+dT] to start generating packets and taking measurements. 145 + Tf, a time, greater than T0, for stopping generation of packets 146 for a sample (Tf may be relative to T0 if desired). 147 + incT, the nominal duration of inter-packet interval, first bit to 148 first bit. 150 T0 may be drawn from a uniform distribution, or T0 = T + Unif(0,dT). 151 Other distributions may also be appropriate. Start times in 152 successive time intervals MUST use an independent value drawn from 153 the distribution. In passive measurement, the arrival of user media 154 flows may have sufficient randomness, or a randomized start time of 155 the measurement during a flow may be needed to meet this 156 requirement. 158 When a mix of packet sizes is desired, passive measurements usually 159 possess the sequence and statistics of sizes in actual use, while 160 active measurements would need to reproduce the intended 161 distribution of sizes. 163 5. Sample metrics for periodic streams 165 The sample metric presented here is similar to the sample metric 166 Type-P-One-way-Delay-Poisson-Stream presented in RFC 2679[4]. 167 Singletons defined in [3] and [4] are applicable here. 169 5.1 Metric name 171 Type-P-One-way-Delay-Periodic-Stream 173 5.2 Metric parameters 175 5.2.1 Global metric parameters 177 These parameters apply in all the sub-sections that follow (5.2.2, 178 5.2.3, and 5.2.4). 180 Parameters that each Singleton usually includes: 181 + Src, the IP address of a host 182 + Dst, the IP address of a host 183 + IPV, the IP version (IPv4/IPv6) used in the measurement 184 + dTloss, a time interval, the maximum waiting time for a packet 185 before declaring it lost. 186 + packet size p(j), the desired number of bytes in the Type-P 187 packet, where j is the size index. 189 Optional parameters: 190 + PktType, any additional qualifiers (transport address) 191 + Tcons, a time interval for consolidating parameters collected at 192 the measurement points. 194 While a number of applications will use one packet size (j = 1), 195 other applications may use packets of different sizes (j > 1). 196 Especially in cases of congestion, it may be useful to use packets 197 smaller than the maximum or predominant size of packets in the 198 periodic stream. 200 A topology where Src and Dst are separate from the measurement 201 points is assumed. 203 5.2.2 Parameters collected at the measurement point MP(Src) 205 Parameters that each Singleton usually includes: 207 + Tstamp(Src)[i], for each packet [i], the time of the packet as 208 measured at MP(Src) 210 Additional parameters: 211 + PktID(Src) [i], for each packet [i], a unique identification or 212 sequence number. 213 + PktSi(Src) [i], for each packet [i], the actual packet size. 215 Some applications may use packets of different sizes, either 216 because of application requirements or in response to IP 217 performance experienced. 219 5.2.3 Parameters collected at the measurement point MP(Dst) 221 + Tstamp(Dst)[i], for each packet [i], the time of the packet as 222 measured at MP(Dst) 223 + PktID(Dst) [i], for each packet [i], a unique identification or 224 sequence number. 225 + PktSi(Dst) [i], for each packet [i], the actual packet size. 227 Optional parameters: 228 + dTstop, a time interval, used to add to time Tf to determine when 229 to stop collecting metrics for a sample 230 + PktStatus [i], for each packet [i], the status of the packet 231 received. Possible status includes OK, packet header corrupt, 232 packet payload corrupt, duplicate, fragment. The criteria to 233 determine the status MUST be specified, if used. 235 5.2.4 Sample Metrics resulting from combining parameters at MP(Src) and 236 MP(Dst) 238 Using the parameters above, a delay singleton would be calculated as 239 follows: 240 + Delay [i], for each packet [i], the time interval 241 Delay[i] = Tstamp(Dst)[i] - Tstamp(Src)[i] 243 For the following conditions, it will not be possible to be able to 244 compute delay singletons: 246 Spurious: There will be no Tstamp(Src)[i] time 247 Not received: There will be no Tstamp (Dst) [i] 248 Corrupt packet header: There will be no Tstamp (Dst) [i] 249 Duplicate: Only the first non-corrupt copy of the packet 250 received at Dst should have Delay [i] computed. 252 A sample metric for average delay is as follows 254 AveDelay = (1/N)Sum(from i=1 to N, Delay[i]) 255 assuming all packets i= 1 though N have valid singletons. 257 A delay variation [5] singleton can also be computed: 259 + IPDV[i], for each packet [i] except the first one, delay 260 variation between successive packets would be calculated as 262 IPDV[I] = Delay[i] - Delay [i-1] 264 IPDV[i] may be negative, zero, or positive. Delay singletons for 265 packets i and i-1 must be calculable or IPDV[i] is undefined. 267 An example metric for the IPDV sample is the range: 269 RangeIPDV = max(IPDV[]) - min(IPDV[]) 271 5.3 High level description of the procedure to collect a sample 273 Beginning on or after time T0, Type-P packets are generated by Src 274 and sent to Dst until time Tf is reached with a nominal interval 275 between the first bit of successive packets of incT as measured at 276 MP(Src). incT may be nominal due to a number of reasons: variation 277 in packet generation at Src, clock issues (see section 5.6), etc. 278 MP(Src) records the parameters above only for packets with 279 timestamps between and including T0 and Tf having the required Src, 280 Dst, and any other qualifiers. MP (Dst) also records for packets 281 with time stamps between T0 and (Tf + dTstop). 283 Optionally at a time Tf + Tcons (but eventually in all cases), the 284 data from MP(Src) and MP(Dst) are consolidated to derive the sample 285 metric results. To prevent stopping data collection too soon, 286 dTcons should be greater than or equal to dTstop. Conversely, to 287 keep data collection reasonably efficient, dTstop should be some 288 reasonable time interval (seconds/minutes/hours), even if dTloss is 289 infinite or extremely long. 291 5.4 Discussion 293 This sampling methodology is intended to quantify the delays and the 294 delay variation as experienced by multimedia streams of an 295 application. Due to the definitions of these metrics, also packet 296 loss status is recorded. The nominal interval between packets 297 assesses network performance variations on a specific time scale. 299 There are a number of factors that should be taken into account when 300 collecting a sample metric of Type-P-One-way-Delay-Periodic-Stream. 302 + The interval T0 to Tf should be specified to cover a long enough 303 time interval to represent a reasonable use of the application 304 under test, yet not excessively long in the same context (e.g. 305 phone calls last longer than 100ms, but less than one week). 307 + The nominal interval between packets (incT) and the packet 308 size(s) (p(j)) should not define an equivalent bit rate that 309 exceeds the capacity of the egress port of Src, the ingress port 310 of Dst, or the capacity of the intervening network(s), if known. 311 There may be exceptional cases to test the response of the 312 application to overload conditions in the transport networks, but 313 these cases should be strictly controlled. 315 + Real delay values will be positive. Therefore, it does not make 316 sense to report a negative value as a real delay. However, an 317 individual zero or negative delay value might be useful as part 318 of a stream when trying to discover a distribution of the 319 delay errors. 321 + Depending on measurement topology, delay values may be as low as 322 100 usec to 10 msec, whereby it may be important for Src and Dst 323 to synchronize very closely. GPS systems afford one way to 324 achieve synchronization to within several 10s of usec. Ordinary 325 application of NTP may allow synchronization to within several 326 msec, but this depends on the stability and symmetry of delay 327 properties among the NTP agents used, and this delay is what we 328 are trying to measure. 330 + A given methodology will have to include a way to determine 331 whether packet was lost or whether delay is merely very large 332 (and the packet is yet to arrive at Dst). The global metric 333 parameter dTloss defines a time interval such that delays larger 334 than dTloss are interpreted as losses. {Comment: For many 335 applications, the treatment a large delay as infinite/loss will 336 be inconsequential. A TCP data packet, for example, that arrives 337 only after several multiples of the usual RTT may as well have 338 been lost.} 340 5.5 Additional Methodology Aspects 342 As with other Type-P-* metrics, the detailed methodology will depend 343 on the Type-P (e.g., protocol number, UDP/TCP port number, size, 344 precedence). 346 5.6 Errors and uncertainties 348 The description of any specific measurement method should include an 349 accounting and analysis of various sources of error or uncertainty. 350 The Framework RFC [3] provides general guidance on this point, but 351 we note here the following specifics related to periodic streams and 352 delay metrics: 354 + Error due to variation of incT. The reasons for this can be 355 uneven process scheduling, possibly due to CPU load. 357 + Errors or uncertainties due to uncertainties in the clocks of the 358 MP(Src) and MP(Dst) measurement points. 360 + Errors or uncertainties due to the difference between 'wire time' 361 and 'host time'. 363 5.6.1. Errors or uncertainties related to Clocks 365 The uncertainty in a measurement of one-way delay is related, in 366 part, to uncertainties in the clocks of MP(Src) and MP(Dst). In the 367 following, we refer to the clock used to measure when the packet was 368 measured at MP(Src) as the MP(Src) clock and we refer to the clock 369 used to measure when the packet was received at MP(Dst) as the 370 MP(Dst) clock. Alluding to the notions of synchronization, 371 accuracy, resolution, and skew, we note the following: 373 + Any error in the synchronization between the MP(Src) clock and 374 the MP(Dst) clock will contribute to error in the delay 375 measurement. We say that the MP(Src) clock and the MP(Dst) clock 376 have a synchronization error of Tsynch if the MP(Src) clock is 377 Tsynch ahead of the MP(Dst) clock. Thus, if we know the value of 378 Tsynch exactly, we could correct for clock synchronization by 379 adding Tsynch to the uncorrected value of Tstamp(Dst)[i] - 380 Tstamp(Src) [i]. 382 + The resolution of a clock adds to uncertainty about any time 383 measured with it. Thus, if the MP(Src) clock has a resolution of 384 10 msec, then this adds 10 msec of uncertainty to any time value 385 measured with it. We will denote the resolution of the source 386 clock and the MP(Dst) clock as ResMP(Src) and ResMP(Dst), 387 respectively. 389 + The skew of a clock is not so much an additional issue as it is a 390 realization of the fact that Tsynch is itself a function of time. 391 Thus, if we attempt to measure or to bound Tsynch, this needs to 392 be done periodically. Over some periods of time, this function 393 can be approximated as a linear function plus some higher order 394 terms; in these cases, one option is to use knowledge of the 395 linear component to correct the clock. Using this correction, 396 the residual Tsynch is made smaller, but remains a source of 397 uncertainty that must be accounted for. We use the function 398 Esynch(t) to denote an upper bound on the uncertainty in 399 synchronization. Thus, |Tsynch(t)| <= Esynch(t). 401 Taking these items together, we note that naive computation 402 Tstamp(Dst)[i] - Tstamp(Src) [i] will be off by Tsynch(t) +/- 403 (ResMP(SRc) + ResMP(Dst)). Using the notion of Esynch(t), we note 404 that these clock-related problems introduce a total uncertainty of 405 Esynch(t)+ Rsource + Rdest. This estimate of total clock-related 406 uncertainty should be included in the error/uncertainty analysis of 407 any measurement implementation. 409 5.6.2. Errors or uncertainties related to Wire-time vs Host-time 411 We would like to measure the time between when a packet is measured 412 and time-stamped at MP(Src) and when it arrives and is time-stamped 413 at MP(Dst) and we refer to these as "wire times." If timestamps are 414 applied by software on Src and Dst, however, then this software can 415 only directly measure the time between when Src generates the packet 416 just prior to sending the test packet and when Dst has started to 417 process the packet after having received the test packet, and we 418 refer to these two points as "host times". 420 To the extent that the difference between wire time and host time is 421 accurately known, this knowledge can be used to correct for wire 422 time measurements and the corrected value more accurately estimates 423 the desired (host time) metric, and visa-versa. 425 To the extent, however, that the difference between wire time and 426 host time is uncertain, this uncertainty must be accounted for in an 427 analysis of a given measurement method. We denote by Hsource an 428 upper bound on the uncertainty in the difference between wire time 429 of MP(Src) and host time on the Src host, and similarly define Hdest 430 for the difference between the host time on the Dst host and the 431 wire time of MP(Dst). We then note that these problems introduce a 432 total uncertainty of Hsource+Hdest. This estimate of total wire-vs- 433 host uncertainty should be included in the error/uncertainty 434 analysis of any measurement implementation. 436 5.6.3. Calibration 438 Generally, the measured values can be decomposed as follows: 440 measured value = true value + systematic error + random error 442 If the systematic error (the constant bias in measured values) can 443 be determined, it can be compensated for in the reported results. 445 reported value = measured value - systematic error 447 therefore 449 reported value = true value + random error 451 The goal of calibration is to determine the systematic and random 452 error generated by the instruments themselves in as much detail as 453 possible. At a minimum, a bound ("e") should be found such that the 454 reported value is in the range (true value - e) to (true value + e) 455 at least 95 percent of the time. We call "e" the calibration error 456 for the measurements. It represents the degree to which the values 457 produced by the measurement instrument are repeatable; that is, how 458 closely an actual delay of 30 ms is reported as 30 ms. 459 {Comment: 95 percent was chosen due to reasons discussed in [4], 460 briefly summarized as (1) some confidence level is desirable to be 461 able to remove outliers, which will be found in measuring any 462 physical property; (2) a particular confidence level should be 463 specified so that the results of independent implementations can be 464 compared.} 465 From the discussion in the previous two sections, the error in 466 measurements could be bounded by determining all the individual 467 uncertainties, and adding them together to form 469 Esynch(t) + ResMP(Src) + ResMP(Dst) + Hsource + Hdest. 471 However, reasonable bounds on both the clock-related uncertainty 472 captured by the first three terms and the host-related uncertainty 473 captured by the last two terms should be possible by careful design 474 techniques and calibrating the instruments using a known, isolated, 475 network in a lab. 477 For example, the clock-related uncertainties are greatly reduced 478 through the use of a GPS time source. The sum of Esynch(t) + 479 ResMP(Src) + ResMP(Dst) is small, and is also bounded for the 480 duration of the measurement because of the global time source. 481 The host-related uncertainties, Hsource + Hdest, could be bounded by 482 connecting two instruments back-to-back with a high-speed serial 483 link or isolated LAN segment. In this case, repeated measurements 484 are measuring the same one-way delay. 486 If the test packets are small, such a network connection has a 487 minimal delay that may be approximated by zero. The measured delay 488 therefore contains only systematic and random error in the 489 instrumentation. The "average value" of repeated measurements is 490 the systematic error, and the variation is the random error. 491 One way to compute the systematic error, and the random error to a 492 95% confidence is to repeat the experiment many times - at least 493 hundreds of tests. The systematic error would then be the median. 494 The random error could then be found by removing the systematic 495 error from the measured values. The 95% confidence interval would 496 be the range from the 2.5th percentile to the 97.5th percentile of 497 these deviations from the true value. The calibration error "e" 498 could then be taken to be the largest absolute value of these two 499 numbers, plus the clock-related uncertainty. {Comment: as 500 described, this bound is relatively loose since the uncertainties 501 are added, and the absolute value of the largest deviation is used. 502 As long as the resulting value is not a significant fraction of the 503 measured values, it is a reasonable bound. If the resulting value 504 is a significant fraction of the measured values, then more exact 505 methods will be needed to compute the calibration error.} 507 Note that random error is a function of measurement load. For 508 example, if many paths will be measured by one instrument, this 509 might increase interrupts, process scheduling, and disk I/O (for 510 example, recording the measurements), all of which may increase the 511 random error in measured singletons. Therefore, in addition to 512 minimal load measurements to find the systematic error, calibration 513 measurements should be performed with the same measurement load that 514 the instruments will see in the field. 516 We wish to reiterate that this statistical treatment refers to the 517 calibration of the instrument; it is used to "calibrate the meter 518 stick" and say how well the meter stick reflects reality. 520 5.6.4 Errors in incT 522 The nominal interval between packets, incT, can vary during either 523 active or passive measurements. In passive measurement, packet 524 headers may include a timestamp applied prior to most of the 525 protocol stack, and the actual sending time may vary due to 526 processor scheduling. For example, H.323 systems are required to 527 have packets ready for the network stack within 5 ms of their ideal 528 time. There may be additional variation from the network between the 529 Src and the MP(Src). Active measurement systems may encounter 530 similar errors, but to a lesser extent. These errors must be 531 accounted for in some types of analysis. 533 5.7 Reporting 535 The calibration and context in which the method is used MUST be 536 carefully considered, and SHOULD always be reported along with 537 metric results. We next present five items to consider: the Type-P 538 of test packets, the threshold of delay equivalent to loss, error 539 calibration, the path traversed by the test packets, and background 540 conditions at Src, Dst, and the intervening networks during a 541 sample. This list is not exhaustive; any additional information that 542 could be useful in interpreting applications of the metrics should 543 also be reported. 545 5.7.1. Type-P 547 As noted in the Framework document [3], the value of a metric may 548 depend on the type of IP packets used to make the measurement, or 549 "type-P". The value of Type-P-One-way-Periodic-Delay could change 550 if the protocol (UDP or TCP), port number, size, or arrangement for 551 special treatment (e.g., IP precedence or RSVP) changes. The exact 552 Type-P used to make the measurements MUST be reported. 554 5.7.2. Threshold for delay equivalent to loss 556 In addition, the threshold for delay equivalent to loss (or 557 methodology to determine this threshold) MUST be reported. 559 5.7.3. Calibration results 561 + If the systematic error can be determined, it SHOULD be removed 562 from the measured values. 563 + You SHOULD also report the calibration error, e, such that the 564 true value is the reported value plus or minus e, with 95% 565 confidence (see the last section.) 567 + If possible, the conditions under which a test packet with finite 568 delay is reported as lost due to resource exhaustion on the 569 measurement instrument SHOULD be reported. 571 5.7.4. Path 573 The path traversed by the packets SHOULD be reported, if possible. 574 In general it is impractical to know the precise path a given packet 575 takes through the network. The precise path may be known for 576 certain Type-P packets on short or stable paths. If Type-P includes 577 the record route (or loose-source route) option in the IP header, 578 and the path is short enough, and all routers on the path support 579 record (or loose-source) route, then the path will be precisely 580 recorded. 582 This may be impractical because the route must be short enough, many 583 routers do not support (or are not configured for) record route, and 584 use of this feature would often artificially worsen the performance 585 observed by removing the packet from common-case processing. 586 However, partial information is still valuable context. For example, 587 if a host can choose between two links (and hence two separate 588 routes from Src to Dst), then the initial link used is valuable 589 context. {Comment: For example, with one commercial setup, a Src on 590 one NAP can reach a Dst on another NAP by either of several 591 different backbone networks.} 593 6. Additional discussion on periodic sampling 595 Fig.1 illustrates measurements on multiple protocol levels that are 596 relevant to this memo. The user's focus is on transport quality 597 evaluation from application point of view. However, to properly 598 separate the quality contribution of the operating system and codec 599 on packet voice, for example, it is beneficial to be able to measure 600 quality at IP level [6]. Link layer monitoring provides a way of 601 accounting for link layer characteristics such as bit error rates. 603 --------------- 604 | application | 605 --------------- 606 | transport | <-- 607 --------------- 608 | network | <-- 609 --------------- 610 | link | <-- 611 --------------- 612 | physical | 613 --------------- 615 Fig. 1: Different possibilities for performing measurements: a 616 protocol view. Above, "application" refers to all layers above L4 617 and is not used in the OSI sense. 619 In general, the results of measurements may be influenced by 620 individual application requirements/responses related to the 621 following issues: 623 + Lost packets: Applications may have varying tolerance to lost 624 packets. Another consideration is the distribution of lost 625 packets (i.e. random or bursty). 626 + Long delays: Many applications will consider packets delayed 627 longer than a certain value to be equivalent to lost packets 628 (i.e. real time applications). 629 + Duplicate packets: Some applications may be perturbed if 630 duplicate packets are received. 631 + Reordering: Some applications may be perturbed if packets arrive 632 out of sequence. This may be in addition to the possibility of 633 exceeding the "long" delay threshold as a result of being out of 634 sequence. 635 + Corrupt packet header: Most applications will probably treat a 636 packet with a corrupt header as equivalent to a lost packet. 637 + Corrupt packet payload: Some applications (e.g. digital voice 638 codecs) may accept corrupt packet payload. In some cases, the 639 packet payload may contain application specific forward error 640 correction (FEC) that can compensate for some level of 641 corruption. 642 + Spurious packet: Dst may receive spurious packets (i.e. packets 643 that are not sent by the Src as part of the metric). Many 644 applications may be perturbed by spurious packets. 646 Depending, e.g., on the observed protocol level, some issues listed 647 above may be indistinguishable from others by the application, it 648 may be important to preserve the distinction for the operators of 649 Src, Dst, and/or the intermediate network(s). 651 6.1 Measurement applications 653 This sampling method provides a way to perform measurements 654 irrespective of the possible QoS mechanisms utilized in the IP 655 network. As an example, for a QoS mechanism without hard guarantees, 656 measurements may be used to ascertain that the "best" class gets the 657 service that has been promised for the traffic class in question. 658 Moreover, an operator could study the quality of a cheap, low- 659 guarantee service implemented using possible slack bandwidth in 660 other classes. Such measurements could be made either in studying 661 the feasibility of a new service, or on a regular basis. 663 IP delivery service measurements have been discussed within the 664 International Telecommunications Union (ITU). A framework for IP 665 service level measurements (with references to the framework for IP 666 performance [3]) that is intended to be suitable for service 667 planning has been approved as I.380 [7]. ITU-T Recommendation I.380 668 covers abstract definitions of performance metrics. This memo 669 describes a method that is useful both for service planning and end- 670 user testing purposes, in both active and passive measurements. 672 Delay measurements can be one-way [3,4], paired one-way, or round- 673 trip [8]. Accordingly, the measurements may be performed either with 674 synchronized or unsynchronized Src/Dst host clocks. Different 675 possibilities are listed below. 677 The reference measurement setup for all measurement types is shown 678 in Fig. 2. 680 ----------------< IP >-------------------- 681 | | | | 682 ------- ------- -------- -------- 683 | Src | | MP | | MP | | Dst | 684 ------- |(Src)| |(Dst) | -------- 685 ------- -------- 687 Fig. 2: Example measurement setup. 689 An example of the use of the method is a setup with a source host 690 (Src), a destination host (Dst), and corresponding measurement 691 points (MP(Src) and MP(Dst)) as shown in Figure 2. Separate 692 equipment for measurement points may be used if having Src and/or 693 Dst conduct the measurement may significantly affect the delay 694 performance to be measured. MP(Src)should be placed/measured close 695 to the egress point of packets from Src. MP(Dst) should be 696 placed/measure close to the ingress point of packets for Dst. 697 "Close" is defined as a distance sufficiently small so that 698 application-level performance characteristics measured (such as 699 delay) can be expected to follow the corresponding performance 700 characteristic between Src and Dst to an adequate accuracy. Basic 701 principle here is that measurement results between MP(Src) and 702 MP(Dst) should be the same as for a measurement between Src and Dst, 703 within the general error margin target of the measurement (e.g., < 1 704 ms; number of lost packets is the same). If this is not possible, 705 the difference between MP-MP measurement and Src-Dst measurement 706 should preferably be systematic. 708 The test setup just described fulfills two important criteria: 1) 709 Test is made with realistic stream metrics, emulating - for example 710 - a full-duplex Voice over IP (VoIP) call. 2) Either one-way or 711 round-trip characteristics may be obtained. 713 It is also possible to have intermediate measurement points between 714 MP(Src) and MP(Dst), but that is beyond the scope of this document. 716 6.1.1 One way measurement 718 In the interests of specifying metrics that are as generally usable 719 as possible, application-level measurements based on one-way delays 720 are used in the example metrics. The implication of application- 721 level measurement for bi-directional applications such as 722 interactive multimedia conferencing is discussed below. 724 Performing a single one-way measurement only yields information on 725 network behavior in one direction. Moreover, the stream at the 726 network transport level does not emulate accurately a full-duplex 727 multimedia connection. 729 6.1.2 Paired one way measurement 731 Paired one way delay refers to two multimedia streams: Src to Dst 732 and Dst to Src for the same Src and Dst. By way of example, for some 733 applications, the delay performance of each one way path is more 734 important than the round trip delay. This is the case for delay- 735 limited signals such as VoIP. Possible reasons for the difference 736 between one-way delays is different routing of streams from Src to 737 Dst vs. Dst to Src. 739 For example, a paired one way measurement may show that Src to Dst 740 has an average delay of 30ms while Dst to Src has an average delay 741 of 120ms. To a round trip delay measurement, this example would look 742 like an average of 150ms delay. Without the knowledge of the 743 asymmetry, we might miss a problem that the application at either 744 end may have with delays averaging more than 100ms. 746 Moreover, paired one way delay measurement emulates a full-duplex 747 VoIP call more accurately than a single one-way measurement only. 749 6.1.3 Round trip measurement 751 From the point of view of periodic multimedia streams, round-trip 752 measurements have two advantages: they avoid the need of host clock 753 synchronization and they allow for a simulation of full-duplex 754 communication. The former aspect means that a measurement is easily 755 performed, since no special equipment or NTP setup is needed. The 756 latter property means that measurement streams are transmitted in 757 both directions. Thus, the measurement provides information on 758 quality of service as experienced by two-way applications. 760 The downsides of round-trip measurement are the need for more 761 bandwidth than an one-way test and more complex accounting of packet 762 loss. Moreover, the stream that is returning towards the original 763 sender may be more bursty than the one on the first "leg" of the 764 round-trip journey. The last issue, however, means in practice that 765 returning stream may experience worse QoS than the out-going one, 766 and the performance estimates thus obtained are pessimistic ones. 767 The possibility of asymmetric routing and queuing must be taken into 768 account during analysis of the results. 770 Note that with suitable arrangements, round-trip measurements may be 771 performed using paired one way measurements. 773 6.2 Statistics calculable from one sample 775 Some statistics may be particularly relevant to applications 776 simulated by periodic streams, such as the range of delay values 777 recorded during the sample. 779 For example, a sample metric generates 100 packets at MP(Src) with 780 the following measurements at MP(Dst): 782 + 80 packets received with delay [i] <= 20 ms 783 + 8 packets received with delay [i] > 20 ms 784 + 5 packets received with corrupt packet headers 785 + 4 packets from MP(Src) with no matching packet recorded at 786 MP(Dst) (effectively lost) 787 + 3 packets received with corrupt packet payload and 788 delay [i] <= 20 ms 789 + 2 packets that duplicate one of the 80 packets received 790 correctly as indicated in the first item 792 For this example, packets are considered acceptable if they are 793 received with less than or equal to 20ms delays and without corrupt 794 packet headers or packet payload. In this case, the percentage of 795 acceptable packets is 80/100 = 80%. 797 For a different application which will accept packets with corrupt 798 packet payload and no delay bound (so long as the packet is 799 received), the percentage of acceptable packets is (80+8+3)/100 = 800 91%. 802 6.3 Statistics calculable from multiple samples 804 There may be value in running multiple tests using this method to 805 collect a "sample of samples". For example, it may be more 806 appropriate to simulate 1,000 two-minute VoIP calls rather than a 807 single 2,000 minute call. When considering collection of multiple 808 samples, issues like the interval between samples (e.g. minutes, 809 hours), composition of samples (e.g. equal Tf-T0 duration, different 810 packet sizes), and network considerations (e.g. run different 811 samples over different intervening link-host combinations) should be 812 taken into account. For items like the interval between samples, 813 the usage pattern for the application of interest should be 814 considered. 816 When computing statistics for multiple samples, more general 817 statistics (e.g. median, percentile, etc.) may have relevance with a 818 larger number of packets. 820 6.4 Background conditions 821 In many cases, the results may be influenced by conditions at Src, 822 Dst, and/or any intervening networks. Factors that may affect the 823 results include: traffic levels and/or bursts during the sample, 824 link and/or host failures, etc. Information about the background 825 conditions may only be available by external means (e.g. phone 826 calls, television) and may only become available days after samples 827 are taken. 829 6.5 Considerations related to delay 831 For interactive multimedia sessions, end-to-end delay is an 832 important factor. Too large a delay reduces the quality of the 833 multimedia session as perceived by the participants. One approach 834 for managing end-to-end delays on an Internet path involving 835 heterogeneous link layer technologies is to use per-domain delay 836 quotas (e.g. 50 ms for a particular IP domain). However, this scheme 837 has clear inefficiencies, and can over-constrain the problem of 838 achieving some end-to-end delay objective. A more flexible 839 implementation ought to address issues like possibility of 840 asymmetric delays on paths, and sensitivity of an application to 841 delay variations in a given domain. There are several alternatives 842 as to the delay statistic one ought to use in managing end-to-end 843 QoS. This question, although very interesting, is not within the 844 scope of this memo and is not discussed further here. 846 7. Security Considerations 848 7.1 Denial of Service Attacks 850 This method generates a periodic stream of packets from one host 851 (Src) to another host (Dst) through intervening networks. This 852 method could be abused for denial of service attacks directed at Dst 853 and/or the intervening network(s). 855 Administrators of Src, Dst, and the intervening network(s) should 856 establish bilateral or multi-lateral agreements regarding the 857 timing, size, and frequency of collection of sample metrics. Use of 858 this method in excess of the terms agreed between the participants 859 may be cause for immediate rejection or discard of packets or other 860 escalation procedures defined between the affected parties. 862 7.2 User data confidentiality 864 Active use of this method generates packets for a sample, rather 865 than taking samples based on user data, and does not threaten user 866 data confidentiality. Passive measurement must restrict attention to 867 the headers of interest. Since user payloads may be temporarily 868 stored for length analysis, suitable precautions MUST be taken to 869 keep this information safe and confidential. 871 7.3 Interference with the metric 873 It may be possible to identify that a certain packet or stream of 874 packets is part of a sample. With that knowledge at Dst and/or the 875 intervening networks, it is possible to change the processing of the 876 packets (e.g. increasing or decreasing delay) that may distort the 877 measured performance. It may also be possible to generate 878 additional packets that appear to be part of the sample metric. 879 These additional packets are likely to perturb the results of the 880 sample measurement. 882 To discourage the kind of interference mentioned above, packet 883 interference checks, such as cryptographic hash, MAY be used. 885 8. IANA Considerations 887 Since this method and metric do not define a protocol or well-known 888 values, there are no IANA considerations in this memo. 890 9. Normative References 892 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 893 9, RFC 2026, October 1996. 895 2 Bradner, S., "Key words for use in RFCs to Indicate Requirement 896 Levels", RFC 2119, March 1997. 898 3 Paxson, V., Almes, G., Mahdavi, J., and Mathis, M., "Framework 899 for IP Performance Metrics", RFC 2330, May 1998. 901 4 Almes, G., Kalidindi, S., and Zekauskas, M., "A one-way delay 902 metric for IPPM", RFC 2679, September 1999. 904 10. Informative References 906 5 Demichelis, C., and Chimento, P., "IP Packet Delay Variation 907 Metric for IPPM", work in progress. 909 6 "End-to-end Quality of Service in TIPHON systems; Part 5: Quality 910 of Service (QoS) measurement methodologies", ETSI standard TS 101 911 329-5 V1.1.2 (2002-01). 913 7 International Telecommunications Union, "Internet protocol data 914 communication service _ IP packet transfer and availability 915 performance parameters", Telecommunications Sector Recommendation 916 I.380 (to be re-designated Y.1540), February 1999. 918 8 Almes, G., Kalidindi, S., and Zekauskas, M., "A round-trip delay 919 metric for IPPM", IETF RFC 2681. 921 11. Acknowledgments 922 The authors wish to thank the chairs of the IPPM WG (Matt Zekauskas 923 and Merike Kaeo) for comments that have made the present draft 924 clearer and more focused. Howard Stanislevic and Will Leland have 925 also presented useful comments and questions. We also acknowledge 926 Henk Uijterwaal's continued challenge to develop the motivation for 927 this method. The authors have built on the substantial foundation 928 laid by the authors of the framework for IP performance [3]. 930 12. Author's Addresses 932 Vilho Raisanen 933 Nokia Networks 934 P.O. Box 300 935 FIN-00045 Nokia Group 936 Finland 937 Phone +358 7180 8000 Fax. +358 9 4376 6852 938 940 Glenn Grotefeld 941 Motorola, Inc. 942 1501 W. Shure Drive, MS 2F1 943 Arlington Heights, IL 60004 USA 944 Phone +1 847 435-0730 Fax +1 847 632-6800 945 947 Al Morton 948 AT&T Labs 949 Room D3 - 3C06 950 200 Laurel Ave. South 951 Middletown, NJ 07748 USA 952 Phone +1 732 420 1571 Fax +1 732 368 1192 953 955 Full Copyright Statement 957 "Copyright (C) The Internet Society (date). All Rights Reserved. 958 This document and translations of it may be copied and furnished to 959 others, and derivative works that comment on or otherwise explain it 960 or assist in its implmentation may be prepared, copied, published 961 and distributed, in whole or in part, without restriction of any 962 kind, provided that the above copyright notice and this paragraph 963 are included on all such copies and derivative works. However, this 964 document itself may not be modified in any way, such as by removing 965 the copyright notice or references to the Internet Society or other 966 Internet organizations, except as needed for the purpose of 967 developing Internet standards in which case the procedures for 968 copyrights defined in the Internet Standards process must be 969 followed, or as required to translate it into languages other than 970 English. 972 The limited permissions granted above are perpetual and will not be 973 revoked by the Internet Society or its successors or assigns. 975 This document and the information contained herein is provided on an 976 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 977 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 978 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 979 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 980 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.