idnits 2.17.1 draft-morton-ippm-advance-metrics-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 336 has weird spacing: '...seconds micr...' == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 25, 2010) is 4930 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'NistNet' is mentioned on line 477, but not defined == Unused Reference: 'RFC4814' is defined on line 541, but no explicit reference was found in the text == Unused Reference: 'RFC5226' is defined on line 545, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2679 (Obsoleted by RFC 7679) ** Obsolete normative reference: RFC 2680 (Obsoleted by RFC 7680) ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) Summary: 3 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Morton 3 Internet-Draft AT&T Labs 4 Intended status: Informational October 25, 2010 5 Expires: April 28, 2011 7 Lab Test Results for Advancing Metrics on the Standards Track 8 draft-morton-ippm-advance-metrics-02 10 Abstract 12 This memo supports the process of progressing performance metric RFCs 13 along the standards track. Observing that the metric definitions 14 themselves should be the primary focus rather than the 15 implementations of metrics, this memo describes results of example 16 lab test procedures to evaluate specific metric RFC requirement 17 clauses to determine if the requirement has been implemented as 18 intended. A single implementation has been tested against the key 19 specifications of RFC 2679 on One-way Delay. 21 Requirements Language 23 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 24 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 25 document are to be interpreted as described in RFC 2119 [RFC2119]. 27 Status of this Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on April 28, 2011. 44 Copyright Notice 46 Copyright (c) 2010 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 This document may contain material from IETF Documents or IETF 60 Contributions published or made publicly available before November 61 10, 2008. The person(s) controlling the copyright in some of this 62 material may not have granted the IETF Trust the right to allow 63 modifications of such material outside the IETF Standards Process. 64 Without obtaining an adequate license from the person(s) controlling 65 the copyright in such materials, this document may not be modified 66 outside the IETF Standards Process, and derivative works of it may 67 not be created outside the IETF Standards Process, except to format 68 it for publication as an RFC or to translate it into languages other 69 than English. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. A Definition-centric metric advancement process . . . . . . . 5 75 3. Lab test results to check metric definitions . . . . . . . . . 6 76 3.1. One-way Delay, Loss threshold, RFC 2679 . . . . . . . . . 7 77 3.1.1. NetProbe Lab results for Loss Threshold . . . . . . . 7 78 3.1.2. XXX Lab Results for Loss Threshold . . . . . . . . . . 8 79 3.1.3. Conclusions on Lab Results for Loss Threshold . . . . 8 80 3.2. One-way Delay, First-bit to Last bit, RFC 2679 . . . . . . 8 81 3.2.1. NetProbe Lab results for Serialization . . . . . . . . 8 82 3.3. One-way Delay, Difference Sample Metric (Lab) . . . . . . 9 83 3.3.1. NetProbe Lab results for Differential Delay . . . . . 10 84 3.4. One-way Delay, ADK Sample Metric (Lab) . . . . . . . . . . 10 85 3.4.1. NetProbe Lab results for ADK . . . . . . . . . . . . . 11 86 3.5. Error Calibration, RFC 2679 . . . . . . . . . . . . . . . 11 87 3.5.1. Net Probe Error and Type-P . . . . . . . . . . . . . . 11 88 4. Notes on Network Emulator Loss Generation . . . . . . . . . . 11 89 5. Security Considerations . . . . . . . . . . . . . . . . . . . 12 90 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 91 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 92 8. Normative References . . . . . . . . . . . . . . . . . . . . . 12 93 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13 95 1. Introduction 97 The IETF (IP Performance Metrics working group) has been considering 98 how to advance their metrics along the standards track since 2001, 99 with the initial publication of Bradner/Paxson/Mankin's memo [ref to 100 work in progress, draft-bradner-metricstest-]. The original proposal 101 was to compare the results of implementations of the metrics, because 102 the usual procedures for advancing protocols did not appear to apply. 103 It was found to be difficult to achieve consensus on exactly how to 104 compare implementations, since there were many legitimate sources of 105 variation that would emerge in the results despite the best attempts 106 to keep the network path equal for both, and because considerable 107 variation was allowed in the parameters of each metric. 109 A renewed work effort sought to investigate ways in which the 110 measurement variability could be reduced and thereby simplify the 111 problem of comparison for equivalence. An earlier version of this 112 draft, titled "Problems and Possible Solutions for Advancing Metrics 113 on the Standards Track", brought many issues to light and offered 114 some solutions. Sections from the earlier draft has now been 115 combined with [draft-geib-ippm-metrictest] resulted in an IPPM 116 working group draft, [draft-ippm-metrictest-00.txt]. The plan now 117 emphasizes evaluating the metric specifications themselves, as a 118 result of this interaction. 120 There is now consensus that the metric definitions should be the 121 primary focus rather than the implementations of metrics, and 122 equivalent results are deemed to be evidence that the metric 123 specifications are clear and unambiguous. This is the metric 124 specification equivalent of protocol interoperability. The 125 advancement process either produces confidence that the metric 126 definitions and supporting material are clearly worded and 127 unambiguous, OR, identifies ways in which the metric definitions 128 should be revised to achieve clarity. 130 The process should also permit identification of options that were 131 not implemented, so that they can be removed from the advancing 132 specification (this is an aspect more typical of protocol advancement 133 along the standards track). 135 This memo's purpose is to add more support for the current approach 136 as the author perceives it to be. It was prepared to help progress 137 discussions on the topic of metric advancement, both through e-mail 138 and at the upcoming IPPM meeting at IETF-79 in Beijing. 140 Another aspect of the metric RFC advancement process which has 141 received limited attention is the requirement to document the work 142 and results. The procedures of [RFC2026] are expanded in[RFC5657], 143 including sample implementation and interoperability reports. 144 Section 3 of this memo can serve as a template for the report that 145 accompanies the protocol action request submitted to the Area 146 Director, including description of the test set-up, procedures, 147 results for each implementation and conclusions. 149 We have also agreed that test plan and procedures should include the 150 threshold for determining equivalence, and this information should be 151 available in advance of cross-implementation comparisons. This memo 152 investigates that topic by outlining a procedure that includes same- 153 implementation comparisons to help set the equivalence threshold. 155 This memo also discusses an issue with some network emulators, namely 156 correlated loss or burst loss generation. 158 Finally, this memo is also an open invitation to developers or 159 testers who would be willing to use their equipment to help advance 160 the IPPM metrics through lab tests, like the tests described below. 162 2. A Definition-centric metric advancement process 164 The process described in Section 3.5 of 165 [draft-ippm-metrictest-00.txt] takes as a first principle that the 166 metric definitions, embodied in the text of the RFCs, are the objects 167 that require evaluation and possible revision in order to advance to 168 the next step on the standards track. 170 IF two implementations do not measure an equivalent singleton, or 171 sample, or produce the an equivalent statistic, 173 AND sources of measurement error do not adequately explain the lack 174 of agreement, 176 THEN the details of each implementation should be audited along with 177 the exact definition text, to determine if there is a lack of clarity 178 that has caused the implementations to vary in a way that affects the 179 correspondence of the results. 181 IF there was a lack of clarity or multiple legitimate interpretations 182 of the definition text, 184 THEN the text should be modified and the resulting memo proposed for 185 consensus and advancement along the standards track. 187 Finally, all the findings MUST be documented in a report that can 188 support advancement on the standards track, similar to those 189 described in [RFC5657]. The list of measurement devices used in 190 testing satisfies the implementation requirement, while the test 191 results provide information on the quality of each specification in 192 the metric RFC (the surrogate for feature interoperability). 194 The figure below illustrates this process: 196 ,---. 197 / \ 198 ( Start ) 199 \ / Implementations 200 `-+-' +-------+ 201 | /| 1 `. 202 +---+----+ / +-------+ `.-----------+ ,-------. 203 | RFC | / |Check for | ,' was RFC `. YES 204 | | / |Equivalence..... clause x -------+ 205 | |/ +-------+ |under | `. clear? ,' | 206 | Metric \.....| 2 ....relevant | `---+---' +----+---+ 207 | Metric |\ +-------+ |identical | No | |Report | 208 | Metric | \ |network | +---+---. |results+| 209 | ... | \ |conditions | |Modify | |Advance | 210 | | \ +-------+ | | |Spec +----+ RFC | 211 +--------+ \| n |.'+-----------+ +-------+ |request?| 212 +-------+ +--------+ 214 3. Lab test results to check metric definitions 216 This section describes some results from lab tests with test devices 217 and a network emulator to create relevant conditions and determine 218 whether the metric definitions were interpreted consistently by 219 implementors. The procedures are slightly modified from the original 220 procedures contained in Appendix A.1 of 221 [draft-ippm-metrictest-00.txt]. The principle modification the use 222 of the mean statistic for comparisons. 224 The metric implementation used was NetProbe version 5.8.5, (an 225 earlier version is used in the WIPM system and deployed world-wide). 226 Accuracy of NetProbe measurements is usually limited by NTP 227 synchronization performance (~1ms error or greater), although this 228 lab environment often exhibits errors much less than typical for NTP. 230 The network emulator is a host running Fedora Core Linux 231 [http://fedoraproject.org/] with IP forwarding enabled and the NIST 232 Net emulator 2.0.12b [http://snad.ncsl.nist.gov/nistnet/] loaded and 233 operating. 235 The links between NetProbe hosts and the NIST Net emulator host were 236 100baseTx-FD (100Mbps full duplex) as reported by "mii-tool", except 237 as noted below. 239 For these tests, a stream of at least 30 packets were sent from 240 Source to Destination in each implementation. Periodic streams (as 241 per [RFC3432]) with 1 second spacing were used, except as noted. 243 These examples do not entirely avoid the problem of declaring 244 equivalence with a statistical test, but the lab conditions should 245 simplify the problem by removing as much variability as possible. 247 Note that there are only five instances of the requirement term 248 "MUST" in [RFC2679] outside of the boilerplate and [RFC2119] 249 reference. 251 3.1. One-way Delay, Loss threshold, RFC 2679 253 This test determines if implementations use the same configured 254 maximum waiting time delay from one measurement to another under 255 different delay conditions, and correctly declare packets arriving in 256 excess of the waiting time threshold as lost. 258 See Section 3.5 of [RFC2679], 3rd bullet point and also Section 3.8.2 259 of [RFC2679]. 261 1. configure a path with 1 sec one-way constant delay 263 2. measure (average) one-way delay with 2 or more implementations, 264 using identical waiting time thresholds for loss set at 2 seconds 266 3. configure the path with 3 sec one-way delay (or change the path 267 delay while test is in progress, when there are sufficient 268 packets at the first delay setting) 270 4. repeat/continue measurements 272 5. observe that the increase measured in step 4 caused all packets 273 with 3 sec delay to be declared lost, and that all packets that 274 arrive successfully in step 2 are assigned a valid one-way delay. 276 3.1.1. NetProbe Lab results for Loss Threshold 278 In NetProbe, the Loss Threshold is implemented uniformly over all 279 packets as a post-processing routine. With the Loss Threshold set at 280 2 seconds, all packets with one-way delay >2 seconds are marked 281 "Lost" and included in the Lost Packet list with their transmission 282 time (as required in Section 3.3 of [RFC2680]). 22 of 38 packets were 283 declared lost. 285 3.1.2. XXX Lab Results for Loss Threshold 287 >>> Comment: this section is a placeholder 289 3.1.3. Conclusions on Lab Results for Loss Threshold 291 >>> Comment: this section is a placeholder 293 3.2. One-way Delay, First-bit to Last bit, RFC 2679 295 This test determines if implementations register the same relative 296 increase in delay from one measurement to another under different 297 delay conditions. This test tends to cancel the sources of error 298 which may be present in an implementation. 300 See Section 3.7.2 of [RFC2679], and Section 10.2 of [RFC2330]. 302 1. configure a path with X ms one-way constant delay, and ideally 303 including a low-speed link 305 2. measure (average) one-way delay with 2 or more implementations, 306 using identical options and equal size small packets (e.g., 100 307 octet IP payload) 309 3. maintain the same path with X ms one-way delay 311 4. measure (average) one-way delay with 2 or more implementations, 312 using identical options and equal size large packets (e.g., 1500 313 octet IP payload) 315 5. observe that the increase measured in steps 2 and 4 is equivalent 316 to the increase in ms expected due to the larger serialization 317 time for each implementation. Most of the measurement errors in 318 each system should cancel, if they are stationary. 320 3.2.1. NetProbe Lab results for Serialization 322 For this test only, the link between the NetProbe Source host and the 323 NIST Net emulator host was changed to 10baseT-FD (10Mbps full duplex) 324 as configured by "mii-tool". 326 The value of X = 1000 ms was used in the NIST Net emulator. 328 When the UDP payload size was increased from 32 octets to 1400 329 octets, the NIST Net emulator exhibited a bi-modal delay 330 distribution. Investigation confirmed that the NetProbe 331 implementations tested did not exhibit bi-modal delay on an alternate 332 (network management) path. 334 1400 byte payload 32 byte payload 335 Delay for each mode (one mode) Delay Diff Expected Diff 336 microseconds microseconds microseconds microseconds 337 1001621 1000356 1265 1094.4 338 1002735 1000356 2379 1094.4 340 Average Delay over 60 packets for different payload sizes with Delay 341 computations and comparison with expected delay difference for 342 serialization. 344 For the lower-delay mode, the Delay Difference between payload sizes 345 is about 170 microseconds higher than expected. However, it is clear 346 that delay increased with a larger payload as expected when the 347 measurement is conducted First-bit to Last-bit and includes 348 serialization time. 350 The higher mode appears on almost every other packet in the stream, 351 and comments are sought on possible configuration changes that would 352 remove this bi-modal behavior without significant sacrifices in other 353 dimensions of performance. 355 UPDATE: Additional investigation appears to conclude that the modal 356 behavior is related to interrupt-to-frame arrival settings of the 357 specific interface board. Various options appear to be configurable, 358 but only when the interface driver is compiled as a module. Also, 359 the board/driver does not support the "coalesce" options of ethtool. 360 Until we can rebuild the Linux machine with this and other planned 361 modifications, confirmation will have to wait. 363 3.3. One-way Delay, Difference Sample Metric (Lab) 365 This test determines if implementations register the same relative 366 increase in delay from one measurement to another under different 367 delay conditions. This test tends to cancel the sources of error 368 which may be present in an implementation. 370 This test is intended to evaluate measurements in sections 3 and 4 of 371 [RFC2679]. 373 1. configure a path with X ms one-way constant delay 375 2. measure (average) one-way delay with 2 or more implementations, 376 using identical options 378 3. configure the path with X+Y ms one-way delay 380 4. repeat measurements 381 5. observe that the (average) increase measured in steps 2 and 4 is 382 ~Y ms for each implementation. Most of the measurement errors in 383 each system should cancel, if they are stationary. 385 3.3.1. NetProbe Lab results for Differential Delay 387 In this test, X=1000ms and Y=2000ms. 389 Average pre-increase delay, microseconds 1000276.6 390 Average post 2s additional, microseconds 3000282.6 391 Difference (should be ~= Y = 2s) 2000006 393 Average delays before/after 2 second increase 395 The NetProbe implementation exhibited a 2 second increase with a 6 396 microsecond error (assuming that the NIST Net emulated delay 397 difference is exact). 399 3.4. One-way Delay, ADK Sample Metric (Lab) 401 This test determines if implementations produce results that appear 402 to come from the same delay distribution. In addition, same- 403 implementation results help to set the threshold of equivalence that 404 will be applied to cross-implementation comparisons. 406 This test is intended to evaluate measurements in sections 3 and 4 of 407 [RFC2679]. 409 1. Configure a path with X ms one-way constant delay. 411 2. Measure a sample of one-way delay singletons with 2 or more 412 implementations, using identical options. 414 3. Measure a sample of one-way delay singletons with additional 415 instances of the *same* implementations, using identical options, 416 noting that connectivity differences MUST be the same as for the 417 cross implementation testing. 419 4. Apply the ADK comparison procedures (see Appendix C of 420 [metricstest]) and determine the resolution and confidence factor 421 for distribution equivalence of each same-implementation 422 comparison and each cross-implementation comparison. 424 5. Take the largest resolution and confidence factor for 425 distribution equivalence from the same-implementation pairs as 426 the equivalence threshold for these experimental conditions. >>> 427 Question: do we need to account for additional cross- 428 implementation error? How much? 430 6. Compare the cross-implementation ADK performance with the 431 equivalence threshold determined in step 4 to determine if 432 equivalence can be declared. 434 3.4.1. NetProbe Lab results for ADK 436 To be provided, the same-implementation lab tests have been 437 completed, but the analysis was not ready in time for publication. 439 ADK Results for same-implementation 441 3.5. Error Calibration, RFC 2679 443 This is a simple check to determine if an implementation reports the 444 error calibration as required in Section 4.8 of [RFC2679]. Note that 445 the context (Type-P) must also be reported. 447 3.5.1. Net Probe Error and Type-P 449 NetProbe error is dependent on the specific version and installation 450 details, and was discussed briefly above. 452 Type-P for this test was IP-UDP with Best Effort DCSP. 454 4. Notes on Network Emulator Loss Generation 456 While network emulators can be expect to generate independent random 457 loss, it is well-understood that real loss tends to be correlated to 458 some extent. 460 NistNet and many earlier and current network emulators use the same 461 effective function to generate correlated values for delay and 462 correlated values for comparison with a loss threshold. The 463 correlation relationship in many emulator descriptions takes the 464 following form: 466 Corr_value = Last_value * corr_coeff + New_value * (1-corr_coeff) 468 where: 470 o New_value is the random value from some distribution 472 o Last_value is the result of this equation for the previous packet 474 o corr_coeff is the correlation coefficient, [+1, -1] 475 o Corr_value is the revised random value with correlation 477 This seems to work adequately for delay, as seen in [NistNet]. 478 However, it does not appear to be possible to produce long loss 479 bursts with low probability using this equation. We note that a 480 somewhat more complicated relationship is implemented in the NistNet 481 code, and avoids range violations that may be possible with 482 correlations at the end of range. 484 Investigation of similar, but alternative relationship to generate 485 loss bursts has begun as part of this effort, and a candidate 486 equation has been developed. Integration with an existing emulator 487 is in-progress. 489 It bears note that some network emulators can produce deterministic 490 loss durations in time and/or in lost packets, but the frequent 491 appearance of the relationship above is disturbing, given its poor 492 ability to produce burst loss, as far as existing tests show. 494 5. Security Considerations 496 There are no security issues raised by discussing the topic of metric 497 RFC advancement along the standards track. 499 The security considerations that apply to any active measurement of 500 live networks are relevant here as well. See [RFC4656] and 501 [RFC5357]. 503 6. IANA Considerations 505 This memo makes no requests of IANA, and hopes that IANA will leave 506 it alone, as well. 508 7. Acknowledgements 510 The author would like to thank Len Ciavattone for continued 511 consultations on the laboratory aspects of this work, and Yaakov 512 Stein for a useful discussion on the bi-modal delay behavior observed 513 in the Linux-based router and network emulator used here. 515 8. Normative References 517 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 518 3", BCP 9, RFC 2026, October 1996. 520 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 521 Requirement Levels", BCP 14, RFC 2119, March 1997. 523 [RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis, 524 "Framework for IP Performance Metrics", RFC 2330, 525 May 1998. 527 [RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 528 Delay Metric for IPPM", RFC 2679, September 1999. 530 [RFC2680] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 531 Packet Loss Metric for IPPM", RFC 2680, September 1999. 533 [RFC3432] Raisanen, V., Grotefeld, G., and A. Morton, "Network 534 performance measurement with periodic streams", RFC 3432, 535 November 2002. 537 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 538 Zekauskas, "A One-way Active Measurement Protocol 539 (OWAMP)", RFC 4656, September 2006. 541 [RFC4814] Newman, D. and T. Player, "Hash and Stuffing: Overlooked 542 Factors in Network Device Benchmarking", RFC 4814, 543 March 2007. 545 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 546 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 547 May 2008. 549 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 550 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 551 RFC 5357, October 2008. 553 [RFC5657] Dusseault, L. and R. Sparks, "Guidance on Interoperation 554 and Implementation Reports for Advancement to Draft 555 Standard", BCP 9, RFC 5657, September 2009. 557 Author's Address 559 Al Morton 560 AT&T Labs 561 200 Laurel Avenue South 562 Middletown,, NJ 07748 563 USA 565 Phone: +1 732 420 1571 566 Fax: +1 732 368 1192 567 Email: acmorton@att.com 568 URI: http://home.comcast.net/~acmacm/