idnits 2.17.1 draft-ietf-ippm-testplan-rfc2680-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. == There are 7 instances of lines with non-RFC2606-compliant FQDNs in the document. == There are 11 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 829 has weird spacing: '...1] and pexp...' == Line 837 has weird spacing: '...1] and pexp...' == Line 845 has weird spacing: '...1] and pexp...' == Line 853 has weird spacing: '...1] and pexp...' == Line 887 has weird spacing: '...p1 and pexp...' == (5 more instances...) == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 17, 2013) is 4058 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'AS 7018' is mentioned on line 363, but not defined == Missing Reference: 'AS 3320' is mentioned on line 367, but not defined == Unused Reference: 'RFC2679' is defined on line 1017, but no explicit reference was found in the text == Unused Reference: 'RFC4814' is defined on line 1035, but no explicit reference was found in the text == Unused Reference: 'RFC5226' is defined on line 1039, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2679 (Obsoleted by RFC 7679) ** Obsolete normative reference: RFC 2680 (Obsoleted by RFC 7680) ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) Summary: 4 errors (**), 0 flaws (~~), 15 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group L. Ciavattone 3 Internet-Draft AT&T Labs 4 Intended status: Informational R. Geib 5 Expires: August 21, 2013 Deutsche Telekom 6 A. Morton 7 AT&T Labs 8 M. Wieser 9 Technical University Darmstadt 10 February 17, 2013 12 Test Plan and Results for Advancing RFC 2680 on the Standards Track 13 draft-ietf-ippm-testplan-rfc2680-02 15 Abstract 17 This memo proposes to advance a performance metric RFC along the 18 standards track, specifically RFC 2680 on One-way Loss Metrics. 19 Observing that the metric definitions themselves should be the 20 primary focus rather than the implementations of metrics, this memo 21 describes the test procedures to evaluate specific metric requirement 22 clauses to determine if the requirement has been interpreted and 23 implemented as intended. Two completely independent implementations 24 have been tested against the key specifications of RFC 2680. 26 In this version, the results are presented in the R-tool output form. 27 Beautification is future work. 29 Requirements Language 31 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 32 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 33 document are to be interpreted as described in RFC 2119 [RFC2119]. 35 Status of this Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at http://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on August 21, 2013. 51 Copyright Notice 53 Copyright (c) 2013 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 This document may contain material from IETF Documents or IETF 67 Contributions published or made publicly available before November 68 10, 2008. The person(s) controlling the copyright in some of this 69 material may not have granted the IETF Trust the right to allow 70 modifications of such material outside the IETF Standards Process. 71 Without obtaining an adequate license from the person(s) controlling 72 the copyright in such materials, this document may not be modified 73 outside the IETF Standards Process, and derivative works of it may 74 not be created outside the IETF Standards Process, except to format 75 it for publication as an RFC or to translate it into languages other 76 than English. 78 Table of Contents 80 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 81 1.1. RFC 2680 Coverage . . . . . . . . . . . . . . . . . . . . 5 82 2. A Definition-centric metric advancement process . . . . . . . 5 83 3. Test configuration . . . . . . . . . . . . . . . . . . . . . . 6 84 4. Error Calibration, RFC 2680 . . . . . . . . . . . . . . . . . 10 85 4.1. Clock Synchronization Calibration . . . . . . . . . . . . 10 86 4.2. Packet Loss Determination Error . . . . . . . . . . . . . 10 87 5. Pre-determined Limits on Equivalence . . . . . . . . . . . . . 11 88 6. Tests to evaluate RFC 2680 Specifications . . . . . . . . . . 12 89 6.1. One-way Loss, ADK Sample Comparison . . . . . . . . . . . 12 90 6.1.1. 340B/Periodic Cross-imp. results . . . . . . . . . . . 13 91 6.1.2. 64B/Periodic Cross-imp. results . . . . . . . . . . . 14 92 6.1.3. 64B/Poisson Cross-imp. results . . . . . . . . . . . . 15 93 6.1.4. Conclusions on the ADK Results for One-way Packet 94 Loss . . . . . . . . . . . . . . . . . . . . . . . . . 16 95 6.2. One-way Loss, Delay threshold . . . . . . . . . . . . . . 16 96 6.2.1. NetProbe results for Loss Threshold . . . . . . . . . 17 97 6.2.2. Perfas Results for Loss Threshold . . . . . . . . . . 18 98 6.2.3. Conclusions for Loss Threshold . . . . . . . . . . . . 18 99 6.3. One-way Loss with Out-of-Order Arrival . . . . . . . . . . 18 100 6.4. Poisson Sending Process Evaluation . . . . . . . . . . . . 19 101 6.4.1. NetProbe Results . . . . . . . . . . . . . . . . . . . 20 102 6.4.2. Perfas Results . . . . . . . . . . . . . . . . . . . . 21 103 6.4.3. Conclusions for Goodness-of-Fit . . . . . . . . . . . 23 104 6.5. Implementation of Statistics for One-way Delay . . . . . . 23 105 7. Conclusions for RFC 2680bis . . . . . . . . . . . . . . . . . 23 106 8. Security Considerations . . . . . . . . . . . . . . . . . . . 24 107 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 108 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 24 109 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 110 11.1. Normative References . . . . . . . . . . . . . . . . . . . 24 111 11.2. Informative References . . . . . . . . . . . . . . . . . . 26 112 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26 114 1. Introduction 116 The IETF (IP Performance Metrics working group, IPPM) has considered 117 how to advance their metrics along the standards track since 2001. 119 A renewed work effort sought to investigate ways in which the 120 measurement variability could be reduced and thereby simplify the 121 problem of comparison for equivalence. 123 There is consensus [RFC6576] that the metric definitions should be 124 the primary focus of evaluation rather than the implementations of 125 metrics, and equivalent results are deemed to be evidence that the 126 metric specifications are clear and unambiguous. This is the metric 127 specification equivalent of protocol interoperability. The 128 advancement process either produces confidence that the metric 129 definitions and supporting material are clearly worded and 130 unambiguous, OR, identifies ways in which the metric definitions 131 should be revised to achieve clarity. 133 The process should also permit identification of options that were 134 not implemented, so that they can be removed from the advancing 135 specification (this is an aspect more typical of protocol advancement 136 along the standards track). 138 This memo's purpose is to implement the current approach for 139 [RFC2680]. 141 In particular, this memo documents consensus on the extent of 142 tolerable errors when assessing equivalence in the results. In 143 discussions, the IPPM working group agreed that test plan and 144 procedures should include the threshold for determining equivalence, 145 and this information should be available in advance of cross- 146 implementation comparisons. This memo includes procedures for same- 147 implementation comparisons to help set the equivalence threshold. 149 Another aspect of the metric RFC advancement process is the 150 requirement to document the work and results. The procedures of 151 [RFC2026] are expanded in[RFC5657], including sample implementation 152 and interoperability reports. This memo follows the template in 153 [I-D.morton-ippm-advance-metrics] for the report that accompanies the 154 protocol action request submitted to the Area Director, including 155 description of the test set-up, procedures, results for each 156 implementation and conclusions. 158 Although the conclusion reached through testing is that [RFC2680] 159 should be advanced on the Standards Track with modifications, the 160 revised text of RFC 2680bis is not yet ready for review. Therefore, 161 this memo documents the information to support [RFC2680] advancement, 162 and the approval of RFC2680bis is left for future action. 164 1.1. RFC 2680 Coverage 166 This plan is intended to cover all critical requirements and sections 167 of [RFC2680]. 169 Note that there are only five instances of the requirement term 170 "MUST" in [RFC2680] outside of the boilerplate and [RFC2119] 171 reference. 173 Material may be added as it is "discovered" (apparently, not all 174 requirements use requirements language). 176 2. A Definition-centric metric advancement process 178 The process described in Section 3.5 of [RFC6576] takes as a first 179 principle that the metric definitions, embodied in the text of the 180 RFCs, are the objects that require evaluation and possible revision 181 in order to advance to the next step on the standards track. 183 IF two implementations do not measure an equivalent singleton or 184 sample, or produce the an equivalent statistic, 186 AND sources of measurement error do not adequately explain the lack 187 of agreement, 189 THEN the details of each implementation should be audited along with 190 the exact definition text, to determine if there is a lack of clarity 191 that has caused the implementations to vary in a way that affects the 192 correspondence of the results. 194 IF there was a lack of clarity or multiple legitimate interpretations 195 of the definition text, 197 THEN the text should be modified and the resulting memo proposed for 198 consensus and advancement along the standards track. 200 Finally, all the findings MUST be documented in a report that can 201 support advancement on the standards track, similar to those 202 described in [RFC5657]. The list of measurement devices used in 203 testing satisfies the implementation requirement, while the test 204 results provide information on the quality of each specification in 205 the metric RFC (the surrogate for feature interoperability). 207 3. Test configuration 209 One metric implementation used was NetProbe version 5.8.5, (an 210 earlier version is used in the WIPM system and deployed world-wide 211 [WIPM]). NetProbe uses UDP packets of variable size, and can produce 212 test streams with Periodic [RFC3432] or Poisson [RFC2330] sample 213 distributions. 215 The other metric implementation used was Perfas+ version 3.1, 216 developed by Deutsche Telekom [Perfas]. Perfas+ uses UDP unicast 217 packets of variable size (but supports also TCP and multicast). Test 218 streams with periodic, Poisson or uniform sample distributions may be 219 used. 221 Figure 1 shows a view of the test path as each Implementation's test 222 flows pass through the Internet and the L2TPv3 tunnel IDs (1 and 2), 223 based on Figure 1 of [RFC6576]. 225 +----+ +----+ +----+ +----+ 226 |Imp1| |Imp1| ,---. |Imp2| |Imp2| 227 +----+ +----+ / \ +-------+ +----+ +----+ 228 | V100 | V200 / \ | Tunnel| | V300 | V400 229 | | ( ) | Head | | | 230 +--------+ +------+ | |__| Router| +----------+ 231 |Ethernet| |Tunnel| |Internet | +---B---+ |Ethernet | 232 |Switch |--|Head |-| | | |Switch | 233 +-+--+---+ |Router| | | +---+---+--+--+--+----+ 234 |__| +--A---+ ( ) |Network| |__| 235 \ / |Emulat.| 236 U-turn \ / |"netem"| U-turn 237 V300 to V400 `-+-' +-------+ V100 to V200 239 Implementations ,---. +--------+ 240 +~~~~~~~~~~~/ \~~~~~~| Remote | 241 +------->-----F2->-| / \ |->---. | 242 | +---------+ | Tunnel ( ) | | | 243 | | transmit|-F1->-| ID 1 ( ) |->. | | 244 | | Imp 1 | +~~~~~~~~~| |~~~~| | | | 245 | | receive |-<--+ ( ) | F1 F2 | 246 | +---------+ | |Internet | | | | | 247 *-------<-----+ F1 | | | | | | 248 +---------+ | | +~~~~~~~~~| |~~~~| | | | 249 | transmit|-* *-| | | |<-* | | 250 | Imp 2 | | Tunnel ( ) | | | 251 | receive |-<-F2-| ID 2 \ / |<----* | 252 +---------+ +~~~~~~~~~~~\ /~~~~~~| Switch | 253 `-+-' +--------+ 255 Illustrations of a test setup with a bi-directional tunnel. The 256 upper diagram emphasizes the VLAN connectivity and geographical 257 location. The lower diagram shows example flows traveling between 258 two measurement implementations (for simplicity, only two flows are 259 shown). 261 Figure 1 263 The testing employs the Layer 2 Tunnel Protocol, version 3 (L2TPv3) 264 [RFC3931] tunnel between test sites on the Internet. The tunnel IP 265 and L2TPv3 headers are intended to conceal the test equipment 266 addresses and ports from hash functions that would tend to spread 267 different test streams across parallel network resources, with likely 268 variation in performance as a result. 270 At each end of the tunnel, one pair of VLANs encapsulated in the 271 tunnel are looped-back so that test traffic is returned to each test 272 site. Thus, test streams traverse the L2TP tunnel twice, but appear 273 to be one-way tests from the test equipment point of view. 275 The network emulator is a host running Fedora 14 Linux 276 [http://fedoraproject.org/] with IP forwarding enabled and the 277 "netem" Network emulator as part of the Fedora Kernel 2.6.35.11 [http 278 ://www.linuxfoundation.org/collaborate/workgroups/networking/netem] 279 loaded and operating. Connectivity across the netem/Fedora host was 280 accomplished by bridging Ethernet VLAN interfaces together with 281 "brctl" commands (e.g., eth1.100 <-> eth2.100). The netem emulator 282 was activated on one interface (eth1) and only operates on test 283 streams traveling in one direction. In some tests, independent netem 284 instances operated separately on each VLAN. 286 The links between the netem emulator host and router and switch were 287 found to be 100baseTx-HD (100Mbps half duplex) as reported by "mii- 288 tool"when the testing was complete. Use of Half Duplex was not 289 intended, but probably added a small amount of delay variation that 290 could have been avoided in full duplex mode. 292 Each individual test was run with common packet rates (1 pps, 10pps) 293 Poisson/Periodic distributions, and IP packet sizes of 64, 340, and 294 500 Bytes. 296 For these tests, a stream of at least 300 packets were sent from 297 Source to Destination in each implementation. Periodic streams (as 298 per [RFC3432]) with 1 second spacing were used, except as noted. 300 As required in Section 2.8.1 of [RFC2680], packet Type-P must be 301 reported. The packet Type-P for this test was IP-UDP with Best 302 Effort DCSP. These headers were encapsulated according to the L2TPv3 303 specifications [RFC3931], and thus may not influence the treatment 304 received as the packets traversed the Internet. 306 With the L2TPv3 tunnel in use, the metric name for the testing 307 configured here (with respect to the IP header exposed to Internet 308 processing) is: 310 Type-IP-protocol-115-One-way-Packet-Loss--Stream 312 With (Section 3.2. [RFC2680]) Metric Parameters: 314 + Src, the IP address of a host (12.3.167.16 or 193.159.144.8) 316 + Dst, the IP address of a host (193.159.144.8 or 12.3.167.16) 318 + T0, a time 319 + Tf, a time 321 + lambda, a rate in reciprocal seconds 323 + Thresh, a maximum waiting time in seconds (see Section 2.8.2 of 324 [RFC2680]) and (Section 3.8. [RFC2680]) 326 Metric Units: A sequence of pairs; the elements of each pair are: 328 + T, a time, and 330 + L, either a zero or a one 332 The values of T in the sequence are monotonic increasing. Note that 333 T would be a valid parameter to the *singleton* Type-P-One-way- 334 Packet-Loss, and that L would be a valid value of Type-P-One-way- 335 Packet Loss (see Section 2 of [RFC2680]). 337 Also, Section 2.8.4 of [RFC2680] recommends that the path SHOULD be 338 reported. In this test set-up, most of the path details will be 339 concealed from the implementations by the L2TPv3 tunnels, thus a more 340 informative path trace route can be conducted by the routers at each 341 location. 343 When NetProbe is used in production, a traceroute is conducted in 344 parallel at the outset of measurements. 346 Perfas+ does not support traceroute. 348 IPLGW#traceroute 193.159.144.8 350 Type escape sequence to abort. 351 Tracing the route to 193.159.144.8 353 1 12.126.218.245 [AS 7018] 0 msec 0 msec 4 msec 354 2 cr84.n54ny.ip.att.net (12.123.2.158) [AS 7018] 4 msec 4 msec 355 cr83.n54ny.ip.att.net (12.123.2.26) [AS 7018] 4 msec 356 3 cr1.n54ny.ip.att.net (12.122.105.49) [AS 7018] 4 msec 357 cr2.n54ny.ip.att.net (12.122.115.93) [AS 7018] 0 msec 358 cr1.n54ny.ip.att.net (12.122.105.49) [AS 7018] 0 msec 359 4 n54ny02jt.ip.att.net (12.122.80.225) [AS 7018] 4 msec 0 msec 360 n54ny02jt.ip.att.net (12.122.80.237) [AS 7018] 4 msec 361 5 192.205.34.182 [AS 7018] 0 msec 362 192.205.34.150 [AS 7018] 0 msec 363 192.205.34.182 [AS 7018] 4 msec 364 6 da-rg12-i.DA.DE.NET.DTAG.DE (62.154.1.30) [AS 3320] 88 msec 88 msec 365 88 msec 366 7 217.89.29.62 [AS 3320] 88 msec 88 msec 88 msec 367 8 217.89.29.55 [AS 3320] 88 msec 88 msec 88 msec 368 9 * * * 370 It was only possible to conduct the traceroute for the measured path 371 on one of the tunnel-head routers (the normal trace facilities of the 372 measurement systems are confounded by the L2TPv3 tunnel 373 encapsulation). 375 4. Error Calibration, RFC 2680 377 An implementation is required to report calibration results on clock 378 synchronization in Section 2.8.3 of [RFC2680] (also required in 379 Section 3.7 of [RFC2680] for sample metrics). 381 Also, it is recommended to report the probability that a packet 382 successfully arriving at the destination network interface is 383 incorrectly designated as lost due to resource exhaustion in Section 384 2.8.3 of [RFC2680]. 386 4.1. Clock Synchronization Calibration 388 For NetProbe and Perfas+ clock synchronization test results, refer to 389 Section 4 of [RFC6808]. 391 4.2. Packet Loss Determination Error 393 Since both measurement implementations have resource limitations, it 394 is theoretically possible that these limits could be exceeded and a 395 packet that arrived at the destination successfully might be 396 discarded in error. 398 In previous test efforts [I-D.morton-ippm-advance-metrics], NetProbe 399 produced 6 multicast streams with an aggregate bit rate over 53 400 Mbit/s, in order to characterize the 1-way capacity of a NISTNet- 401 based emulator. Neither the emulator nor the pair of NetProbe 402 implementations used in this testing dropped any packets in these 403 streams. 405 The maximum load used here between any 2 NetProbe implementations was 406 be 11.5 Mbit/s divided equally among 3 unicast test streams. We 407 conclude that steady resource usage does not contribute error 408 (additional loss) to the measurements. 410 5. Pre-determined Limits on Equivalence 412 In this section, we provide the numerical limits on comparisons 413 between implementations, in order to declare that the results are 414 equivalent and therefore, the tested specification is clear. 416 A key point is that the allowable errors, corrections, and confidence 417 levels only need to be sufficient to detect mis-interpretation of the 418 tested specification resulting in diverging implementations. 420 Also, the allowable error must be sufficient to compensate for 421 measured path differences. It was simply not possible to measure 422 fully identical paths in the VLAN-loopback test configuration used, 423 and this practical compromise must be taken into account. 425 For Anderson-Darling K-sample (ADK) [ADK] comparisons, the required 426 confidence factor for the cross-implementation comparisons SHALL be 427 the smallest of: 429 o 0.95 confidence factor at 1 packet resolution, or 431 o the smallest confidence factor (in combination with resolution) of 432 the two same-implementation comparisons for the same test 433 conditions (if the number of streams is sufficient to allow such 434 comparisons). 436 For Anderson-Darling Goodness-of-Fit (ADGoF) [Radgof] comparisons, 437 the required level of significance for the same-implementation 438 Goodness-of-Fit (GoF) SHALL be 0.05 or 5%, as specified in Section 439 11.4 of [RFC2330]. This is equivalent to a 95% confidence factor. 441 6. Tests to evaluate RFC 2680 Specifications 443 This section describes some results from production network (cross- 444 Internet) tests with measurement devices implementing IPPM metrics 445 and a network emulator to create relevant conditions, to determine 446 whether the metric definitions were interpreted consistently by 447 implementors. 449 The procedures are similar contained in Appendix A.1 of [RFC6576] for 450 One-way Delay. 452 6.1. One-way Loss, ADK Sample Comparison 454 This test determines if implementations produce results that appear 455 to come from a common packet loss distribution, as an overall 456 evaluation of Section 3 of [RFC2680], "A Definition for Samples of 457 One-way Packet Loss". Same-implementation comparison results help to 458 set the threshold of equivalence that will be applied to cross- 459 implementation comparisons. 461 This test is intended to evaluate measurements in sections 2, 3, and 462 4 of [RFC2680]. 464 By testing the extent to which the counts of one-way packet loss 465 counts on different test streams of two [RFC2680] implementations 466 appear to be from the same loss process, we reduce comparison steps 467 because comparing the resulting summary statistics (as defined in 468 Section 4 of [RFC2680]) would require a redundant set of equivalence 469 evaluations. We can easily check whether the single statistic in 470 Section 4 of [RFC2680] was implemented, and report on that fact. 472 1. Configure an L2TPv3 path between test sites, and each pair of 473 measurement devices to operate tests in their designated pair of 474 VLANs. 476 2. Measure a sample of one-way packet loss singletons with 2 or more 477 implementations, using identical options and network emulator 478 settings (if used). 480 3. Measure a sample of one-way packet loss singletons with *four or 481 more* instances of the *same* implementations, using identical 482 options, noting that connectivity differences SHOULD be the same 483 as for the cross implementation testing. 485 4. If less than ten test streams are available, skip to step 7. 487 5. Apply the ADK comparison procedures (see Appendix C of [RFC6576]) 488 and determine the resolution and confidence factor for 489 distribution equivalence of each same-implementation comparison 490 and each cross-implementation comparison. 492 6. Take the coarsest resolution and confidence factor for 493 distribution equivalence from the same-implementation pairs, or 494 the limit defined in Section 5 above, as a limit on the 495 equivalence threshold for these experimental conditions. 497 7. Compare the cross-implementation ADK performance with the 498 equivalence threshold determined in step 5 to determine if 499 equivalence can be declared. 501 The common parameters used for tests in this section are: 503 The cross-implementation comparison uses a simple ADK analysis 504 [Rtool] [Radk], where all NetProbe loss counts are compared with all 505 Perfas+ loss results. 507 In the result analysis of this section: 509 o All comparisons used 1 packet resolution. 511 o No Correction Factors were applied. 513 o The 0.95 confidence factor (1.960 for cross-implementation 514 comparison) was used. 516 6.1.1. 340B/Periodic Cross-imp. results 518 Tests described in this section used: 520 o IP header + payload = 340 octets 522 o Periodic sampling at 1 packet per second 524 o Test duration = 1200 seconds (during April 7, 2011, EDT) 526 The netem emulator was set for 100ms constant delay, with 10% loss 527 ratio. In this experiment, the netem emulator was configured to 528 operate independently on each VLAN and thus the emulator itself is a 529 potential source of error when comparing streams that traverse the 530 test path in different directions. 532 A07bps_loss <- c(114, 175, 138, 142, 181, 105) (NetProbe) 533 A07per_loss <- c(115, 128, 136, 127, 139, 138) (Perfas) 535 > A07bps_loss <- c(114, 175, 138, 142, 181, 105) 536 > A07per_loss <- c(115, 128, 136, 127, 139, 138) 537 > 538 > A07cross_loss_ADK <- adk.test(A07bps_loss, A07per_loss) 539 > A07cross_loss_ADK 540 Anderson-Darling k-sample test. 542 Number of samples: 2 543 Sample sizes: 6 6 544 Total number of values: 12 545 Number of unique values: 11 547 Mean of Anderson Darling Criterion: 1 548 Standard deviation of Anderson Darling Criterion: 0.6569 550 T = (Anderson Darling Criterion - mean)/sigma 552 Null Hypothesis: All samples come from a common population. 554 t.obs P-value extrapolation 555 not adj. for ties 0.52043 0.20604 0 556 adj. for ties 0.62679 0.18607 0 558 The cross-implementation comparisons pass the ADK criterion. 560 6.1.2. 64B/Periodic Cross-imp. results 562 Tests described in this section used: 564 o IP header + payload = 64 octets 566 o Periodic sampling at 1 packet per second 568 o Test duration = 300 seconds (during March 24, 2011, EDT) 570 The netem emulator was set for 0ms constant delay, with 10% loss 571 ratio. 573 > M24per_loss <- c(42,34,35,35) (Perfas) 574 > M24apd_23BC_loss <- c(27,39,29,24) (NetProbe) 575 > M24apd_loss23BC_ADK <- adk.test(M24apd_23BC_loss,M24per_loss) 576 > M24apd_loss23BC_ADK 577 Anderson-Darling k-sample test. 579 Number of samples: 2 580 Sample sizes: 4 4 581 Total number of values: 8 582 Number of unique values: 7 584 Mean of Anderson Darling Criterion: 1 585 Standard deviation of Anderson Darling Criterion: 0.60978 587 T = (Anderson Darling Criterion - mean)/sigma 589 Null Hypothesis: All samples come from a common population. 591 t.obs P-value extrapolation 592 not adj. for ties 0.76921 0.16200 0 593 adj. for ties 0.90935 0.14113 0 595 Warning: At least one sample size is less than 5. 596 p-values may not be very accurate. 598 The cross-implementation comparisons pass the ADK criterion. 600 6.1.3. 64B/Poisson Cross-imp. results 602 Tests described in this section used: 604 o IP header + payload = 64 octets 606 o Poisson sampling at lambda = 1 packet per second 608 o Test duration = 20 minutes (during April 27, 2011, EDT) 610 The netem configuration was 0ms delay and 10% loss, but there were 611 two passes through an emulator for each stream, and loss emulation 612 was present for 18 minutes of the 20 minute test . 614 A27aps_loss <- c(91,110,113,102,111,109,112,113) (NetProbe) 615 A27per_loss <- c(95,123,126,114) (Perfas) 617 A27cross_loss_ADK <- adk.test(A27aps_loss, A27per_loss) 619 > A27cross_loss_ADK 620 Anderson-Darling k-sample test. 622 Number of samples: 2 623 Sample sizes: 8 4 624 Total number of values: 12 625 Number of unique values: 11 627 Mean of Anderson Darling Criterion: 1 628 Standard deviation of Anderson Darling Criterion: 0.65642 630 T = (Anderson Darling Criterion - mean)/sigma 632 Null Hypothesis: All samples come from a common population. 634 t.obs P-value extrapolation 635 not adj. for ties 2.15099 0.04145 0 636 adj. for ties 1.93129 0.05125 0 638 Warning: At least one sample size is less than 5. 639 p-values may not be very accurate. 640 > 642 The cross-implementation comparisons barely pass the ADK criterion at 643 95% = 1.960 when adjusting for ties. 645 6.1.4. Conclusions on the ADK Results for One-way Packet Loss 647 We conclude that the two implementations are capable of producing 648 equivalent one-way packet loss measurements based on their 649 interpretation of [RFC2680] . 651 6.2. One-way Loss, Delay threshold 653 This test determines if implementations use the same configured 654 maximum waiting time delay from one measurement to another under 655 different delay conditions, and correctly declare packets arriving in 656 excess of the waiting time threshold as lost. 658 See Section 2.8.2 of [RFC2680]. 660 1. configure an L2TPv3 path between test sites, and each pair of 661 measurement devices to operate tests in their designated pair of 662 VLANs. 664 2. configure the network emulator to add 1.0 sec one-way constant 665 delay in one direction of transmission. 667 3. measure (average) one-way delay with 2 or more implementations, 668 using identical waiting time thresholds (Thresh) for loss set at 669 3 seconds. 671 4. configure the network emulator to add 3 sec one-way constant 672 delay in one direction of transmission equivalent to 2 seconds of 673 additional one-way delay (or change the path delay while test is 674 in progress, when there are sufficient packets at the first delay 675 setting) 677 5. repeat/continue measurements 679 6. observe that the increase measured in step 5 caused all packets 680 with 2 sec additional delay to be declared lost, and that all 681 packets that arrive successfully in step 3 are assigned a valid 682 one-way delay. 684 The common parameters used for tests in this section are: 686 o IP header + payload = 64 octets 688 o Poisson sampling at lambda = 1 packet per second 690 o Test duration = 900 seconds total (March 21) 692 The netem emulator was set to add constant delays as specified in the 693 procedure above. 695 6.2.1. NetProbe results for Loss Threshold 697 In NetProbe, the Loss Threshold is implemented uniformly over all 698 packets as a post-processing routine. With the Loss Threshold set at 699 3 seconds, all packets with one-way delay >3 seconds are marked 700 "Lost" and included in the Lost Packet list with their transmission 701 time (as required in Section 3.3 of [RFC2680]). This resulted in 342 702 packets designated as lost in one of the test streams (with average 703 delay = 3.091 sec). 705 6.2.2. Perfas Results for Loss Threshold 707 Perfas+ uses a fixed Loss Threshold which was not adjustable during 708 this study. The Loss Threshold is approximately one minute, and 709 emulation of a delay of this size was not attempted. However, it is 710 possible to implement any delay threshold desired with a post- 711 processing routine and subsequent analysis. Using this method, 195 712 packets would be declared lost (with average delay = 3.091 sec). 714 6.2.3. Conclusions for Loss Threshold 716 Both implementations assume that any constant delay value desired can 717 be used as the Loss Threshold, since all delays are stored as a pair 718 as required in [RFC2680]. This is a simple way to 719 enforce the constant loss threshold envisioned in [RFC2680] (see 720 specific section reference above). We take the position that the 721 assumption of post-processing is compliant, and that the text of the 722 RFC should be revised slightly to include this point. 724 6.3. One-way Loss with Out-of-Order Arrival 726 Section 3.6 of [RFC2680] indicates that implementations need to 727 ensure that reordered packets are handled correctly using an 728 uncapitalized "must". In essence, this is an implied requirement 729 because the correct packet must be identified as lost if it fails to 730 arrive before its delay threshold under all circumstances, and 731 reordering is always a possibility on IP network paths. See 732 [RFC4737] for the definition of reordering used in IETF standard- 733 compliant measurements. 735 Using the procedure of section 6.1, the netem emulator was set to 736 introduce significant delay (2000 ms) and delay variation (1000 ms), 737 which was sufficient to produce packet reordering because each 738 packet's emulated delay is independent from others, and 10% loss. 740 The tests described in this section used: 742 o IP header + payload = 64 octets 744 o Periodic sampling = 1 packet per second 746 o Test duration = 600 seconds (during May 2, 2011, EDT) 747 > Y02aps_loss <- c(53,45,67,55) (NetProbe) 748 > Y02per_loss <- c(59,62,67,69) (Perfas) 749 > Y02cross_loss_ADK <- adk.test(Y02aps_loss, Y02per_loss) 750 > Y02cross_loss_ADK 751 Anderson-Darling k-sample test. 753 Number of samples: 2 754 Sample sizes: 4 4 755 Total number of values: 8 756 Number of unique values: 7 758 Mean of Anderson Darling Criterion: 1 759 Standard deviation of Anderson Darling Criterion: 0.60978 761 T = (Anderson Darling Criterion - mean)/sigma 763 Null Hypothesis: All samples come from a common population. 765 t.obs P-value extrapolation 766 not adj. for ties 1.11282 0.11531 0 767 adj. for ties 1.19571 0.10616 0 769 Warning: At least one sample size is less than 5. 770 p-values may not be very accurate. 771 > 773 The test results indicate that extensive reordering was present. 774 Both implementations capture the extensive delay variation between 775 adjacent packets. In NetProbe, packet arrival order is preserved in 776 the raw measurement files, so an examination of arrival packet 777 sequence numbers also indicates reordering. 779 Despite extensive continuous packet reordering present in the 780 transmission path, the distributions of loss counts from the two 781 implementations pass the ADK criterion at 95% = 1.960. 783 6.4. Poisson Sending Process Evaluation 785 Section 3.7 of [RFC2680] indicates that implementations need to 786 ensure that their sending process is reasonably close to a classic 787 Poisson distribution when used. Much more detail on sample 788 distribution generation and Goodness-of-Fit testing is specified in 789 Section 11.4 of [RFC2330] and the Appendix of [RFC2330]. 791 In this section, each implementation's Poisson distribution is 792 compared with an idealistic version of the distribution available in 793 the base functionality of the R-tool for Statistical Analysis[Rtool], 794 and performed using the Anderson-Darling Goodness-of-Fit test package 795 (ADGofTest) [Radgof]. The Goodness-of-Fit criterion derived from 796 [RFC2330] requires a test statistic value AD <= 2.492 for 5% 797 significance. The Appendix of [RFC2330] also notes that there may be 798 difficulty satisfying the ADGofTest when the sample includes many 799 packets (when 8192 were used, the test always failed, but smaller 800 sets of the stream passed). 802 Both implementations were configured to produce Poisson distributions 803 with lambda = 1 packet per second. 805 6.4.1. NetProbe Results 807 Section 11.4 of [RFC2330] suggests three possible measurement points 808 to evaluate the Poisson distribution. The NetProbe analysis uses 809 "user-level timestamps made just before or after the system call for 810 transmitting the packet". 812 The statistical summary for two NetProbe streams is below: 814 > summary(a27ms$s1[2:1152]) 815 Min. 1st Qu. Median Mean 3rd Qu. Max. 816 0.0100 0.2900 0.6600 0.9846 1.3800 8.6390 817 > summary(a27ms$s2[2:1152]) 818 Min. 1st Qu. Median Mean 3rd Qu. Max. 819 0.010 0.280 0.670 0.979 1.365 8.829 821 We see that both the Means are near the specified lambda = 1. 823 The results of ADGoF tests for these two streams is shown below: 825 > ad.test( a27ms$s1[2:101], pexp, 1) 827 Anderson-Darling GoF Test 829 data: a27ms$s1[2:101] and pexp 830 AD = 0.8908, p-value = 0.4197 831 alternative hypothesis: NA 833 > ad.test( a27ms$s1[2:1001], pexp, 1) 835 Anderson-Darling GoF Test 837 data: a27ms$s1[2:1001] and pexp 838 AD = 0.9284, p-value = 0.3971 839 alternative hypothesis: NA 841 > ad.test( a27ms$s2[2:101], pexp, 1) 843 Anderson-Darling GoF Test 845 data: a27ms$s2[2:101] and pexp 846 AD = 0.3597, p-value = 0.8873 847 alternative hypothesis: NA 849 > ad.test( a27ms$s2[2:1001], pexp, 1) 851 Anderson-Darling GoF Test 853 data: a27ms$s2[2:1001] and pexp 854 AD = 0.6913, p-value = 0.5661 855 alternative hypothesis: NA 857 We see that both 100 and 1000 packet sets from two different streams 858 (s1 and s2) all passed the AD <= 2.492 criterion. 860 6.4.2. Perfas Results 862 Section 11.4 of [RFC2330] suggests three possible measurement points 863 to evaluate the Poisson distribution. The Perfas+ analysis uses 864 "wire times for the packets as recorded using a packet filter". 865 However, due to limited access at the Perfas+ side of the test setup, 866 the captures were made after the Perfas+ streams traversed the 867 production network, adding a small amount of unwanted delay variation 868 to the wire times (and possibly error due to packet loss). 870 The statistical summary for two Perfas+ streams is below: 872 > summary(a27pe$p1) 873 Min. 1st Qu. Median Mean 3rd Qu. Max. 874 0.004 0.347 0.788 1.054 1.548 4.231 875 > summary(a27pe$p2) 876 Min. 1st Qu. Median Mean 3rd Qu. Max. 877 0.0010 0.2710 0.7080 0.9696 1.3740 7.1160 879 We see that both the Means are near the specified lambda = 1. 881 The results of ADGoF tests for these two streams is shown below: 883 > ad.test(a27pe$p1, pexp, 1 ) 885 Anderson-Darling GoF Test 887 data: a27pe$p1 and pexp 888 AD = 1.1364, p-value = 0.2930 889 alternative hypothesis: NA 891 > ad.test(a27pe$p2, pexp, 1 ) 893 Anderson-Darling GoF Test 895 data: a27pe$p2 and pexp 896 AD = 0.5041, p-value = 0.7424 897 alternative hypothesis: NA 899 > ad.test(a27pe$p1[1:100], pexp, 1 ) 901 Anderson-Darling GoF Test 903 data: a27pe$p1[1:100] and pexp 904 AD = 0.7202, p-value = 0.5419 905 alternative hypothesis: NA 907 > ad.test(a27pe$p1[101:193], pexp, 1 ) 909 Anderson-Darling GoF Test 911 data: a27pe$p1[101:193] and pexp 912 AD = 1.4046, p-value = 0.201 913 alternative hypothesis: NA 915 > ad.test(a27pe$p2[1:100], pexp, 1 ) 917 Anderson-Darling GoF Test 919 data: a27pe$p2[1:100] and pexp 920 AD = 0.4758, p-value = 0.7712 921 alternative hypothesis: NA 923 > ad.test(a27pe$p2[101:193], pexp, 1 ) 925 Anderson-Darling GoF Test 927 data: a27pe$p2[101:193] and pexp 928 AD = 0.3381, p-value = 0.9068 929 alternative hypothesis: NA 931 > 933 We see that both 193, 100, and 93 packet sets from two different 934 streams (p1 and p2) all passed the AD <= 2.492 criterion. 936 6.4.3. Conclusions for Goodness-of-Fit 938 Both NetProbe and Perfas+ implementations produce adequate Poisson 939 distributions when according to the Anderson-Darling Goodness-of-Fit 940 at the 5% significance (1-alpha = 0.05, or 95% confidence level). 942 6.5. Implementation of Statistics for One-way Delay 944 We check which statistics were implemented, and report on those 945 facts, noting that Section 4 of [RFC2680] does not specify the 946 calculations exactly, and gives only some illustrative examples. 948 NetProbe Perfas 950 4.1. Type-P-One-way-Packet-Loss-Average yes yes 951 (this is more commonly referred to as loss ratio) 953 Implementation of Section 4 Statistics 955 We note that implementations refer to this metric as a loss ratio, 956 and this is an area for likely revision of the text to make it more 957 consistent with wide-spread usage. 959 7. Conclusions for RFC 2680bis 961 This memo concludes that [RFC2680] should be advanced on the 962 standards track, and recommends the following edits to improve the 963 text (which are not deemed significant enough to affect maturity). 965 o Revise Type-P-One-way-Packet-Loss-Ave to Type-P-One-way-Delay- 966 Packet-Loss-Ratio 968 o Regarding implementation of the loss delay threshold (section 969 6.2), the assumption of post-processing is compliant, and the text 970 of RFC 2680bis should be revised slightly to include this point. 972 o The IETF has reached consensus on guidance for reporting metrics 973 in [RFC6703], and this memo should be referenced in RFC2680bis to 974 incorporate recent experience where appropriate. 976 We note that there are at least two Eratta on [RFC2680] and these 977 should be processed as part of the editing process. 979 8. Security Considerations 981 The security considerations that apply to any active measurement of 982 live networks are relevant here as well. See [RFC4656] and 983 [RFC5357]. 985 9. IANA Considerations 987 This memo makes no requests of IANA, and the authors hope that IANA 988 personnel will be able to use their valuable time in other worthwhile 989 pursuits. 991 10. Acknowledgements 993 The authors thank Lars Eggert for his continued encouragement to 994 advance the IPPM metrics during his tenure as AD Advisor. 996 Nicole Kowalski supplied the needed CPE router for the NetProbe side 997 of the test set-up, and graciously managed her testing in spite of 998 issues caused by dual-use of the router. Thanks Nicole! 1000 The "NetProbe Team" also acknowledges many useful discussions on 1001 statistical interpretation with Ganga Maguluri. 1003 11. References 1005 11.1. Normative References 1007 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 1008 3", BCP 9, RFC 2026, October 1996. 1010 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1011 Requirement Levels", BCP 14, RFC 2119, March 1997. 1013 [RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis, 1014 "Framework for IP Performance Metrics", RFC 2330, 1015 May 1998. 1017 [RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 1018 Delay Metric for IPPM", RFC 2679, September 1999. 1020 [RFC2680] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 1021 Packet Loss Metric for IPPM", RFC 2680, September 1999. 1023 [RFC3432] Raisanen, V., Grotefeld, G., and A. Morton, "Network 1024 performance measurement with periodic streams", RFC 3432, 1025 November 2002. 1027 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 1028 Zekauskas, "A One-way Active Measurement Protocol 1029 (OWAMP)", RFC 4656, September 2006. 1031 [RFC4737] Morton, A., Ciavattone, L., Ramachandran, G., Shalunov, 1032 S., and J. Perser, "Packet Reordering Metrics", RFC 4737, 1033 November 2006. 1035 [RFC4814] Newman, D. and T. Player, "Hash and Stuffing: Overlooked 1036 Factors in Network Device Benchmarking", RFC 4814, 1037 March 2007. 1039 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1040 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1041 May 2008. 1043 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 1044 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 1045 RFC 5357, October 2008. 1047 [RFC5657] Dusseault, L. and R. Sparks, "Guidance on Interoperation 1048 and Implementation Reports for Advancement to Draft 1049 Standard", BCP 9, RFC 5657, September 2009. 1051 [RFC6576] Geib, R., Morton, A., Fardid, R., and A. Steinmitz, "IP 1052 Performance Metrics (IPPM) Standard Advancement Testing", 1053 BCP 176, RFC 6576, March 2012. 1055 [RFC6703] Morton, A., Ramachandran, G., and G. Maguluri, "Reporting 1056 IP Network Performance Metrics: Different Points of View", 1057 RFC 6703, August 2012. 1059 [RFC6808] Ciavattone, L., Geib, R., Morton, A., and M. Wieser, "Test 1060 Plan and Results Supporting Advancement of RFC 2679 on the 1061 Standards Track", RFC 6808, December 2012. 1063 11.2. Informative References 1065 [ADK] Scholz, F. and M. Stephens, "K-sample Anderson-Darling 1066 Tests of fit, for continuous and discrete cases", 1067 University of Washington, Technical Report No. 81, 1068 May 1986. 1070 [I-D.morton-ippm-advance-metrics] 1071 Morton, A., "Lab Test Results for Advancing Metrics on the 1072 Standards Track", draft-morton-ippm-advance-metrics-02 1073 (work in progress), October 2010. 1075 [Perfas] Heidemann, C., "Qualitaet in IP-Netzen Messverfahren", 1076 published by ITG Fachgruppe, 2nd meeting 5.2.3 (NGN) http: 1077 //www.itg523.de/oeffentlich/01nov/ 1078 Heidemann_QOS_Messverfahren.pdf , November 2001. 1080 [RFC3931] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling 1081 Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005. 1083 [Radgof] Bellosta, C., "ADGofTest: Anderson-Darling Goodness-of-Fit 1084 Test. R package version 0.3.", http://cran.r-project.org/ 1085 web/packages/ADGofTest/index.html, December 2011. 1087 [Radk] Scholz, F., "adk: Anderson-Darling K-Sample Test and 1088 Combinations of Such Tests. R package version 1.0.", , 1089 2008. 1091 [Rtool] R Development Core Team, "R: A language and environment 1092 for statistical computing. R Foundation for Statistical 1093 Computing, Vienna, Austria. ISBN 3-900051-07-0, URL 1094 http://www.R-project.org/", , 2011. 1096 [WIPM] "AT&T Global IP Network", 1097 http://ipnetwork.bgtmo.ip.att.net/pws/index.html, 2012. 1099 Authors' Addresses 1101 Len Ciavattone 1102 AT&T Labs 1103 200 Laurel Avenue South 1104 Middletown, NJ 07748 1105 USA 1107 Phone: +1 732 420 1239 1108 Fax: 1109 Email: lencia@att.com 1110 URI: 1112 Ruediger Geib 1113 Deutsche Telekom 1114 Heinrich Hertz Str. 3-7 1115 Darmstadt, 64295 1116 Germany 1118 Phone: +49 6151 58 12747 1119 Email: Ruediger.Geib@telekom.de 1121 Al Morton 1122 AT&T Labs 1123 200 Laurel Avenue South 1124 Middletown, NJ 07748 1125 USA 1127 Phone: +1 732 420 1571 1128 Fax: +1 732 368 1192 1129 Email: acmorton@att.com 1130 URI: http://home.comcast.net/~acmacm/ 1132 Matthias Wieser 1133 Technical University Darmstadt 1134 Darmstadt, 1135 Germany 1137 Phone: 1138 Email: matthias_michael.wieser@stud.tu-darmstadt.de