idnits 2.17.1 draft-ietf-ippm-metrictest-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 2, 2010) is 5040 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 1239 == Unused Reference: 'RFC2680' is defined on line 914, but no explicit reference was found in the text == Unused Reference: 'RFC2681' is defined on line 917, but no explicit reference was found in the text == Unused Reference: 'RFC3931' is defined on line 924, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2330 ** Obsolete normative reference: RFC 2679 (Obsoleted by RFC 7679) ** Obsolete normative reference: RFC 2680 (Obsoleted by RFC 7680) ** Downref: Normative reference to an Informational RFC: RFC 4459 Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force R. Geib, Ed. 3 Internet-Draft Deutsche Telekom 4 Intended status: Standards Track A. Morton 5 Expires: January 3, 2011 AT&T Labs 6 R. Fardid 7 Cariden Technologies 8 A. Steinmitz 9 HS Fulda 10 July 2, 2010 12 IPPM standard advancement testing 13 draft-ietf-ippm-metrictest-00 15 Abstract 17 This document specifies tests to determine if multiple independent 18 instantiations of a performance metric RFC have implemented the 19 specifications in the same way. This is the performance metric 20 equivalent of interoperability, required to advance RFCs along the 21 standards track. Results from different implementations of metric 22 RFCs will be collected under the same underlying network conditions 23 and compared using state of the art statistical methods. The goal is 24 an evaluation of the metric RFC itself, whether its definitions are 25 clear and unambiguous to implementors and therefore a candidate for 26 advancement on the IETF standards track. 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on January 3, 2011. 45 Copyright Notice 47 Copyright (c) 2010 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 6 64 2. Basic idea . . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 3. Verification of conformance to a metric specification . . . . 8 66 3.1. Tests of an individual implementation against a metric 67 specification . . . . . . . . . . . . . . . . . . . . . . 8 68 3.2. Test setup resulting in identical live network testing 69 conditions . . . . . . . . . . . . . . . . . . . . . . . . 9 70 3.3. Tests of two or more different implementations against 71 a metric specification . . . . . . . . . . . . . . . . . . 14 72 3.4. Clock synchronisation . . . . . . . . . . . . . . . . . . 14 73 3.5. Recommended Metric Verification Measurement Process . . . 15 74 3.6. Miscellaneous . . . . . . . . . . . . . . . . . . . . . . 19 75 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 76 5. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 19 77 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 78 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 79 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 80 8.1. Normative References . . . . . . . . . . . . . . . . . . . 20 81 8.2. Informative References . . . . . . . . . . . . . . . . . . 21 82 Appendix A. An example on a One-way Delay metric validation . . . 22 83 A.1. Compliance to Metric specification requirements . . . . . 22 84 A.2. Examples related to statistical tests for One-way Delay . 24 85 Appendix B. Anderson-Darling 2 sample C++ code . . . . . . . . . 25 86 Appendix C. Glossary . . . . . . . . . . . . . . . . . . . . . . 34 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 35 89 1. Introduction 91 The Internet Standards Process RFC2026 [RFC2026] requires that for a 92 IETF specification to advance beyond the Proposed Standard level, at 93 least two genetically unrelated implementations must be shown to 94 interoperate correctly with all features and options. This 95 requirement can be met by supplying: 97 o evidence that (at least a sub-set of) the specification has been 98 implemented by multiple parties, thus indicating adoption by the 99 IETF community and the extent of feature coverage. 101 o evidence that each feature of the specification is sufficiently 102 well-described to support interoperability, as demonstrated 103 through testing and/or user experience with deployment. 105 In the case of a protocol specification, the notion of 106 "interoperability" is reasonably intuitive - the implementations must 107 successfully "talk to each other", while exercising all features and 108 options. To achieve interoperability, two implementors need to 109 interpret the protocol specifications in equivalent ways. In the 110 case of IP Performance Metrics (IPPM), this definition of 111 interoperability is only useful for test and control protocols like 112 the One-Way Active Measurement Protocol, OWAMP [RFC4656], and the 113 Two-Way Active Measurement Protocol, TWAMP [RFC5357]. 115 A metric specification RFC describes one or more metric definitions, 116 methods of measurement and a way to report the results of 117 measurement. One example would be a way to test and report the One- 118 way Delay that data packets incur while being sent from one network 119 location to another, One-way Delay Metric. 121 In the case of metric specifications, the conditions that satisfy the 122 "interoperability" requirement are less obvious, and there was a need 123 for IETF agreement on practices to judge metric specification 124 "interoperability" in the context of the IETF Standards Process. 125 This memo provides methods which should be suitable to evaluate 126 metric specifications for standards track advancement. The methods 127 proposed here MAY be generally applicable to metric specification 128 RFCs beyond those developed under the IPPM Framework [RFC2330]. 130 Since many implementations of IP metrics are embedded in measurement 131 systems that do not interact with one another (they were built before 132 OWAMP and TWAMP), the interoperability evaluation called for in the 133 IETF standards process cannot be determined by observing that 134 independent implementations interact properly for various protocol 135 exchanges. Instead, verifying that different implementations give 136 statistically equivalent results under controlled measurement 137 conditions takes the place of interoperability observations. Even 138 when evaluating OWAMP and TWAMP RFCs for standards track advancement, 139 the methods described here are useful to evaluate the measurement 140 results because their validity would not be ascertained in typical 141 interoperability testing. 143 The standards advancement process aims at producing confidence that 144 the metric definitions and supporting material are clearly worded and 145 unambiguous, or reveals ways in which the metric definitions can be 146 revised to achieve clarity. The process also permits identification 147 of options that were not implemented, so that they can be removed 148 from the advancing specification. Thus, the product of this process 149 is information about the metric specification RFC itself: 150 determination of the specifications or definitions that are clear and 151 unambiguous and those that are not (as opposed to an evaluation of 152 the implementations which assist in the process). 154 This document defines a process to verify that implementations (or 155 practically, measurement systems) have interpreted the metric 156 specifications in equivalent ways, and produce equivalent results. 158 Testing for statistical equivalence requires ensuring identical test 159 setups (or awareness of differences) to the best possible extent. 160 Thus, producing identical test conditions is a core goal of the memo. 161 Another important aspect of this process is to test individual 162 implementations against specific requirements in the metric 163 specifications using customized tests for each requirement. These 164 tests can distinguish equivalent interpretations of each specific 165 requirement. 167 Conclusions on equivalence are reached by two measures. 169 First, implementations are compared against individual metric 170 specifications to make sure that differences in implementation are 171 minimised or at least known. 173 Second, a test setup is proposed ensuring identical networking 174 conditions so that unknowns are minimized and comparisons are 175 simplified. The resulting separate data sets may be seen as samples 176 taken from the same underlying distribution. Using state of the art 177 statistical methods, the equivalence of the results is verified. To 178 illustrate application of the process and methods defined here, 179 evaluation of the One-way Delay Metric [RFC2679] is provided in an 180 Appendix. While test setups will vary with the metrics to be 181 validated, the general methodology of determining equivalent results 182 will not. Documents defining test setups to evaluate other metrics 183 should be developed once the process proposed here has been agreed 184 and approved. 186 The metric RFC advancement process begins with a request for protocol 187 action accompanied by a memo that documents the supporting tests and 188 results. The procedures of [RFC2026] are expanded in[RFC5657], 189 including sample implementation and interoperability reports. 190 Section 3 of [morton-advance-metrics-01] can serve as a template for 191 a metric RFC report which accompanies the protocol action request to 192 the Area Director, including description of the test set-up, 193 procedures, results for each implementation and conclusions. 195 Changes from prior ID -02 to WG -00 draft 197 o Incorporation of aspects of reporting to support the protocol 198 action request in the Introduction and section 3.5 200 o Overhaul of sectcion 3.2 regarding tunneling: Added generic 201 tunneling requirements and L2TPv3 as an example tunneling 202 mechanism fulfilling the tunneling requirements. Removed and 203 adapted some of the prior references to other tunneling protocols 205 o Softened a requirement within section 3.4 (MUST to SHOULD on 206 precision) and removed some comments of the authors. 208 o Updated contact information of one author and added a new author. 210 o Added example C++ code of an Anderson-Darling two sample test 211 implementation. 213 Changes from ID -01 to ID -02 version 215 o Major editorial review, rewording and clarifications on all 216 contents. 218 o Additional text on parrallel testing using VLANs and GRE or 219 Pseudowire tunnels. 221 o Additional examples and a glossary. 223 Changes from ID -00 to ID -01 version 225 o Addition of a comparison of individual metric implementations 226 against the metric specification (trying to pick up problems and 227 solutions for metric advancement [morton-advance-metrics]). 229 o More emphasis on the requirement to carefully design and document 230 the measurement setup of the metric comparison. 232 o Proposal of testing conditions under identical WAN network 233 conditions using IP in IP tunneling or Pseudo Wires and parallel 234 measurement streams. 236 o Proposing the requirement to document the smallest resolution at 237 which an ADK test was passed by 95%. As no minimum resolution is 238 specified, IPPM metric compliance is not linked to a particular 239 performance of an implementation. 241 o Reference to RFC 2330 and RFC 2679 for the 95% confidence interval 242 as preferred criterion to decide on statistical equivalence 244 o Reducing the proposed statistical test to ADK with 95% confidence. 246 1.1. Requirements Language 248 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 249 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 250 document are to be interpreted as described in RFC 2119 [RFC2119]. 252 2. Basic idea 254 The implementation of a standard compliant metric is expected to meet 255 the requirements of the related metric specification. So before 256 comparing two metric implementations, each metric implementation is 257 individually compared against the metric specification. 259 Most metric specifications leave freedom to implementors on non- 260 fundamental aspects of an individual metric (or options). Comparing 261 different measurement results using a statistical test with the 262 assumption of identical test path and testing conditions requires 263 knowledge of all differences in the overall test setup. Metric 264 specification options chosen by implementors have to be documented. 265 It is REQUIRED to use identical implementation options wherever 266 possible for any test proposed here. Calibrations proposed by metric 267 standards should be performed to further identify (and possibly 268 reduce) potential sources of errors in the test setup. 270 The Framework for IP Performance Metrics [RFC2330] expects that a 271 "methodology for a metric should have the property that it is 272 repeatable: if the methodology is used multiple times under identical 273 conditions, it should result in consistent measurements." This means 274 an implementation is expected to repeatedly measure a metric with 275 consistent results (repeatability with the same result). Small 276 deviations in the test setup are expected to lead to small deviations 277 in results only. To characterise statistical equivalence in the case 278 of small deviations, RFC 2330 and [RFC2679] suggest to apply a 95% 279 confidence interval. Quoting RFC 2679, "95 percent was chosen 280 because ... a particular confidence level should be specified so that 281 the results of independent implementations can be compared." 283 Two different implementations are expected to produce statistically 284 equivalent results if they both measure a metric under the same 285 networking conditions. Formulating in statistical terms: separate 286 metric implementations collect separate samples from the same 287 underlying statistical process (the same network conditions). The 288 statistical hypothesis to be tested is the expectation that both 289 samples do not expose statistically different properties. This 290 requires careful test design: 292 o The measurement test setup must be self-consistent to the largest 293 possible extent. To minimize the influence of the test and 294 measurement setup on the result, network conditions and paths MUST 295 be identical for the compared implementations to the largest 296 possible degree. This includes both the stability and non- 297 ambiguity of routes taken by the measurement packets. See RFC 298 2330 for a discussion on self-consistency. 300 o The error induced by the sample size must be small enough to 301 minimize its influence on the test result. This may have to be 302 respected, especially if two implementations measure with 303 different average probing rates. 305 o Every comparison must be repeated several times based on different 306 measurement data to avoid random indications of compatibility (or 307 the lack of it). 309 o To minimize the influence of implementation options on the result, 310 metric implementations SHOULD use identical options and parameters 311 for the metric under evaluation. 313 o The implementation with the lowest probing frequency determines 314 the smallest temporal interval for which samples can be compared. 316 The metric specifications themselves are the primary focus of 317 evaluation, rather than the implementations of metrics. The 318 documentation produced by the advancement process should identify 319 which metric definitions and supporting material were found to be 320 clearly worded and unambiguous, OR, it should identify ways in which 321 the metric specification text should be revised to achieve clarity 322 and unified interpretation. 324 The process should also permit identification of options that were 325 not implemented, so that they can be removed from the advancing 326 specification (this is an aspect more typical of protocol advancement 327 along the standards track). 329 Note that this document does not propose to base interoperability 330 indications of performance metric implementations on comparisons of 331 individual singletons. Individual singletons may be impacted by many 332 statistical effects while they are measured. Comparing two 333 singletons of different implementations may result in failures with 334 higher probability than comparing samples. 336 3. Verification of conformance to a metric specification 338 This section specifies how to verify compliance of two or more IPPM 339 implementations against a metric specification. This document only 340 proposes a general methodology. Compliance criteria to a specific 341 metric implementation need to be defined for each individual metric 342 specification. The only exception is the statistical test comparing 343 two metric implementations which are simultaneously tested. This 344 test is applicable without metric specific decision criteria. 346 3.1. Tests of an individual implementation against a metric 347 specification 349 A metric implementation MUST support the requirements classified as 350 "MUST" and "REQUIRED" of the related metric specification to be 351 compliant to the latter. 353 Further, supported options of a metric implementation SHOULD be 354 documented in sufficient detail. The documentation of chosen options 355 is RECOMMENDED to minimise (and recognise) differences in the test 356 setup if two metric implementations are compared. Further, this 357 documentation is used to validate and improve the underlying metric 358 specification option, to remove options which saw no implementation 359 or which are badly specified from the metric specification to be 360 promoted to a standard. This documentation SHOULD be made for all 361 implementation relevant specifications of a metric picked for a 362 comparison, which aren't explicitly marked as "MUST" or "REQUIRED" in 363 the metric specification. This applies for the following sections of 364 all metric specifications: 366 o Singleton Definition of the Metric. 368 o Sample Definition of the Metric. 370 o Statistics Definition of the Metric. As statistics are compared 371 by the test specified here, this documentation is required even in 372 the case, that the metric specification does not contain a 373 Statistics Definition. 375 o Timing and Synchronisation related specification (if relevant for 376 the Metric). 378 o Any other technical part present or missing in the metric 379 specification, which is relevant for the implementation of the 380 Metric. 382 RFC2330 and RFC2679 emphasise precision as an aim of IPPM metric 383 implementations. A single IPPM conformant implementation MUST under 384 otherwise identical network conditions produce precise results for 385 repeated measurements of the same metric. 387 RFC 2330 prefers the "empirical distribution function" EDF to 388 describe collections of measurements. RFC 2330 determines, that 389 "unless otherwise stated, IPPM goodness-of-fit tests are done using 390 5% significance." The goodness of fit test determines by which 391 precision two or more samples of a metric implementation belong to 392 the same underlying distribution (of measured network performance 393 events). The goodness of fit test to be applied is the Anderson- 394 Darling K sample test (ADK sample test, K stands for the number of 395 samples to be compared) [ADK]. Please note that RFC 2330 and RFC 396 2679 apply an Anderson Darling goodness of fit test too. 398 The results of a repeated test with a single implementation MUST pass 399 an ADK sample test with confidence level of 95%. The resolution for 400 which the ADK test has been passed with the specified confidence 401 level MUST be documented. To formulate this differently: The 402 requirement is to document the smallest resolution, at which the 403 results of the tested metric implementation pass an ADK test with a 404 confidence level of 95%. The minimum resolution available in the 405 reported results from each implementation MUST be taken into account 406 in the ADK test. 408 3.2. Test setup resulting in identical live network testing conditions 410 Two major issues complicate tests for metric compliance across live 411 networks under identical testing conditions. One is the general 412 point that metric definition implementations cannot be conveniently 413 examined in field measurement scenarios. The other one is more 414 broadly described as "parallelism in devices and networks", including 415 mechanisms like those that achieve load balancing (see [RFC4928]). 417 This section proposes two measures to deal with both issues. 418 Tunneling mechanisms can be used to avoid parallel processing of 419 different flows in the network. Measuring by separate parallel probe 420 flows results in repeated collection of data. If both measures are 421 combined, WAN network conditions are identical for a number of 422 independent measurement flows, no matter what the network conditions 423 are in detail. 425 Any measurement setup MUST be made to avoid the probing traffic 426 itself to impede the metric measurement. The created measurement 427 load MUST NOT result in congestion at the access link connecting the 428 measurement implementation to the WAN. The created measurement load 429 MUST NOT overload the measurement implementation itself, eg. by 430 causing a high CPU load or by creating imprecisions due to internal 431 transmit (receive respectively) probe packet collisions. 433 Tunneling multiple flows reaching a network element on a single 434 physical port may allow to transmit all packets of the tunnel via the 435 same path. Applying tunnels to avoid undesired influence of standard 436 routing for measurement purposes is a concept known from literature, 437 see e.g. GRE encapsulated multicast probing [GU+Duffield]. An 438 existing IP in IP tunnel protocol can be applied to avoid Equal-Cost 439 Multi-Path (ECMP) routing of different measurement streams if it 440 meets the following criteria: 442 o Inner IP packets from different measurement implementations are 443 mapped into a single tunnel with single outer IP origin and 444 destination address as well as origing and destination port 445 numbers which are identical for all packets. 447 o An easily accessible commodity tunneling protocol allows to carry 448 out a metric test from more test sites. 450 o A low operational overhead may enable a broader audience to set up 451 a metric test with the desired properties. 453 o The tunneling protocol should be reliable and stable in set up and 454 operation to avoid disturbances or influence on the test results. 456 o The tunneling protocol should not incurr any extra cost for those 457 interested in setting up a metric test. 459 An illustration of a test setup with two tunnels and two flows 460 between two linecards of one implementation is given in Figure 1. 462 Implementation ,---. +--------+ 463 +~~~~~~~~~~~/ \~~~~~~| Remote | 464 +------->-----F2->-| / \ |->---+ | 465 | +---------+ | Tunnel 1( ) | | | 466 | | transmit|-F1->-| ( ) |->+ | | 467 | | LC1 | +~~~~~~~~~| |~~~~| | | | 468 | | receive |-<--+ ( ) | F1 F2 | 469 | +---------+ | |Internet | | | | | 470 *-------<-----+ F2 | | | | | | 471 +---------+ | | +~~~~~~~~~| |~~~~| | | | 472 | transmit|-* *-| | | |--+<-* | 473 | LC2 | | Tunnel 2( ) | | | 474 | receive |-<-F1-| \ / |<-* | 475 +---------+ +~~~~~~~~~~~\ /~~~~~~| Router | 476 `-+-' +--------+ 478 Illustration of a test setup with two tunnels. For simplicity, only 479 two linecards of one implementation and two flows F between them are 480 shown. 482 Figure 1 484 Figure 2 shows the network elements required to set up GRE tunnels or 485 as shown by figure 1. 487 Implementation 489 +-----+ ,---. 490 | LC1 | / \ 491 +-----+ / \ +------+ 492 | +-------+ ( ) +-------+ |Remote| 493 +--------+ | | | | | | | | 494 |Ethernet| | Tunnel| |Internet | | Tunnel| | | 495 |Switch |--| Head |--| |--| Head |--| | 496 +--------+ | Router| | | | Router| | | 497 | | | ( ) | | |Router| 498 +-----+ +-------+ \ / +-------+ +------+ 499 | LC2 | \ / 500 +-----+ `-+-' 501 Illustration of a hardware setup to realise the test setup 502 illustrated by figure 1 with GRE tunnels or Pseudowires. 504 Figure 2 506 If tunneling is applied, two tunnels MUST carry all test traffic in 507 between the test site and the remote site. For example, if 802.1Q 508 Ethernet Virtual LANs (VLAN) are applied and the measurement streams 509 are carried in different VLANs, the IP tunnel or Pseudo Wires 510 respectively MUST be set up in physical port mode to avoid set up of 511 Pseudo Wires per VLAN (which may see different paths due to ECMP 512 routing), see RFC 4448. The remote router and the Ethernet switch 513 shown in figure 2 must support 802.1Q in this set up. 515 The IP packet size of the metric implementation SHOULD be chosen 516 small enough to avoid fragmentation due to the added Ethernet and 517 tunnel headers. Otherwise, the impact of tunnel overhead on 518 fragmentation and interface MTU size MUST be understood and taken 519 into account (see [RFC4459]). 521 An Ethernet port mode IP tunnel carrying several 802.1Q VLANs each 522 containing measurement traffic of a single measurement system was set 523 up as a proof of concept using RFC4719 [RFC4719], Transport of 524 Ethernet Frames over L2TPv3. Ethernet over L2TPv3 seems to fulfill 525 most of the desired tunneling protocol criteria mentioned above. 527 The following headers may have to be accounted for when calculating 528 total packet length, if VLANs and Ethernet over L2TPv3 tunnels are 529 applied: 531 o Ethernet 802.1Q: 22 Byte. 533 o L2TPv3 Header: 4-16 Byte for L2TPv3 data messages over IP; 16-28 534 Byte for L2TPv3 data messages over UDP. 536 o IPv4 Header (outer IP header): 20 Byte. 538 o MPLS Labels may be added by a carrier. Each MPLS Label has a 539 length of 4 Bytes. By the time of writing, between 1 and 4 Labels 540 seems to be a fair guess of what's expectable. 542 The applicability of one or more of the following tunneling protocols 543 may be investigated by interested parties if Ethernet over L2TPv3 is 544 felt to be not suitable: IP in IP [RFC2003] or Generic Routing 545 Encapsulation (GRE) [RFC2784]. RFC 4928 [RFC4928] proposes measures 546 how to avoid ECMP treatment in MPLS networks. 548 L2TP is a commodity tunneling protocol [RFC2661]. By the time of 549 writing, L2TPv3 [RFC3931]is the latest version of L2TP. 551 Ethernet Pseudo Wires may also be set up on MPLS networks [RFC4448]. 552 While there's no technical issue with this solution, MPLS interfaces 553 are mostly found in the network provider domain. Hence not all of 554 the above tunneling criteria are met. 556 Each test is repeated several times. WAN conditions may change over 557 time. Sequential testing is desirable, but may not be a useful 558 metric test option. It is RECOMMENDED that tests be carried out by 559 establishing N different parallel measurement flows. Two or three 560 linecards per implementation serving to send or receive measurement 561 flows should be sufficient to create 5 or more parallel measurement 562 flows. If three linecards are used, each card sends and receives 2 563 flows. Other options are to separate flows by DiffServ marks 564 (without deploying any QoS in the inner or outer tunnel) or using a 565 single CBR flow and evaluating every n-th singleton to belong to a 566 specific measurement flow. 568 Some additional rules to calculate and compare samples have to be 569 respected to perform a metric test: 571 o To compare different probes of a common underlying distribution in 572 terms of metrics characterising a communication network requires 573 to respect the temporal nature for which the assumption of common 574 underlying distribution may hold. Any singletons or samples to be 575 compared MUST be captured within the same time interval. 577 o Whenever statistical events like singletons or rates are used to 578 characterise measured metrics of a time-interval, at least 5 579 singletons of a relevant metric SHOULD be present to ensure a 580 minimum confidence into the reported value (see Wikipedia on 581 confidence [Rule of thumb]). Note that this criterion also is to 582 be respected e.g. when comparing packet loss metrics. Any packet 583 loss measurement interval to be compared with the results of 584 another implementation SHOULD contain at least five lost packets 585 to have a minimum confidence that the observed loss rate wasn't 586 caused by a small number of random packet drops. 588 o The minimum number of singletons or samples to be compared by an 589 Anderson-Darling test SHOULD be 100 per tested metric 590 implementation. Note that the Anderson-Darling test detects small 591 differences in distributions fairly well and will fail for high 592 number of compared results (RFC2330 mentions an example with 8192 593 measurements where an Anderson-Darling test always failed). 595 o Generally, the Anderson-Darling test is sensitive to differences 596 in the accuracy or bias associated with varying implementations or 597 test conditions. These dissimilarities may result in differing 598 averages of samples to be compared. An example may be different 599 packet sizes, resulting in a constant delay difference between 600 compared samples. Therefore samples to be compared by an Anderson 601 Darling test MAY be calibrated by the difference of the average 602 values of the samples. Any calibration of this kind MUST be 603 documented in the test result. 605 3.3. Tests of two or more different implementations against a metric 606 specification 608 RFC2330 expects "a methodology for a given metric [to] exhibit 609 continuity if, for small variations in conditions, it results in 610 small variations in the resulting measurements. Slightly more 611 precisely, for every positive epsilon, there exists a positive delta, 612 such that if two sets of conditions are within delta of each other, 613 then the resulting measurements will be within epsilon of each 614 other." A small variation in conditions in the context of the metric 615 test proposed here can be seen as different implementations measuring 616 the same metric along the same path. 618 IPPM metric specification however allow for implementor options to 619 the largest possible degree. It can't be expected that two 620 implementors pick identical options for the implementations. 621 Implementors SHOULD to the highest degree possible pick the same 622 configurations for their systems when comparing their implementations 623 by a metric test. 625 In some cases, a goodness of fit test may not be possible or show 626 disappointing results. To clarify the difficulties arising from 627 different implementation options, the individual options picked for 628 every compared implementation SHOULD be documented in sufficient 629 detail. Based on this documentation, the underlying metric 630 specification should be improved before it is promoted to a standard. 632 The same statistical test as applicable to quantify precision of a 633 single metric implementation MUST be passed to compare metric 634 conformance of different implementations. To document compatibility, 635 the smallest measurement resolution at which the compared 636 implementations passed the ADK sample test MUST be documented. 638 For different implementations of the same metric, "variations in 639 conditions" are reasonably expected. The ADK test comparing samples 640 of the different implementations may result in a lower precision than 641 the test for precision of each implementation individually. 643 3.4. Clock synchronisation 645 Clock synchronization effects require special attention. Accuracy of 646 one-way active delay measurements for any metrics implementation 647 depends on clock synchronization between the source and destination 648 of tests. Ideally, one-way active delay measurement (RFC 2679, 649 [RFC2679]) test endpoints either have direct access to independent 650 GPS or CDMA-based time sources or indirect access to nearby NTP 651 primary (stratum 1) time sources, equipped with GPS receivers. 652 Access to these time sources may not be available at all test 653 locations associated with different Internet paths, for a variety of 654 reasons out of scope of this document. 656 When secondary (stratum 2 and above) time sources are used with NTP 657 running across the same network, whose metrics are subject to 658 comparative implementation tests, network impairments can affect 659 clock synchronization, distort sample one-way values and their 660 interval statistics. It is RECOMMENDED to discard sample one-way 661 delay values for any implementation, when one of the following 662 reliability conditions is met: 664 o Delay is measured and is finite in one direction, but not the 665 other. 667 o Absolute value of the difference between the sum of one-way 668 measurements in both directions and round-trip measurement is 669 greater than X% of the latter value. 671 Examination of the second condition requires RTT measurement for 672 reference, e.g., based on TWAMP (RFC5357, RFC 5357 [RFC5357]), in 673 conjunction with one-way delay measurement. 675 Specification of X% to strike a balance between identification of 676 unreliable one-way delay samples and misidentification of reliable 677 samples under a wide range of Internet path RTTs probably requires 678 further study. 680 An IPPM compliant metric implementation whose measurement requires 681 synchronized clocks is however expected to provide precise 682 measurement results. Any IPPM metric implementation SHOULD be of a 683 precision of 1 ms (+/- 500 us) with a confidence of 95% if the metric 684 is captured along an Internet path which is stable and not congested 685 during a measurement duration of an hour or more. 687 3.5. Recommended Metric Verification Measurement Process 689 In order to meet their obligations under the IETF Standards Process 690 the IESG must be convinced that each metric specification advanced to 691 Draft Standard or Internet Standard status is clearly written, that 692 there are the required multiple verifiably equivalent 693 implementations, and that all options have been implemented. 695 In the context of this document, metrics are designed to measure some 696 characteristic of a data network. An aim of any metric definition 697 should be that it should be specified in a way that can reliably 698 measure the specific characteristic in a repeatable way. 700 Each metric, statistic or option of those to be validated MUST be 701 compared against a reference measurement or another implementation by 702 at least 5 different basic data sets, each one with sufficient size 703 to reach the specified level of confidence, as specified by this 704 document. 706 Finally, the metric definitions, embodied in the text of the RFCs, 707 are the objects that require evaluation and possible revision in 708 order to advance to the next step on the standards track. 710 IF two (or more) implementations do not measure an equivalent metric 711 as specified by this document, 713 AND sources of measurement error do not adequately explain the lack 714 of agreement, 716 THEN the details of each implementation should be audited along with 717 the exact definition text, to determine if there is a lack of clarity 718 that has caused the implementations to vary in a way that affects the 719 correspondence of the results. 721 IF there was a lack of clarity or multiple legitimate interpretations 722 of the definition text, 724 THEN the text should be modified and the resulting memo proposed for 725 consensus and advancement along the standards track. 727 Finally, all the findings MUST be documented in a report that can 728 support advancement on the standards track, similar to those 729 described in [RFC5657]. The list of measurement devices used in 730 testing satisfies the implementation requirement, while the test 731 results provide information on the quality of each specification in 732 the metric RFC (the surrogate for feature interoperability). 734 The complete process of advancing a metric specification to a 735 standard as defined by this document is illustrated in Figure 3. 737 ,---. 738 / \ 739 ( Start ) 740 \ / Implementations 741 `-+-' +-------+ 742 | /| 1 `. 743 +---+----+ / +-------+ `.-----------+ ,-------. 744 | RFC | / |Check for | ,' was RFC `. YES 745 | | / |Equivalence.... clause x --------+ 746 | |/ +-------+ |under | `. clear? ,' | 747 | Metric \.....| 2 ....relevant | `---+---' +----+---+ 748 | Metric |\ +-------+ |identical | No | |Report | 749 | Metric | \ |network | +--+----+ |results+| 750 | ... | \ |conditions | |Modify | |Advance | 751 | | \ +-------+ | | |Spec +----+RFC | 752 +--------+ \| n |.'+-----------+ +-------+ |request | 753 +-------+ +--------+ 755 Illustration of the metric standardisation process 757 Figure 3 759 Any recommendation for the advancement of a metric specification MUST 760 be accompanied by an implementation report, as is the case with all 761 requests for the advancement of IETF specifications. The 762 implementation report needs to include the tests performed, the 763 applied test setup, the specific metrics in the RFC and reports of 764 the tests performed with two or more implementations. The test plan 765 needs to specify the precision reached for each measured metric and 766 thus define the meaning of "statistically equivalent" for the 767 specific metrics being tested. 769 Ideally, the test plan would co-evolve with the development of the 770 metric, since that's when people have the most context in their 771 thinking regarding the different subtleties that can arise. 773 In particular, the implementation report MUST as a minimum document: 775 o The metric compared and the RFC specifying it. This includes 776 statements as required by the section "Tests of an individual 777 implementation against a metric specification" of this document. 779 o The measurement configuration and setup. 781 o A complete specification of the measurement stream (mean rate, 782 statistical distribution of packets, packet size or mean packet 783 size and their distribution), DSCP and any other measurement 784 stream properties which could result in deviating results. 785 Deviations in results can be caused also if chosen IP addresses 786 and ports of different implementations can result in different 787 layer 2 or layer 3 paths due to operation of Equal Cost Multi-Path 788 routing in an operational network. 790 o The duration of each measurement to be used for a metric 791 validation, the number of measurement points collected for each 792 metric during each measurement interval (i.e. the probe size) and 793 the level of confidence derived from this probe size for each 794 measurement interval. 796 o The result of the statistical tests performed for each metric 797 validation as required by the section "Tests of two or more 798 different implementations against a metric specification" of this 799 document. 801 o A parameterization of laboratory conditions and applied traffic 802 and network conditions allowing reproduction of these laboratory 803 conditions for readers of the implementation report. 805 o The documentation helping to improve metric specifications defined 806 by this section. 808 All of the tests for each set SHOULD be run in a test setup as 809 specified in the section "Test setup resulting in identical live 810 network testing conditions." 812 If a different test set up is chosen, it is RECOMMENDED to avoid 813 effects falsifying results of validation measurements caused by real 814 data networks (like parallelism in devices and networks). Data 815 networks may forward packets differently in the case of: 817 o Different packet sizes chosen for different metric 818 implementations. A proposed countermeasure is selecting the same 819 packet size when validating results of two samples or a sample 820 against an original distribution. 822 o Selection of differing IP addresses and ports used by different 823 metric implementations during metric validation tests. If ECMP is 824 applied on IP or MPLS level, different paths can result (note that 825 it may be impossible to detect an MPLS ECMP path from an IP 826 endpoint). A proposed counter measure is to connect the 827 measurement equipment to be compared by a NAT device, or 828 establishing a single tunnel to transport all measurement traffic 829 The aim is to have the same IP addresses and port for all 830 measurement packets or to avoid ECMP based local routing diversion 831 by using a layer 2 tunnel. 833 o Different IP options. 835 o Different DSCP. 837 o If the N measurements are captured using sequential measurements 838 instead of simultaneous ones, then the following factors come into 839 play: Time varying paths and load conditions. 841 3.6. Miscellaneous 843 In the case that a metric validation requires capturing rare events, 844 an impairment generator may have to be added to the test set up. 845 Inclusion of an impairment generator and the parameterisation of the 846 impairments generated MUST be documented. Rare events could be 847 packet duplications, packet loss rates above one digit percentages, 848 loss patterns or packet re-ordering and so on. 850 As specified above, 5 singletons are the recommended basis to 851 minimise interference of random events with the statistical test 852 proposed by this document. In the case of ratio measurements (like 853 packet loss), the underlying sum of basic events, against the which 854 the metric's monitored singletons are "rated", determines the 855 resolution of the test. A packet loss statistic with a resolution of 856 1% requires one packet loss statistic-datapoint to consist of 500 857 delay singletons (of which at least 5 were lost). To compare EDFs on 858 packet loss requires one hundred such statistics per flow. That 859 means, all in all at least 50 000 delay singletons are required per 860 single measurement flow. Live network packet loss is assumed to be 861 present during main traffic hours only. Let this interval be 5 862 hours. The required minimum rate of a single measurement flow in 863 that case is 2.8 packets/sec (assuming a loss of 1% during 5 hours). 864 If this measurement is too demanding under live network conditions, 865 an impairment generator should be used. 867 4. Acknowledgements 869 Gerhard Hasslinger commented a first version of this document, 870 suggested statistical tests and the evaluation of time series 871 information. Henk Uijterwaal pushed this work and Mike Hamilton, 872 Scott Bradner and Emile Stephan commented on versions of this draft 873 before initial publication. Carol Davids reviewed the 01 version of 874 this draft. 876 5. Contributors 878 Scott Bradner, Vern Paxson and Allison Mankin drafted bradner- 879 metrictest [bradner-metrictest], and major parts of it are included 880 in this document. 882 6. IANA Considerations 884 This memo includes no request to IANA. 886 7. Security Considerations 888 This draft does not raise any specific security issues. 890 8. References 892 8.1. Normative References 894 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 895 October 1996. 897 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 898 3", BCP 9, RFC 2026, October 1996. 900 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 901 Requirement Levels", BCP 14, RFC 2119, March 1997. 903 [RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis, 904 "Framework for IP Performance Metrics", RFC 2330, 905 May 1998. 907 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 908 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 909 RFC 2661, August 1999. 911 [RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 912 Delay Metric for IPPM", RFC 2679, September 1999. 914 [RFC2680] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 915 Packet Loss Metric for IPPM", RFC 2680, September 1999. 917 [RFC2681] Almes, G., Kalidindi, S., and M. Zekauskas, "A Round-trip 918 Delay Metric for IPPM", RFC 2681, September 1999. 920 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 921 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 922 March 2000. 924 [RFC3931] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling 925 Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005. 927 [RFC4448] Martini, L., Rosen, E., El-Aawar, N., and G. Heron, 928 "Encapsulation Methods for Transport of Ethernet over MPLS 929 Networks", RFC 4448, April 2006. 931 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 932 Network Tunneling", RFC 4459, April 2006. 934 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 935 Zekauskas, "A One-way Active Measurement Protocol 936 (OWAMP)", RFC 4656, September 2006. 938 [RFC4719] Aggarwal, R., Townsley, M., and M. Dos Santos, "Transport 939 of Ethernet Frames over Layer 2 Tunneling Protocol Version 940 3 (L2TPv3)", RFC 4719, November 2006. 942 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 943 Cost Multipath Treatment in MPLS Networks", BCP 128, 944 RFC 4928, June 2007. 946 [RFC5657] Dusseault, L. and R. Sparks, "Guidance on Interoperation 947 and Implementation Reports for Advancement to Draft 948 Standard", BCP 9, RFC 5657, September 2009. 950 8.2. Informative References 952 [ADK] Scholz, F. and M. Stephens, "K-sample Anderson-Darling 953 Tests of fit, for continuous and discrete cases", 954 University of Washington, Technical Report No. 81, 955 May 1986. 957 [GU+Duffield] 958 Gu, Y., Duffield, N., Breslau, L., and S. Sen, "GRE 959 Encapsulated Multicast Probing: A Scalable Technique for 960 Measuring One-Way Loss", SIGMETRICS'07 San Diego, 961 California, USA, June 2007. 963 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 964 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 965 RFC 5357, October 2008. 967 [Rule of thumb] 968 Hardy, M., "Confidence interval", March 2010. 970 [bradner-metrictest] 971 Bradner, S., Mankin, A., and V. Paxson, "Advancement of 972 metrics specifications on the IETF Standards Track", 973 draft -bradner-metricstest-03, (work in progress), 974 July 2007. 976 [morton-advance-metrics] 977 Morton, A., "Problems and Possible Solutions for Advancing 978 Metrics on the Standards Track", draft -morton-ippm- 979 advance-metrics-00, (work in progress), July 2009. 981 [morton-advance-metrics-01] 982 Morton, A., "Lab Test Results for Advancing Metrics on the 983 Standards Track", draft -morton-ippm-advance-metrics-01, 984 (work in progress), June 2010. 986 Appendix A. An example on a One-way Delay metric validation 988 The text of this appendix is not binding. It is an example how parts 989 of a One-way Delay metric test could look like. 990 http://xml.resource.org/public/rfc/bibxml/ 992 A.1. Compliance to Metric specification requirements 994 One-way Delay, Loss threshold, RFC 2679 996 This test determines if implementations use the same configured 997 maximum waiting time delay from one measurement to another under 998 different delay conditions, and correctly declare packets arriving in 999 excess of the waiting time threshold as lost. See Section 3.5 of 1000 RFC2679, 3rd bullet point and also Section 3.8.2 of RFC2679. 1002 (1) Configure a path with 1 sec one-way constant delay. 1004 (2) Measure one-way delay with 2 or more implementations, using 1005 identical waiting time thresholds for loss set at 2 seconds. 1007 (3) Configure the path with 3 sec one-way delay. 1009 (4) Repeat measurements. 1011 (5) Observe that the increase measured in step 4 caused all packets 1012 to be declared lost, and that all packets that arrive 1013 successfully in step 2 are assigned a valid one-way delay. 1015 One-way Delay, First-bit to Last bit, RFC 2679 1017 This test determines if implementations register the same relative 1018 increase in delay from one measurement to another under different 1019 delay conditions. This test tends to cancel the sources of error 1020 which may be present in an implementation. See Section 3.7.2 of 1021 RFC2679, and Section 10.2 of RFC2330. 1023 (1) Configure a path with X ms one-way constant delay, and ideally 1024 including a low-speed link. 1026 (2) Measure one-way delay with 2 or more implementations, using 1027 identical options and equal size small packets (e.g., 100 octet 1028 IP payload). 1030 (3) Maintain the same path with X ms one-way delay. 1032 (4) Measure one-way delay with 2 or more implementations, using 1033 identical options and equal size large packets (e.g., 1500 octet 1034 IP payload). 1036 (5) Observe that the increase measured in steps 2 and 4 is 1037 equivalent to the increase in ms expected due to the larger 1038 serialization time for each implementation. Most of the 1039 measurement errors in each system should cancel, if they are 1040 stationary. 1042 One-way Delay, RFC 2679 1044 This test determines if implementations register the same relative 1045 increase in delay from one measurement to another under different 1046 delay conditions. This test tends to cancel the sources of error 1047 which may be present in an implementation. This test is intended to 1048 evaluate measurments in sections 3 and 4 of RFC2679. 1050 (1) Configure a path with X ms one-way constant delay. 1052 (2) Measure one-way delay with 2 or more implementations, using 1053 identical options. 1055 (3) Configure the path with X+Y ms one-way delay. 1057 (4) Repeat measurements. 1059 (5) Observe that the increase measured in steps 2 and 4 is ~Y ms for 1060 each implementation. Most of the measurement errors in each 1061 system should cancel, if they are stationary. 1063 Error Calibration, RFC 2679 1065 This is a simple check to determine if an implementation reports the 1066 error calibration as required in Section 4.8 of RFC2679. Note that 1067 the context (Type-P) must also be reported. 1069 A.2. Examples related to statistical tests for One-way Delay 1071 A one way delay measurement may pass an ADK test with a timestamp 1072 resultion of 1 ms. The same test may fail, if timestamps with a 1073 resolution of 100 microseconds are eavluated. The implementation 1074 then is then conforming to the metric specification up to a timestamp 1075 resolution of 1 ms. 1077 Let's assume another one way delay measurement comparison between 1078 implementation 1, probing with a frequency of 2 probes per second and 1079 implementation 2 probing at a rate of 2 probes every 3 minutes. To 1080 ensure reasonable confidence in results, sample metrics are 1081 calculated from at least 5 singletons per compared time interval. 1082 This means, sample delay values are calculated for each system for 1083 identical 6 minute intervals for the whole test duration. Per 6 1084 minute interval, the sample metric is calculated from 720 singletons 1085 for implementation 1 and from 6 singletons for implementation 2. 1086 Note, that if outliers are not filtered, moving averages are an 1087 option for an evaluation too. The minimum move of an averaging 1088 interval is three minutes in this example. 1090 The data in table 1 may result from measuring One-Way Delay with 1091 implementation 1 (see column Implemnt_1) and implementation 2 (see 1092 column implemnt_2). Each data point in the table represents a 1093 (rounded) average of the sampled delay values per interval. The 1094 resolution of the clock is one micro-second. The difference in the 1095 delay values may result eg. from different probe packet sizes. 1097 +------------+------------+-----------------------------+ 1098 | Implemnt_1 | Implemnt_2 | Implemnt_2 - Delta_Averages | 1099 +------------+------------+-----------------------------+ 1100 | 5000 | 6549 | 4997 | 1101 | 5008 | 6555 | 5003 | 1102 | 5012 | 6564 | 5012 | 1103 | 5015 | 6565 | 5013 | 1104 | 5019 | 6568 | 5016 | 1105 | 5022 | 6570 | 5018 | 1106 | 5024 | 6573 | 5021 | 1107 | 5026 | 6575 | 5023 | 1108 | 5027 | 6577 | 5025 | 1109 | 5029 | 6580 | 5028 | 1110 | 5030 | 6585 | 5033 | 1111 | 5032 | 6586 | 5034 | 1112 | 5034 | 6587 | 5035 | 1113 | 5036 | 6588 | 5036 | 1114 | 5038 | 6589 | 5037 | 1115 | 5039 | 6591 | 5039 | 1116 | 5041 | 6592 | 5040 | 1117 | 5043 | 6599 | 5047 | 1118 | 5046 | 6606 | 5054 | 1119 | 5054 | 6612 | 5060 | 1120 +------------+------------+-----------------------------+ 1122 Table 1 1124 Average values of sample metrics captured during identical time 1125 intervals are compared. This excludes random differences caused by 1126 differing probing intervals or differing temporal distance of 1127 singletons resulting from their Poisson distributed sending times. 1129 In the example, 20 values have been picked (note that at least 100 1130 values are recommended for a single run of a real test). Data must 1131 be ordered by ascending rank. The data of Implemnt_1 and Implemnt_2 1132 as shown in the first two columns of table 1 clearly fails an ADK 1133 test with 95% confidence. 1135 The results of Implemnt_2 are now reduced by difference of the 1136 averages of column 2 (rounded to 6581 us) and column 1 (rounded to 1137 5029 us), which is 1552 us. The result may be found in column 3 of 1138 table 1. Comparing column 1 and column 3 of the table by an ADK test 1139 shows, that the data contained in these columns passes an ADK tests 1140 with 95% confidence. 1142 Appendix B. Anderson-Darling 2 sample C++ code 1143 /* Routines for computing the Anderson-Darling 2 sample 1144 * test statistic. 1145 * 1146 * Implemented based on the description in 1147 * "Anderson-Darling K Sample Test" Heckert, Alan and 1148 * Filliben, James, editors, Dataplot Reference Manual, 1149 * Chapter 15 Auxiliary, NIST, 2004. 1150 * Official Reference by 2010 1151 * Heckert, N. A. (2001). Dataplot website at the 1152 * National Institute of Standards and Technology: 1153 * http://www.itl.nist.gov/div898/software/dataplot.html/ 1154 * June 2001. 1155 */ 1157 #include 1158 #include 1159 #include 1160 #include 1162 using namespace std; 1164 vector vec1, vec2; 1165 double adk_result; 1166 double adk_criterium = 1.993; 1168 /* vec1 and vec2 to be initialised with sample 1 and 1169 * sample 2 values in ascending order. 1170 */ 1172 /* example for iterating the vectors 1173 * for(vector::iterator it = vec1->begin(); 1174 * it != vec1->end(); it++ 1175 * { 1176 * cout << *it << endl; 1177 * } 1178 */ 1180 static int k, val_st_z_samp1, val_st_z_samp2, 1181 val_eq_z_samp1, val_eq_z_samp2, 1182 j, n_total, n_sample1, n_sample2, L, 1183 max_number_samples, line, maxnumber_z; 1184 static int column_1, column_2; 1185 static double adk, n_value, z, sum_adk_samp1, 1186 sum_adk_samp2, z_aux; 1187 static double H_j, F1j, hj, F2j, denom_1_aux, denom_2_aux; 1188 static bool next_z_sample2, equal_z_both_samples; 1189 static int stop_loop1, stop_loop2, stop_loop3,old_eq_line2, 1190 old_eq_line1; 1192 static double adk_criterium = 1.993; 1194 k = 2; 1195 n_sample1 = vec1->size() - 1; 1196 n_sample2 = vec2->size() - 1; 1198 // -1 because vec[0] is a dummy value 1200 n_total = n_sample1 + n_sample2; 1202 /* value equal to the line with a value = zj in sample 1. 1203 * Here j=1, so the line is 1. 1204 */ 1206 val_eq_z_samp1 = 1; 1208 /* value equal to the line with a value = zj in sample 2. 1209 * Here j=1, so the line is 1. 1210 */ 1212 val_eq_z_samp2 = 1; 1214 /* value equal to the last line with a value < zj 1215 * in sample 1. Here j=1, so the line is 0. 1216 */ 1218 val_st_z_samp1 = 0; 1220 /* value equal to the last line with a value < zj 1221 * in sample 1. Here j=1, so the line is 0. 1222 */ 1224 val_st_z_samp2 = 0; 1226 sum_adk_samp1 = 0; 1227 sum_adk_samp2 = 0; 1228 j = 1; 1230 // as mentioned above, j=1 1232 equal_z_both_samples = false; 1233 next_z_sample2 = false; 1235 //assuming the next z to be of sample 1 1237 stop_loop1 = n_sample1 + 1; 1239 // + 1 because vec[0] is a dummy, see n_sample1 declaration 1240 stop_loop2 = n_sample2 + 1; 1241 stop_loop3 = n_total + 1; 1243 /* The required z values are calculated until all values 1244 * of both samples have been taken into account. See the 1245 * lines above for the stoploop values. Construct required 1246 * to avoid a mathematical operation in the While condition 1247 */ 1249 while (((stop_loop1 > val_eq_z_samp1) 1250 || (stop_loop2 > val_eq_z_samp2)) && stop_loop3 > j) 1251 { 1252 if(val_eq_z_samp1 < n_sample1+1) 1253 { 1255 /* here, a preliminary zj value is set. 1256 * See below how to calculate the actual zj. 1257 */ 1259 z = (*vec1)[val_eq_z_samp1]; 1261 /* this while sequence calculates the number of values 1262 * equal to z. 1263 */ 1265 while ((val_eq_z_samp1+1 < n_sample1) 1266 && z == (*vec1)[val_eq_z_samp1+1] ) 1267 { 1268 val_eq_z_samp1++; 1269 } 1270 } 1271 else 1272 { 1273 val_eq_z_samp1 = 0; 1274 val_st_z_samp1 = n_sample1; 1276 // this should be val_eq_z_samp1 - 1 = n_sample1 1277 } 1279 if(val_eq_z_samp2 < n_sample2+1) 1280 { 1281 z_aux = (*vec2)[val_eq_z_samp2];; 1283 /* this while sequence calculates the number of values 1284 * equal to z_aux 1285 */ 1287 while ((val_eq_z_samp2+1 < n_sample2) 1288 && z_aux == (*vec2)[val_eq_z_samp2+1] ) 1289 { 1290 val_eq_z_samp2++; 1291 } 1293 /* the smaller of the two actual data values is picked 1294 * as the next zj. 1295 */ 1297 if(z > z_aux) 1298 { 1299 z = z_aux; 1300 next_z_sample2 = true; 1301 } 1302 else 1303 { 1304 if (z == z_aux) 1305 { 1306 equal_z_both_samples = true; 1307 } 1309 /* This is the case, if the last value of column1 is 1310 * smaller than the remaining values of column2. 1311 */ 1312 if (val_eq_z_samp1 == 0) 1313 { 1314 z = z_aux; 1315 next_z_sample2 = true; 1316 } 1317 } 1318 } 1319 else 1320 { 1321 val_eq_z_samp2 = 0; 1322 val_st_z_samp2 = n_sample2; 1324 // this should be val_eq_z_samp2 - 1 = n_sample2 1326 } 1328 /* in the following, sum j = 1 to L is calculated for 1329 * sample 1 and sample 2. 1330 */ 1332 if (equal_z_both_samples) 1333 { 1335 /* hj is the number of values in the combined sample 1336 * equal to zj 1337 */ 1338 hj = val_eq_z_samp1 - val_st_z_samp1 1339 + val_eq_z_samp2 - val_st_z_samp2; 1341 /* H_j is the number of values in the combined sample 1342 * smaller than zj plus one half the the number of 1343 * values in the combined sample equal to zj 1344 * (that's hj/2). 1345 */ 1347 H_j = val_st_z_samp1 + val_st_z_samp2 1348 + hj / 2; 1350 /* F1j is the number of values in the 1st sample 1351 * which are less than zj plus one half the number 1352 * of values in this sample which are equal to zj. 1353 */ 1355 F1j = val_st_z_samp1 + (double) 1356 (val_eq_z_samp1 - val_st_z_samp1) / 2; 1358 /* F2j is the number of values in the 1st sample 1359 * which are less than zj plus one half the number 1360 * of values in this sample which are equal to zj. 1361 */ 1362 F2j = val_st_z_samp2 + (double) 1363 (val_eq_z_samp2 - val_st_z_samp2) / 2; 1365 /* set the line of values equal to zj to the 1366 * actual line of the last value picked for zj. 1367 */ 1368 val_st_z_samp1 = val_eq_z_samp1; 1370 /* Set the line of values equal to zj to the actual 1371 * line of the last value picked for zjof each 1372 * sample. This is required as data smaller than zj 1373 * is accounted differently than values equal to zj. 1374 */ 1376 val_st_z_samp2 = val_eq_z_samp2; 1378 /* next the lines of the next values z, ie. zj+1 1379 * are addressed. 1380 */ 1382 val_eq_z_samp1++; 1384 /* next the lines of the next values z, ie. 1385 * zj+1 are addressed 1386 */ 1388 val_eq_z_samp2++; 1389 } 1390 else 1391 { 1393 /* the smaller z value was contained in sample 2, 1394 * hence this value is the zj to base the following 1395 * calculations on. 1396 */ 1397 if (next_z_sample2) 1398 { 1400 /* hj is the number of values in the combined 1401 * sample equal to zj, in this case these are 1402 * within sample 2 only. 1403 */ 1404 hj = val_eq_z_samp2 - val_st_z_samp2; 1406 /* H_j is the number of values in the combined sample 1407 * smaller than zj plus one half the the number of 1408 * values in the combined sample equal to zj 1409 * (that's hj/2). 1410 */ 1412 H_j = val_st_z_samp1 + val_st_z_samp2 1413 + hj / 2; 1415 /* F1j is the number of values in the 1st sample which 1416 * are less than zj plus one half the number of values in 1417 * this sample which are equal to zj. 1418 * As val_eq_z_samp2 < val_eq_z_samp1, these are the 1419 * val_st_z_samp1 only. 1420 */ 1421 F1j = val_st_z_samp1; 1423 /* F2j is the number of values in the 1st sample which 1424 * are less than zj plus one half the number of values in 1425 * this sample which are equal to zj. The latter are from 1426 * sample 2 only in this case. 1427 */ 1429 F2j = val_st_z_samp2 + (double) 1430 (val_eq_z_samp2 - val_st_z_samp2) / 2; 1432 /* Set the line of values equal to zj to the actual line 1433 * of the last value picked for zj of sample 2 only in 1434 * this case. 1435 */ 1436 val_st_z_samp2 = val_eq_z_samp2; 1438 /* next the line of the next value z, ie. zj+1 is 1439 * addressed. Here, only sample 2 must be addressed. 1440 */ 1442 val_eq_z_samp2++; 1443 if (val_eq_z_samp1 == 0) 1444 { 1445 val_eq_z_samp1 = stop_loop1; 1446 } 1447 } 1449 /* the smaller z value was contained in sample 2, 1450 * hence this value is the zj to base the following 1451 * calculations on. 1452 */ 1454 else 1455 { 1457 /* hj is the number of values in the combined 1458 * sample equal to zj, in this case these are 1459 * within sample 1 only. 1460 */ 1461 hj = val_eq_z_samp1 - val_st_z_samp1; 1463 /* H_j is the number of values in the combined 1464 * sample smaller than zj plus one half the the number 1465 * of values in the combined sample equal to zj 1466 * (that's hj/2). 1467 */ 1469 H_j = val_st_z_samp1 + val_st_z_samp2 1470 + hj / 2; 1472 /* F1j is the number of values in the 1st sample which 1473 * are less than zj plus, in this case these are within 1474 * sample 1 only one half the number of values in this 1475 * sample which are equal to zj. The latter are from 1476 * sample 1 only in this case. 1477 */ 1479 F1j = val_st_z_samp1 + (double) 1480 (val_eq_z_samp1 - val_st_z_samp1) / 2; 1482 /* F2j is the number of values in the 1st sample which 1483 * are less than zj plus one half the number of values 1484 * in this sample which are equal to zj. As 1485 * val_eq_z_samp1 < val_eq_z_samp2, these are the 1486 * val_st_z_samp2 only. 1487 */ 1489 F2j = val_st_z_samp2; 1491 /* Set the line of values equal to zj to the actual line 1492 * of the last value picked for zj of sample 1 only in 1493 * this case 1494 */ 1496 val_st_z_samp1 = val_eq_z_samp1; 1498 /* next the line of the next value z, ie. zj+1 is 1499 * addressed. Here, only sample 1 must be addressed. 1500 */ 1501 val_eq_z_samp1++; 1503 if (val_eq_z_samp2 == 0) 1504 { 1505 val_eq_z_samp2 = stop_loop2; 1506 } 1507 } 1508 } 1510 denom_1_aux = n_total * F1j - n_sample1 * H_j; 1511 denom_2_aux = n_total * F2j - n_sample2 * H_j; 1513 sum_adk_samp1 = sum_adk_samp1 + hj 1514 * (denom_1_aux * denom_1_aux) / 1515 (H_j * (n_total - H_j) 1516 - n_total * hj / 4); 1517 sum_adk_samp2 = sum_adk_samp2 + hj 1518 * (denom_2_aux * denom_2_aux) / 1519 (H_j * (n_total - H_j) 1520 - n_total * hj / 4); 1522 next_z_sample2 = false; 1523 equal_z_both_samples = false; 1525 /* index to count the z. It is only required to prevent 1526 * the while slope to execute endless 1527 */ 1528 j++; 1529 } 1531 // calculating the adk value is the final step. 1533 adk_result = (double) (n_total - 1) / (n_total 1534 * n_total * (k - 1)) 1535 * (sum_adk_samp1 / n_sample1 1536 + sum_adk_samp2 / n_sample2); 1538 /* if(adk_result <= adk_criterium) 1539 * adk_2_sample test is passed 1540 */ 1542 Figure 4 1544 Appendix C. Glossary 1546 +-------------+-----------------------------------------------------+ 1547 | ADK | Anderson-Darling K-Sample test, a test used to | 1548 | | check whether two samples have the same statistical | 1549 | | distribution. | 1550 | ECMP | Equal Cost Multipath, a load balancing mechanism | 1551 | | evaluating MPLS labels stacks, IP addresses and | 1552 | | ports. | 1553 | EDF | The "Empirical Distribution Function" of a set of | 1554 | | scalar measurements is a function F(x) which for | 1555 | | any x gives the fractional proportion of the total | 1556 | | measurements that were smaller than or equal as x. | 1557 | Metric | A measured quantity related to the performance and | 1558 | | reliability of the Internet, expressed by a value. | 1559 | | This could be a singleton (single value), a sample | 1560 | | of single values or a statistic based on a sample | 1561 | | of singletons. | 1562 | OWAMP | One-way Active Measurement Protocol, a protocol for | 1563 | | communication between IPPM measurement systems | 1564 | | specified by IPPM. | 1565 | OWD | One-Way Delay, a performance metric specified by | 1566 | | IPPM. | 1567 | Sample | A sample metric is derived from a given singleton | 1568 | metric | metric by evaluating a number of distinct instances | 1569 | | together. | 1570 | Singleton | A singleton metric is, in a sense, one atomic | 1571 | metric | measurement of this metric. | 1572 | Statistical | A 'statistical' metric is derived from a given | 1573 | metric | sample metric by computing some statistic of the | 1574 | | values defined by the singleton metric on the | 1575 | | sample. | 1576 | TWAMP | Two-way Active Measurement Protocol, a protocol for | 1577 | | communication between IPPM measurement systems | 1578 | | specified by IPPM. | 1579 +-------------+-----------------------------------------------------+ 1581 Table 2 1583 Authors' Addresses 1585 Ruediger Geib (editor) 1586 Deutsche Telekom 1587 Heinrich Hertz Str. 3-7 1588 Darmstadt, 64295 1589 Germany 1591 Phone: +49 6151 628 2747 1592 Email: Ruediger.Geib@telekom.de 1594 Al Morton 1595 AT&T Labs 1596 200 Laurel Avenue South 1597 Middletown, NJ 07748 1598 USA 1600 Phone: +1 732 420 1571 1601 Fax: +1 732 368 1192 1602 Email: acmorton@att.com 1603 URI: http://home.comcast.net/~acmacm/ 1605 Reza Fardid 1606 Cariden Technologies 1607 888 Villa Street, Suite 500 1608 Mountain View, CA 94041 1609 USA 1611 Phone: 1612 Email: rfardid@cariden.com 1613 Alexander Steinmitz 1614 HS Fulda 1615 Marquardstr. 35 1616 Fulda, 36039 1617 Germany 1619 Phone: 1620 Email: steinionline@gmx.de