idnits 2.17.1 draft-ietf-ippm-model-based-metrics-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1061 has weird spacing: '... and n = h1...' -- The document date (March 9, 2015) is 3334 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'Dominant' is mentioned on line 257, but not defined -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2861 (Obsoleted by RFC 7661) Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IP Performance Working Group M. Mathis 3 Internet-Draft Google, Inc 4 Intended status: Experimental A. Morton 5 Expires: September 10, 2015 AT&T Labs 6 March 9, 2015 8 Model Based Bulk Performance Metrics 9 draft-ietf-ippm-model-based-metrics-04.txt 11 Abstract 13 We introduce a new class of model based metrics designed to determine 14 if an end-to-end Internet path can meet predefined bulk transport 15 performance targets by applying a suite of IP diagnostic tests to 16 successive subpaths. The subpath-at-a-time tests can be robustly 17 applied to key infrastructure, such as interconnects, to accurately 18 detect if any part of the infrastructure will prevent the full end- 19 to-end paths traversing them from meeting the specified target 20 performance. 22 The diagnostic tests consist of precomputed traffic patterns and 23 statistical criteria for evaluating packet delivery. The traffic 24 patterns are precomputed to mimic TCP or other transport protocol 25 over a long path but are constructed in such a way that they are 26 independent of the actual details of the subpath under test, end 27 systems or applications. Likewise the success criteria depends on 28 the packet delivery statistics of the subpath, as evaluated against a 29 protocol model applied to the target performance. The success 30 criteria also does not depend on the details of the subpath, 31 endsystems or application. This makes the measurements open loop, 32 eliminating most of the difficulties encountered by traditional bulk 33 transport metrics. 35 Model based metrics exhibit several important new properties not 36 present in other Bulk Capacity Metrics, including the ability to 37 reason about concatenated or overlapping subpaths. The results are 38 vantage independent which is critical for supporting independent 39 validation of tests results from multiple Measurement Points. 41 This document does not define diagnostic tests directly, but provides 42 a framework for designing suites of diagnostics tests that are 43 tailored to confirming that infrastructure can meet the target 44 performance. 46 Status of this Memo 48 This Internet-Draft is submitted in full conformance with the 49 provisions of BCP 78 and BCP 79. 51 Internet-Drafts are working documents of the Internet Engineering 52 Task Force (IETF). Note that other groups may also distribute 53 working documents as Internet-Drafts. The list of current Internet- 54 Drafts is at http://datatracker.ietf.org/drafts/current/. 56 Internet-Drafts are draft documents valid for a maximum of six months 57 and may be updated, replaced, or obsoleted by other documents at any 58 time. It is inappropriate to use Internet-Drafts as reference 59 material or to cite them other than as "work in progress." 61 This Internet-Draft will expire on September 10, 2015. 63 Copyright Notice 65 Copyright (c) 2015 IETF Trust and the persons identified as the 66 document authors. All rights reserved. 68 This document is subject to BCP 78 and the IETF Trust's Legal 69 Provisions Relating to IETF Documents 70 (http://trustee.ietf.org/license-info) in effect on the date of 71 publication of this document. Please review these documents 72 carefully, as they describe your rights and restrictions with respect 73 to this document. Code Components extracted from this document must 74 include Simplified BSD License text as described in Section 4.e of 75 the Trust Legal Provisions and are provided without warranty as 76 described in the Simplified BSD License. 78 Table of Contents 80 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 81 1.1. TODO . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 82 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 83 3. New requirements relative to RFC 2330 . . . . . . . . . . . . 11 84 4. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 12 85 4.1. TCP properties . . . . . . . . . . . . . . . . . . . . . . 13 86 4.2. Diagnostic Approach . . . . . . . . . . . . . . . . . . . 14 87 5. Common Models and Parameters . . . . . . . . . . . . . . . . . 15 88 5.1. Target End-to-end parameters . . . . . . . . . . . . . . . 16 89 5.2. Common Model Calculations . . . . . . . . . . . . . . . . 16 90 5.3. Parameter Derating . . . . . . . . . . . . . . . . . . . . 17 91 6. Common testing procedures . . . . . . . . . . . . . . . . . . 18 92 6.1. Traffic generating techniques . . . . . . . . . . . . . . 18 93 6.1.1. Paced transmission . . . . . . . . . . . . . . . . . . 18 94 6.1.2. Constant window pseudo CBR . . . . . . . . . . . . . . 19 95 6.1.3. Scanned window pseudo CBR . . . . . . . . . . . . . . 19 96 6.1.4. Concurrent or channelized testing . . . . . . . . . . 20 97 6.2. Interpreting the Results . . . . . . . . . . . . . . . . . 21 98 6.2.1. Test outcomes . . . . . . . . . . . . . . . . . . . . 21 99 6.2.2. Statistical criteria for estimating run_length . . . . 22 100 6.2.3. Reordering Tolerance . . . . . . . . . . . . . . . . . 24 101 6.3. Test Preconditions . . . . . . . . . . . . . . . . . . . . 25 102 7. Diagnostic Tests . . . . . . . . . . . . . . . . . . . . . . . 25 103 7.1. Basic Data Rate and Delivery Statistics Tests . . . . . . 26 104 7.1.1. Delivery Statistics at Paced Full Data Rate . . . . . 26 105 7.1.2. Delivery Statistics at Full Data Windowed Rate . . . . 27 106 7.1.3. Background Delivery Statistics Tests . . . . . . . . . 27 107 7.2. Standing Queue Tests . . . . . . . . . . . . . . . . . . . 27 108 7.2.1. Congestion Avoidance . . . . . . . . . . . . . . . . . 29 109 7.2.2. Bufferbloat . . . . . . . . . . . . . . . . . . . . . 29 110 7.2.3. Non excessive loss . . . . . . . . . . . . . . . . . . 30 111 7.2.4. Duplex Self Interference . . . . . . . . . . . . . . . 30 112 7.3. Slowstart tests . . . . . . . . . . . . . . . . . . . . . 30 113 7.3.1. Full Window slowstart test . . . . . . . . . . . . . . 31 114 7.3.2. Slowstart AQM test . . . . . . . . . . . . . . . . . . 31 115 7.4. Sender Rate Burst tests . . . . . . . . . . . . . . . . . 31 116 7.5. Combined and Implicit Tests . . . . . . . . . . . . . . . 32 117 7.5.1. Sustained Bursts Test . . . . . . . . . . . . . . . . 32 118 7.5.2. Streaming Media . . . . . . . . . . . . . . . . . . . 33 119 8. An Example . . . . . . . . . . . . . . . . . . . . . . . . . . 34 120 9. Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 36 121 10. Security Considerations . . . . . . . . . . . . . . . . . . . 37 122 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 123 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 124 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 125 13.1. Normative References . . . . . . . . . . . . . . . . . . . 38 126 13.2. Informative References . . . . . . . . . . . . . . . . . . 38 127 Appendix A. Model Derivations . . . . . . . . . . . . . . . . . . 40 128 A.1. Queueless Reno . . . . . . . . . . . . . . . . . . . . . . 41 129 Appendix B. Complex Queueing . . . . . . . . . . . . . . . . . . 42 130 Appendix C. Version Control . . . . . . . . . . . . . . . . . . . 43 131 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43 133 1. Introduction 135 Bulk performance metrics evaluate an Internet path's ability to carry 136 bulk data. Model based bulk performance metrics rely on mathematical 137 TCP models to design a targeted diagnostic suite (TDS) of IP 138 performance tests which can be applied independently to each subpath 139 of the full end-to-end path. These targeted diagnostic suites allow 140 independent tests of subpaths to accurately detect if any subpath 141 will prevent the full end-to-end path from delivering bulk data at 142 the specified performance target, independent of the measurement 143 vantage points or other details of the test procedures used for each 144 measurement. 146 The end-to-end target performance is determined by the needs of the 147 user or application, outside the scope of this document. For bulk 148 data transport, the primary performance parameter of interest is the 149 target data rate. However, since TCP's ability to compensate for 150 less than ideal network conditions is fundamentally affected by the 151 Round Trip Time (RTT) and the Maximum Transmission Unit (MTU) of the 152 entire end-to-end path over which the data traverses, these 153 parameters must also be specified in advance. They may reflect a 154 specific real path through the Internet or an idealized path 155 representing a typical user community. The target values for these 156 three parameters, Data Rate, RTT and MTU, inform the mathematical 157 models used to design the TDS. 159 Each IP diagnostic test in a TDS consists of a precomputed traffic 160 pattern and statistical criteria for evaluating packet delivery. 162 Mathematical models are used to design traffic patterns that mimic 163 TCP or other bulk transport protocol operating at the target data 164 rate, MTU and RTT over a full range of conditions, including flows 165 that are bursty at multiple time scales. The traffic patterns are 166 computed in advance based on the three target parameters of the end- 167 to-end path and independent of the properties of individual subpaths. 168 As much as possible the measurement traffic is generated 169 deterministically in ways that minimize the extent to which test 170 methodology, measurement points, measurement vantage or path 171 partitioning affect the details of the measurement traffic. 173 Mathematical models are also used to compute the bounds on the packet 174 delivery statistics for acceptable IP performance. Since these 175 statistics, such as packet loss, are typically aggregated from all 176 subpaths of the end-to-end path, the end-to-end statistical bounds 177 need to be apportioned as a separate bound for each subpath. Note 178 that links that are expected to be bottlenecks are expected to 179 contribute a larger fraction of the total packet loss and/or delay. 180 In compensation, other links have to be constrained to contribute 181 less packet loss and delay. The criteria for passing each test of a 182 TDS is an apportioned share of the total bound determined by the 183 mathematical model from the end-to-end target performance. 185 In addition to passing or failing, a test can be deemed to be 186 inconclusive for a number of reasons including: the precomputed 187 traffic pattern was not accurately generated; the measurement results 188 were not statistically significant; and others such as failing to 189 meet some required test preconditions. 191 This document describes a framework for deriving traffic patterns and 192 delivery statistics for model based metrics. It does not fully 193 specify any measurement techniques. Important details such as packet 194 type-p selection, sampling techniques, vantage selection, etc. are 195 not specified here. We imagine Fully Specified Targeted Diagnostic 196 Suites (FSTDS), that define all of these details. We use TDS to 197 refer to the subset of such a specification that is in scope for this 198 document. A TDS includes the target parameters, documentation of the 199 models and assumptions used to derive the diagnostic test parameters, 200 specifications for the traffic and delivery statistics for the tests 201 themselves, and a description of a test setup that can be used to 202 validate the tests and models. 204 Section 2 defines terminology used throughout this document. 206 It has been difficult to develop Bulk Transport Capacity [RFC3148] 207 metrics due to some overlooked requirements described in Section 3 208 and some intrinsic problems with using protocols for measurement, 209 described in Section 4. 211 In Section 5 we describe the models and common parameters used to 212 derive the targeted diagnostic suite. In Section 6 we describe 213 common testing procedures. Each subpath is evaluated using suite of 214 far simpler and more predictable diagnostic tests described in 215 Section 7. In Section 8 we present an example TDS that might be 216 representative of HD video, and illustrate how MBM can be used to 217 address difficult measurement situations, such as confirming that 218 intercarrier exchanges have sufficient performance and capacity to 219 deliver HD video between ISPs. 221 There exists a small risk that model based metric itself might yield 222 a false pass result, in the sense that every subpath of an end-to-end 223 path passes every IP diagnostic test and yet a real application fails 224 to attain the performance target over the end-to-end path. If this 225 happens, then the validation procedure described in Section 9 needs 226 to be used to prove and potentially revise the models. 228 Future documents may define model based metrics for other traffic 229 classes and application types, such as real time streaming media. 231 1.1. TODO 233 This section to be removed prior to publication. 235 Please send comments about this draft to ippm@ietf.org. See 236 http://goo.gl/02tkD for more information including: interim drafts, 237 an up to date todo list and information on contributing. 239 Formatted: Mon Mar 9 14:37:24 PDT 2015 241 2. Terminology 243 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 244 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 245 document are to be interpreted as described in [RFC2119]. 247 Terminology about paths, etc. See [RFC2330] and [RFC7398]. 249 [data] sender: Host sending data and receiving ACKs. 250 [data] receiver: Host receiving data and sending ACKs. 251 subpath: A portion of the full path. Note that there is no 252 requirement that subpaths be non-overlapping. 253 Measurement Point: Measurement points as described in [RFC7398]. 254 test path: A path between two measurement points that includes a 255 subpath of the end-to-end path under test, and could include 256 infrastructure between the measurement points and the subpath. 257 [Dominant] Bottleneck: The Bottleneck that generally dominates 258 traffic statistics for the entire path. It typically determines a 259 flow's self clock timing, packet loss and ECN marking rate. See 260 Section 4.1. 261 front path: The subpath from the data sender to the dominant 262 bottleneck. 263 back path: The subpath from the dominant bottleneck to the receiver. 264 return path: The path taken by the ACKs from the data receiver to 265 the data sender. 266 cross traffic: Other, potentially interfering, traffic competing for 267 network resources (bandwidth and/or queue capacity). 269 Properties determined by the end-to-end path and application. They 270 are described in more detail in Section 5.1. 272 Application Data Rate: General term for the data rate as seen by the 273 application above the transport layer. This is the payload data 274 rate, and excludes transport and lower level headers(TCP/IP or 275 other protocols) and as well as retransmissions and other data 276 that does not contribute to the total quantity of data delivered 277 to the application. 278 Link Data Rate: General term for the data rate as seen by the link 279 or lower layers. The link data rate includes transport and IP 280 headers, retransmissions and other transport layer overhead. This 281 document is agnostic as to whether the link data rate includes or 282 excludes framing, MAC, or other lower layer overheads, except that 283 they must be treated uniformly. 284 end-to-end target parameters: Application or transport performance 285 goals for the end-to-end path. They include the target data rate, 286 RTT and MTU described below. 287 Target Data Rate: The application data rate, typically the ultimate 288 user's performance goal. 289 Target RTT (Round Trip Time): The baseline (minimum) RTT of the 290 longest end-to-end path over which the application expects to be 291 able meet the target performance. TCP and other transport 292 protocol's ability to compensate for path problems is generally 293 proportional to the number of round trips per second. The Target 294 RTT determines both key parameters of the traffic patterns (e.g. 295 burst sizes) and the thresholds on acceptable traffic statistics. 296 The Target RTT must be specified considering authentic packets 297 sizes: MTU sized packets on the forward path, ACK sized packets 298 (typically header_overhead) on the return path. 299 Target MTU (Maximum Transmission Unit): The maximum MTU supported by 300 the end-to-end path the over which the application expects to meet 301 the target performance. Assume 1500 Byte packet unless otherwise 302 specified. If some subpath forces a smaller MTU, then it becomes 303 the target MTU, and all model calculations and subpath tests must 304 use the same smaller MTU. 305 Effective Bottleneck Data Rate: This is the bottleneck data rate 306 inferred from the ACK stream, by looking at how much data the ACK 307 stream reports delivered per unit time. If the path is thinning 308 ACKs or batching packets the effective bottleneck rate can be much 309 higher than the average link rate. See Section 4.1 and Appendix B 310 for more details. 311 [sender | interface] rate: The burst data rate, constrained by the 312 data sender's interfaces. Today 1 or 10 Gb/s are typical. 313 Header_overhead: The IP and TCP header sizes, which are the portion 314 of each MTU not available for carrying application payload. 315 Without loss of generality this is assumed to be the size for 316 returning acknowledgements (ACKs). For TCP, the Maximum Segment 317 Size (MSS) is the Target MTU minus the header_overhead. 319 Basic parameters common to models and subpath tests. They are 320 described in more detail in Section 5.2. Note that these are mixed 321 between application transport performance (excludes headers) and link 322 IP performance (includes headers). 324 pipe size: A general term for number of packets needed in flight 325 (the window size) to exactly fill some network path or subpath. 326 This is the window size which is normally the onset of queueing. 327 target_pipe_size: The number of packets in flight (the window size) 328 needed to exactly meet the target rate, with a single stream and 329 no cross traffic for the specified application target data rate, 330 RTT, and MTU. It is the amount of circulating data required to 331 meet the target data rate, and implies the scale of the bursts 332 that the network might experience. 333 run length: A general term for the observed, measured, or specified 334 number of packets that are (to be) delivered between losses or ECN 335 marks. Nominally one over the loss or ECN marking probability, if 336 there are independently and identically distributed. 337 target_run_length: The target_run_length is an estimate of the 338 minimum number of good packets needed between losses or ECN marks 339 necessary to attain the target_data_rate over a path with the 340 specified target_RTT and target_MTU, as computed by a mathematical 341 model of TCP congestion control. A reference calculation is shown 342 in Section 5.2 and alternatives in Appendix A 344 Ancillary parameters used for some tests 346 derating: Under some conditions the standard models are too 347 conservative. The modeling framework permits some latitude in 348 relaxing or "derating" some test parameters as described in 349 Section 5.3 in exchange for a more stringent TDS validation 350 procedures, described in Section 9. 351 subpath_data_rate: The maximum IP data rate supported by a subpath. 352 This typically includes TCP/IP overhead, including headers, 353 retransmits, etc. 354 test_path_RTT: The RTT between two measurement points using 355 appropriate data and ACK packet sizes. 356 test_path_pipe: The amount of data necessary to fill a test path. 357 Nominally the test path RTT times the subpath_data_rate (which 358 should be part of the end-to-end subpath). 359 test_window: The window necessary to meet the target_rate over a 360 subpath. Typically test_window=target_data_rate*test_RTT/ 361 (target_MTU - header_overhead). 363 Tests can be classified into groups according to their applicability. 365 Capacity tests: determine if a network subpath has sufficient 366 capacity to deliver the target performance. As long as the test 367 traffic is within the proper envelope for the target end-to-end 368 performance, the average packet losses or ECN marks must be below 369 the threshold computed by the model. As such, capacity tests 370 reflect parameters that can transition from passing to failing as 371 a consequence of cross traffic, additional presented load or the 372 actions of other network users. By definition, capacity tests 373 also consume significant network resources (data capacity and/or 374 buffer space), and the test schedules must be balanced by their 375 cost. 376 Monitoring tests: are designed to capture the most important aspects 377 of a capacity test, but without presenting excessive ongoing load 378 themselves. As such they may miss some details of the network's 379 performance, but can serve as a useful reduced-cost proxy for a 380 capacity test. 381 Engineering tests: evaluate how network algorithms (such as AQM and 382 channel allocation) interact with TCP-style self clocked protocols 383 and adaptive congestion control based on packet loss and ECN 384 marks. These tests are likely to have complicated interactions 385 with cross traffic and under some conditions can be inversely 386 sensitive to load. For example a test to verify that an AQM 387 algorithm causes ECN marks or packet drops early enough to limit 388 queue occupancy may experience a false pass result in the presence 389 of cross traffic. It is important that engineering tests be 390 performed under a wide range of conditions, including both in situ 391 and bench testing, and over a wide variety of load conditions. 392 Ongoing monitoring is less likely to be useful for engineering 393 tests, although sparse in situ testing might be appropriate. 395 General Terminology: 397 Targeted Diagnostic Test (TDS): A set of IP Diagnostics designed to 398 determine if a subpath can sustain flows at a specific 399 target_data_rate over a path that has a target_RTT using 400 target_MTU sided packets. 401 Fully Specified Targeted Diagnostic Test: A TDS together with 402 additional specification such as "type-p", etc which are out of 403 scope for this document, but need to be drawn from other standards 404 documents. 405 apportioned: To divide and allocate, as in budgeting packet loss 406 rates across multiple subpaths to accumulate below a specified 407 end-to-end loss rate. 408 open loop: A control theory term used to describe a class of 409 techniques where systems that naturally exhibit circular 410 dependencies can be analyzed by suppressing some of the 411 dependences, such that the resulting dependency graph is acyclic. 413 Bulk performance metrics: Bulk performance metrics evaluate an 414 Internet path's ability to carry bulk data, such as transporting 415 large files, streaming (non-real time) video, and at some scales, 416 web images and content. (For very fast network, web performance 417 is dominated by pure RTT effects). The metrics presented in this 418 document reflect the evolution of [RFC3148]. 419 traffic patterns: The temporal patterns or statistics of traffic 420 generated by applications over transport protocols such as TCP. 421 There are several mechanisms that cause bursts at various time 422 scales. Our goal here is to mimic the range of common patterns 423 (burst sizes and rates, etc), without tieing our applicability to 424 specific applications, implementations or technologies, which are 425 sure to become stale. 426 delivery Statistics: Raw or summary statistics about packet delivery 427 properties of the IP layer including packet losses, ECN marks, 428 reordering, or any other properties that may be germane to 429 transport performance. 430 IP performance tests: Measurements or diagnostic tests to determine 431 delivery statistics. 433 3. New requirements relative to RFC 2330 435 Model Based Metrics are designed to fulfill some additional 436 requirement that were not recognized at the time RFC 2330 was written 437 [RFC2330]. These missing requirements may have significantly 438 contributed to policy difficulties in the IP measurement space. Some 439 additional requirements are: 440 o IP metrics must be actionable by the ISP - they have to be 441 interpreted in terms of behaviors or properties at the IP or lower 442 layers, that an ISP can test, repair and verify. 443 o Metrics should be spatially composable, such that measures of 444 concatenated paths should be predictable from subpaths. Ideally 445 they should also be differentiable: the metrics of a subpath 446 should be 447 o Metrics must be vantage point invariant over a significant range 448 of measurement point choices, including off path measurement 449 points. The only requirements on MP selection should be that the 450 portion of the test path that is not under test between the MP and 451 the part that under tests is effectively ideal, or is non ideal in 452 ways that can be calibrated out of the measurements and the test 453 RTT between the MPs is below some reasonable bound. 454 o Metrics must be repeatable by multiple parties with no specialized 455 access to MPs or diagnostic infrastructure. It must be possible 456 for different parties to make the same measurement and observe the 457 same results. In particular it is specifically important that 458 both a consumer (or their delegate) and ISP be able to perform the 459 same measurement and get the same result. Note that vantage 460 independence is key to this requirement. 462 4. Background 464 At the time the IPPM WG was chartered, sound Bulk Transport Capacity 465 measurement was known to be way beyond our capabilities. By 466 hindsight it is now clear why it is such a hard problem: 467 o TCP is a control system with circular dependencies - everything 468 affects performance, including components that are explicitly not 469 part of the test. 470 o Congestion control is an equilibrium process, such that transport 471 protocols change the network (raise loss probability and/or RTT) 472 to conform to their behavior. 473 o TCP's ability to compensate for network flaws is directly 474 proportional to the number of roundtrips per second (i.e. 475 inversely proportional to the RTT). As a consequence a flawed 476 link may pass a short RTT local test even though it fails when the 477 path is extended by a perfect network to some larger RTT. 478 o TCP has a meta Heisenberg problem - Measurement and cross traffic 479 interact in unknown and ill defined ways. The situation is 480 actually worse than the traditional physics problem where you can 481 at least estimate bounds on the relative momentum of the 482 measurement and measured particles. For network measurement you 483 can not in general determine the relative "elasticity" of the 484 measurement traffic and cross traffic, so you can not even gauge 485 the relative magnitude of their effects on each other. 487 These properties are a consequence of the equilibrium behavior 488 intrinsic to how all throughput optimizing protocols interact with 489 the Internet. The protocols rely on control systems based on 490 multiple network estimators to regulate the quantity of data traffic 491 sent into the network. The data traffic in turn alters network and 492 the properties observed by the estimators, such that there are 493 circular dependencies between every component and every property. 494 Since some of these properties are non-linear, the entire system is 495 nonlinear, and any change anywhere causes difficult to predict 496 changes in every parameter. 498 Model Based Metrics overcome these problems by forcing the 499 measurement system to be open loop: the delivery statistics (akin to 500 the network estimators) do not affect the traffic or traffic patterns 501 (bursts), which computed on the basis of the target performance. In 502 order for a network to pass, the resulting delivery statistics and 503 corresponding network estimators have to be such that they would not 504 cause the control systems slow the traffic below the target rate. 506 4.1. TCP properties 508 TCP and SCTP are self clocked protocols. The dominant steady state 509 behavior is to have an approximately fixed quantity of data and 510 acknowledgements (ACKs) circulating in the network. The receiver 511 reports arriving data by returning ACKs to the data sender, the data 512 sender typically responds by sending exactly the same quantity of 513 data back into the network. The total quantity of data plus the data 514 represented by ACKs circulating in the network is referred to as the 515 window. The mandatory congestion control algorithms incrementally 516 adjust the window by sending slightly more or less data in response 517 to each ACK. The fundamentally important property of this systems is 518 that it is entirely self clocked: The data transmissions are a 519 reflection of the ACKs that were delivered by the network, the ACKs 520 are a reflection of the data arriving from the network. 522 A number of phenomena can cause bursts of data, even in idealized 523 networks that are modeled as simple queueing systems. 525 During slowstart the data rate is doubled on each RTT by sending 526 twice as much data as was delivered to the receiver on the prior RTT. 527 For slowstart to be able to fill such a network the network must be 528 able to tolerate slowstart bursts up to the full pipe size inflated 529 by the anticipated window reduction on the first loss or ECN mark. 530 For example, with classic Reno congestion control, an optimal 531 slowstart has to end with a burst that is twice the bottleneck rate 532 for exactly one RTT in duration. This burst causes a queue which is 533 exactly equal to the pipe size (i.e. the window is exactly twice the 534 pipe size) so when the window is halved in response to the first 535 loss, the new window will be exactly the pipe size. 537 Note that if the bottleneck data rate is significantly slower than 538 the rest of the path, the slowstart bursts will not cause significant 539 queues anywhere else along the path; they primarily exercise the 540 queue at the dominant bottleneck. 542 Other sources of bursts include application pauses and channel 543 allocation mechanisms. Appendix B describes the treatment of channel 544 allocation systems. If the application pauses (stops reading or 545 writing data) for some fraction of one RTT, state-of-the-art TCP 546 catches up to the earlier window size by sending a burst of data at 547 the full sender interface rate. To fill such a network with a 548 realistic application, the network has to be able to tolerate 549 interface rate bursts from the data sender large enough to cover 550 application pauses. 552 Although the interface rate bursts are typically smaller than last 553 burst of a slowstart, they are at a higher data rate so they 554 potentially exercise queues at arbitrary points along the front path 555 from the data sender up to and including the queue at the dominant 556 bottleneck. There is no model for how frequent or what sizes of 557 sender rate bursts should be tolerated. 559 To verify that a path can meet a performance target, it is necessary 560 to independently confirm that the path can tolerate bursts in the 561 dimensions that can be caused by these mechanisms. Three cases are 562 likely to be sufficient: 564 o Slowstart bursts sufficient to get connections started properly. 565 o Frequent sender interface rate bursts that are small enough where 566 they can be assumed not to significantly affect delivery 567 statistics. (Implicitly derated by selecting the burst size). 568 o Infrequent sender interface rate full target_pipe_size bursts that 569 do affect the delivery statistics. (Target_run_length may be 570 derated). 572 4.2. Diagnostic Approach 574 The MBM approach is to open loop TCP by precomputing traffic patterns 575 that are typically generated by TCP operating at the given target 576 parameters, and evaluating delivery statistics (packet loss, ECN 577 marks and delay). In this approach the measurement software 578 explicitly controls the data rate, transmission pattern or cwnd 579 (TCP's primary congestion control state variables) to create 580 repeatable traffic patterns that mimic TCP behavior but are 581 independent of the actual behavior of the subpath under test. These 582 patterns are manipulated to probe the network to verify that it can 583 deliver all of the traffic patterns that a transport protocol is 584 likely to generate under normal operation at the target rate and RTT. 586 By opening the protocol control loops, we remove most sources of 587 temporal and spatial correlation in the traffic delivery statistics, 588 such that each subpath's contribution to the end-to-end statistics 589 can be assumed to be independent and stationary (The delivery 590 statistics depend on the fine structure of the data transmissions, 591 but not on long time scale state imbedded in the sender, receiver or 592 other network components.) Therefore each subpath's contribution to 593 the end-to-end delivery statistics can be assumed to be independent, 594 and spatial composition techniques such as [RFC5835] and [RFC6049] 595 apply. 597 In typical networks, the dominant bottleneck contributes the majority 598 of the packet loss and ECN marks. Often the rest of the path makes 599 insignificant contribution to these properties. A TDS should 600 apportion the end-to-end budget for the specified parameters 601 (primarily packet loss and ECN marks) to each subpath or group of 602 subpaths. For example the dominant bottleneck may be permitted to 603 contribute 90% of the loss budget, while the rest of the path is only 604 permitted to contribute 10%. 606 A TDS or FSTDS MUST apportion all relevant packet delivery statistics 607 between successive subpaths, such that the spatial composition of the 608 apportioned metrics will yield end-to-end statics which are within 609 the bounds determined by the models. 611 A network is expected to be able to sustain a Bulk TCP flow of a 612 given data rate, MTU and RTT when all of the following conditions are 613 met: 614 1. The raw link rate is higher than the target data rate. See 615 Section 7.1 or any number of data rate tests outside of MBM. 616 2. The observed packet delivery statistics are better than required 617 by a suitable TCP performance model (e.g. fewer losses or ECN 618 marks). See Section 7.1 or any number of low rate packet loss 619 tests outside of MBM. 620 3. There is sufficient buffering at the dominant bottleneck to 621 absorb a slowstart rate burst large enough to get the flow out of 622 slowstart at a suitable window size. See Section 7.3. 623 4. There is sufficient buffering in the front path to absorb and 624 smooth sender interface rate bursts at all scales that are likely 625 to be generated by the application, any channel arbitration in 626 the ACK path or any other mechanisms. See Section 7.4. 627 5. When there is a standing queue at a bottleneck for a shared media 628 subpath (e.g. half duplex), there are suitable bounds on how the 629 data and ACKs interact, for example due to the channel 630 arbitration mechanism. See Section 7.2.4. 631 6. When there is a slowly rising standing queue at the bottleneck 632 the onset of packet loss has to be at an appropriate point (time 633 or queue depth) and progressive. See Section 7.2. 635 Note that conditions 1 through 4 require load tests for confirmation, 636 and thus need to be monitored on an ongoing basis. Conditions 5 and 637 6 require engineering tests. They won't generally fail due to load, 638 but may fail in the field due to configuration errors, etc. and 639 should be spot checked. 641 We are developing a tool that can perform many of the tests described 642 here[MBMSource]. 644 5. Common Models and Parameters 645 5.1. Target End-to-end parameters 647 The target end-to-end parameters are the target data rate, target RTT 648 and target MTU as defined in Section 2. These parameters are 649 determined by the needs of the application or the ultimate end user 650 and the end-to-end Internet path over which the application is 651 expected to operate. The target parameters are in units that make 652 sense to upper layers: payload bytes delivered to the application, 653 above TCP. They exclude overheads associated with TCP and IP 654 headers, retransmits and other protocols (e.g. DNS). 656 Other end-to-end parameters defined in Section 2 include the 657 effective bottleneck data rate, the sender interface data rate and 658 the TCP/IP header sizes (overhead). 660 The target data rate must be smaller than all link data rates by 661 enough headroom to carry the transport protocol overhead, explicitly 662 including retransmissions and an allowance for fluctuations in the 663 actual data rate, needed to meet the specified average rate. 664 Specifying a target rate with insufficient headroom is likely to 665 result in brittle measurements having little predictive value. 667 Note that the target parameters can be specified for a hypothetical 668 path, for example to construct TDS designed for bench testing in the 669 absence of a real application, or for a real physical test, for in 670 situ testing of production infrastructure. 672 The number of concurrent connections is explicitly not a parameter to 673 this model. If a subpath requires multiple connections in order to 674 meet the specified performance, that must be stated explicitly and 675 the procedure described in Section 6.1.4 applies. 677 5.2. Common Model Calculations 679 The end-to-end target parameters are used to derive the 680 target_pipe_size and the reference target_run_length. 682 The target_pipe_size, is the average window size in packets needed to 683 meet the target rate, for the specified target RTT and MTU. It is 684 given by: 686 target_pipe_size = ceiling( target_rate * target_RTT / ( target_MTU - 687 header_overhead ) ) 689 Target_run_length is an estimate of the minimum required number of 690 unmarked packets that must be delivered between losses or ECN marks, 691 as computed by a mathematical model of TCP congestion control. The 692 derivation here follows [MSMO97], and by design is quite 693 conservative. The alternate models described in Appendix A generally 694 yield smaller run_lengths (higher acceptable loss or ECN marking 695 rates), but may not apply in all situations. A FSTDS that uses an 696 alternate model MUST compare it to the reference target_run_length 697 computed here. 699 Reference target_run_length is derived as follows: assume the 700 subpath_data_rate is infinitesimally larger than the target_data_rate 701 plus the required header_overhead. Then target_pipe_size also 702 predicts the onset of queueing. A larger window will cause a 703 standing queue at the bottleneck. 705 Assume the transport protocol is using standard Reno style Additive 706 Increase, Multiplicative Decrease congestion control [RFC5681] (but 707 not Appropriate Byte Counting [RFC3465]) and the receiver is using 708 standard delayed ACKs. Reno increases the window by one packet every 709 pipe_size worth of ACKs. With delayed ACKs this takes 2 Round Trip 710 Times per increase. To exactly fill the pipe, losses must be no 711 closer than when the peak of the AIMD sawtooth reached exactly twice 712 the target_pipe_size otherwise the multiplicative window reduction 713 triggered by the loss would cause the network to be underfilled. 714 Following [MSMO97] the number of packets between losses must be the 715 area under the AIMD sawtooth. They must be no more frequent than 716 every 1 in ((3/2)*target_pipe_size)*(2*target_pipe_size) packets, 717 which simplifies to: 719 target_run_length = 3*(target_pipe_size^2) 721 Note that this calculation is very conservative and is based on a 722 number of assumptions that may not apply. Appendix A discusses these 723 assumptions and provides some alternative models. If a different 724 model is used, a fully specified TDS or FSTDS MUST document the 725 actual method for computing target_run_length and ratio between 726 alternate target_run_length and the reference target_run_length 727 calculated above, along with a discussion of the rationale for the 728 underlying assumptions. 730 These two parameters, target_pipe_size and target_run_length, 731 directly imply most of the individual parameters for the tests in 732 Section 7. 734 5.3. Parameter Derating 736 Since some aspects of the models are very conservative, the MBM 737 framework permits some latitude in derating test parameters. Rather 738 than trying to formalize more complicated models we permit some test 739 parameters to be relaxed as long as they meet some additional 740 procedural constraints: 742 o The TDS or FSTDS MUST document and justify the actual method used 743 compute the derated metric parameters. 744 o The validation procedures described in Section 9 must be used to 745 demonstrate the feasibility of meeting the performance targets 746 with infrastructure that infinitesimally passes the derated tests. 747 o The validation process itself must be documented is such a way 748 that other researchers can duplicate the validation experiments. 750 Except as noted, all tests below assume no derating. Tests where 751 there is not currently a well established model for the required 752 parameters explicitly include derating as a way to indicate 753 flexibility in the parameters. 755 6. Common testing procedures 757 6.1. Traffic generating techniques 759 6.1.1. Paced transmission 761 Paced (burst) transmissions: send bursts of data on a timer to meet a 762 particular target rate and pattern. In all cases the specified data 763 rate can either be the application or link rates. Header overheads 764 must be included in the calculations as appropriate. 765 Headway: Time interval between packets or bursts, specified from the 766 start of one to the start of the next. e.g. If packets are sent 767 with a 1 mS headway, there will be exactly 1000 packets per 768 second. 769 Paced single packets: Send individual packets at the specified rate 770 or headway. 771 Burst: Send sender interface rate bursts on a timer. Specify any 3 772 of: average rate, packet size, burst size (number of packets) and 773 burst headway (burst start to start). These bursts are typically 774 sent as back-to-back packets at the testers interface rate. 775 Slowstart bursts: Send 4 packet sender interface rate bursts at an 776 average data rate equal to twice effective bottleneck link rate 777 (but not more than the sender interface rate). This corresponds 778 to the average rate during a TCP slowstart when Appropriate Byte 779 Counting [RFC3465] is present or delayed ack is disabled. Note 780 that if the effective bottleneck link rate is more than half of 781 the sender interface rate, slowstart rate bursts become sender 782 interface rate bursts. 783 Repeated Slowstart bursts: Slowstart bursts are typically part of 784 larger scale pattern of repeated bursts, such as sending 785 target_pipe_size packets as slowstart bursts on a target_RTT 786 headway (burst start to burst start). Such a stream has three 787 different average rates, depending on the averaging interval. At 788 the finest time scale the average rate is the same as the sender 789 interface rate, at a medium scale the average rate is twice the 790 effective bottleneck link rate and at the longest time scales the 791 average rate is equal to the target data rate. 793 Note that in conventional measurement theory, exponential 794 distributions are often used to eliminate many sorts of correlations. 795 For the procedures above, the correlations are created by the network 796 elements and accurately reflect their behavior. At some point in the 797 future, it will be desirable to introduce noise sources into the 798 above pacing models, but they are not warranted at this time. 800 6.1.2. Constant window pseudo CBR 802 Implement pseudo constant bit rate by running a standard protocol 803 such as TCP with a fixed window size, such that it is self clocked. 804 Data packets arriving at the receiver trigger acknowledgements (ACKs) 805 which travel back to the sender where they trigger additional 806 transmissions. The window size is computed from the target_data_rate 807 and the actual RTT of the test path. The rate is only maintained in 808 average over each RTT, and is subject to limitations of the transport 809 protocol. 811 Since the window size is constrained to be an integer number of 812 packets, for small RTTs or low data rates there may not be 813 sufficiently precise control over the data rate. Rounding the window 814 size up (the default) is likely to be result in data rates that are 815 higher than the target rate, but reducing the window by one packet 816 may result in data rates that are too small. Also cross traffic 817 potentially raises the RTT, implicitly reducing the rate. Cross 818 traffic that raises the RTT nearly always makes the test more 819 strenuous. A FSTDS specifying a constant window CBR tests MUST 820 explicitly indicate under what conditions errors in the data cause 821 tests to inconclusive. See the discussion of test outcomes in 822 Section 6.2.1. 824 Since constant window pseudo CBR testing is sensitive to RTT 825 fluctuations it can not accurately control the data rate in 826 environments with fluctuating delays. 828 6.1.3. Scanned window pseudo CBR 830 Scanned window pseudo CBR is similar to the constant window CBR 831 described above, except the window is scanned across a range of sizes 832 designed to include two key events, the onset of queueing and the 833 onset of packet loss or ECN marks. The window is scanned by 834 incrementing it by one packet every 2*target_pipe_size delivered 835 packets. This mimics the additive increase phase of standard TCP 836 congestion avoidance when delayed ACKs are in effect. It normally 837 separates the the window increases by approximately twice the 838 target_RTT. 840 There are two ways to implement this test: one built by applying a 841 window clamp to standard congestion control in a standard protocol 842 such as TCP and the other built by stiffening a non-standard 843 transport protocol. When standard congestion control is in effect, 844 any losses or ECN marks cause the transport to revert to a window 845 smaller than the clamp such that the scanning clamp loses control the 846 window size. The NPAD pathdiag tool is an example of this class of 847 algorithms [Pathdiag]. 849 Alternatively a non-standard congestion control algorithm can respond 850 to losses by transmitting extra data, such that it maintains the 851 specified window size independent of losses or ECN marks. Such a 852 stiffened transport explicitly violates mandatory Internet congestion 853 control and is not suitable for in situ testing. [RFC5681] It is 854 only appropriate for engineering testing under laboratory conditions. 855 The Windowed Ping tool implements such a test [WPING]. The tool 856 described in the paper has been updated.[mpingSource] 858 The test procedures in Section 7.2 describe how to the partition the 859 scans into regions and how to interpret the results. 861 6.1.4. Concurrent or channelized testing 863 The procedures described in this document are only directly 864 applicable to single stream performance measurement, e.g. one TCP 865 connection. In an ideal world, we would disallow all performance 866 claims based multiple concurrent streams, but this is not practical 867 due to at least two different issues. First, many very high rate 868 link technologies are channelized and pin individual flows to 869 specific channels to minimize reordering or other problems and 870 second, TCP itself has scaling limits. Although the former problem 871 might be overcome through different design decisions, the later 872 problem is more deeply rooted. 874 All congestion control algorithms that are philosophically aligned 875 with the standard [RFC5681] (e.g. claim some level of TCP 876 friendliness) have scaling limits, in the sense that as a long fast 877 network (LFN) with a fixed RTT and MTU gets faster, these congestion 878 control algorithms get less accurate and as a consequence have 879 difficulty filling the network[CCscaling]. These properties are a 880 consequence of the original Reno AIMD congestion control design and 881 the requirement in [RFC5681] that all transport protocols have 882 uniform response to congestion. 884 There are a number of reasons to want to specify performance in term 885 of multiple concurrent flows, however this approach is not 886 recommended for data rates below several megabits per second, which 887 can be attained with run lengths under 10000 packets. Since the 888 required run length goes as the square of the data rate, at higher 889 rates the run lengths can be unreasonably large, and multiple 890 connection might be the only feasible approach. 892 If multiple connections are deemed necessary to meet aggregate 893 performance targets then this MUST be stated both the design of the 894 TDS and in any claims about network performance. The tests MUST be 895 performed concurrently with the specified number of connections. For 896 the the tests that use bursty traffic, the bursts should be 897 synchronized across flows. 899 6.2. Interpreting the Results 901 6.2.1. Test outcomes 903 To perform an exhaustive test of an end-to-end network path, each 904 test of the TDS is applied to each subpath of an end-to-end path. If 905 any subpath fails any test then an application running over the end- 906 to-end path can also be expected to fail to attain the target 907 performance under some conditions. 909 In addition to passing or failing, a test can be deemed to be 910 inconclusive for a number of reasons. Proper instrumentation and 911 treatment of inconclusive outcomes is critical to the accuracy and 912 robustness of Model Based Metrics. Tests can be inconclusive if the 913 precomputed traffic pattern or data rates were not accurately 914 generated; the measurement results were not statistically 915 significant; and others causes such as failing to meet some required 916 preconditions for the test. 918 For example consider a test that implements Constant Window Pseudo 919 CBR (Section 6.1.2) by adding rate controls and detailed traffic 920 instrumentation to TCP (e.g. [RFC4898]). TCP includes built in 921 control systems which might interfere with the sending data rate. If 922 such a test meets the required delivery statistics (e.g. run length) 923 while failing to attain the specified data rate it must be treated as 924 an inconclusive result, because we can not a priori determine if the 925 reduced data rate was caused by a TCP problem or a network problem, 926 or if the reduced data rate had a material effect on the observed 927 delivery statistics. 929 Note that for load tests, if the observed delivery statistics fail to 930 meet the targets, the test can can be considered to have failed 931 because it doesn't really matter that the test didn't attain the 932 required data rate. 934 The really important new properties of MBM, such as vantage 935 independence, are a direct consequence of opening the control loops 936 in the protocols, such that the test traffic does not depend on 937 network conditions or traffic received. Any mechanism that 938 introduces feedback between the paths measurements and the traffic 939 generation is at risk of introducing nonlinearities that spoil these 940 properties. Any exceptional event that indicates that such feedback 941 has happened should cause the test to be considered inconclusive. 943 One way to view inconclusive tests is that they reflect situations 944 where a test outcome is ambiguous between limitations of the network 945 and some unknown limitation of the diagnostic test itself, which may 946 have been caused by some uncontrolled feedback from the network. 948 Note that procedures that attempt to sweep the target parameter space 949 to find the limits on some parameter such as target_data_rate are at 950 risk of breaking the location independent properties of Model Based 951 Metrics, if the boundary between passing and inconclusive is at all 952 sensitive to RTT. 954 One of the goals for evolving TDS designs will be to keep sharpening 955 distinction between inconclusive, passing and failing tests. The 956 criteria for for passing, failing and inconclusive tests MUST be 957 explicitly stated for every test in the TDS or FSTDS. 959 One of the goals of evolving the testing process, procedures, tools 960 and measurement point selection should be to minimize the number of 961 inconclusive tests. 963 It may be useful to keep raw data delivery statistics for deeper 964 study of the behavior of the network path and to measure the tools 965 themselves. Raw delivery statistics can help to drive tool 966 evolution. Under some conditions it might be possible to reevaluate 967 the raw data for satisfying alternate performance targets. However 968 it is important to guard against sampling bias and other implicit 969 feedback which can cause false results and exhibit measurement point 970 vantage sensitivity. 972 6.2.2. Statistical criteria for estimating run_length 974 When evaluating the observed run_length, we need to determine 975 appropriate packet stream sizes and acceptable error levels for 976 efficient measurement. In practice, can we compare the empirically 977 estimated packet loss and ECN marking probabilities with the targets 978 as the sample size grows? How large a sample is needed to say that 979 the measurements of packet transfer indicate a particular run length 980 is present? 981 The generalized measurement can be described as recursive testing: 982 send packets (individually or in patterns) and observe the packet 983 delivery performance (loss ratio or other metric, any marking we 984 define). 986 As each packet is sent and measured, we have an ongoing estimate of 987 the performance in terms of the ratio of packet loss or ECN mark to 988 total packets (i.e. an empirical probability). We continue to send 989 until conditions support a conclusion or a maximum sending limit has 990 been reached. 992 We have a target_mark_probability, 1 mark per target_run_length, 993 where a "mark" is defined as a lost packet, a packet with ECN mark, 994 or other signal. This constitutes the null Hypothesis: 996 H0: no more than one mark in target_run_length = 997 3*(target_pipe_size)^2 packets 999 and we can stop sending packets if on-going measurements support 1000 accepting H0 with the specified Type I error = alpha (= 0.05 for 1001 example). 1003 We also have an alternative Hypothesis to evaluate: if performance is 1004 significantly lower than the target_mark_probability. Based on 1005 analysis of typical values and practical limits on measurement 1006 duration, we choose four times the H0 probability: 1008 H1: one or more marks in (target_run_length/4) packets 1010 and we can stop sending packets if measurements support rejecting H0 1011 with the specified Type II error = beta (= 0.05 for example), thus 1012 preferring the alternate hypothesis H1. 1014 H0 and H1 constitute the Success and Failure outcomes described 1015 elsewhere in the memo, and while the ongoing measurements do not 1016 support either hypothesis the current status of measurements is 1017 inconclusive. 1019 The problem above is formulated to match the Sequential Probability 1020 Ratio Test (SPRT) [StatQC]. Note that as originally framed the 1021 events under consideration were all manufacturing defects. In 1022 networking, ECN marks and lost packets are not defects but signals, 1023 indicating that the transport protocol should slow down. 1025 The Sequential Probability Ratio Test also starts with a pair of 1026 hypothesis specified as above: 1028 H0: p0 = one defect in target_run_length 1029 H1: p1 = one defect in target_run_length/4 1030 As packets are sent and measurements collected, the tester evaluates 1031 the cumulative defect count against two boundaries representing H0 1032 Acceptance or Rejection (and acceptance of H1): 1034 Acceptance line: Xa = -h1 + s*n 1035 Rejection line: Xr = h2 + s*n 1036 where n increases linearly for each packet sent and 1038 h1 = { log((1-alpha)/beta) }/k 1039 h2 = { log((1-beta)/alpha) }/k 1040 k = log{ (p1(1-p0)) / (p0(1-p1)) } 1041 s = [ log{ (1-p0)/(1-p1) } ]/k 1042 for p0 and p1 as defined in the null and alternative Hypotheses 1043 statements above, and alpha and beta as the Type I and Type II 1044 errors. 1046 The SPRT specifies simple stopping rules: 1048 o Xa < defect_count(n) < Xb: continue testing 1049 o defect_count(n) <= Xa: Accept H0 1050 o defect_count(n) >= Xb: Accept H1 1052 The calculations above are implemented in the R-tool for Statistical 1053 Analysis [Rtool] , in the add-on package for Cross-Validation via 1054 Sequential Testing (CVST) [CVST] . 1056 Using the equations above, we can calculate the minimum number of 1057 packets (n) needed to accept H0 when x defects are observed. For 1058 example, when x = 0: 1060 Xa = 0 = -h1 + s*n 1061 and n = h1 / s 1063 6.2.3. Reordering Tolerance 1065 All tests must be instrumented for packet level reordering [RFC4737]. 1066 However, there is no consensus for how much reordering should be 1067 acceptable. Over the last two decades the general trend has been to 1068 make protocols and applications more tolerant to reordering (see for 1069 example [RFC4015]), in response to the gradual increase in reordering 1070 in the network. This increase has been due to the deployment of 1071 technologies such as multi threaded routing lookups and Equal Cost 1072 MultiPath (ECMP) routing. These techniques increase parallelism in 1073 network and are critical to enabling overall Internet growth to 1074 exceed Moore's Law. 1076 Note that transport retransmission strategies can trade off 1077 reordering tolerance vs how quickly they can repair losses vs 1078 overhead from spurious retransmissions. In advance of new 1079 retransmission strategies we propose the following strawman: 1080 Transport protocols should be able to adapt to reordering as long as 1081 the reordering extent is no more than the maximum of one quarter 1082 window or 1 mS, whichever is larger. Within this limit on reorder 1083 extent, there should be no bound on reordering density. 1085 By implication, recording which is less than these bounds should not 1086 be treated as a network impairment. However [RFC4737] still applies: 1087 reordering should be instrumented and the maximum reordering that can 1088 be properly characterized by the test (e.g. bound on history buffers) 1089 should be recorded with the measurement results. 1091 Reordering tolerance and diagnostic limitations, such as history 1092 buffer size, MUST be specified in a FSTDS. 1094 6.3. Test Preconditions 1096 Many tests have preconditions which are required to assure their 1097 validity. For example the presence or nonpresence of cross traffic 1098 on specific subpaths, or appropriate preloading to put reactive 1099 network elements into the proper states[RFC7312]). If preconditions 1100 are not properly satisfied for some reason, the tests should be 1101 considered to be inconclusive. In general it is useful to preserve 1102 diagnostic information about why the preconditions were not met, and 1103 any test data that was collected even if it is not useful for the 1104 intended test. Such diagnostic information and partial test data may 1105 be useful for improving the test in the future. 1107 It is important to preserve the record that a test was scheduled, 1108 because otherwise precondition enforcement mechanisms can introduce 1109 sampling bias. For example, canceling tests due to cross traffic on 1110 subscriber access links might introduce sampling bias of tests of the 1111 rest of the network by reducing the number of tests during peak 1112 network load. 1114 Test preconditions and failure actions MUST be specified in a FSTDS. 1116 7. Diagnostic Tests 1118 The diagnostic tests below are organized by traffic pattern: basic 1119 data rate and delivery statistics, standing queues, slowstart bursts, 1120 and sender rate bursts. We also introduce some combined tests which 1121 are more efficient when networks are expected to pass, but conflate 1122 diagnostic signatures when they fail. 1124 There are a number of test details which are not fully defined here. 1125 They must be fully specified in a FSTDS. From a standardization 1126 perspective, this lack of specificity will weaken this version of 1127 Model Based Metrics, however it is anticipated that this it be more 1128 than offset by the extent to which MBM suppresses the problems caused 1129 by using transport protocols for measurement. e.g. non-specific MBM 1130 metrics are likely to have better repeatability than many existing 1131 BTC like metrics. Once we have good field experience, the missing 1132 details can be fully specified. 1134 7.1. Basic Data Rate and Delivery Statistics Tests 1136 We propose several versions of the basic data rate and delivery 1137 statistics test. All measure the number of packets delivered between 1138 losses or ECN marks, using a data stream that is rate controlled at 1139 or below the target_data_rate. 1141 The tests below differ in how the data rate is controlled. The data 1142 can be paced on a timer, or window controlled at full target data 1143 rate. The first two tests implicitly confirm that sub_path has 1144 sufficient raw capacity to carry the target_data_rate. They are 1145 recommend for relatively infrequent testing, such as an installation 1146 or periodic auditing process. The third, background delivery 1147 statistics, is a low rate test designed for ongoing monitoring for 1148 changes in subpath quality. 1150 All rely on the receiver accumulating packet delivery statistics as 1151 described in Section 6.2.2 to score the outcome: 1153 Pass: it is statistically significant that the observed interval 1154 between losses or ECN marks is larger than the target_run_length. 1156 Fail: it is statistically significant that the observed interval 1157 between losses or ECN marks is smaller than the target_run_length. 1159 A test is considered to be inconclusive if it failed to meet the data 1160 rate as specified below, meet the qualifications defined in 1161 Section 6.3 or neither run length statistical hypothesis was 1162 confirmed in the allotted test duration. 1164 7.1.1. Delivery Statistics at Paced Full Data Rate 1166 Confirm that the observed run length is at least the 1167 target_run_length while relying on timer to send data at the 1168 target_rate using the procedure described in in Section 6.1.1 with a 1169 burst size of 1 (single packets) or 2 (packet pairs). 1171 The test is considered to be inconclusive if the packet transmission 1172 can not be accurately controlled for any reason. 1174 RFC 6673 [RFC6673] is appropriate for measuring delivery statistics 1175 at full data rate. 1177 7.1.2. Delivery Statistics at Full Data Windowed Rate 1179 Confirm that the observed run length is at least the 1180 target_run_length while sending at an average rate approximately 1181 equal to the target_data_rate, by controlling (or clamping) the 1182 window size of a conventional transport protocol to a fixed value 1183 computed from the properties of the test path, typically 1184 test_window=target_data_rate*test_RTT/target_MTU. Note that if there 1185 is any interaction between the forward and return path, test_window 1186 may need to be adjusted slightly to compensate for the resulting 1187 inflated RTT. 1189 Since losses and ECN marks generally cause transport protocols to at 1190 least temporarily reduce their data rates, this test is expected to 1191 be less precise about controlling its data rate. It should not be 1192 considered inconclusive as long as at least some of the round trips 1193 reached the full target_data_rate without incurring losses or ECN 1194 marks. To pass this test the network MUST deliver target_pipe_size 1195 packets in target_RTT time without any losses or ECN marks at least 1196 once per two target_pipe_size round trips, in addition to meeting the 1197 run length statistical test. 1199 7.1.3. Background Delivery Statistics Tests 1201 The background run length is a low rate version of the target target 1202 rate test above, designed for ongoing lightweight monitoring for 1203 changes in the observed subpath run length without disrupting users. 1204 It should be used in conjunction with one of the above full rate 1205 tests because it does not confirm that the subpath can support raw 1206 data rate. 1208 RFC 6673 [RFC6673] is appropriate for measuring background delivery 1209 statistics. 1211 7.2. Standing Queue Tests 1213 These engineering tests confirm that the bottleneck is well behaved 1214 across the onset of packet loss, which typically follows after the 1215 onset of queueing. Well behaved generally means lossless for 1216 transient queues, but once the queue has been sustained for a 1217 sufficient period of time (or reaches a sufficient queue depth) there 1218 should be a small number of losses to signal to the transport 1219 protocol that it should reduce its window. Losses that are too early 1220 can prevent the transport from averaging at the target_data_rate. 1221 Losses that are too late indicate that the queue might be subject to 1222 bufferbloat [wikiBloat] and inflict excess queuing delays on all 1223 flows sharing the bottleneck queue. Excess losses (more than half of 1224 the window) at the onset of congestion make loss recovery problematic 1225 for the transport protocol. Non-linear, erratic or excessive RTT 1226 increases suggest poor interactions between the channel acquisition 1227 algorithms and the transport self clock. All of the tests in this 1228 section use the same basic scanning algorithm, described here, but 1229 score the link on the basis of how well it avoids each of these 1230 problems. 1232 For some technologies the data might not be subject to increasing 1233 delays, in which case the data rate will vary with the window size 1234 all the way up to the onset of load induced losses or ECN marks. For 1235 theses technologies, the discussion of queueing does not apply, but 1236 it is still required that the onset of losses or ECN marks be at an 1237 appropriate point and progressive. 1239 Use the procedure in Section 6.1.3 to sweep the window across the 1240 onset of queueing and the onset of loss. The tests below all assume 1241 that the scan emulates standard additive increase and delayed ACK by 1242 incrementing the window by one packet for every 2*target_pipe_size 1243 packets delivered. A scan can typically be divided into three 1244 regions: below the onset of queueing, a standing queue, and at or 1245 beyond the onset of loss. 1247 Below the onset of queueing the RTT is typically fairly constant, and 1248 the data rate varies in proportion to the window size. Once the data 1249 rate reaches the link rate, the data rate becomes fairly constant, 1250 and the RTT increases in proportion to the increase in window size. 1251 The precise transition across the start of queueing can be identified 1252 by the maximum network power, defined to be the ratio data rate over 1253 the RTT. The network power can be computed at each window size, and 1254 the window with the maximum are taken as the start of the queueing 1255 region. 1257 For technologies that do not have conventional queues, start the scan 1258 at a window equal to the test_window=target_data_rate*test_RTT/ 1259 target_MTU, i.e. starting at the target rate, instead of the power 1260 point. 1262 If there is random background loss (e.g. bit errors, etc), precise 1263 determination of the onset of queue induced packet loss may require 1264 multiple scans. Above the onset of queuing loss, all transport 1265 protocols are expected to experience periodic losses determined by 1266 the interaction between the congestion control and AQM algorithms. 1267 For standard congestion control algorithms the periodic losses are 1268 likely to be relatively widely spaced and the details are typically 1269 dominated by the behavior of the transport protocol itself. For the 1270 stiffened transport protocols case (with non-standard, aggressive 1271 congestion control algorithms) the details of periodic losses will be 1272 dominated by how the the window increase function responds to loss. 1274 7.2.1. Congestion Avoidance 1276 A link passes the congestion avoidance standing queue test if more 1277 than target_run_length packets are delivered between the onset of 1278 queueing (as determined by the window with the maximum network power) 1279 and the first loss or ECN mark. If this test is implemented using a 1280 standards congestion control algorithm with a clamp, it can be 1281 performed in situ in the production internet as a capacity test. For 1282 an example of such a test see [Pathdiag]. 1284 For technologies that do not have conventional queues, use the 1285 test_window inplace of the onset of queueing. i.e. A link passes the 1286 congestion avoidance standing queue test if more than 1287 target_run_length packets are delivered between start of the scan at 1288 test_window and the first loss or ECN mark. 1290 7.2.2. Bufferbloat 1292 This test confirms that there is some mechanism to limit buffer 1293 occupancy (e.g. that prevents bufferbloat). Note that this is not 1294 strictly a requirement for single stream bulk performance, however if 1295 there is no mechanism to limit buffer queue occupancy then a single 1296 stream with sufficient data to deliver is likely to cause the 1297 problems described in [RFC2309], [I-D.ietf-aqm-recommendation] and 1298 [wikiBloat]. This may cause only minor symptoms for the dominant 1299 flow, but has the potential to make the link unusable for other flows 1300 and applications. 1302 Pass if the onset of loss occurs before a standing queue has 1303 introduced more delay than than twice target_RTT, or other well 1304 defined and specified limit. Note that there is not yet a model for 1305 how much standing queue is acceptable. The factor of two chosen here 1306 reflects a rule of thumb. In conjunction with the previous test, 1307 this test implies that the first loss should occur at a queueing 1308 delay which is between one and two times the target_RTT. 1310 Specified RTT limits that are larger than twice the target_RTT must 1311 be fully justified in the FSTDS. 1313 7.2.3. Non excessive loss 1315 This test confirm that the onset of loss is not excessive. Pass if 1316 losses are equal or less than the increase in the cross traffic plus 1317 the test traffic window increase on the previous RTT. This could be 1318 restated as non-decreasing link throughput at the onset of loss, 1319 which is easy to meet as long as discarding packets in not more 1320 expensive than delivering them. (Note when there is a transient drop 1321 in link throughput, outside of a standing queue test, a link that 1322 passes other queue tests in this document will have sufficient queue 1323 space to hold one RTT worth of data). 1325 Note that conventional Internet traffic policers will not pass this 1326 test, which is correct. TCP often fails to come into equilibrium at 1327 more than a small fraction of the available capacity, if the capacity 1328 is enforced by a policer. [Citation Pending]. 1330 7.2.4. Duplex Self Interference 1332 This engineering test confirms a bound on the interactions between 1333 the forward data path and the ACK return path. 1335 Some historical half duplex technologies had the property that each 1336 direction held the channel until it completely drains its queue. 1337 When a self clocked transport protocol, such as TCP, has data and 1338 acks passing in opposite directions through such a link, the behavior 1339 often reverts to stop-and-wait. Each additional packet added to the 1340 window raises the observed RTT by two forward path packet times, once 1341 as it passes through the data path, and once for the additional delay 1342 incurred by the ACK waiting on the return path. 1344 The duplex self interference test fails if the RTT rises by more than 1345 some fixed bound above the expected queueing time computed from trom 1346 the excess window divided by the link data rate. This bound must be 1347 smaller than target_RTT/2 to avoid reverting to stop and wait 1348 behavior. (e.g. Packets have to be released at least twice per RTT, 1349 to avoid stop and wait behavior.) 1351 7.3. Slowstart tests 1353 These tests mimic slowstart: data is sent at twice the effective 1354 bottleneck rate to exercise the queue at the dominant bottleneck. 1356 In general they are deemed inconclusive if the elapsed time to send 1357 the data burst is not less than half of the time to receive the ACKs. 1358 (i.e. sending data too fast is ok, but sending it slower than twice 1359 the actual bottleneck rate as indicated by the ACKs is deemed 1360 inconclusive). Space the bursts such that the average data rate is 1361 equal to the target_data_rate. 1363 7.3.1. Full Window slowstart test 1365 This is a capacity test to confirm that slowstart is not likely to 1366 exit prematurely. Send slowstart bursts that are target_pipe_size 1367 total packets. 1369 Accumulate packet delivery statistics as described in Section 6.2.2 1370 to score the outcome. Pass if it is statistically significant that 1371 the observed number of good packets delivered between losses or ECN 1372 marks is larger than the target_run_length. Fail if it is 1373 statistically significant that the observed interval between losses 1374 or ECN marks is smaller than the target_run_length. 1376 Note that these are the same parameters as the Sender Full Window 1377 burst test, except the burst rate is at slowestart rate, rather than 1378 sender interface rate. 1380 7.3.2. Slowstart AQM test 1382 Do a continuous slowstart (send data continuously at slowstart_rate), 1383 until the first loss, stop, allow the network to drain and repeat, 1384 gathering statistics on the last packet delivered before the loss, 1385 the loss pattern, maximum observed RTT and window size. Justify the 1386 results. There is not currently sufficient theory justifying 1387 requiring any particular result, however design decisions that affect 1388 the outcome of this tests also affect how the network balances 1389 between long and short flows (the "mice and elephants" problem). The 1390 queue at the time of the first loss should be at least one half of 1391 the target_RTT. 1393 This is an engineering test: It would be best performed on a 1394 quiescent network or testbed, since cross traffic has the potential 1395 to change the results. 1397 7.4. Sender Rate Burst tests 1399 These tests determine how well the network can deliver bursts sent at 1400 sender's interface rate. Note that this test most heavily exercises 1401 the front path, and is likely to include infrastructure may be out of 1402 scope for an access ISP, even though the bursts might be caused by 1403 ACK compression, thinning or channel arbitration in the access ISP. 1404 See Appendix B. 1406 Also, there are a several details that are not precisely defined. 1407 For starters there is not a standard server interface rate. 1 Gb/s 1408 and 10 Gb/s are very common today, but higher rates will become cost 1409 effective and can be expected to be dominant some time in the future. 1411 Current standards permit TCP to send a full window bursts following 1412 an application pause. (Congestion Window Validation [RFC2861], is 1413 not required, but even if was, it does not take effect until an 1414 application pause is longer than an RTO.) Since full window bursts 1415 are consistent with standard behavior, it is desirable that the 1416 network be able to deliver such bursts, otherwise application pauses 1417 will cause unwarranted losses. Note that the AIMD sawtooth requires 1418 a peak window that is twice target_pipe_size, so the worst case burst 1419 may be 2*target_pipe_size. 1421 It is also understood in the application and serving community that 1422 interface rate bursts have a cost to the network that has to be 1423 balanced against other costs in the servers themselves. For example 1424 TCP Segmentation Offload (TSO) reduces server CPU in exchange for 1425 larger network bursts, which increase the stress on network buffer 1426 memory. 1428 There is not yet theory to unify these costs or to provide a 1429 framework for trying to optimize global efficiency. We do not yet 1430 have a model for how much the network should tolerate server rate 1431 bursts. Some bursts must be tolerated by the network, but it is 1432 probably unreasonable to expect the network to be able to efficiently 1433 deliver all data as a series of bursts. 1435 For this reason, this is the only test for which we encourage 1436 derating. A TDS could include a table of pairs of derating 1437 parameters: what burst size to use as a fraction of the 1438 target_pipe_size, and how much each burst size is permitted to reduce 1439 the run length, relative to to the target_run_length. 1441 7.5. Combined and Implicit Tests 1443 Combined tests efficiently confirm multiple network properties in a 1444 single test, possibly as a side effect of normally content delivery. 1445 They require less measurement traffic than other testing strategies 1446 at the cost of conflating diagnostic signatures when they fail. 1447 These are by far the most efficient for monitoring networks that are 1448 nominally expected to pass all tests. 1450 7.5.1. Sustained Bursts Test 1452 The sustained burst test implements a combined worst case version of 1453 all of the load tests above. It is simply: 1455 Send target_pipe_size bursts of packets at server interface rate with 1456 target_RTT headway (burst start to burst start). Verify that the 1457 observed delivery statistics meets the target_run_length. 1459 Key observations: 1460 o The subpath under test is expected to go idle for some fraction of 1461 the time: (subpath_data_rate-target_rate)/subpath_data_rate. 1462 Failing to do so indicates a problem with the procedure and an 1463 inconclusive test result. 1464 o The burst sensitivity can be derated by sending smaller bursts 1465 more frequently. E.g. send target_pipe_size*derate packet bursts 1466 every target_RTT*derate. 1467 o When not derated, this test is the most strenuous load test. 1468 o A link that passes this test is likely to be able to sustain 1469 higher rates (close to subpath_data_rate) for paths with RTTs 1470 significantly smaller than the target_RTT. 1471 o This test can be implemented with instrumented TCP [RFC4898], 1472 using a specialized measurement application at one end [MBMSource] 1473 and a minimal service at the other end [RFC0863] [RFC0864]. 1474 o This test is efficient to implement, since it does not require 1475 per-packet timers, and can make use of TSO in modern NIC hardware. 1476 o This test by itself is not sufficient: the standing window 1477 engineering tests are also needed to ensure that the link is well 1478 behaved at and beyond the onset of congestion. 1479 o Assuming the link passes relevant standing window engineering 1480 tests (particularly that it has a progressive onset of loss at an 1481 appropriate queue depth) the passing sustained burst test is 1482 (believed to be) a sufficient verify that the subpath will not 1483 impair stream at the target performance under all conditions. 1484 Proving this statement will be subject of ongoing research. 1486 Note that this test is clearly independent of the subpath RTT, or 1487 other details of the measurement infrastructure, as long as the 1488 measurement infrastructure can accurately and reliably deliver the 1489 required bursts to the subpath under test. 1491 7.5.2. Streaming Media 1493 Model Based Metrics can be implicitly implemented as a side effect of 1494 serving any non-throughput maximizing traffic, such as streaming 1495 media, with some additional controls and instrumentation in the 1496 servers. The essential requirement is that the traffic be 1497 constrained such that even with arbitrary application pauses, bursts 1498 and data rate fluctuations, the traffic stays within the envelope 1499 defined by the individual tests described above. 1501 If the application's serving_data_rate is less than or equal to the 1502 target_data_rate and the serving_RTT (the RTT between the sender and 1503 client) is less than the target_RTT, this constraint is most easily 1504 implemented by clamping the transport window size to be no larger 1505 than: 1507 serving_window_clamp=target_data_rate*serving_RTT/ 1508 (target_MTU-header_overhead) 1510 Under the above constraints the serving_window_clamp will limit the 1511 both the serving data rate and burst sizes to be no larger than the 1512 procedures in Section 7.1.2 and Section 7.4 or Section 7.5.1. Since 1513 the serving RTT is smaller than the target_RTT, the worst case bursts 1514 that might be generated under these conditions will be smaller than 1515 called for by Section 7.4 and the sender rate burst sizes are 1516 implicitly derated by the serving_window_clamp divided by the 1517 target_pipe_size at the very least. (Depending on the application 1518 behavior, the data traffic might be significantly smoother than 1519 specified by any of the burst tests.) 1521 Note that it is important that the target_data_rate be above the 1522 actual average rate needed by the application so it can recover after 1523 transient pauses caused by congestion or the application itself. 1525 In an alternative implementation the data rate and bursts might be 1526 explicitly controlled by a host shaper or pacing at the sender. This 1527 would provide better control over transmissions but it is 1528 substantially more complicated to implement and would be likely to 1529 have a higher CPU overhead. 1531 Note that these techniques can be applied to any content delivery 1532 that can be subjected to a reduced data rate in order to inhibit TCP 1533 equilibrium behavior. 1535 8. An Example 1537 In this section a we illustrate a TDS designed to confirm that an 1538 access ISP can reliably deliver HD video from multiple content 1539 providers to all of their customers. With modern codecs, minimal HD 1540 video (720p) generally fits in 2.5 Mb/s. Due to their geographical 1541 size, network topology and modem designs the ISP determines that most 1542 content is within a 50 mS RTT from their users (This is a sufficient 1543 to cover continental Europe or either US coast from a single serving 1544 site.) 1545 2.5 Mb/s over a 50 ms path 1547 +----------------------+-------+---------+ 1548 | End to End Parameter | value | units | 1549 +----------------------+-------+---------+ 1550 | target_rate | 2.5 | Mb/s | 1551 | target_RTT | 50 | ms | 1552 | target_MTU | 1500 | bytes | 1553 | header_overhead | 64 | bytes | 1554 | target_pipe_size | 11 | packets | 1555 | target_run_length | 363 | packets | 1556 +----------------------+-------+---------+ 1558 Table 1 1560 Table 1 shows the default TCP model with no derating, and as such is 1561 quite conservative. The simplest TDS would be to use the sustained 1562 burst test, described in Section 7.5.1. Such a test would send 11 1563 packet bursts every 50mS, and confirming that there was no more than 1564 1 packet loss per 33 bursts (363 total packets in 1.650 seconds). 1566 Since this number represents is the entire end-to-ends loss budget, 1567 independent subpath tests could be implemented by apportioning the 1568 loss rate across subpaths. For example 50% of the losses might be 1569 allocated to the access or last mile link to the user, 40% to the 1570 interconnects with other ISPs and 1% to each internal hop (assuming 1571 no more than 10 internal hops). Then all of the subpaths can be 1572 tested independently, and the spatial composition of passing subpaths 1573 would be expected to be within the end-to-end loss budget. 1575 Testing interconnects has generally been problematic: conventional 1576 performance tests run between Measurement Points adjacent to either 1577 side of the interconnect, are not generally useful. Unconstrained 1578 TCP tests, such as iperf [iperf] are usually overly aggressive 1579 because the RTT is so small (often less than 1 mS). With a short RTT 1580 these tools are likely to report inflated numbers because for short 1581 RTTs these tools can tolerate very hight loss rates and can push 1582 other cross traffic off of the network. As a consequence they are 1583 useless for predicting actual user performance, and may themselves be 1584 quite disruptive. Model Based Metrics solves this problem. The same 1585 test pattern as used on other links can be applied to the 1586 interconnect. For our example, when apportioned 40% of the losses, 1587 11 packet bursts sent every 50mS should have fewer than one loss per 1588 82 bursts (902 packets). 1590 9. Validation 1592 Since some aspects of the models are likely to be too conservative, 1593 Section 5.2 permits alternate protocol models and Section 5.3 permits 1594 test parameter derating. If either of these techniques are used, we 1595 require demonstrations that such a TDS can robustly detect links that 1596 will prevent authentic applications using state-of-the-art protocol 1597 implementations from meeting the specified performance targets. This 1598 correctness criteria is potentially difficult to prove, because it 1599 implicitly requires validating a TDS against all possible links and 1600 subpaths. The procedures described here are still experimental. 1602 We suggest two approaches, both of which should be applied: first, 1603 publish a fully open description of the TDS, including what 1604 assumptions were used and and how it was derived, such that the 1605 research community can evaluate the design decisions, test them and 1606 comment on their applicability; and second, demonstrate that an 1607 applications running over an infinitessimally passing testbed do meet 1608 the performance targets. 1610 An infinitessimally passing testbed resembles a epsilon-delta proof 1611 in calculus. Construct a test network such that all of the 1612 individual tests of the TDS pass by only small (infinitesimal) 1613 margins, and demonstrate that a variety of authentic applications 1614 running over real TCP implementations (or other protocol as 1615 appropriate) meets the end-to-end target parameters over such a 1616 network. The workloads should include multiple types of streaming 1617 media and transaction oriented short flows (e.g. synthetic web 1618 traffic ). 1620 For example, for the HD streaming video TDS described in Section 8, 1621 the link layer bottleneck data rate should be exactly the header 1622 overhead above 2.5 Mb/s, the per packet random background loss 1623 probability should be 1/363, for a run length of 363 packets, the 1624 bottleneck queue should be 11 packets and the front path should have 1625 just enough buffering to withstand 11 packet interface rate bursts. 1626 We want every one of the TDS tests to fail if we slightly increase 1627 the relevant test parameter, so for example sending a 12 packet 1628 bursts should cause excess (possibly deterministic) packet drops at 1629 the dominant queue at the bottleneck. On this infinitessimally 1630 passing network it should be possible for a real application using a 1631 stock TCP implementation in the vendor's default configuration to 1632 attain 2.5 Mb/s over an 50 mS path. 1634 The most difficult part of setting up such a testbed is arranging for 1635 it to infinitesimally pass the individual tests. Two approaches: 1636 constraining the network devices not to use all available resources 1637 (e.g. by limiting available buffer space or data rate); and 1638 preloading subpaths with cross traffic. Note that is it important 1639 that a single environment be constructed which infinitessimally 1640 passes all tests at the same time, otherwise there is a chance that 1641 TCP can exploit extra latitude in some parameters (such as data rate) 1642 to partially compensate for constraints in other parameters (queue 1643 space, or viceversa). 1645 To the extent that a TDS is used to inform public dialog it should be 1646 fully publicly documented, including the details of the tests, what 1647 assumptions were used and how it was derived. All of the details of 1648 the validation experiment should also be published with sufficient 1649 detail for the experiments to be replicated by other researchers. 1650 All components should either be open source of fully described 1651 proprietary implementations that are available to the research 1652 community. 1654 10. Security Considerations 1656 Measurement is often used to inform business and policy decisions, 1657 and as a consequence is potentially subject to manipulation for 1658 illicit gains. Model Based Metrics are expected to be a huge step 1659 forward because equivalent measurements can be performed from 1660 multiple vantage points, such that performance claims can be 1661 independently validated by multiple parties. 1663 Much of the acrimony in the Net Neutrality debate is due by the 1664 historical lack of any effective vantage independent tools to 1665 characterize network performance. Traditional methods for measuring 1666 bulk transport capacity are sensitive to RTT and as a consequence 1667 often yield very different results local to an ISP and end-to-end. 1668 Neither the ISP nor customer can repeat the other's measurements 1669 leading to high levels of distrust and acrimony. Model Based Metrics 1670 are expected to greatly improve this situation. 1672 This document only describes a framework for designing Fully 1673 Specified Targeted Diagnostic Suite. Each FSTDS MUST include its own 1674 security section. 1676 11. Acknowledgements 1678 Ganga Maguluri suggested the statistical test for measuring loss 1679 probability in the target run length. Alex Gilgur for helping with 1680 the statistics. 1682 Meredith Whittaker for improving the clarity of the communications. 1684 This work was inspired by Measurement Lab: open tools running on an 1685 open platform, using open tools to collect open data. See 1686 http://www.measurementlab.net/ 1688 12. IANA Considerations 1690 This document has no actions for IANA. 1692 13. References 1694 13.1. Normative References 1696 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1697 Requirement Levels", BCP 14, RFC 2119, March 1997. 1699 13.2. Informative References 1701 [RFC0863] Postel, J., "Discard Protocol", STD 21, RFC 863, May 1983. 1703 [RFC0864] Postel, J., "Character Generator Protocol", STD 22, 1704 RFC 864, May 1983. 1706 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 1707 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 1708 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 1709 S., Wroclawski, J., and L. Zhang, "Recommendations on 1710 Queue Management and Congestion Avoidance in the 1711 Internet", RFC 2309, April 1998. 1713 [RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis, 1714 "Framework for IP Performance Metrics", RFC 2330, 1715 May 1998. 1717 [RFC2861] Handley, M., Padhye, J., and S. Floyd, "TCP Congestion 1718 Window Validation", RFC 2861, June 2000. 1720 [RFC3148] Mathis, M. and M. Allman, "A Framework for Defining 1721 Empirical Bulk Transfer Capacity Metrics", RFC 3148, 1722 July 2001. 1724 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 1725 Counting (ABC)", RFC 3465, February 2003. 1727 [RFC4015] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 1728 for TCP", RFC 4015, February 2005. 1730 [RFC4737] Morton, A., Ciavattone, L., Ramachandran, G., Shalunov, 1731 S., and J. Perser, "Packet Reordering Metrics", RFC 4737, 1732 November 2006. 1734 [RFC4898] Mathis, M., Heffner, J., and R. Raghunarayan, "TCP 1735 Extended Statistics MIB", RFC 4898, May 2007. 1737 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1738 Control", RFC 5681, September 2009. 1740 [RFC5835] Morton, A. and S. Van den Berghe, "Framework for Metric 1741 Composition", RFC 5835, April 2010. 1743 [RFC6049] Morton, A. and E. Stephan, "Spatial Composition of 1744 Metrics", RFC 6049, January 2011. 1746 [RFC6673] Morton, A., "Round-Trip Packet Loss Metrics", RFC 6673, 1747 August 2012. 1749 [RFC7312] Fabini, J. and A. Morton, "Advanced Stream and Sampling 1750 Framework for IP Performance Metrics (IPPM)", RFC 7312, 1751 August 2014. 1753 [RFC7398] Bagnulo, M., Burbridge, T., Crawford, S., Eardley, P., and 1754 A. Morton, "A Reference Path and Measurement Points for 1755 Large-Scale Measurement of Broadband Performance", 1756 RFC 7398, February 2015. 1758 [I-D.ietf-aqm-recommendation] 1759 Baker, F. and G. Fairhurst, "IETF Recommendations 1760 Regarding Active Queue Management", 1761 draft-ietf-aqm-recommendation-11 (work in progress), 1762 February 2015. 1764 [MSMO97] Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The 1765 Macroscopic Behavior of the TCP Congestion Avoidance 1766 Algorithm", Computer Communications Review volume 27, 1767 number3, July 1997. 1769 [WPING] Mathis, M., "Windowed Ping: An IP Level Performance 1770 Diagnostic", INET 94, June 1994. 1772 [mpingSource] 1773 Fan, X., Mathis, M., and D. Hamon, "Git Repository for 1774 mping: An IP Level Performance Diagnostic", Sept 2013, 1775 . 1777 [MBMSource] 1778 Hamon, D., Stuart, S., and H. Chen, "Git Repository for 1779 Model Based Metrics", Sept 2013, 1780 . 1782 [Pathdiag] 1783 Mathis, M., Heffner, J., O'Neil, P., and P. Siemsen, 1784 "Pathdiag: Automated TCP Diagnosis", Passive and Active 1785 Measurement , June 2008. 1787 [iperf] Wikipedia Contributors, "iPerf", Wikipedia, The Free 1788 Encyclopedia , cited March 2015, . 1791 [StatQC] Montgomery, D., "Introduction to Statistical Quality 1792 Control - 2nd ed.", ISBN 0-471-51988-X, 1990. 1794 [Rtool] R Development Core Team, "R: A language and environment 1795 for statistical computing. R Foundation for Statistical 1796 Computing, Vienna, Austria. ISBN 3-900051-07-0, URL 1797 http://www.R-project.org/", , 2011. 1799 [CVST] Krueger, T. and M. Braun, "R package: Fast Cross- 1800 Validation via Sequential Testing", version 0.1, 11 2012. 1802 [AFD] Pan, R., Breslau, L., Prabhakar, B., and S. Shenker, 1803 "Approximate fairness through differential dropping", 1804 SIGCOMM Comput. Commun. Rev. 33, 2, April 2003. 1806 [wikiBloat] 1807 Wikipedia, "Bufferbloat", http://en.wikipedia.org/w/ 1808 index.php?title=Bufferbloat&oldid=608805474, March 2015. 1810 [CCscaling] 1811 Fernando, F., Doyle, J., and S. Steven, "Scalable laws for 1812 stable network congestion control", Proceedings of 1813 Conference on Decision and 1814 Control, http://www.ee.ucla.edu/~paganini, December 2001. 1816 Appendix A. Model Derivations 1818 The reference target_run_length described in Section 5.2 is based on 1819 very conservative assumptions: that all window above target_pipe_size 1820 contributes to a standing queue that raises the RTT, and that classic 1821 Reno congestion control with delayed ACKs are in effect. In this 1822 section we provide two alternative calculations using different 1823 assumptions. 1825 It may seem out of place to allow such latitude in a measurement 1826 standard, but this section provides offsetting requirements. 1828 The estimates provided by these models make the most sense if network 1829 performance is viewed logarithmically. In the operational Internet, 1830 data rates span more than 8 orders of magnitude, RTT spans more than 1831 3 orders of magnitude, and loss probability spans at least 8 orders 1832 of magnitude. When viewed logarithmically (as in decibels), these 1833 correspond to 80 dB of dynamic range. On an 80 db scale, a 3 dB 1834 error is less than 4% of the scale, even though it might represent a 1835 factor of 2 in untransformed parameter. 1837 This document gives a lot of latitude for calculating 1838 target_run_length, however people designing a TDS should consider the 1839 effect of their choices on the ongoing tussle about the relevance of 1840 "TCP friendliness" as an appropriate model for Internet capacity 1841 allocation. Choosing a target_run_length that is substantially 1842 smaller than the reference target_run_length specified in Section 5.2 1843 strengthens the argument that it may be appropriate to abandon "TCP 1844 friendliness" as the Internet fairness model. This gives developers 1845 incentive and permission to develop even more aggressive applications 1846 and protocols, for example by increasing the number of connections 1847 that they open concurrently. 1849 A.1. Queueless Reno 1851 In Section 5.2 it was assumed that the link rate matches the target 1852 rate plus overhead, such that the excess window needed for the AIMD 1853 sawtooth causes a fluctuating queue at the bottleneck. 1855 An alternate situation would be bottleneck where there is no 1856 significant queue and losses are caused by some mechanism that does 1857 not involve extra delay, for example by the use of a virtual queue as 1858 in Approximate Fair Dropping[AFD]. A flow controlled by such a 1859 bottleneck would have a constant RTT and a data rate that fluctuates 1860 in a sawtooth due to AIMD congestion control. Assume the losses are 1861 being controlled to make the average data rate meet some goal which 1862 is equal or greater than the target_rate. The necessary run length 1863 can be computed as follows: 1865 For some value of Wmin, the window will sweep from Wmin packets to 1866 2*Wmin packets in 2*Wmin RTT (due to delayed ACK). Unlike the 1867 queueing case where Wmin = Target_pipe_size, we want the average of 1868 Wmin and 2*Wmin to be the target_pipe_size, so the average rate is 1869 the target rate. Thus we want Wmin = (2/3)*target_pipe_size. 1871 Between losses each sawtooth delivers (1/2)(Wmin+2*Wmin)(2Wmin) 1872 packets in 2*Wmin round trip times. 1874 Substituting these together we get: 1876 target_run_length = (4/3)(target_pipe_size^2) 1878 Note that this is 44% of the reference_run_length computed earlier. 1879 This makes sense because under the assumptions in Section 5.2 the 1880 AMID sawtooth caused a queue at the bottleneck, which raised the 1881 effective RTT by 50%. 1883 Appendix B. Complex Queueing 1885 For many network technologies simple queueing models don't apply: the 1886 network schedules, thins or otherwise alters the timing of ACKs and 1887 data, generally to raise the efficiency of the channel allocation 1888 when confronted with relatively widely spaced small ACKs. These 1889 efficiency strategies are ubiquitous for half duplex, wireless and 1890 broadcast media. 1892 Altering the ACK stream generally has two consequences: it raises the 1893 effective bottleneck data rate, making slowstart burst at higher 1894 rates (possibly as high as the sender's interface rate) and it 1895 effectively raises the RTT by the average time that the ACKs and data 1896 were delayed. The first effect can be partially mitigated by 1897 reclocking ACKs once they are beyond the bottleneck on the return 1898 path to the sender, however this further raises the effective RTT. 1900 The most extreme example of this sort of behavior would be a half 1901 duplex channel that is not released as long as end point currently 1902 holding the channel has more traffic (data or ACKs) to send. Such 1903 environments cause self clocked protocols under full load to revert 1904 to extremely inefficient stop and wait behavior, where they send an 1905 entire window of data as a single burst of the forward path, followed 1906 by the entire window of ACKs on the return path. It is important to 1907 note that due to self clocking, ill conceived channel allocation 1908 mechanisms can increase the stress on upstream links in a long path: 1909 they cause large and faster bursts. 1911 If a particular end-to-end path contains a link or device that alters 1912 the ACK stream, then the entire path from the sender up to the 1913 bottleneck must be tested at the burst parameters implied by the ACK 1914 scheduling algorithm. The most important parameter is the Effective 1915 Bottleneck Data Rate, which is the average rate at which the ACKs 1916 advance snd.una. Note that thinning the ACKs (relying on the 1917 cumulative nature of seg.ack to permit discarding some ACKs) is 1918 implies an effectively infinite bottleneck data rate. 1920 Holding data or ACKs for channel allocation or other reasons (such as 1921 forward error correction) always raises the effective RTT relative to 1922 the minimum delay for the path. Therefore it may be necessary to 1923 replace target_RTT in the calculation in Section 5.2 by an 1924 effective_RTT, which includes the target_RTT plus a term to account 1925 for the extra delays introduced by these mechanisms. 1927 Appendix C. Version Control 1929 This section to be removed prior to publication. 1931 Formatted: Mon Mar 9 14:37:24 PDT 2015 1933 Authors' Addresses 1935 Matt Mathis 1936 Google, Inc 1937 1600 Amphitheater Parkway 1938 Mountain View, California 94043 1939 USA 1941 Email: mattmathis@google.com 1943 Al Morton 1944 AT&T Labs 1945 200 Laurel Avenue South 1946 Middletown, NJ 07748 1947 USA 1949 Phone: +1 732 420 1571 1950 Email: acmorton@att.com 1951 URI: http://home.comcast.net/~acmacm/