idnits 2.17.1 draft-ietf-ippm-model-based-metrics-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 7 instances of lines with non-RFC2606-compliant FQDNs in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 578: '... A TDS or FSTDS MUST apportion all re...' RFC 2119 keyword, line 685: '...ecified TDS or FSTDS MUST document the...' RFC 2119 keyword, line 702: '...The TDS or FSTDS MUST document and jus...' RFC 2119 keyword, line 833: '...argets then this MUST be stated both t...' RFC 2119 keyword, line 834: '...etwork performance. The tests MUST be...' (2 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 354 has weird spacing: '...y tests deter...' == Line 365 has weird spacing: '...g tests are d...' == Line 370 has weird spacing: '...g tests evalu...' == Line 1006 has weird spacing: '... and n = h1...' -- The document date (July 3, 2014) is 3575 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'Dominant' is mentioned on line 242, but not defined == Missing Reference: 'W' is mentioned on line 1859, but not defined -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2861 (Obsoleted by RFC 7661) == Outdated reference: A later version (-07) exists of draft-ietf-ippm-lmap-path-04 Summary: 3 errors (**), 0 flaws (~~), 9 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IP Performance Working Group M. Mathis 3 Internet-Draft Google, Inc 4 Intended status: Experimental A. Morton 5 Expires: January 4, 2015 AT&T Labs 6 July 3, 2014 8 Model Based Bulk Performance Metrics 9 draft-ietf-ippm-model-based-metrics-03.txt 11 Abstract 13 We introduce a new class of model based metrics designed to determine 14 if an end-to-end Internet path can meet predefined transport 15 performance targets by applying a suite of IP diagnostic tests to 16 successive subpaths. The subpath-at-a-time tests can be robustly 17 applied to key infrastructure, such as interconnects, to accurately 18 detect if it will prevent the full end-to-end paths that traverse it 19 from meeting the specified target performance. 21 Each IP diagnostic test consists of a precomputed traffic pattern and 22 a statistical criteria for evaluating packet delivery. The traffic 23 patterns are precomputed to mimic TCP or other transport protocol 24 over a long path but are independent of the actual details of the 25 subpath under test. Likewise the success criteria depends on the 26 target performance for the long path and not the details of the 27 subpath. This makes the measurements open loop, which introduces 28 several important new properties and eliminates most of the 29 difficulties encountered by traditional bulk transport metrics. 31 This document does not define diagnostic tests, but provides a 32 framework for designing suites of diagnostics tests that are tailored 33 the confirming the target performance. 35 Interim DRAFT Formatted: Thu Jul 3 20:19:04 PDT 2014 37 Status of this Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at http://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on January 4, 2015. 54 Copyright Notice 56 Copyright (c) 2014 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 72 1.1. TODO . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 73 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 74 3. New requirements relative to RFC 2330 . . . . . . . . . . . . 11 75 4. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 11 76 4.1. TCP properties . . . . . . . . . . . . . . . . . . . . . . 12 77 4.2. Diagnostic Approach . . . . . . . . . . . . . . . . . . . 14 78 5. Common Models and Parameters . . . . . . . . . . . . . . . . . 15 79 5.1. Target End-to-end parameters . . . . . . . . . . . . . . . 15 80 5.2. Common Model Calculations . . . . . . . . . . . . . . . . 16 81 5.3. Parameter Derating . . . . . . . . . . . . . . . . . . . . 17 82 6. Common testing procedures . . . . . . . . . . . . . . . . . . 17 83 6.1. Traffic generating techniques . . . . . . . . . . . . . . 17 84 6.1.1. Paced transmission . . . . . . . . . . . . . . . . . . 17 85 6.1.2. Constant window pseudo CBR . . . . . . . . . . . . . . 18 86 6.1.3. Scanned window pseudo CBR . . . . . . . . . . . . . . 19 87 6.1.4. Concurrent or channelized testing . . . . . . . . . . 19 88 6.2. Interpreting the Results . . . . . . . . . . . . . . . . . 20 89 6.2.1. Test outcomes . . . . . . . . . . . . . . . . . . . . 20 90 6.2.2. Statistical criteria for measuring run_length . . . . 22 91 6.2.2.1. Alternate criteria for measuring run_length . . . 23 92 6.2.3. Reordering Tolerance . . . . . . . . . . . . . . . . . 25 93 6.3. Test Preconditions . . . . . . . . . . . . . . . . . . . . 25 94 7. Diagnostic Tests . . . . . . . . . . . . . . . . . . . . . . . 26 95 7.1. Basic Data Rate and Delivery Statistics Tests . . . . . . 26 96 7.1.1. Delivery Statistics at Paced Full Data Rate . . . . . 27 97 7.1.2. Delivery Statistics at Full Data Windowed Rate . . . . 27 98 7.1.3. Background Delivery Statistics Tests . . . . . . . . . 27 99 7.2. Standing Queue Tests . . . . . . . . . . . . . . . . . . . 28 100 7.2.1. Congestion Avoidance . . . . . . . . . . . . . . . . . 29 101 7.2.2. Bufferbloat . . . . . . . . . . . . . . . . . . . . . 29 102 7.2.3. Non excessive loss . . . . . . . . . . . . . . . . . . 30 103 7.2.4. Duplex Self Interference . . . . . . . . . . . . . . . 30 104 7.3. Slowstart tests . . . . . . . . . . . . . . . . . . . . . 30 105 7.3.1. Full Window slowstart test . . . . . . . . . . . . . . 31 106 7.3.2. Slowstart AQM test . . . . . . . . . . . . . . . . . . 31 107 7.4. Sender Rate Burst tests . . . . . . . . . . . . . . . . . 31 108 7.5. Combined Tests . . . . . . . . . . . . . . . . . . . . . . 32 109 7.5.1. Sustained burst test . . . . . . . . . . . . . . . . . 32 110 7.5.2. Streaming Media . . . . . . . . . . . . . . . . . . . 33 111 8. An Example . . . . . . . . . . . . . . . . . . . . . . . . . . 34 112 9. Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 35 113 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 114 11. Informative References . . . . . . . . . . . . . . . . . . . . 37 115 Appendix A. Model Derivations . . . . . . . . . . . . . . . . . . 40 116 A.1. Queueless Reno . . . . . . . . . . . . . . . . . . . . . . 40 117 A.2. CUBIC . . . . . . . . . . . . . . . . . . . . . . . . . . 41 118 Appendix B. Complex Queueing . . . . . . . . . . . . . . . . . . 42 119 Appendix C. Version Control . . . . . . . . . . . . . . . . . . . 43 120 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43 122 1. Introduction 124 Bulk performance metrics evaluate an Internet path's ability to carry 125 bulk data. Model based bulk performance metrics rely on mathematical 126 TCP models to design a targeted diagnostic suite (TDS) of IP 127 performance tests which can be applied independently to each subpath 128 of the full end-to-end path. These targeted diagnostic suites allow 129 independent tests of subpaths to accurately detect if any subpath 130 will prevent the full end-to-end path from delivering bulk data at 131 the specified performance target, independent of the measurement 132 vantage points or other details of the test procedures used for each 133 measurement. 135 The end-to-end target performance is determined by the needs of the 136 user or application, outside the scope of this document. For bulk 137 data transport, the primary performance parameter of interest is the 138 target data rate. However, since TCP's ability to compensate for 139 less than ideal network conditions is fundamentally affected by the 140 Round Trip Time (RTT) and the Maximum Transmission Unit (MTU) of the 141 entire end-to-end path over which the data traverses, these 142 parameters must also be specified in advance. They may reflect a 143 specific real path through the Internet or an idealized path 144 representing a typical user community. The target values for these 145 three parameters, Data Rate, RTT and MTU, inform the mathematical 146 models used to design the TDS. 148 Each IP diagnostic test in a TDS consists of a precomputed traffic 149 pattern and statistical criteria for evaluating packet delivery. 151 Mathematical models are used to design traffic patterns that mimic 152 TCP or other bulk transport protocol operating at the target data 153 rate, MTU and RTT over a full range of conditions, including flows 154 that are bursty at multiple time scales. The traffic patterns are 155 computed in advance based on the three target parameters of the end- 156 to-end path and independent of the properties of individual subpaths. 157 As much as possible the measurement traffic is generated 158 deterministically in ways that minimize the extent to which test 159 methodology, measurement points, measurement vantage or path 160 partitioning affect the details of the measurement traffic. 162 Mathematical models are also used to compute the bounds on the packet 163 delivery statistics for acceptable IP performance. Since these 164 statistics, such as packet loss, are typically aggregated from all 165 subpaths of the end-to-end path, the end-to-end statistical bounds 166 need to be apportioned as a separate bound for each subpath. Note 167 that links that are expected to be bottlenecks are expected to 168 contribute more packet loss and/or delay. In compensation, other 169 links have to be constrained to contribute less packet loss and 170 delay. The criteria for passing each test of a TDS is an apportioned 171 share of the total bound determined by the mathematical model from 172 the end-to-end target performance. 174 In addition to passing or failing, a test can be deemed to be 175 inconclusive for a number of reasons including: the precomputed 176 traffic pattern was not accurately generated; the measurement results 177 were not statistically significant; and others such as failing to 178 meet some required test preconditions. 180 This document describes a framework for deriving traffic patterns and 181 delivery statistics for model based metrics. It does not fully 182 specify any measurement techniques. Important details such as packet 183 type-p selection, sampling techniques, vantage selection, etc. are 184 not specified here. We imagine Fully Specified Targeted Diagnostic 185 Suites (FSTDS), that define all of these details. We use TDS to 186 refer to the subset of such a specification that is in scope for this 187 document. A TDS includes the target parameters, documentation of the 188 models and assumptions used to derive the diagnostic test parameters, 189 specifications for the traffic and delivery statistics for the tests 190 themselves, and a description of a test setup that can be used to 191 validate the tests and models. 193 Section 2 defines terminology used throughout this document. 195 It has been difficult to develop Bulk Transport Capacity [RFC3148] 196 metrics due to some overlooked requirements described in Section 3 197 and some intrinsic problems with using protocols for measurement, 198 described in Section 4. 200 In Section 5 we describe the models and common parameters used to 201 derive the targeted diagnostic suite. In Section 6 we describe 202 common testing procedures. Each subpath is evaluated using suite of 203 far simpler and more predictable diagnostic tests described in 204 Section 7. In Section 8 we present an example TDS that might be 205 representative of HD video, and illustrate how MBM can be used to 206 address difficult measurement situations, such as confirming that 207 intercarrier exchanges have sufficient performance and capacity to 208 deliver HD video between ISPs. 210 There exists a small risk that model based metric itself might yield 211 a false pass result, in the sense that every subpath of an end-to-end 212 path passes every IP diagnostic test and yet a real application fails 213 to attain the performance target over the end-to-end path. If this 214 happens, then the validation procedure described in Section 9 needs 215 to be used to prove and potentially revise the models. 217 Future documents will define model based metrics for other traffic 218 classes and application types, such as real time streaming media. 220 1.1. TODO 222 Please send comments about this draft to ippm@ietf.org. See 223 http://goo.gl/02tkD for more information including: interim drafts, 224 an up to date todo list and information on contributing. 226 Formatted: Thu Jul 3 20:19:04 PDT 2014 228 2. Terminology 230 Terminology about paths, etc. See [RFC2330] and 231 [I-D.ietf-ippm-lmap-path]. 233 [data] sender Host sending data and receiving ACKs. 234 [data] receiver Host receiving data and sending ACKs. 235 subpath A portion of the full path. Note that there is no 236 requirement that subpaths be non-overlapping. 237 Measurement Point Measurement points as described in 238 [I-D.ietf-ippm-lmap-path]. 239 test path A path between two measurement points that includes a 240 subpath of the end-to-end path under test, and could include 241 infrastructure between the measurement points and the subpath. 242 [Dominant] Bottleneck The Bottleneck that generally dominates 243 traffic statistics for the entire path. It typically determines a 244 flow's self clock timing, packet loss and ECN marking rate. See 245 Section 4.1. 246 front path The subpath from the data sender to the dominant 247 bottleneck. 248 back path The subpath from the dominant bottleneck to the receiver. 249 return path The path taken by the ACKs from the data receiver to the 250 data sender. 251 cross traffic Other, potentially interfering, traffic competing for 252 network resources (bandwidth and/or queue capacity). 254 Properties determined by the end-to-end path and application. They 255 are described in more detail in Section 5.1. 257 Application Data Rate General term for the data rate as seen by the 258 application above the transport layer. This is the payload data 259 rate, and excludes transport and lower level headers(TCP/IP or 260 other protocols) and as well as retransmissions and other data 261 that does not contribute to the total quantity of data delivered 262 to the application. 264 Link Data Rate General term for the data rate as seen by the link or 265 lower layers. The link data rate includes transport and IP 266 headers, retransmits and other transport layer overhead. This 267 document is agnostic as to whether the link data rate includes or 268 excludes framing, MAC, or other lower layer overheads, except that 269 they must be treated uniformly. 270 end-to-end target parameters: Application or transport performance 271 goals for the end-to-end path. They include the target data rate, 272 RTT and MTU described below. 273 Target Data Rate: The application data rate, typically the ultimate 274 user's performance goal. 275 Target RTT (Round Trip Time): The baseline (minimum) RTT of the 276 longest end-to-end path over which the application expects to be 277 able meet the target performance. TCP and other transport 278 protocol's ability to compensate for path problems is generally 279 proportional to the number of round trips per second. The Target 280 RTT determines both key parameters of the traffic patterns (e.g. 281 burst sizes) and the thresholds on acceptable traffic statistics. 282 The Target RTT must be specified considering authentic packets 283 sizes: MTU sized packets on the forward path, ACK sized packets 284 (typically header_overhead) on the return path. 285 Target MTU (Maximum Transmission Unit): The maximum MTU supported by 286 the end-to-end path the over which the application expects to meet 287 the target performance. Assume 1500 Byte packet unless otherwise 288 specified. If some subpath forces a smaller MTU, then it becomes 289 the target MTU, and all model calculations and subpath tests must 290 use the same smaller MTU. 291 Effective Bottleneck Data Rate: This is the bottleneck data rate 292 inferred from the ACK stream, by looking at how much data the ACK 293 stream reports delivered per unit time. If the path is thinning 294 ACKs or batching packets the effective bottleneck rate can be much 295 higher than the average link rate. See Section 4.1 and Appendix B 296 for more details. 297 [sender | interface] rate: The burst data rate, constrained by the 298 data sender's interfaces. Today 1 or 10 Gb/s are typical. 299 Header_overhead: The IP and TCP header sizes, which are the portion 300 of each MTU not available for carrying application payload. 301 Without loss of generality this is assumed to be the size for 302 returning acknowledgements (ACKs). For TCP, the Maximum Segment 303 Size (MSS) is the Target MTU minus the header_overhead. 305 Basic parameters common to models and subpath tests. They are 306 described in more detail in Section 5.2. Note that these are mixed 307 between application transport performance (excludes headers) and link 308 IP performance (includes headers). 310 pipe size A general term for number of packets needed in flight (the 311 window size) to exactly fill some network path or subpath. This 312 is the window size which is normally the onset of queueing. 313 target_pipe_size: The number of packets in flight (the window size) 314 needed to exactly meet the target rate, with a single stream and 315 no cross traffic for the specified application target data rate, 316 RTT, and MTU. It is the amount of circulating data required to 317 meet the target data rate, and implies the scale of the bursts 318 that the network might experience. 319 Delivery Statistics Raw or summary statistics about packet delivery, 320 packet losses, ECN marks, reordering, or any other properties of 321 packet delivery that may be germane to transport performance. 322 run length A general term for the observed, measured, or specified 323 number of packets that are (to be) delivered between losses or ECN 324 marks. Nominally one over the loss or ECN marking probability, if 325 there are independently and identically distributed. 326 target_run_length The target_run_length is an estimate of the 327 minimum required headway between losses or ECN marks necessary to 328 attain the target_data_rate over a path with the specified 329 target_RTT and target_MTU, as computed by a mathematical model of 330 TCP congestion control. A reference calculation is show in 331 Section 5.2 and alternatives in Appendix A 333 Ancillary parameters used for some tests 335 derating: Under some conditions the standard models are too 336 conservative. The modeling framework permits some latitude in 337 relaxing or "derating" some test parameters as described in 338 Section 5.3 in exchange for a more stringent TDS validation 339 procedures, described in Section 9. 340 subpath_data_rate The maximum IP data rate supported by a subpath. 341 This typically includes TCP/IP overhead, including headers, 342 retransmits, etc. 343 test_path_RTT The RTT between two measurement points using 344 appropriate data and ACK packet sizes. 345 test_path_pipe The amount of data necessary to fill a test path. 346 Nominally the test path RTT times the subpath_data_rate (which 347 should be part of the end-to-end subpath). 348 test_window The window necessary to meet the target_rate over a 349 subpath. Typically test_window=target_data_rate*test_RTT/ 350 (target_MTU - header_overhead). 352 Tests can be classified into groups according to their applicability. 354 Capacity tests determine if a network subpath has sufficient 355 capacity to deliver the target performance. As long as the test 356 traffic is within the proper envelope for the target end-to-end 357 performance, the average packet losses or ECN must be below the 358 threshold computed by the model. As such, capacity tests reflect 359 parameters that can transition from passing to failing as a 360 consequence of cross traffic, additional presented load or the 361 actions of other network users. By definition, capacity tests 362 also consume significant network resources (data capacity and/or 363 buffer space), and the test schedules must be balanced by their 364 cost. 365 Monitoring tests are designed to capture the most important aspects 366 of a capacity test, but without presenting excessive ongoing load 367 themselves. As such they may miss some details of the network's 368 performance, but can serve as a useful reduced-cost proxy for a 369 capacity test. 370 Engineering tests evaluate how network algorithms (such as AQM and 371 channel allocation) interact with TCP-style self clocked protocols 372 and adaptive congestion control based on packet loss and ECN 373 marks. These tests are likely to have complicated interactions 374 with other traffic and under some conditions can be inversely 375 sensitive to load. For example a test to verify that an AQM 376 algorithm causes ECN marks or packet drops early enough to limit 377 queue occupancy may experience a false pass result in the presence 378 of bursty cross traffic. It is important that engineering tests 379 be performed under a wide range of conditions, including both in 380 situ and bench testing, and over a wide variety of load 381 conditions. Ongoing monitoring is less likely to be useful for 382 engineering tests, although sparse in situ testing might be 383 appropriate. 385 General Terminology: 387 Targeted Diagnostic Test (TDS) A set of IP Diagnostics designed to 388 determine if a subpath can sustain flows at a specific 389 target_data_rate over a path that has a target_RTT using 390 target_MTU sided packets. 391 Fully Specified Targeted Diagnostic Test A TDS together with 392 additional specification such as "type-p", etc which are out of 393 scope for this document, but need to be drawn from other standards 394 documents. 395 apportioned To divide and allocate, as in budgeting packet loss 396 rates across multiple subpaths to accumulate below a specified 397 end-to-end loss rate. 399 open loop A control theory term used to describe a class of 400 techniques where systems that exhibit circular dependencies can be 401 analyzed by suppressing some of the dependences, such that the 402 resulting dependency graph is acyclic. 404 3. New requirements relative to RFC 2330 406 Model Based Metrics are designed to fulfill some additional 407 requirement that were not recognized at the time RFC 2330 was written 408 [RFC2330]. These missing requirements may have significantly 409 contributed to policy difficulties in the IP measurement space. Some 410 additional requirements are: 411 o IP metrics must be actionable by the ISP - they have to be 412 interpreted in terms of behaviors or properties at the IP or lower 413 layers, that an ISP can test, repair and verify. 414 o Metrics must be vantage point invariant over a significant range 415 of measurement point choices, including off path measurement 416 points. The only requirements on MP selection should be that the 417 portion of the test path that is not under test is effectively 418 ideal (or is non ideal in ways that can be calibrated out of the 419 measurements) and the test RTT between the MPs is below some 420 reasonable bound. 421 o Metrics must be repeatable by multiple parties with no specialized 422 access to MPs or diagnostic infrastructure. It must be possible 423 for different parties to make the same measurement and observe the 424 same results. In particular it is specifically important that 425 both a consumer (or their delegate) and ISP be able to perform the 426 same measurement and get the same result. 428 NB: All of the metric requirements in RFC 2330 should be reviewed and 429 potentially revised. If such a document is opened soon enough, this 430 entire section should be dropped. 432 4. Background 434 At the time the IPPM WG was chartered, sound Bulk Transport Capacity 435 measurement was known to be beyond our capabilities. By hindsight it 436 is now clear why it is such a hard problem: 437 o TCP is a control system with circular dependencies - everything 438 affects performance, including components that are explicitly not 439 part of the test. 440 o Congestion control is an equilibrium process, such that transport 441 protocols change the network (raise loss probability and/or RTT) 442 to conform to their behavior. 444 o TCP's ability to compensate for network flaws is directly 445 proportional to the number of roundtrips per second (i.e. 446 inversely proportional to the RTT). As a consequence a flawed 447 link may pass a short RTT local test even though it fails when the 448 path is extended by a perfect network to some larger RTT. 449 o TCP has a meta Heisenberg problem - Measurement and cross traffic 450 interact in unknown and ill defined ways. The situation is 451 actually worse than the traditional physics problem where you can 452 at least estimate the relative momentum of the measurement and 453 measured particles. For network measurement you can not in 454 general determine the relative "elasticity" of the measurement 455 traffic and cross traffic, so you can not even gauge the relative 456 magnitude of their effects on each other. 458 These properties are a consequence of the equilibrium behavior 459 intrinsic to how all throughput optimizing protocols interact with 460 the network. The protocols rely on control systems based on multiple 461 network estimators to regulate the quantity of data sent into the 462 network. The data in turn alters network and the properties observed 463 by the estimators, such that there are circular dependencies between 464 every component and every property. Since some of these estimators 465 are non-linear, the entire system is nonlinear, and any change 466 anywhere causes difficult to predict changes in every parameter. 468 Model Based Metrics overcome these problems by forcing the 469 measurement system to be open loop: the delivery statistics (akin to 470 the network estimators) do not affect the traffic. The traffic and 471 traffic patterns (bursts) are computed on the basis of the target 472 performance. In order for a network to pass, the resulting delivery 473 statistics and corresponding network estimators have to be such that 474 they would not cause the control systems slow the traffic below the 475 target rate. 477 4.1. TCP properties 479 TCP and SCTP are self clocked protocols. The dominant steady state 480 behavior is to have an approximately fixed quantity of data and 481 acknowledgements (ACKs) circulating in the network. The receiver 482 reports arriving data by returning ACKs to the data sender, the data 483 sender typically responds by sending exactly the same quantity of 484 data back into the network. The total quantity of data plus the data 485 represented by ACKs circulating in the network is referred to as the 486 window. The mandatory congestion control algorithms incrementally 487 adjust the window by sending slightly more or less data in response 488 to each ACK. The fundamentally important property of this systems is 489 that it is entirely self clocked: The data transmissions are a 490 reflection of the ACKs that were delivered by the network, the ACKs 491 are a reflection of the data arriving from the network. 493 A number of phenomena can cause bursts of data, even in idealized 494 networks that are modeled as simple queueing systems. 496 During slowstart the data rate is doubled on each RTT by sending 497 twice as much data as was delivered to the receiver on the prior RTT. 498 For slowstart to be able to fill such a network the network must be 499 able to tolerate slowstart bursts up to the full pipe size inflated 500 by the anticipated window reduction on the first loss or ECN mark. 501 For example, with classic Reno congestion control, an optimal 502 slowstart has to end with a burst that is twice the bottleneck rate 503 for exactly one RTT in duration. This burst causes a queue which is 504 exactly equal to the pipe size (i.e. the window is exactly twice the 505 pipe size) so when the window is halved in response to the first 506 loss, the new window will be exactly the pipe size. 508 Note that if the bottleneck data rate is significantly slower than 509 the rest of the path, the slowstart bursts will not cause significant 510 queues anywhere else along the path; they primarily exercise the 511 queue at the dominant bottleneck. 513 Other sources of bursts include application pauses and channel 514 allocation mechanisms. Appendix B describes the treatment of channel 515 allocation systems. If the application pauses (stops reading or 516 writing data) for some fraction of one RTT, state-of-the-art TCP 517 catches up to the earlier window size by sending a burst of data at 518 the full sender interface rate. To fill such a network with a 519 realistic application, the network has to be able to tolerate 520 interface rate bursts from the data sender large enough to cover 521 application pauses. 523 Although the interface rate bursts are typically smaller than last 524 burst of a slowstart, they are at a higher data rate so they 525 potentially exercise queues at arbitrary points along the front path 526 from the data sender up to and including the queue at the dominant 527 bottleneck. There is no model for how frequent or what sizes of 528 sender rate bursts should be tolerated. 530 To verify that a path can meet a performance target, it is necessary 531 to independently confirm that the path can tolerate bursts in the 532 dimensions that can be caused by these mechanisms. Three cases are 533 likely to be sufficient: 535 o Slowstart bursts sufficient to get connections started properly. 536 o Frequent sender interface rate bursts that are small enough where 537 they can be assumed not to significantly affect delivery 538 statistics. (Implicitly derated by selecting the burst size). 540 o Infrequent sender interface rate full target_pipe_size bursts that 541 do affect the delivery statistics. (Target_run_length is 542 derated). 544 4.2. Diagnostic Approach 546 The MBM approach is to open loop TCP by precomputing traffic patterns 547 that are typically generated by TCP operating at the given target 548 parameters, and evaluating delivery statistics (packet loss, ECN 549 marks and delay). In this approach the measurement software 550 explicitly controls the data rate, transmission pattern or cwnd 551 (TCP's primary congestion control state variables) to create 552 repeatable traffic patterns that mimic TCP behavior but are 553 independent of the actual behavior of the subpath under test. These 554 patterns are manipulated to probe the network to verify that it can 555 deliver all of the traffic patterns that a transport protocol is 556 likely to generate under normal operation at the target rate and RTT. 558 By opening the protocol control loops, we remove most sources of 559 temporal and spatial correlation in the traffic delivery statistics, 560 such that each subpath's contribution to the end-to-end statistics 561 can be assumed to be independent and stationary (The delivery 562 statistics depend on the fine structure of the data transmissions, 563 but not on long time scale state imbedded in the sender, receiver or 564 other network components.) Therefore each subpath's contribution to 565 the end-to-end delivery statistics can be assumed to be independent, 566 and spatial composition techniques such as [RFC5835] and [RFC6049] 567 apply. 569 In typical networks, the dominant bottleneck contributes the majority 570 of the packet loss and ECN marks. Often the rest of the path makes 571 insignificant contribution to these properties. A TDS should 572 apportion the end-to-end budget for the specified parameters 573 (primarily packet loss and ECN marks) to each subpath or group of 574 subpaths. For example the dominant bottleneck may be permitted to 575 contribute 90% of the loss budget, while the rest of the path is only 576 permitted to contribute 10%. 578 A TDS or FSTDS MUST apportion all relevant packet delivery statistics 579 between different subpaths, such that the spatial composition of the 580 apportioned metrics yields end-to-end statics which are within the 581 bounds determined by the models. 583 A network is expected to be able to sustain a Bulk TCP flow of a 584 given data rate, MTU and RTT when the following conditions are met: 585 o The raw link rate is higher than the target data rate. 587 o The observed delivery statistics are better than required by a 588 suitable TCP performance model (e.g. fewer losses). 589 o There is sufficient buffering at the dominant bottleneck to absorb 590 a slowstart rate burst large enough to get the flow out of 591 slowstart at a suitable window size. 592 o There is sufficient buffering in the front path to absorb and 593 smooth sender interface rate bursts at all scales that are likely 594 to be generated by the application, any channel arbitration in the 595 ACK path or other mechanisms. 596 o When there is a standing queue at a bottleneck for a shared media 597 subpath, there are suitable bounds on how the data and ACKs 598 interact, for example due to the channel arbitration mechanism. 599 o When there is a slowly rising standing queue at the bottleneck the 600 onset of packet loss has to be at an appropriate point (time or 601 queue depth) and progressive. This typically requires some form 602 of Automatic Queue Management [RFC2309]. 604 We are developing a tool that can perform many of the tests described 605 here[MBMSource]. 607 5. Common Models and Parameters 609 5.1. Target End-to-end parameters 611 The target end-to-end parameters are the target data rate, target RTT 612 and target MTU as defined in Section 2. These parameters are 613 determined by the needs of the application or the ultimate end user 614 and the end-to-end Internet path over which the application is 615 expected to operate. The target parameters are in units that make 616 sense to upper layers: payload bytes delivered to the application, 617 above TCP. They exclude overheads associated with TCP and IP 618 headers, retransmits and other protocols (e.g. DNS). 620 Other end-to-end parameters defined in Section 2 include the 621 effective bottleneck data rate, the sender interface data rate and 622 the TCP/IP header sizes (overhead). 624 The target data rate must be smaller than all link data rates by 625 enough headroom to carry the transport protocol overhead, explicitly 626 including retransmissions and an allowance for fluctuations in the 627 actual data rate, needed to meet the specified average rate. 628 Specifying a target rate with insufficient headroom are likely to 629 result in brittle measurements having little predictive value. 631 Note that the target parameters can be specified for a hypothetical 632 path, for example to construct TDS designed for bench testing in the 633 absence of a real application, or for a real physical test, for in 634 situ testing of production infrastructure. 636 The number of concurrent connections is explicitly not a parameter to 637 this model. If a subpath requires multiple connections in order to 638 meet the specified performance, that must be stated explicitly and 639 the procedure described in Section 6.1.4 applies. 641 5.2. Common Model Calculations 643 The end-to-end target parameters are used to derive the 644 target_pipe_size and the reference target_run_length. 646 The target_pipe_size, is the average window size in packets needed to 647 meet the target rate, for the specified target RTT and MTU. It is 648 given by: 650 target_pipe_size = target_rate * target_RTT / ( target_MTU - 651 header_overhead ) 653 Target_run_length is an estimate of the minimum required headway 654 between losses or ECN marks, as computed by a mathematical model of 655 TCP congestion control. The derivation here follows [MSMO97], and by 656 design is quite conservative. The alternate models described in 657 Appendix A generally yield smaller run_lengths (higher loss rates), 658 but may not apply in all situations. In any case alternate models 659 should be compared to the reference target_run_length computed here. 661 Reference target_run_length is derived as follows: assume the 662 subpath_data_rate is infinitesimally larger than the target_data_rate 663 plus the required header_overhead. Then target_pipe_size also 664 predicts the onset of queueing. A larger window will cause a 665 standing queue at the bottleneck. 667 Assume the transport protocol is using standard Reno style Additive 668 Increase, Multiplicative Decrease congestion control [RFC5681] (but 669 not Appropriate Byte Counting [RFC3465]) and the receiver is using 670 standard delayed ACKs. Reno increases the window by one packet every 671 pipe_size worth of ACKs. With delayed ACKs this takes 2 Round Trip 672 Times per increase. To exactly fill the pipe losses must be no 673 closer than when the peak of the AIMD sawtooth reached exactly twice 674 the target_pipe_size otherwise the multiplicative window reduction 675 triggered by the loss would cause the network to be underfilled. 676 Following [MSMO97] the number of packets between losses must be the 677 area under the AIMD sawtooth. They must be no more frequent than 678 every 1 in ((3/2)*target_pipe_size)*(2*target_pipe_size) packets, 679 which simplifies to: 681 target_run_length = 3*(target_pipe_size^2) 682 Note that this calculation is very conservative and is based on a 683 number of assumptions that may not apply. Appendix A discusses these 684 assumptions and provides some alternative models. If a different 685 model is used, a fully specified TDS or FSTDS MUST document the 686 actual method for computing target_run_length along with the 687 rationale for the underlying assumptions and the ratio of chosen 688 target_run_length to the reference target_run_length calculated 689 above. 691 These two parameters, target_pipe_size and target_run_length, 692 directly imply most of the individual parameters for the tests in 693 Section 7. 695 5.3. Parameter Derating 697 Since some aspects of the models are very conservative, this 698 framework permits some latitude in derating test parameters. Rather 699 than trying to formalize more complicated models we permit some test 700 parameters to be relaxed as long as they meet some additional 701 procedural constraints: 702 o The TDS or FSTDS MUST document and justify the actual method used 703 compute the derated metric parameters. 704 o The validation procedures described in Section 9 must be used to 705 demonstrate the feasibility of meeting the performance targets 706 with infrastructure that infinitesimally passes the derated tests. 707 o The validation process itself must be documented is such a way 708 that other researchers can duplicate the validation experiments. 710 Except as noted, all tests below assume no derating. Tests where 711 there is not currently a well established model for the required 712 parameters explicitly include derating as a way to indicate 713 flexibility in the parameters. 715 6. Common testing procedures 717 6.1. Traffic generating techniques 719 6.1.1. Paced transmission 721 Paced (burst) transmissions: send bursts of data on a timer to meet a 722 particular target rate and pattern. In all cases the specified data 723 rate can either be the application or link rates. Header overheads 724 must be included in the calculations as appropriate. 726 Paced single packets: Send individual packets at the specified rate 727 or headway. 728 Burst: Send sender interface rate bursts on a timer. Specify any 3 729 of: average rate, packet size, burst size (number of packets) and 730 burst headway (burst start to start). These bursts are typically 731 sent as back-to-back packets at the testers interface rate. 732 Slowstart bursts: Send 4 packet sender interface rate bursts at an 733 average data rate equal to twice effective bottleneck link rate 734 (but not more than the sender interface rate). This corresponds 735 to the average rate during a TCP slowstart when Appropriate Byte 736 Counting [RFC3465] is present or delayed ack is disabled. Note 737 that if the effective bottleneck link rate is more than half of 738 the sender interface rate, slowstart bursts become sender 739 interface rate bursts. 740 Repeated Slowstart bursts: Slowstart bursts are typically part of 741 larger scale pattern of repeated bursts, such as sending 742 target_pipe_size packets as slowstart bursts on a target_RTT 743 headway (burst start to burst start). Such a stream has three 744 different average rates, depending on the averaging interval. At 745 the finest time scale the average rate is the same as the sender 746 interface rate, at a medium scale the average rate is twice the 747 effective bottleneck link rate and at the longest time scales the 748 average rate is equal to the target data rate. 750 Note that in conventional measurement theory, exponential 751 distributions are often used to eliminate many sorts of correlations. 752 For the procedures above, the correlations are created by the network 753 elements and accurately reflect their behavior. At some point in the 754 future, it may be desirable to introduce noise sources into the above 755 pacing models, but the are not warranted at this time. 757 6.1.2. Constant window pseudo CBR 759 Implement pseudo constant bit rate by running a standard protocol 760 such as TCP with a fixed window size. The rate is only maintained in 761 average over each RTT, and is subject to limitations of the transport 762 protocol. 764 The window size is computed from the target_data_rate and the actual 765 RTT of the test path. 767 If the transport protocol fails to maintain the test rate within 768 prescribed limits the test would typically be considered inconclusive 769 or failing, depending on what mechanism caused the reduced rate. See 770 the discussion of test outcomes in Section 6.2.1. 772 6.1.3. Scanned window pseudo CBR 774 Same as the above, except the window is scanned across a range of 775 sizes designed to include two key events, the onset of queueing and 776 the onset of packet loss or ECN marks. The window is scanned by 777 incrementing it by one packet for every 2*target_pipe_size delivered 778 packets. This mimics the additive increase phase of standard TCP 779 congestion avoidance and normally separates the the window increases 780 by approximately twice the target_RTT. 782 There are two versions of this test: one built by applying a window 783 clamp to standard congestion control and the other built by 784 stiffening a non-standard transport protocol. When standard 785 congestion control is in effect, any losses or ECN marks cause the 786 transport to revert to a window smaller than the clamp such that the 787 scanning clamp loses control the window size. The NPAD pathdiag tool 788 is an example of this class of algorithms [Pathdiag]. 790 Alternatively a non-standard congestion control algorithm can respond 791 to losses by transmitting extra data, such that it maintains the 792 specified window size independent of losses or ECN marks. Such a 793 stiffened transport explicitly violates mandatory Internet congestion 794 control and is not suitable for in situ testing. It is only 795 appropriate for engineering testing under laboratory conditions. The 796 Windowed Ping tools implemented such a test [WPING]. The tool 797 described in the paper has been updated.[mpingSource] 799 The test procedures in Section 7.2 describe how to the partition the 800 scans into regions and how to interpret the results. 802 6.1.4. Concurrent or channelized testing 804 The procedures described in this document are only directly 805 applicable to single stream performance measurement, e.g. one TCP 806 connection. In an ideal world, we would disallow all performance 807 claims based multiple concurrent streams, but this is not practical 808 due to at least two different issues. First, many very high rate 809 link technologies are channelized and pin individual flows to 810 specific channels to minimize reordering or other problems and 811 second, TCP itself has scaling limits. Although the former problem 812 might be overcome through different design decisions, the later 813 problem is more deeply rooted. 815 All standard [RFC5681] and de facto standard congestion control 816 algorithms [CUBIC] have scaling limits, in the sense that as a long 817 fast network (LFN) with a fixed RTT and MTU gets faster, these 818 congestion control algorithms get less accurate and as a consequence 819 have difficulty filling the network[CCscaling]. These properties are 820 a consequence of the original Reno AIMD congestion control design and 821 the requirement in [RFC5681] that all transport protocols have 822 uniform response to congestion. 824 There are a number of reasons to want to specify performance in term 825 of multiple concurrent flows, however this approach is not 826 recommended for data rates below several megabits per second, which 827 can be attained with run lengths under 10000 packets. Since the 828 required run length goes as the square of the data rate, at higher 829 rates the run lengths can be unreasonably large, and multiple 830 connection might be the only feasible approach. 832 If multiple connections are deemed necessary to meet aggregate 833 performance targets then this MUST be stated both the design of the 834 TDS and in any claims about network performance. The tests MUST be 835 performed concurrently with the specified number of connections. For 836 the the tests that use bursty traffic, the bursts should be 837 synchronized across flows. 839 6.2. Interpreting the Results 841 6.2.1. Test outcomes 843 To perform an exhaustive test of an end-to-end network path, each 844 test of the TDS is applied to each subpath of an end-to-end path. If 845 any subpath fails any test then an application running over the end- 846 to-end path can also be expected to fail to attain the target 847 performance under some conditions. 849 In addition to passing or failing, a test can be deemed to be 850 inconclusive for a number of reasons. Proper instrumentation and 851 treatment of inconclusive outcomes is critical to the accuracy and 852 robustness of Model Based Metrics. Tests can be inconclusive if the 853 precomputed traffic pattern or data rates were not accurately 854 generated; the measurement results were not statistically 855 significant; and others causes such as failing to meet some required 856 preconditions for the test. 858 For example consider a test that implements Constant Window Pseudo 859 CBR (Section 6.1.2) by adding rate controls and detailed traffic 860 instrumentation to TCP (e.g. [RFC4898]). TCP includes built in 861 control systems which might interfere with the sending data rate. If 862 such a test meets the required delivery statistics (e.g. run length) 863 while failing to attain the specified data rate it must be treated as 864 an inconclusive result, because we can not a priori determine if the 865 reduced data rate was caused by a TCP problem or a network problem, 866 or if the reduced data rate had a material effect on the delivery 867 statistics themselves. 869 Note that for load tests such as this example, an if the observed 870 delivery statistics fail to meet the targets, the test can can be 871 considered to have failed the test because it doesn't really matter 872 that the test didn't attain the required data rate. 874 The really important new properties of MBM, such as vantage 875 independence, are a direct consequence of opening the control loops 876 in the protocols, such that the test traffic does not depend on 877 network conditions or traffic received. Any mechanism that 878 introduces feedback between the traffic measurements and the traffic 879 generation is at risk of introducing nonlinearities that spoil these 880 properties. Any exceptional event that indicates that such feedback 881 has happened should cause the test to be considered inconclusive. 883 One way to view inconclusive tests is that they reflect situations 884 where a test outcome is ambiguous between limitations of the network 885 and some unknown limitation of the diagnostic test itself, which may 886 have been caused by some uncontrolled feedback from the network. 888 Note that procedures that attempt to sweep the target parameter space 889 to find the limits on some parameter (for example to find the highest 890 data rate for a subpath) are likely to break the location independent 891 properties of Model Based Metrics, because the boundary between 892 passing and inconclusive is generally sensitive to RTT. This 893 interaction is because TCP's ability to compensate for flaws in the 894 network scales with the number of round trips per second. Repeating 895 the same procedure from a different vantage point with a larger RTT 896 is likely get a different result, because with the larger TCP will 897 less accurately control the data rate. 899 One of the goals for evolving TDS designs will be to keep sharpening 900 distinction between inconclusive, passing and failing tests. The 901 criteria for for passing, failing and inconclusive tests MUST be 902 explicitly stated for every test in the TDS or FSTDS. 904 One of the goals of evolving the testing process, procedures tools 905 and measurement point selection should be to minimize the number of 906 inconclusive tests. 908 It may be useful to keep raw data delivery statistics for deeper 909 study of the behavior of the network path and to measure the tools. 910 Raw delivery statistics can help to drive tool evolution. Under some 911 conditions it might be possible to reevaluate the raw data for 912 satisfying alternate performance targets. However it is important to 913 guard against sampling bias and other implicit feedback which can 914 cause false results and exhibit measurement point vantage 915 sensitivity. 917 6.2.2. Statistical criteria for measuring run_length 919 When evaluating the observed run_length, we need to determine 920 appropriate packet stream sizes and acceptable error levels for 921 efficient measurement. In practice, can we compare the empirically 922 estimated packet loss and ECN marking probabilities with the targets 923 as the sample size grows? How large a sample is needed to say that 924 the measurements of packet transfer indicate a particular run length 925 is present? 927 The generalized measurement can be described as recursive testing: 928 send packets (individually or in patterns) and observe the packet 929 delivery performance (loss ratio or other metric, any marking we 930 define). 932 As each packet is sent and measured, we have an ongoing estimate of 933 the performance in terms of the ratio of packet loss or ECN mark to 934 total packets (i.e. an empirical probability). We continue to send 935 until conditions support a conclusion or a maximum sending limit has 936 been reached. 938 We have a target_mark_probability, 1 mark per target_run_length, 939 where a "mark" is defined as a lost packet, a packet with ECN mark, 940 or other signal. This constitutes the null Hypothesis: 942 H0: no more than one mark in target_run_length = 943 3*(target_pipe_size)^2 packets 945 and we can stop sending packets if on-going measurements support 946 accepting H0 with the specified Type I error = alpha (= 0.05 for 947 example). 949 We also have an alternative Hypothesis to evaluate: if performance is 950 significantly lower than the target_mark_probability. Based on 951 analysis of typical values and practical limits on measurement 952 duration, we choose four times the H0 probability: 954 H1: one or more marks in (target_run_length/4) packets 956 and we can stop sending packets if measurements support rejecting H0 957 with the specified Type II error = beta (= 0.05 for example), thus 958 preferring the alternate hypothesis H1. 960 H0 and H1 constitute the Success and Failure outcomes described 961 elsewhere in the memo, and while the ongoing measurements do not 962 support either hypothesis the current status of measurements is 963 inconclusive. 965 The problem above is formulated to match the Sequential Probability 966 Ratio Test (SPRT) [StatQC]. Note that as originally framed the 967 events under consideration were all manufacturing defects. In 968 networking, ECN marks and lost packets are not defects but signals, 969 indicating that the transport protocol should slow down. 971 The Sequential Probability Ratio Test also starts with a pair of 972 hypothesis specified as above: 974 H0: p0 = one defect in target_run_length 975 H1: p1 = one defect in target_run_length/4 976 As packets are sent and measurements collected, the tester evaluates 977 the cumulative defect count against two boundaries representing H0 978 Acceptance or Rejection (and acceptance of H1): 980 Acceptance line: Xa = -h1 + sn 981 Rejection line: Xr = h2 + sn 982 where n increases linearly for each packet sent and 984 h1 = { log((1-alpha)/beta) }/k 985 h2 = { log((1-beta)/alpha) }/k 986 k = log{ (p1(1-p0)) / (p0(1-p1)) } 987 s = [ log{ (1-p0)/(1-p1) } ]/k 988 for p0 and p1 as defined in the null and alternative Hypotheses 989 statements above, and alpha and beta as the Type I and Type II error. 991 The SPRT specifies simple stopping rules: 993 o Xa < defect_count(n) < Xb: continue testing 994 o defect_count(n) <= Xa: Accept H0 995 o defect_count(n) >= Xb: Accept H1 997 The calculations above are implemented in the R-tool for Statistical 998 Analysis [Rtool] , in the add-on package for Cross-Validation via 999 Sequential Testing (CVST) [CVST] . 1001 Using the equations above, we can calculate the minimum number of 1002 packets (n) needed to accept H0 when x defects are observed. For 1003 example, when x = 0: 1005 Xa = 0 = -h1 + sn 1006 and n = h1 / s 1008 6.2.2.1. Alternate criteria for measuring run_length 1010 An alternate calculation, contributed by Alex Gilgur (Google). 1012 The probability of failure within an interval whose length is 1013 target_run_length is given by an exponential distribution with rate = 1014 1 / target_run_length (a memoryless process). The implication of 1015 this is that it will be different, depending on the total count of 1016 packets that have been through the pipe, the formula being: 1018 P(t1 < T < t2) = R(t1) - R(t2), 1020 where 1022 T = number of packets at which a failure will occur with probability P; 1023 t = number of packets: 1024 t1 = number of packets (e.g., when failure last occurred) 1025 t2 = t1 + target_run_length 1026 R = failure rate: 1027 R(t1) = exp (-t1/target_run_length) 1028 R(t2) = exp (-t2/target_run_length) 1030 The algorithm: 1032 initialize the packet.counter = 0 1033 initialize the failed.packet.counter = 0 1034 start the loop 1035 if paket_response = ACK: 1036 increment the packet.counter 1037 else: 1038 ### The packet failed 1039 increment the packet.counter 1040 increment the failed.packet.counter 1042 P_fail_observed = failed.packet.counter/packet.counter 1044 upper_bound = packet.counter + target.run.length / 2 1045 lower_bound = packet.counter - target.run.length / 2 1047 R1 = exp( -upper_bound / target.run.length) 1048 R0 = R(max(0, lower_bound)/ target.run.length) 1050 P_fail_predicted = R1-R0 1051 Compare P_fail_observed vs. P_fail_predicted 1052 end-if 1053 continue the loop 1055 This algorithm allows accurate comparison of the observed failure 1056 probability with the corresponding values predicted based on a fixed 1057 target_failure_rate, which is equal to 1.0 / target_run_length. 1059 6.2.3. Reordering Tolerance 1061 All tests must be instrumented for packet level reordering [RFC4737]. 1062 However, there is no consensus for how much reordering should be 1063 acceptable. Over the last two decades the general trend has been to 1064 make protocols and applications more tolerant to reordering (see for 1065 example [RFC4015]), in response to the gradual increase in reordering 1066 in the network. This increase has been due to the gradual deployment 1067 of technologies such as multi threaded routing lookups and Equal Cost 1068 Multipath (ECMP) routing. These techniques increase parallelism in 1069 network and are critical to enabling overall Internet growth to 1070 exceed Moore's Law. 1072 Note that transport retransmission strategies can trade off 1073 reordering tolerance vs how quickly can repair losses vs overhead 1074 from spurious retransmissions. In advance of new retransmission 1075 strategies we propose the following strawman: Transport protocols 1076 should be able to adapt to reordering as long as the reordering 1077 extent is no more than the maximum of one half window or 1 mS, 1078 whichever is larger. Within this limit on reorder extent, there 1079 should be no bound on reordering density. 1081 By implication, recording which is less than these bounds should not 1082 be treated as a network impairment. However [RFC4737] still applies: 1083 reordering should be instrumented and the maximum reordering that can 1084 be properly characterized by the test (e.g. bound on history buffers) 1085 should be recorded with the measurement results. 1087 Reordering tolerance and diagnostic bounds must be specified in a 1088 FSTDS. 1090 6.3. Test Preconditions 1092 Many tests have preconditions which are required to assure their 1093 validity. For example the presence or nonpresence of cross traffic 1094 on specific subpaths, or appropriate preloading to put reactive 1095 network elements into the proper states[I-D.ietf-ippm-2330-update]) 1096 If preconditions are not properly satisfied for some reason, the 1097 tests should be considered to be inconclusive. In general it is 1098 useful to preserve diagnostic information about why the preconditions 1099 were not met, and the test data that was collected, if any. 1101 It is important to preserve the record that a test was scheduled, 1102 because otherwise precondition enforcement mechanisms can introduce 1103 sampling bias. For example, canceling tests due to load on 1104 subscriber access links may introduce sampling bias for tests of the 1105 rest of the network by reducing the number of tests during peak 1106 network load. 1108 Test preconditions and failure actions must be specified in a FSTDS. 1110 7. Diagnostic Tests 1112 The diagnostic tests below are organized by traffic pattern: basic 1113 data rate and delivery statistics, standing queues, slowstart bursts, 1114 and sender rate bursts. We also introduce some combined tests which 1115 are more efficient when networks are expected to pass, but conflate 1116 diagnostic signatures when they fail. 1118 There are a number of test details which are not fully defined here. 1119 They must be fully specified in a FSTDS. From a standardization 1120 perspective, this lack of specificity will weaken this version of 1121 Model Based Metrics, however it is anticipated that this it be more 1122 than offset by the extent to which MBM suppresses the problems caused 1123 by using transport protocols for measurement. e.g. non-specific MBM 1124 metrics are likely to have better repeatability than many existing 1125 BTC like metrics. Once we have good field experience, the missing 1126 details can be fully specified. 1128 7.1. Basic Data Rate and Delivery Statistics Tests 1130 We propose several versions of the basic data rate and delivery 1131 statistics test. All measure the number of packets delivered between 1132 losses or ECN marks, using a data stream that is rate controlled at 1133 or below the target_data_rate. 1135 The tests below differ in how the data rate is controlled. The data 1136 can be paced on a timer, or window controlled at full target data 1137 rate. The first two tests implicitly confirm that sub_path has 1138 sufficient raw capacity to carry the target_data_rate. They are 1139 recommend for relatively infrequent testing, such as an installation 1140 or periodic auditing process. The third, background delivery 1141 statistics, is a low rate test designed for ongoing monitoring for 1142 changes in subpath quality. 1144 All rely on the receiver accumulating packet delivery statistics as 1145 described in Section 6.2.2 to score the outcome: 1147 Pass: it is statistically significant that the observed interval 1148 between losses or ECN marks is larger than the target_run_length. 1150 Fail: it is statistically significant that the observed interval 1151 between losses or ECN marks is smaller than the target_run_length. 1153 A test is considered to be inconclusive if it failed to meet the data 1154 rate as specified below, meet the qualifications defined in 1155 Section 6.3 or neither run length statistical hypothesis was 1156 confirmed in the allotted test duration. 1158 7.1.1. Delivery Statistics at Paced Full Data Rate 1160 Confirm that the observed run length is at least the 1161 target_run_length while relying on timer to send data at the 1162 target_rate using the procedure described in in Section 6.1.1 with a 1163 burst size of 1 (single packets) or 2 (packet pairs). 1165 The test is considered to be inconclusive if the packet transmission 1166 can not be accurately controlled for any reason. 1168 RFC 6673 [RFC6673] is appropriate for measuring delivery statistics 1169 at full data rate. 1171 7.1.2. Delivery Statistics at Full Data Windowed Rate 1173 Confirm that the observed run length is at least the 1174 target_run_length while sending at an average rate approximately 1175 equal to the target_data_rate, by controlling (or clamping) the 1176 window size of a conventional transport protocol to a fixed value 1177 computed from the properties of the test path, typically 1178 test_window=target_data_rate*test_RTT/target_MTU. Note that if there 1179 is any interaction between the forward and return path, test_window 1180 may need to be adjusted slightly to compensate for the resulting 1181 inflated RTT. 1183 Since losses and ECN marks generally cause transport protocols to at 1184 least temporarily reduce their data rates, this test is expected to 1185 be less precise about controlling its data rate. It should not be 1186 considered inconclusive as long as at least some of the round trips 1187 reached the full target_data_rate without incurring losses or ECN 1188 marks. To pass this test the network MUST deliver target_pipe_size 1189 packets in target_RTT time without any losses or ECN marks at least 1190 once per two target_pipe_size round trips, in addition to meeting the 1191 run length statistical test. 1193 7.1.3. Background Delivery Statistics Tests 1195 The background run length is a low rate version of the target target 1196 rate test above, designed for ongoing lightweight monitoring for 1197 changes in the observed subpath run length without disrupting users. 1198 It should be used in conjunction with one of the above full rate 1199 tests because it does not confirm that the subpath can support raw 1200 data rate. 1202 RFC 6673 [RFC6673] is appropriate for measuring background delivery 1203 statistics. 1205 7.2. Standing Queue Tests 1207 These test confirm that the bottleneck is well behaved across the 1208 onset of packet loss, which typically follows after the onset of 1209 queueing. Well behaved generally means lossless for transient 1210 queues, but once the queue has been sustained for a sufficient period 1211 of time (or reaches a sufficient queue depth) there should be a small 1212 number of losses to signal to the transport protocol that it should 1213 reduce its window. Losses that are too early can prevent the 1214 transport from averaging at the target_data_rate. Losses that are 1215 too late indicate that the queue might be subject to bufferbloat 1216 [wikiBloat] and inflict excess queuing delays on all flows sharing 1217 the bottleneck queue. Excess losses (more than a few per RTT) make 1218 loss recovery problematic for the transport protocol. Non-linear or 1219 erratic RTT fluctuations suggest poor interactions between the 1220 channel acquisition algorithms and the transport self clock. All of 1221 the tests in this section use the same basic scanning algorithm, 1222 described here, but score the link on the basis of how well it avoids 1223 each of these problems. 1225 For some technologies the data might not be subject to increasing 1226 delays, in which case the data rate will vary with the window size 1227 all the way up to the onset of load induced losses or ECN marks. For 1228 theses technologies, the discussion of queueing does not apply, but 1229 it is still required that the onset of losses (or ECN marks) be at an 1230 appropriate point and progressive. 1232 Use the procedure in Section 6.1.3 to sweep the window across the 1233 onset of queueing and the onset of loss. The tests below all assume 1234 that the scan emulates standard additive increase and delayed ACK by 1235 incrementing the window by one packet for every 2*target_pipe_size 1236 packets delivered. A scan can typically be divided into three 1237 regions: below the onset of queueing, a standing queue, and at or 1238 beyond the onset of loss. 1240 Below the onset of queueing the RTT is typically fairly constant, and 1241 the data rate varies in proportion to the window size. Once the data 1242 rate reaches the link rate, the data rate becomes fairly constant, 1243 and the RTT increases in proportion to the increase in window size. 1244 The precise transition across the start of queueing can be identified 1245 by the maximum network power, defined to be the ratio data rate over 1246 the RTT. The network power can be computed at each window size, and 1247 the window with the maximum are taken as the start of the queueing 1248 region. 1250 For technologies that do not have conventional queues, start the scan 1251 at a window equal to the test_window=target_data_rate*test_RTT/ 1252 target_MTU, i.e. starting at the target rate, instead of the power 1253 point. 1255 If there is random background loss (e.g. bit errors, etc), precise 1256 determination of the onset of queue induced packet loss may require 1257 multiple scans. Above the onset of queuing loss, all transport 1258 protocols are expected to experience periodic losses determined by 1259 the interaction between the congestion control and AQM algorithms. 1260 For standard congestion control algorithms the periodic losses are 1261 likely to be relatively widely spaced and the details are typically 1262 dominated by the behavior of the transport protocol itself. For the 1263 stiffened transport protocols case (with non-standard, aggressive 1264 congestion control algorithms) the details of periodic losses will be 1265 dominated by how the the window increase function responds to loss. 1267 7.2.1. Congestion Avoidance 1269 A link passes the congestion avoidance standing queue test if more 1270 than target_run_length packets are delivered between the onset of 1271 queueing (as determined by the window with the maximum network power) 1272 and the first loss or ECN mark. If this test is implemented using a 1273 standards congestion control algorithm with a clamp, it can be used 1274 in situ in the production internet as a capacity test. For an 1275 example of such a test see [Pathdiag]. 1277 For technologies that do not have conventional queues, use the 1278 test_window inplace of the onset of queueing. i.e. A link passes the 1279 congestion avoidance standing queue test if more than 1280 target_run_length packets are delivered between start of the scan at 1281 test_window and the first loss or ECN mark. 1283 7.2.2. Bufferbloat 1285 This test confirms that there is some mechanism to limit buffer 1286 occupancy (e.g. that prevents bufferbloat). Note that this is not 1287 strictly a requirement for single stream bulk performance, however if 1288 there is no mechanism to limit buffer queue occupancy then a single 1289 stream with sufficient data to deliver is likely to cause the 1290 problems described in [RFC2309] and [wikiBloat]. This may cause only 1291 minor symptoms for the dominant flow, but has the potential to make 1292 the link unusable for other flows and applications. 1294 Pass if the onset of loss occurs before a standing queue has 1295 introduced more delay than than twice target_RTT, or other well 1296 defined and specified limit. Note that there is not yet a model for 1297 how much standing queue is acceptable. The factor of two chosen here 1298 reflects a rule of thumb. In conjunction with the previous test, 1299 this test implies that the first loss should occur at a queueing 1300 delay which is between one and two times the target_RTT. 1302 Specified RTT limits that are larger than twice the target_RTT must 1303 be fully justified in the FSTDS. 1305 7.2.3. Non excessive loss 1307 This test confirm that the onset of loss is not excessive. Pass if 1308 losses are equal or less than the increase in the cross traffic plus 1309 the test traffic window increase on the previous RTT. This could be 1310 restated as non-decreasing link throughput at the onset of loss, 1311 which is easy to meet as long as discarding packets in not more 1312 expensive than delivering them. (Note when there is a transient drop 1313 in link throughput, outside of a standing queue test, a link that 1314 passes other queue tests in this document will have sufficient queue 1315 space to hold one RTT worth of data). 1317 7.2.4. Duplex Self Interference 1319 This engineering test confirms a bound on the interactions between 1320 the forward data path and the ACK return path. 1322 Some historical half duplex technologies had the property that each 1323 direction held the channel until it completely drains its queue. 1324 When a self clocked transport protocol, such as TCP, has data and 1325 acks passing in opposite directions through such a link, the behavior 1326 often reverts to stop-and-wait. Each additional packet added to the 1327 window raises the observed RTT by two forward path packet times, once 1328 as it passes through the data path, and once for the additional delay 1329 incurred by the ACK waiting on the return path. 1331 The duplex self interference test fails if the RTT rises by more than 1332 some fixed bound above the expected queueing time computed from trom 1333 the excess window divided by the link data rate. 1335 7.3. Slowstart tests 1337 These tests mimic slowstart: data is sent at twice the effective 1338 bottleneck rate to exercise the queue at the dominant bottleneck. 1340 In general they are deemed inconclusive if the elapsed time to send 1341 the data burst is not less than half of the time to receive the ACKs. 1342 (i.e. sending data too fast is ok, but sending it slower than twice 1343 the actual bottleneck rate as indicated by the ACKs is deemed 1344 inconclusive). Space the bursts such that the average data rate is 1345 equal to the target_data_rate. 1347 7.3.1. Full Window slowstart test 1349 This is a capacity test to confirm that slowstart is not likely to 1350 exit prematurely. Send slowstart bursts that are target_pipe_size 1351 total packets. 1353 Accumulate packet delivery statistics as described in Section 6.2.2 1354 to score the outcome. Pass if it is statistically significant that 1355 the observed interval between losses or ECN marks is larger than the 1356 target_run_length. Fail if it is statistically significant that the 1357 observed interval between losses or ECN marks is smaller than the 1358 target_run_length. 1360 Note that these are the same parameters as the Sender Full Window 1361 burst test, except the burst rate is at slowestart rate, rather than 1362 sender interface rate. 1364 7.3.2. Slowstart AQM test 1366 Do a continuous slowstart (send data continuously at slowstart_rate), 1367 until the first loss, stop, allow the network to drain and repeat, 1368 gathering statistics on the last packet delivered before the loss, 1369 the loss pattern, maximum observed RTT and window size. Justify the 1370 results. There is not currently sufficient theory justifying 1371 requiring any particular result, however design decisions that affect 1372 the outcome of this tests also affect how the network balances 1373 between long and short flows (the "mice and elephants" problem). The 1374 queue at the time of the first loss should be at least one half of 1375 the target_RTT. 1377 This is an engineering test: It would be best performed on a 1378 quiescent network or testbed, since cross traffic has the potential 1379 to change the results. 1381 7.4. Sender Rate Burst tests 1383 These tests determine how well the network can deliver bursts sent at 1384 sender's interface rate. Note that this test most heavily exercises 1385 the front path, and is likely to include infrastructure may be out of 1386 scope for a subscriber ISP. 1388 Also, there are a several details that are not precisely defined. 1389 For starters there is not a standard server interface rate. 1 Gb/s 1390 and 10 Gb/s are very common today, but higher rates will become cost 1391 effective and can be expected to be dominant some time in the future. 1393 Current standards permit TCP to send a full window bursts following 1394 an application pause. (Congestion Window Validation [RFC2861], is 1395 not required, but even if was, it does not take effect until an 1396 application pause is longer than an RTO.) Since full window bursts 1397 are consistent with standard behavior, it is desirable that the 1398 network be able to deliver such bursts, otherwise application pauses 1399 will cause unwarranted losses. Note that the AIMD sawtooth requires 1400 a peak window that is twice target_pipe_size, so the worst case burst 1401 may be 2*target_pipe_size. 1403 It is also understood in the application and serving community that 1404 interface rate bursts have a cost to the network that has to be 1405 balanced against other costs in the servers themselves. For example 1406 TCP Segmentation Offload (TSO) reduces server CPU in exchange for 1407 larger network bursts, which increase the stress on network buffer 1408 memory. 1410 There is not yet theory to unify these costs or to provide a 1411 framework for trying to optimize global efficiency. We do not yet 1412 have a model for how much the network should tolerate server rate 1413 bursts. Some bursts must be tolerated by the network, but it is 1414 probably unreasonable to expect the network to be able to efficiently 1415 deliver all data as a series of bursts. 1417 For this reason, this is the only test for which we explicitly 1418 encourage derating. A TDS should include a table of pairs of 1419 derating parameters: what burst size to use as a fraction of the 1420 target_pipe_size, and how much each burst size is permitted to reduce 1421 the run length, relative to to the target_run_length. 1423 7.5. Combined Tests 1425 Combined tests efficiently confirm multiple network properties in a 1426 single test, possibly as a side effect of production content 1427 delivery. They require less measurement traffic than other testing 1428 strategies at the cost of conflating diagnostic signatures when they 1429 fail. These are by far the most efficient for testing networks that 1430 are expected to pass all tests. 1432 7.5.1. Sustained burst test 1434 The sustained burst test implements a combined worst case version of 1435 all of the capacity tests above. In its simplest form send 1436 target_pipe_size bursts of packets at server interface rate with 1437 target_RTT headway (burst start to burst start). Verify that the 1438 observed delivery statistics meets the target_run_length. Key 1439 observations: 1440 o The subpath under test is expected to go idle for some fraction of 1441 the time: (subpath_data_rate-target_rate)/subpath_data_rate. 1442 Failing to do so indicates a problem with the procedure and an 1443 inconclusive test result. 1444 o The burst sensitivity can be derated by sending smaller bursts 1445 more frequently. E.g. send target_pipe_size*derate packet bursts 1446 every target_RTT*derate. 1447 o When not derated this test is more strenuous than the slowstart 1448 capacity tests. 1449 o A link that passes this test is likely to be able to sustain 1450 higher rates (close to subpath_data_rate) for paths with RTTs 1451 significantly smaller than the target_RTT. Offsetting this 1452 performance underestimation is part of the rationale behind 1453 permitting derating in general. 1454 o This test can be implemented with instrumented TCP [RFC4898], 1455 using a specialized measurement application at one end [MBMSource] 1456 and a minimal service at the other end [RFC0863] [RFC0864]. A 1457 prototype tool exists and is under evaluation . 1458 o This test is efficient to implement, since it does not require 1459 per-packet timers, and can make use of TSO in modern NIC hardware. 1460 o This test is not completely sufficient: the standing window 1461 engineering tests are also needed to ensure that the link is well 1462 behaved at and beyond the onset of congestion. Links that exhibit 1463 punitive behaviors such as sudden high loss under overload may not 1464 interact well with TCP's self clock. 1465 o Assuming the link passes relevant standing window engineering 1466 tests (particularly that it has a progressive onset of loss at an 1467 appropriate queue depth) the passing sustained burst test is 1468 (believed to be) a sufficient verify that the subpath will not 1469 impair stream at the target performance under all conditions. 1470 Proving this statement is the subject of ongoing research. 1472 Note that this test is clearly independent of the subpath RTT, or 1473 other details of the measurement infrastructure, as long as the 1474 measurement infrastructure can accurately and reliably deliver the 1475 required bursts to the subpath under test. 1477 7.5.2. Streaming Media 1479 Model Based Metrics can be implemented as a side effect of serving 1480 any non-throughput maximizing traffic*, such as streaming media, with 1481 some additional controls and instrumentation in the servers. The 1482 essential requirement is that the traffic be constrained such that 1483 even with arbitrary application pauses, bursts and data rate 1484 fluctuations, the traffic stays within the envelope defined by the 1485 individual tests described above. 1487 If the serving_data_rate is less than or equal to the 1488 target_data_rate and the serving_RTT (the RTT between the sender and 1489 client) is less than the target_RTT, this constraint is most easily 1490 implemented by clamping the transport window size to be no larger 1491 than: 1493 serving_window_clamp=target_data_rate*serving_RTT/ 1494 (target_MTU-header_overhead) 1496 Under the above constraints the serving_window_clamp will limit the 1497 both the serving data rate and burst sizes to be no larger than the 1498 procedures in Section 7.1.2 and Section 7.4 or Section 7.5.1. Since 1499 the serving RTT is smaller than the target_RTT, the worst case bursts 1500 that might be generated under these conditions will be smaller than 1501 called for by Section 7.4 and the sender rate burst sizes are 1502 implicitly derated by the serving_window_clamp divided by the 1503 target_pipe_size at the very least. (The traffic might be smoother 1504 than specified by the sender interface rate bursts test.) 1506 Note that it is important that the target_data_rate be above the 1507 actual average rate needed by the application so it can recover after 1508 transient pauses caused by congestion or the application itself. 1510 In an alternative implementation the data rate and bursts might be 1511 explicitly controlled by a host shaper or pacing at the sender. This 1512 would provide better control over transmissions but it is 1513 substantially more complicated to implement and would be likely to 1514 have a higher CPU overhead. 1516 * Note that these techniques can be applied to any content delivery 1517 that can be subjected to a reduced data rate in order to inhibit TCP 1518 equilibrium behavior. 1520 8. An Example 1522 In this section a we illustrate a TDS designed to confirm that an 1523 access ISP can reliably deliver HD video from multiple content 1524 providers to all of their customers. With modern codecs HD video 1525 generally fits in 2.5 Mb/s [@@HDvideo]. Due to their geographical 1526 size, network topology and modem designs the ISP determines that most 1527 content is within a 50 mS RTT from their users (This is a sufficient 1528 RTT to cover continental Europe or either US coast from a single 1529 serving site.) 1530 2.5 Mb/s over a 50 ms path 1532 +----------------------+-------+---------+ 1533 | End to End Parameter | value | units | 1534 +----------------------+-------+---------+ 1535 | target_rate | 2.5 | Mb/s | 1536 | target_RTT | 50 | ms | 1537 | target_MTU | 1500 | bytes | 1538 | header_overhead | 64 | bytes | 1539 | target_pipe_size | 11 | packets | 1540 | target_run_length | 363 | packets | 1541 +----------------------+-------+---------+ 1543 Table 1 1545 Table 1 shows the default TCP model with no derating, and as such is 1546 quite conservative. The simplest TDS would be to use the sustained 1547 burst test, described in Section 7.5.1. Such a test would send 11 1548 packet bursts every 50mS, and confirming that there was no more than 1549 1 packet loss per 33 bursts (363 total packets in 1.650 seconds). 1551 Since this number represents is the entire end-to-ends loss budget, 1552 independent subpath tests could be implemented by apportioning the 1553 loss rate across subpaths. For example 50% of the losses might be 1554 allocated to the access or last mile link to the user, 40% to the 1555 interconnects with other ISPs and 1% to each internal hop (assuming 1556 no more than 10 internal hops). Then all of the subpaths can be 1557 tested independently, and the spatial composition of passing subpaths 1558 would be expected to be within the end-to-end loss budget. 1560 Testing interconnects has generally been problematic: conventional 1561 performance tests run between Measurement Points adjacent to either 1562 side of the interconnect, are not generally useful. Unconstrained 1563 TCP tests, such as netperf tests [@@netperf] are typically overly 1564 aggressive because the RTT is so small (often less than 1 mS). These 1565 tools are likely to report inflated numbers by pushing other traffic 1566 off of the network. As a consequence they are useless for predicting 1567 actual user performance, and may themselves be quite disruptive. 1568 Model Based Metrics solves this problem. The same test pattern as 1569 used on other links can be applied to the interconnect. For our 1570 example, when apportioned 40% of the losses, 11 packet bursts sent 1571 every 50mS should have fewer than one loss per 82 bursts (902 1572 packets). 1574 9. Validation 1576 Since some aspects of the models are likely to be too conservative, 1577 Section 5.2 permits alternate protocol models and Section 5.3 permits 1578 test parameter derating. If either of these techniques are used, we 1579 require demonstrations that such a TDS can robustly detect links that 1580 will prevent authentic applications using state-of-the-art protocol 1581 implementations from meeting the specified performance targets. This 1582 correctness criteria is potentially difficult to prove, because it 1583 implicitly requires validating a TDS against all possible links and 1584 subpaths. The procedures described here are still experimental. 1586 We suggest two approaches, both of which should be applied: first, 1587 publish a fully open description of the TDS, including what 1588 assumptions were used and and how it was derived, such that the 1589 research community can evaluate the design decisions, test them and 1590 comment on their applicability; and second, demonstrate that an 1591 applications running over an infinitessimally passing testbed do meet 1592 the performance targets. 1594 An infinitessimally passing testbed resembles a epsilon-delta proof 1595 in calculus. Construct a test network such that all of the 1596 individual tests of the TDS pass by only small (infinitesimal) 1597 margins, and demonstrate that a variety of authentic applications 1598 running over real TCP implementations (or other protocol as 1599 appropriate) meets the end-to-end target parameters over such a 1600 network. The workloads should include multiple types of streaming 1601 media and transaction oriented short flows (e.g. synthetic web 1602 traffic ). 1604 For example, for the HD streaming video TDS described in Section 8, 1605 the link layer bottleneck data rate should be exactly the header 1606 overhead above 2.5 Mb/s, the per packet random background loss 1607 probability should be 1/363, for a run length of 363 packets, the 1608 bottleneck queue should be 11 packets and the front path should have 1609 just enough buffering to withstand 11 packet interface rate bursts. 1610 We want every one of the TDS tests to fail if we slightly increase 1611 the relevant test parameter, so for example sending a 12 packet 1612 bursts should cause excess (possibly deterministic) packet drops at 1613 the dominant queue at the bottleneck. On this infinitessimally 1614 passing network it should be possible for a real application using a 1615 stock TCP implementation in the vendor's default configuration to 1616 attain 2.5 Mb/s over an 50 mS path. 1618 The most difficult part of setting up such a testbed is arranging for 1619 it to infinitesimally pass the individual tests. Two approaches: 1620 constraining the network devices not to use all available resources 1621 (e.g. by limiting available buffer space or data rate); and 1622 preloading subpaths with cross traffic. Note that is it important 1623 that a single environment be constructed which infinitessimally 1624 passes all tests at the same time, otherwise there is a chance that 1625 TCP can exploit extra latitude in some parameters (such as data rate) 1626 to partially compensate for constraints in other parameters (queue 1627 space, or viceversa). 1629 To the extent that a TDS is used to inform public dialog it should be 1630 fully publicly documented, including the details of the tests, what 1631 assumptions were used and how it was derived. All of the details of 1632 the validation experiment should also be published with sufficient 1633 detail for the experiments to be replicated by other researchers. 1634 All components should either be open source of fully described 1635 proprietary implementations that are available to the research 1636 community. 1638 10. Acknowledgements 1640 Ganga Maguluri suggested the statistical test for measuring loss 1641 probability in the target run length. Alex Gilgur for helping with 1642 the statistics and contributing and alternate model. 1644 Meredith Whittaker for improving the clarity of the communications. 1646 This work was inspired by Measurement Lab: open tools running on an 1647 open platform, using open tools to collect open data. See 1648 http://www.measurementlab.net/ 1650 11. Informative References 1652 [RFC0863] Postel, J., "Discard Protocol", STD 21, RFC 863, May 1983. 1654 [RFC0864] Postel, J., "Character Generator Protocol", STD 22, 1655 RFC 864, May 1983. 1657 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 1658 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 1659 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 1660 S., Wroclawski, J., and L. Zhang, "Recommendations on 1661 Queue Management and Congestion Avoidance in the 1662 Internet", RFC 2309, April 1998. 1664 [RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis, 1665 "Framework for IP Performance Metrics", RFC 2330, 1666 May 1998. 1668 [RFC2861] Handley, M., Padhye, J., and S. Floyd, "TCP Congestion 1669 Window Validation", RFC 2861, June 2000. 1671 [RFC3148] Mathis, M. and M. Allman, "A Framework for Defining 1672 Empirical Bulk Transfer Capacity Metrics", RFC 3148, 1673 July 2001. 1675 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 1676 Counting (ABC)", RFC 3465, February 2003. 1678 [RFC4015] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 1679 for TCP", RFC 4015, February 2005. 1681 [RFC4737] Morton, A., Ciavattone, L., Ramachandran, G., Shalunov, 1682 S., and J. Perser, "Packet Reordering Metrics", RFC 4737, 1683 November 2006. 1685 [RFC4898] Mathis, M., Heffner, J., and R. Raghunarayan, "TCP 1686 Extended Statistics MIB", RFC 4898, May 2007. 1688 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1689 Control", RFC 5681, September 2009. 1691 [RFC5835] Morton, A. and S. Van den Berghe, "Framework for Metric 1692 Composition", RFC 5835, April 2010. 1694 [RFC6049] Morton, A. and E. Stephan, "Spatial Composition of 1695 Metrics", RFC 6049, January 2011. 1697 [RFC6673] Morton, A., "Round-Trip Packet Loss Metrics", RFC 6673, 1698 August 2012. 1700 [I-D.ietf-ippm-2330-update] 1701 Fabini, J. and A. Morton, "Advanced Stream and Sampling 1702 Framework for IPPM", draft-ietf-ippm-2330-update-05 (work 1703 in progress), May 2014. 1705 [I-D.ietf-ippm-lmap-path] 1706 Bagnulo, M., Burbridge, T., Crawford, S., Eardley, P., and 1707 A. Morton, "A Reference Path and Measurement Points for 1708 LMAP", draft-ietf-ippm-lmap-path-04 (work in progress), 1709 June 2014. 1711 [MSMO97] Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The 1712 Macroscopic Behavior of the TCP Congestion Avoidance 1713 Algorithm", Computer Communications Review volume 27, 1714 number3, July 1997. 1716 [WPING] Mathis, M., "Windowed Ping: An IP Level Performance 1717 Diagnostic", INET 94, June 1994. 1719 [mpingSource] 1720 Fan, X., Mathis, M., and D. Hamon, "Git Repository for 1721 mping: An IP Level Performance Diagnostic", Sept 2013, 1722 . 1724 [MBMSource] 1725 Hamon, D., "Git Repository for Model Based Metrics", 1726 Sept 2013, . 1728 [Pathdiag] 1729 Mathis, M., Heffner, J., O'Neil, P., and P. Siemsen, 1730 "Pathdiag: Automated TCP Diagnosis", Passive and Active 1731 Measurement , June 2008. 1733 [StatQC] Montgomery, D., "Introduction to Statistical Quality 1734 Control - 2nd ed.", ISBN 0-471-51988-X, 1990. 1736 [Rtool] R Development Core Team, "R: A language and environment 1737 for statistical computing. R Foundation for Statistical 1738 Computing, Vienna, Austria. ISBN 3-900051-07-0, URL 1739 http://www.R-project.org/", , 2011. 1741 [CVST] Krueger, T. and M. Braun, "R package: Fast Cross- 1742 Validation via Sequential Testing", version 0.1, 11 2012. 1744 [CUBIC] Ha, S., Rhee, I., and L. Xu, "CUBIC: a new TCP-friendly 1745 high-speed TCP variant", SIGOPS Oper. Syst. Rev. 42, 5, 1746 July 2008. 1748 [LMCUBIC] Ledesma Goyzueta, R. and Y. Chen, "A Deterministic Loss 1749 Model Based Analysis of CUBIC, IEEE International 1750 Conference on Computing, Networking and Communications 1751 (ICNC), E-ISBN : 978-1-4673-5286-4", January 2013. 1753 [AFD] Pan, R., Breslau, L., Prabhakar, B., and S. Shenker, 1754 "Approximate fairness through differential dropping", 1755 SIGCOMM Comput. Commun. Rev. 33, 2, April 2003. 1757 [wikiBloat] 1758 Wikipedia, "Bufferbloat", http://en.wikipedia.org/w/ 1759 index.php?title=Bufferbloat&oldid=608805474, June 2014. 1761 [CCscaling] 1762 Fernando, F., Doyle, J., and S. Steven, "Scalable laws for 1763 stable network congestion control", Proceedings of 1764 Conference on Decision and 1765 Control, http://www.ee.ucla.edu/~paganini, December 2001. 1767 Appendix A. Model Derivations 1769 The reference target_run_length described in Section 5.2 is based on 1770 very conservative assumptions: that all window above target_pipe_size 1771 contributes to a standing queue that raises the RTT, and that classic 1772 Reno congestion control with delayed ACKs are in effect. In this 1773 section we provide two alternative calculations using different 1774 assumptions. 1776 It may seem out of place to allow such latitude in a measurement 1777 standard, but the section provides offsetting requirements. 1779 The estimates provided by these models make the most sense if network 1780 performance is viewed logarithmically. In the operational Internet, 1781 data rates span more than 8 orders of magnitude, RTT spans more than 1782 3 orders of magnitude, and loss probability spans at least 8 orders 1783 of magnitude. When viewed logarithmically (as in decibels), these 1784 correspond to 80 dB of dynamic range. On an 80 db scale, a 3 dB 1785 error is less than 4% of the scale, even though it might represent a 1786 factor of 2 in untransformed parameter. 1788 This document gives a lot of latitude for calculating 1789 target_run_length, however people designing a TDS should consider the 1790 effect of their choices on the ongoing tussle about the relevance of 1791 "TCP friendliness" as an appropriate model for Internet capacity 1792 allocation. Choosing a target_run_length that is substantially 1793 smaller than the reference target_run_length specified in Section 5.2 1794 strengthens the argument that it may be appropriate to abandon "TCP 1795 friendliness" as the Internet fairness model. This gives developers 1796 incentive and permission to develop even more aggressive applications 1797 and protocols, for example by increasing the number of connections 1798 that they open concurrently. 1800 A.1. Queueless Reno 1802 In Section 5.2 it is assumed that the target rate is the same as the 1803 link rate, and any excess window causes a standing queue at the 1804 bottleneck. This might be representative of a non-shared access 1805 link. An alternative situation would be a heavily aggregated subpath 1806 where individual flows do not significantly contribute to the 1807 queueing delay, and losses are determined monitoring the average data 1808 rate, for example by the use of a virtual queue as in [AFD]. In such 1809 a scheme the RTT is constant and TCP's AIMD congestion control causes 1810 the data rate to fluctuate in a sawtooth. If the traffic is being 1811 controlled in a manner that is consistent with the metrics here, goal 1812 would be to make the actual average rate equal to the 1813 target_data_rate. 1815 We can derive a model for Reno TCP and delayed ACK under the above 1816 set of assumptions: for some value of Wmin, the window will sweep 1817 from Wmin packets to 2*Wmin packets in 2*Wmin RTT. Unlike the 1818 queueing case where Wmin = Target_pipe_size, we want the average of 1819 Wmin and 2*Wmin to be the target_pipe_size, so the average rate is 1820 the target rate. Thus we want Wmin = (2/3)*target_pipe_size. 1822 Between losses each sawtooth delivers (1/2)(Wmin+2*Wmin)(2Wmin) 1823 packets in 2*Wmin round trip times. 1825 Substituting these together we get: 1827 target_run_length = (4/3)(target_pipe_size^2) 1829 Note that this is 44% of the reference run length. This makes sense 1830 because under the assumptions in Section 5.2 the AMID sawtooth caused 1831 a queue at the bottleneck, which raised the effective RTT by 50%. 1833 A.2. CUBIC 1835 CUBIC has three operating regions. The model for the expected value 1836 of window size derived in [LMCUBIC] assumes operation in the 1837 "concave" region only, which is a non-TCP friendly region for long- 1838 lived flows. The authors make the following assumptions: packet loss 1839 probability, p, is independent and periodic, losses occur one at a 1840 time, and they are true losses due to tail drop or corruption. This 1841 definition of p aligns very well with our definition of 1842 target_run_length and the requirement for progressive loss (AQM). 1844 Although CUBIC window increase depends on continuous time, the 1845 authors transform the time to reach the maximum Window size in terms 1846 of RTT and a parameter for the multiplicative rate decrease on 1847 observing loss, beta (whose default value is 0.2 in CUBIC). The 1848 expected value of Window size, E[W], is also dependent on C, a 1849 parameter of CUBIC that determines its window-growth aggressiveness 1850 (values from 0.01 to 4). 1852 E[W] = ( C*(RTT/p)^3 * ((4-beta)/beta) )^-4 1854 and, further assuming Poisson arrival, the mean throughput, x, is 1856 x = E[W]/RTT 1858 We note that under these conditions (deterministic single losses), 1859 the value of E[W] is always greater than 0.8 of the maximum window 1860 size ~= reference_run_length. @@@@ 1862 Appendix B. Complex Queueing 1864 For many network technologies simple queueing models do not apply: 1865 the network schedules, thins or otherwise alters the timing of ACKs 1866 and data, generally to raise the efficiency of the channel allocation 1867 process when confronted with relatively widely spaced small ACKs. 1868 These efficiency strategies are ubiquitous for half duplex, wireless 1869 and broadcast media. 1871 Altering the ACK stream generally has two consequences: it raises the 1872 effective bottleneck data rate, making slowstart burst at higher 1873 rates (possibly as high as the sender's interface rate) and it 1874 effectively raises the RTT by the average time that the ACKs were 1875 delayed. The first effect can be partially mitigated by reclocking 1876 ACKs once they are beyond the bottleneck on the return path to the 1877 sender, however this further raises the effective RTT. 1879 The most extreme example of this sort of behavior would be a half 1880 duplex channel that is not released as long as end point currently 1881 holding the channel has queued traffic. Such environments cause self 1882 clocked protocols under full load to revert to extremely inefficient 1883 stop and wait behavior, where they send an entire window of data as a 1884 single burst, followed by the entire window of ACKs on the return 1885 path. 1887 If a particular end-to-end path contains a link or device that alters 1888 the ACK stream, then the entire path from the sender up to the 1889 bottleneck must be tested at the burst parameters implied by the ACK 1890 scheduling algorithm. The most important parameter is the Effective 1891 Bottleneck Data Rate, which is the average rate at which the ACKs 1892 advance snd.una. Note that thinning the ACKs (relying on the 1893 cumulative nature of seg.ack to permit discarding some ACKs) is 1894 implies an effectively infinite bottleneck data rate. It is 1895 important to note that due to the self clock, ill conceived channel 1896 allocation mechanisms can increase the stress on upstream links in a 1897 long path. 1899 Holding data or ACKs for channel allocation or other reasons (such as 1900 error correction) always raises the effective RTT relative to the 1901 minimum delay for the path. Therefore it may be necessary to replace 1902 target_RTT in the calculation in Section 5.2 by an effective_RTT, 1903 which includes the target_RTT reflecting the fixed part of the path 1904 plus a term to account for the extra delays introduced by these 1905 mechanisms. 1907 Appendix C. Version Control 1909 Formatted: Thu Jul 3 20:19:04 PDT 2014 1911 Authors' Addresses 1913 Matt Mathis 1914 Google, Inc 1915 1600 Amphitheater Parkway 1916 Mountain View, California 94043 1917 USA 1919 Email: mattmathis@google.com 1921 Al Morton 1922 AT&T Labs 1923 200 Laurel Avenue South 1924 Middletown, NJ 07748 1925 USA 1927 Phone: +1 732 420 1571 1928 Email: acmorton@att.com 1929 URI: http://home.comcast.net/~acmacm/