idnits 2.17.1 draft-irtf-tmrg-tests-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 64 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 220 has weird spacing: '..., where c is...' == Line 232 has weird spacing: '... namely rho/(...' == Line 233 has weird spacing: '...C , and lambd...' == Line 250 has weird spacing: '...rt with rho/(...' == Line 259 has weird spacing: '...ts with rho>1...' == (6 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 9, 2009) is 5404 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Online' is mentioned on line 981, but not defined -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) Summary: 1 error (**), 0 flaws (~~), 9 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group L. Andrew, Ed. 3 Internet-Draft CAIA, Swinburne University of 4 Intended status: BCP Technology 5 Expires: January 10, 2010 S. Floyd, Ed. 6 ICSI Center for Internet Research 7 G. Wang, editor 8 NEC, China 9 July 9, 2009 11 Common TCP Evaluation Suite 12 draft-irtf-tmrg-tests-02.txt 14 Status of this Memo 16 This Internet-Draft is submitted to IETF in full conformance with the 17 provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on January 10, 2010. 37 Copyright Notice 39 Copyright (c) 2009 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents in effect on the date of 44 publication of this document (http://trustee.ietf.org/license-info). 45 Please review these documents carefully, as they describe your 46 rights and restrictions with respect to this document. 48 Abstract 50 This document presents an evaluation test suite for the initial 51 evaluation of proposed TCP modifications. The goal of the test suite 52 is to allow researchers to quickly and easily evaluate their proposed 53 TCP extensions in simulators and testbeds using a common set of well- 54 defined, standard test cases, in order to compare and contrast 55 proposals against standard TCP as well as other proposed 56 modifications. This test suite is not intended to result in an 57 exhaustive evaluation of a proposed TCP modification or new 58 congestion control mechanism. Instead, the focus is on quickly and 59 easily generating an initial evaluation report that allows the 60 networking community to understand and discuss the behavioral aspects 61 of a new proposal, in order to guide further experimentation that 62 will be needed to fully investigate the specific aspects of a new 63 proposal. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Traffic generation . . . . . . . . . . . . . . . . . . . . . . 4 69 2.1. Loads . . . . . . . . . . . . . . . . . . . . . . . . . . 5 70 2.2. Equilibrium . . . . . . . . . . . . . . . . . . . . . . . 6 71 2.3. Packet size distribution . . . . . . . . . . . . . . . . . 7 72 2.4. Round Trip Times . . . . . . . . . . . . . . . . . . . . . 7 73 3. Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 8 74 3.1. Basic scenarios . . . . . . . . . . . . . . . . . . . . . 9 75 3.1.1. Topology and background traffic . . . . . . . . . . . 9 76 3.1.2. Flows under test . . . . . . . . . . . . . . . . . . . 11 77 3.1.3. Outputs . . . . . . . . . . . . . . . . . . . . . . . 12 78 3.2. Delay/throughput tradeoff as function of queue size . . . 12 79 3.2.1. Topology and background traffic . . . . . . . . . . . 12 80 3.2.2. Flows under test . . . . . . . . . . . . . . . . . . . 13 81 3.2.3. Outputs . . . . . . . . . . . . . . . . . . . . . . . 13 82 3.3. Ramp up time: completion time of one flow . . . . . . . . 13 83 3.3.1. Topology and background traffic . . . . . . . . . . . 13 84 3.3.2. Flows under test . . . . . . . . . . . . . . . . . . . 14 85 3.3.3. Outputs . . . . . . . . . . . . . . . . . . . . . . . 14 86 3.4. Transients: release of bandwidth, arrival of many flows . 14 87 3.4.1. Topology and background traffic . . . . . . . . . . . 14 88 3.4.2. Flows under test . . . . . . . . . . . . . . . . . . . 15 89 3.4.3. Outputs . . . . . . . . . . . . . . . . . . . . . . . 15 90 3.5. Impact on standard TCP traffic . . . . . . . . . . . . . . 15 91 3.5.1. Topology and background traffic . . . . . . . . . . . 15 92 3.5.2. Flows under test . . . . . . . . . . . . . . . . . . . 16 93 3.5.3. Outputs . . . . . . . . . . . . . . . . . . . . . . . 16 94 3.5.4. Suggestions . . . . . . . . . . . . . . . . . . . . . 16 95 3.6. Intra-protocol and inter-RTT fairness . . . . . . . . . . 17 96 3.6.1. Topology and background traffic . . . . . . . . . . . 17 97 3.6.2. Flows under test . . . . . . . . . . . . . . . . . . . 17 98 3.6.3. Outputs . . . . . . . . . . . . . . . . . . . . . . . 17 99 3.7. Multiple bottlenecks . . . . . . . . . . . . . . . . . . . 18 100 3.7.1. Topology and background traffic . . . . . . . . . . . 18 101 3.7.2. Flows under test . . . . . . . . . . . . . . . . . . . 19 102 3.7.3. Outputs . . . . . . . . . . . . . . . . . . . . . . . 20 103 3.8. Implementations . . . . . . . . . . . . . . . . . . . . . 20 104 3.9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 20 105 3.10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 20 106 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 107 5. Security Considerations . . . . . . . . . . . . . . . . . . . 21 108 6. Informative References . . . . . . . . . . . . . . . . . . . . 21 109 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 111 1. Introduction 113 This document describes a common test suite for the initial 114 evaluation of new TCP extensions. It defines a small number of 115 evaluation scenarios, including traffic and delay distributions, 116 network topologies, and evaluation parameters and metrics. The 117 motivation for such an evaluation suite is to help researchers in 118 evaluating their proposed modifications to TCP. The evaluation suite 119 will also enable independent duplication and verification of reported 120 results by others, which is an important aspect of the scientific 121 method that is not often put to use by the networking community. A 122 specific target is that the evaluations should be able to be 123 completed in three days of simulations, or in a reasonable amount of 124 effort in a testbed. 126 This document is an outcome of a ``round-table'' meeting on TCP 127 evaluation, held at Caltech on November 8-9, 2007. This document is 128 the first step in constructing the evaluation suite; the goal is for 129 the evaluation suite to be adapted in response from feedback from the 130 networking community. 132 2. Traffic generation 134 Congestion control concerns the response of flows to bandwidth 135 limitations or to the presence of other flows. For a realistic 136 testing of a congestion control protocol, we design scenarios to use 137 reasonably-typical traffic; most scenarios use traffic generated from 138 a traffic generator, with a range of start times for user sessions, 139 connection sizes, and the like, mimicking the traffic patterns 140 commonly observed in the Internet. Cross-traffic and reverse-path 141 traffic have the desirable effect of reducing the occurrence of 142 pathological conditions such as global synchronization among 143 competing flows that might otherwise be mis-interpreted as normal 144 average behaviours of those protocols [FK03], [MV06]. This traffic 145 must be reasonably realistic for the tests to predict the behaviour 146 of congestion control protocols in real networks, and also well- 147 defined so that statistical noise does not mask important effects. 149 It is important that the same ``amount'' of congestion or cross- 150 traffic be used for the testing scenarios of different congestion 151 control algorithms. This is complicated by the fact that packet 152 arrivals and even flow arrivals are influenced by the behavior of the 153 algorithms. For this reason, a pure packet-level generation of 154 traffic where generated traffic does not respond to the behaviour of 155 other present flows is not suitable. Instead, emulating application 156 or user behaviours at the end points using reactive protocols such as 157 TCP in a closed-loop fashion results in a closer approximation of 158 cross-traffic, where user behaviours are modeled by well-defined 159 parameters for source inputs (e.g., request sizes for HTTP), 160 destination inputs (e.g., response size), and think times between 161 pairs of source and destination inputs. By setting appropriate 162 parameters for the traffic generator, we can emulate non-greedy user- 163 interactive traffic (e.g., HTTP 1.1, SMTP and Telnet) as well as 164 greedy traffic (e.g., P2P and long file downloads). This approach 165 models protocol reactions to the congestion caused by other flows in 166 the common paths, although it fails to model the reactions of users 167 themselves to the presence of the congestion. 169 While the protocols being tested may differ, it is important that we 170 maintain the same ``load'' or level of congestion for the 171 experimental scenarios. To enable this, we use a hybrid of open-loop 172 and close-loop approaches. For this test suite, network traffic 173 consists of sessions corresponding to individual users. Because 174 users are independent, these session arrivals are well modeled by an 175 open-loop Poisson process. A session may consist of a single greedy 176 TCP flow, multiple greedy flows separated by user ``think'' times, or 177 a single non-greedy flow with embedded think times. The session 178 arrival process forms a Poisson process [HVA03]. Both the think 179 times and burst sizes have heavy-tailed distributions, with the exact 180 distribution based on empirical studies. The think times and burst 181 sizes will be chosen independently. This is unlikely to be the case 182 in practice, but we have not been able to find any measurements of 183 the joint distribution. We invite researchers to study this joint 184 distribution, and future revisions of this test suite will use such 185 statistics when they are available. 187 There are several traffic generators available that implement a 188 similar approach to that discussed above. For now, we are planning 189 to use the Tmix [Tmix] traffic generator. Tmix represents each TCP 190 connection by a connection vector consisting of a sequence of 191 (request-size, response-size, think-time) triples, thus representing 192 bi-directional traffic. Connection vectors used for traffic 193 generation can be obtained from Internet traffic traces. By taking 194 measurement from various points of the Internet such as campus 195 networks, DSL access links, and IPS core backbones, we can obtain 196 sets of connection vectors for different levels of congested links. 197 We plan to publish these connection vectors as part of this test 198 suite. A draft set of connection vectors is avilable at 199 . 201 2.1. Loads 203 For most current traffic generators, the traffic is specified by an 204 arrival rate for independent user sessions, along with specifications 205 of connection sizes, number of connections per sessions, user wait 206 times within sessions, and the like. For many of the scenarios, such 207 as the basic scenarios in Section 3.1, each scenario is run for a 208 range of loads, where the load is varied by varying the rate of 209 session arrivals. For a given congestion control mechanism, 210 experiments run with different loads are likely to have different 211 packet drop rates, and different levels of statistical multiplexing. 213 Because the session arrival times are specified independently of the 214 transfer times, one way to specify the load would be as A = 215 E[f]/E[t], where E[f] is the mean session size (in bits 216 transferred), E[t] is the mean session inter-arrival time in 217 seconds, and A is the load in bps. 219 It is important to test congestion control in ``overloaded'' 220 conditions. However, if A>c , where c is the capacity of the 221 bottleneck link, then the system has no equilibrium. Such cases are 222 studied in Section 3.4. In long-running experiments with A>c , the 223 expected number of flows would increase without bound. This means 224 that the measured results would be very sensitive to the duration of 225 the simulation. 227 Instead, for equilibrium experiments, we measure the load as the 228 ``mean number of jobs in an M/G/1 queue using processor sharing,'' 229 where a job is a user session. This reflects the fact that TCP aims 230 at processor sharing of variable sized files. Because processor 231 sharing is a symmetric discipline [Kelly79], the mean number of flows 232 is equal to that of an M/M/1 queue, namely rho/(1-rho) , where 233 rho=lambda S/C , and lambda [flows per second] is the arrival rate 234 of jobs/flows, S [bits] is the mean job size and C [bits per 235 second] is the bottleneck capacity. For small loads, say 10%, this 236 is essentially equal to the fraction of the capacity. However, for 237 overloaded systems, the fraction of the bandwidth used will be much 238 less than this measure of load. 240 In order to improve the traffic generators used in these scenarios, 241 we invite researchers to explore how the user behavior, as reflected 242 in the connection sizes, user wait times, and number of connections 243 per session, might be affected by the level of congestion experienced 244 within a session [RMC03]. 246 2.2. Equilibrium 248 In order to minimize the dependence of the results on the experiment 249 durations, scenarios should be as stationary as possible. To this 250 end, experiments will start with rho/(1-rho) active cross-traffic 251 flows, with traffic of the specified load. 253 * This is insufficient if the traces have very long pauses between 254 bursts, because this initial loading has all finished by the time the 255 number of actual tmix sessions builds up. It may be better to start 256 with several (many?) pre-existing connection vectors instead of 257 greedy sources. * 259 *It is still an open issue whether to use tests with rho>1 . If 260 such tests are used, the initial number of flows will need to be 261 defined.* 263 Note that the distribution of the durations of the active flows at a 264 given time is (often significantly) different from the distribution 265 of flow durations, skewed toward long flows. For simplicity, this 266 will be ignored and the initial flow sizes will be drawn from the 267 general flow size distribution. 269 2.3. Packet size distribution 271 For flows generated by the traffic generator, 10% use 536-byte 272 packets, and 90% 1500-byte packets. The packet size of each flow 273 will be specified along with the start time and duration, to maximize 274 the repeatability. 276 2.4. Round Trip Times 278 Most tests use a simple dumbbell topology with a central link that 279 connects two routers, as illustrated in Figure 1. Each router is 280 connected to three nodes by edge links. In order to generate a 281 typical range of round trip times, edge links have different delays. 282 On one side, the one-way propagation delays are: 0 ms, 12 ms and 283 25 ms; on the other: 2 ms, 37 ms, and 75 ms. Traffic is uniformly 284 shared among the nine source/destination pairs, giving a distribution 285 of per-flow RTTs in the absence of queueing delay shown in Figure 2. 286 These RTTs are computed for a dumbbell topology with a delay of 0 ms 287 for the central link. The delay for the central link is given in the 288 specific scenarios in the next section. 290 Node 1 Node 4 291 \_ _/ 292 \_ _/ 293 \_ __________ Intermediate __________ _/ 294 | | link | | 295 Node 2 ------| Router 1 |----------------| Router 2 |------ Node 5 296 _|__________| |__________|_ 297 _/ \_ 298 _/ \_ 299 Node 3 / \ Node 6 301 A dumbbell topology 303 Figure 1 305 For dummynet experiments, delays can be obtained by specifying the 306 delay of each flow. 308 ------------------------------------------ 309 | Path | RTT || Path | RTT || Path | RTT | 310 |------+-----++------+-----++------+-----| 311 | 1-4 | 4 || 1-5 | 74 || 1-6 | 150 | 312 | 2-4 | 28 || 2-5 | 98 || 2-6 | 174 | 313 | 3-4 | 54 || 3-5 | 124 || 3-6 | 200 | 314 ------------------------------------------ 316 RTTs of the paths between two nodes, in milliseconds. * These RTTs 317 are subject to change, based on comparison between the resulting 318 packet-weighted RTT distribution and measurements* * I'd like to 319 change the RTT 1-4 to 3ms or 5ms instead of 4ms... -- LA* 321 Figure 2 323 3. Scenarios 325 It is not possible to provide TCP researchers with a complete set of 326 scenarios for an exhaustive evaluation of a new TCP extension; 327 especially because the characteristics of a new extension will often 328 require experiments with specific scenarios that highlight its 329 behavior. On the other hand, an exhaustive evaluation of a TCP 330 extension will need to include several standard scenarios, and it is 331 the focus of the test suite described in this section to define this 332 initial set of test cases. 334 3.1. Basic scenarios 336 The purpose of the basic scenarios is to explore the behavior of a 337 TCP extension over different link types. The scenarios use the 338 dumbbell topology of Section 2.4, with the link delays modified as 339 specified below. 341 This basic topology is used to instantiate several basic scenarios, 342 by appropriately choosing capacity and delay parameters for the 343 individual links. Depending on the configuration, the bottleneck 344 link may be in one of the edge links or the central link. 346 3.1.1. Topology and background traffic 348 The basic scenarios are for a single topology, with a range of 349 capacities and RTTs. For each scenario, traffic levels of 350 uncongested, mild congestion, and moderate congestion are specified; 351 these are explained below. 353 *Data Center:* The data center scenario models a case where bandwidth 354 is plentiful and link delays are generally low. It uses the same 355 configuration for the central link and all of the edge links. All 356 links have a capacity of either 1 Gbps, 2.5 Gbps or 10 Gbps; links 357 from nodes 1, 2 and 4 have a one-way propagation delay of 1 ms, while 358 those from nodes 3, 5 and 6 have 10 ms [WCL05], and the common link 359 has 0 ms delay. 361 Uncongested: TBD Mild congestion: TBD Moderate congestion: 362 TBD 364 *Access Link:* The access link scenario models an access link 365 connecting an institution (e.g., a university or corporation) to an 366 ISP. The central and edge links are all 100 Mbps. The one-way 367 propagation delay of the central link is 2 ms, while the edge links 368 have the delays given in Section 2.4. Our goal in assigning delays 369 to edge links is only to give a realistic distribution of round-trip 370 times for traffic on the central link. 372 Uncongested: TBD Mild congestion: TBD Moderate congestion: 373 TBD 375 *Trans-Oceanic Link:* The trans-oceanic scenario models a test case 376 where mostly lower-delay edge links feed into a high-delay central 377 link. The central link is 1 Gbps, with a one-way propagation delay 378 of 65 ms. The edge links have the same bandwidth as the central 379 link, with the one-way delays given in Section 2.4. An alternative 380 would be to use smaller delays for the edge links, with one-way 381 delays for each set of three edge links of 5, 10, and 25 ms. 383 *Implementations may use a smaller bandwidth for the trans-oceanic 384 link, for example to run a simulation in a feasible amount of time. 385 In testbeds, one of the metrics should be the number of timeouts in 386 servers, due to implementation issues when running at high speed.* 388 Uncongested: TBD Mild congestion: TBD Moderate congestion: 389 TBD 391 *Geostationary Satellite:* The geostationary satellite scenario 392 models an asymmetric test case with a high-bandwidth downlink and a 393 low-bandwidth uplink [HK99], [GF04]. The capacity of the central 394 link is 40 Mbps with a one-way propagation delay of 300 ms. The 395 downlink capacity of the edge links is also 40 Mbps, but their uplink 396 capacity is only 4 Mbps. Edge one-way delays are as given in 397 Section 2.4. Note that ``downlink'' is towards the router for edge 398 links attached to the first router, and away from the router for edge 399 links on the other router. 401 Uncongested: TBD Mild congestion: TBD Moderate congestion: 402 TBD 404 *Wireless Access:* The wireless access scenario models wireless 405 access to the wired backbone. The capacity of the central link is 406 100 Mbps with 2 ms of one-way delay. All links to Router 1 are 407 wired. Router 2 has a shared wireless link of nominal bit rate 408 11 Mbps (to model IEEE 802.11b links) or 54 Mbps (IEEE 802.11a/g) 409 with a one-way delay of 1us connected to dummy nodes 4', 5' and 6', 410 which are then connected to nodes 4, 5 and 6 by wired links of delays 411 2, 37 and 75 ms. This is to achieve the same RTT distribution as the 412 other scenarios, while allowing a CSMA model to have realistic delay 413 for a WLAN. 415 Note that wireless links have many other unique properties not 416 captured by delay and bitrate. In particular, the physical layer 417 might suffer from propagation effects that result in packet losses, 418 and the MAC layer might add high jitter under contention or large 419 steps in bandwidth due to adaptive modulation and coding. Specifying 420 these properties is beyond the scope of the current first version of 421 this test suite. 423 Uncongested: TBD Mild congestion: TBD Moderate congestion: 424 TBD 426 *Dial-up Link:* The dial-up link scenario models a network with a 427 dial-up link of 64 kbps and a one-way delay of 5 ms for the central 428 link. *modems are asymmetric, 56k downlink and 33.6k or 48k uplink. 429 Should we change this?* This could be thought of as modeling a 430 scenario reported as typical in Africa, with many users sharing a 431 single low-bandwidth dial-up link. 433 Uncongested: TBD Mild congestion: TBD Moderate congestion: 434 TBD 436 *Traffic:* For each of the basic scenarios, three cases are tested: 437 uncongested; mild congestion, and moderate congestion. All cases 438 will use scaled versions of the traces available at 439 . *The exact traffic loads and run 440 times for each scenario still need to be agreed upon. There is 441 ongoing debate about whether rho>1 is needed to get moderate to 442 high congestion. If rho>1 is used, note that the results will 443 depend heavily on the run time, because congestion will progressively 444 build up. In those cases, metrics which consider this non- 445 stationarity may be more useful than average quantities.* In the 446 default case, the reverse path has a low level of traffic (10% load). 447 The buffer size at the two routers is set to the maximum bandwidth- 448 delay-product for a 100 ms flow (i.e., a maximum queueing delay of 449 100 ms), with drop-tail queues in units of packets. Each run will be 450 for at least a hundred seconds, and the metrics will not cover the 451 initial warm-up times of each run. (Testbeds might use longer run 452 times, as should simulations with smaller bandwidth-delay products.) 454 As with all of the scenarios in this document, the basic scenarios 455 could benefit from more measurement studies about characteristics of 456 congested links in the current Internet, and about trends that could 457 help predict the characteristics of congested links in the future. 458 This would include more measurements on typical packet drop rates, 459 and on the range of round-trip times for traffic on congested links. 461 For the access link scenario, more extensive simulations or 462 experiments will be run, with both drop-tail and RED queue 463 management, with drop-tail queues in units of both bytes and packets, 464 and with RED queue management both in byte mode and in packet mode. 465 Specific TCP extensions may require the evaluation of associated AQM 466 mechanisms. For the access link scenario, simulations or experiments 467 will also include runs with a reverse-path load equal to the forward- 468 path load. For the access link scenario, additional experiments will 469 use a range of buffer sizes, including 20% and 200% of the bandwidth- 470 delay product for a 100 ms flow. 472 3.1.2. Flows under test 474 For this basic scenario, there is no differentiation between ``cross- 475 traffic'' and the ``flows under test''. The aggregate traffic is 476 under test, with the metrics exploring both aggregate traffic and 477 distributions of flow-specific metrics. 479 3.1.3. Outputs 481 For each run, the following metrics will be collected, for the 482 central link in each direction: the aggregate link utilization, the 483 average packet drop rate, and the average queueing delay, all over 484 the second half of the run. *This metric could be difficult to gather 485 in emulated testbeds since routers statistics of queue utilization 486 are not always reliable and depend on time-scale.* Separate 487 statistics should be reported for each direction in the satellite and 488 wireless access scenarios, since those networks are asymmetric. 489 *Should "over the second half of the run" be "starting after 50s"? 490 Sally used the second half of the run, for 100s simulations, but we 491 to get non-random results, we should run for longer. The warm-up 492 time doesn't need to scale up with the run length.* 494 Other metrics of interest for general scenarios can be grouped in two 495 sets: flow-centric and stability. The flow-centric metrics include 496 the sending rate, good-put, cumulative loss and queueing delay 497 trajectory for each flow, over time, and the transfer time per flow 498 versus file size. *Testbeds could use monitors in the TCP layer 499 (e.g., Web100) to estimate the queueing delay and loss.* *NS2 flowmon 500 has problems, because it seems not to release memory associated with 501 terminated flows. * Stability properties of interest include the 502 standard deviation of the throughput and the queueing delay for the 503 bottleneck link and for flows [WCL05]. The worst case stability is 504 also considered. 506 3.2. Delay/throughput tradeoff as function of queue size 508 Different queue management mechanisms have different delay-throughput 509 tradeoffs. E.g., Adaptive Virtual Queue [KS01] gives low delay, at 510 the expense of lower throughput. Different congestion control 511 mechanisms may have different tradeoffs, which these tests aim to 512 illustrate. 514 3.2.1. Topology and background traffic 516 These tests use the topology of Section 2.4. This test is run for 517 the access link scenario in Section 3.1. 519 For each Drop-Tail scenario set, five tests are run, with buffer 520 sizes of 10%, 20%, 50%, 100%, and 200% of the Bandwidth Delay Product 521 (BDP) for a 100 ms flow. For each AQM scenario (if used), five tests 522 are run, with a target average queue size of 2.5%, 5%, 10%, 20%, and 523 50% of the BDP, with a buffer equal to the BDP. 525 3.2.2. Flows under test 527 The level of traffic from the traffic generator will be specified so 528 that when a buffer size of 100% of the BDP is used with Drop Tail 529 queue management, there is a moderate level of congestion (e.g., 1-2% 530 packet drop rates when Standard TCP is used). Alternately, a range 531 of traffic levels could be chosen, with a scenario set run for each 532 traffic level (as in the examples cited below). 534 3.2.3. Outputs 536 For each test, three figures are kept, the average throughput, the 537 average packet drop rate, and the average queueing delay over the 538 second half of the test. 540 For each set of scenarios, the output is two graphs. For the delay/ 541 bandwidth graph, the x-axis shows the average queueing delay, and the 542 y-axis shows the average throughput. For the drop-rate graph, the 543 x-axis shows the average queueing delay, and the y-axis shows the 544 average packet drop rate. Each pair of graphs illustrates the delay/ 545 throughput/drop-rate tradeoffs for this congestion control mechanism. 546 For an AQM mechanism, each pair of graphs also illustrates how the 547 throughput and average queue size vary (or don't vary) as a function 548 of the traffic load. Examples of delay/throughput tradeoffs appear 549 in Figures 1-3 of [FS01] and Figures 4-5 of [AHM08]. 551 3.3. Ramp up time: completion time of one flow 553 These tests aim to determine how quickly existing flows make room for 554 new flows. 556 3.3.1. Topology and background traffic 558 Dumbbell. At least three capacities should be used, as close as 559 possible to: 56 kbps, 10 Mbps and 1 Gbps. The 56 kbps case is 560 included to investigate the performance using mobile handsets. 562 For each capacity, three RTT scenarios should be tested, in which the 563 existing and newly arriving flow have RTTs of (74 ms, 74 ms), 564 (124 ms, 28 ms) and (28 ms, 124 ms). *Was (80,80), (120,30), 565 (30,120), but the above are taken from Table 1 to simplify 566 implementation. OK? * 568 Throughout the experiment, there is also 10% bidirectional cross 569 traffic, as described in Section 2, using the mix of RTTs described 570 in Section 2.4. All traffic is from the new TCP extension. 572 3.3.2. Flows under test 574 Traffic is dominated by two long lived flows, because we believe that 575 to be the worst case, in which convergence is slowest. 577 One flow starts in ``equilibrium'' (at least having finished normal 578 slow-start). A new flow then starts; slow-start is disabled by 579 setting the initial slow-start threshold to the initial CWND. Slow 580 start is disabled because this is the worst case, and could happen if 581 a loss occurred in the first RTT. * Roman Chertov has suggested doing 582 some tests with slow start enabled too. Will there be time? Wait 583 until initial NS2 implementation is available to test * 585 The experiment ends once the new flow has run for five minutes. Both 586 of the flows use 1500-byte packets. 588 3.3.3. Outputs 590 The output of these experiments are the time until the 1500(10^n) th 591 byte of the new flow is received, for n = 1,2,... . This measures 592 how quickly the existing flow releases capacity to the new flow, 593 without requiring a definition of when ``fairness'' has been 594 achieved. By leaving the upper limit on n unspecified, the test 595 remains applicable to very high-speed networks. 597 A single run of this test cannot achieve statistical reliability by 598 running for a long time. Instead, an average over at least three 599 runs should be taken. Each run must use different cross traffic, as 600 specified in Section 2. 602 3.4. Transients: release of bandwidth, arrival of many flows 604 These tests investigate the impact of a sudden change of congestion 605 level. They differ from the "Ramp up time" test in that the 606 congestion here is caused by unresponsive traffic. 608 3.4.1. Topology and background traffic 610 The network is a single bottleneck link, with bit rate 100 Mbps, with 611 a buffer of 1024 packets (120% BDP at 100 ms). 613 The transient traffic is generated using UDP, to avoid overlap with 614 the scenario of Section 3.3 and isolate the behavior of the flows 615 under study. Three transients are tested: 616 1. step decrease from 75 Mbps to 0 Mbps, 617 2. step increase from 0 Mbps to 75 Mbps, 618 3. 30 step increases of 2.5 Mbps at 1 s intervals, simulating a 619 ``flash crowd'' effect. 620 These transients occur after the flow under test has exited slow- 621 start, and remain until the end of the experiment. 623 There is no TCP cross traffic as described in Section 2 in this 624 experiment. because flow arrivals/departures occur on timescales long 625 compared with these effects. 627 3.4.2. Flows under test 629 There is one flow under test: a long-lived flow in the same direction 630 as the transient traffic, with a 100 ms RTT. 632 3.4.3. Outputs 634 For the decrease in cross traffic, the metrics are (i) the time taken 635 for the flow under test to increase its window to 60%, 80% and 90% of 636 its BDP, and (ii) the maximum change of the window in a single RTT 637 while the window is increasing to that value. 639 For cases with an increase in cross traffic, the metric is the number 640 of packets dropped by the cross traffic from the start of the 641 transient until 100 s after the transient. This measures the harm 642 caused by algorithms which reduce their rates too slowly on 643 congestion. 645 3.5. Impact on standard TCP traffic 647 Many new TCP proposals achieve a gain, G , in their own throughput 648 at the expense of a loss, L , in the throughput of standard TCP 649 flows sharing a bottleneck, as well as by increasing the link 650 utilization. In this context a "standard TCP flow" is defined as a 651 flow using SACK TCP [RFC2883], but without ECN [RFC3168]. *What 652 about: * Window scaling [RFC1323] (yes), FRTO [RFC4138] (yes), ABC 653 [RFC3465] (no)? The intention is for a "standard TCP flow" to 654 correspond to TCP as commonly deployed in the Internet today (with 655 the notable exception of CUBIC, which runs by default on the majority 656 of web servers). This scenario quantifies this tradeoff. 658 3.5.1. Topology and background traffic 660 The dumbbell of Section 2.4 is used with the same capacities as for 661 the convergence tests (Section 3.3). All traffic in this scenario 662 comes from the flows under test. 664 3.5.2. Flows under test 666 The scenario is performed by conducting pairs of experiments, with 667 identical flow arrival times and flow sizes. Within each experiment, 668 flows are divided into two camps. For every flow in camp A, there is 669 a flow with the same size, source and destination in camp B, and vice 670 versa. The start time of the two flows are within 2 s. 672 The file sizes and start times are as specified in Section 2, with 673 start times scaled to achieve loads of 50% and 100%. In addition, 674 both camps have a long-lived flow. The experiments last for 1200 675 seconds. 677 In the first experiment, called BASELINE, both camp A and camp B use 678 standard TCP. In the second, called MIX, camp A uses standard TCP 679 and camp B uses the new TCP extension. 681 The rationale for having paired camps is to remove the statistical 682 uncertainty which would come from randomly choosing half of the flows 683 to run each algorithm. This way, camp A and camp B have the same 684 loads. 686 3.5.3. Outputs 688 The gain achieved by the new algorithm and loss incurred by standard 689 TCP are given by G=T(B)_Mix/T(B)_Baseline and L=T(A)_Mix/ 690 T(A)_Baseline where T(x) is the throughput obtained by camp x , 691 measured as the amount of data acknowledged by the receivers (that 692 is, ``goodput''), and taken over the last 8000 seconds of the 693 experiment. 695 The loss, L , is analogous to the ``bandwidth stolen from TCP'' 696 in [SA03] and ``throughput degradation'' in [SSM07]. 698 A plot of G vs L represents the tradeoff between efficiency and 699 loss. 701 3.5.4. Suggestions 703 Other statistics of interest are the values of G and L for each 704 quartile of file sizes. This will reveal whether the new proposal is 705 more aggressive in starting up or more reluctant to release its share 706 of capacity. 708 As always, testing at other loads and averaging over multiple runs 709 are encouraged. 711 3.6. Intra-protocol and inter-RTT fairness 713 These tests aim to measure bottleneck bandwidth sharing among flows 714 of the same protocol with the same RTT, which represents the flows 715 going through the same routing path. The tests also measure inter- 716 RTT fairness, the bandwidth sharing among flows of the same protocol 717 where routing paths have a common bottleneck segment but might have 718 different overall paths with different RTTs. 720 3.6.1. Topology and background traffic 722 The topology, the capacity and cross traffic conditions of these 723 tests are the same as in Section 3.3. The bottleneck buffer is 724 varied from 25% to 200% BDP for a 100 ms flow, increasing by factors 725 of 2. 727 3.6.2. Flows under test 729 We use two flows of the same protocol for this experiment. The RTTs 730 of the flows range from 10 ms to 160 ms (10 ms, 20 ms, 40 ms, 80 ms, 731 and 160 ms) such that the ratio of the minimum RTT over the maximum 732 RTT is at most 1/16. *In case a testbed doesn't support up to 160 ms 733 RTT, the RTTs may be scaled down in proportion to the maximum RTT 734 supported in that environment.* 736 Intra-protocol fairness: For each run, two flows with the same RTT, 737 taken from the range of RTTs above start randomly within the first 738 10% of the experiment. The order in which these flows start doesn't 739 matter. An additional test of interest, but not part of this suite, 740 would involve two extreme cases - two flows with very short or long 741 RTTs (e.g., the delay less then 1-2 ms represents communication 742 happen in the data-center and the delay larger than 600 ms considers 743 communication over satellite). 745 Inter-RTT fairness: For each run, one flow with a fixed RTT of 160 ms 746 starts first, and another flow with a different RTT taken from the 747 range of RTTs above, joins afterward. The starting times of both two 748 flows are randomly chosen within the first 10% of the experiment as 749 before. 751 3.6.3. Outputs 753 The output of this experiment is the ratio of the average throughput 754 values of the two flows. The output also includes the packet drop 755 rate for the congested link. 757 3.7. Multiple bottlenecks 759 These experiments explore the relative bandwidth for a flow that 760 traverses multiple bottlenecks, and flows with the same round-trip 761 time that each traverse only one of the bottleneck links. 763 3.7.1. Topology and background traffic 765 The topology is a ``parking-lot'' topology with three (horizontal) 766 bottleneck links and four (vertical) access links. The bottleneck 767 links have a rate of 100 Mbps, and the access links have a rate of 768 1 Gbps. 770 All flows have a round-trip time of 60 ms, to enable the effect of 771 traversing multiple bottlenecks to be distinguished from that of 772 different round trip times. This can be achieved as in Figure 3 by 773 (a) the second access link having a one-way delay of 30 ms (b) the 774 bottleneck link to which it does not connect having a one-way delay 775 of 30 ms and (c) all other links having negligible delay. This can 776 be extended to more than three bottlenecks as shown in Figure 4, by 777 assigning a delay of 30 ms to every alternate access link, and to 778 zero or one of the bottleneck links.) For the special case of three 779 hops, an alternative is for all links to have a one-way delay of 780 10 ms, as shown in Figure 5. It is not clear whether there are 781 interesting performance differences between these two topologies, and 782 if so, which is more typical of the actual internet. 784 > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > 785 _____ 100M 0ms ____________ 100M 0ms _____________ 100M 30ms ____ 786 | ................ | ................ | ................ | 787 1G : : 1G : : 1G : : 1G 788 | : : | : : | : : | 789 0ms : : 30ms : : 0ms : : 0ms 790 | ^ V | ^ V | ^ V | 792 Basic multi-hop topology. 794 Figure 3 796 -------+------+------+------+------+-------###### 797 |......#......|......#......|......#......|......| 798 |: :#: :|: :#: :|: :#: :|: :| 799 |: :#: :|: :#: :|: :#: :|: :| 800 |: :#: :|: :#: :|: :#: :|: :| 801 |^ V#^ V|^ V#^ V|^ V#^ V|^ V| 803 ### 30ms one-way --- 0ms one-way 805 Extension to 7-hop parking lot. (Not part of the basic test suite.) 807 Figure 4 809 > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > 810 _____ 100M 10ms __________ 100M 10ms ___________ 100M 10ms ___ 811 | ............... | ............... | ............... | 812 1G : : 1G : : 1G : : 1G 813 | : : | : : | : : | 814 10ms : : 10ms : : 10ms : : 10ms 815 | ^ V | ^ V | ^ V | 817 Alternative highly symmetric multi-hop topology. 819 Figure 5 821 Throughout the experiment, there is 10% bidirectional cross traffic 822 on each of the three bottleneck links, as described in Section 2. 823 The cross-traffic flows all traverse two access links and a single 824 bottleneck link. 826 All traffic uses the new TCP extension. 828 3.7.2. Flows under test 830 In addition to the cross-traffic, there are four flows under test, 831 all with traffic in the same direction on the bottleneck links. The 832 multiple-bottleneck flow traverses no access links and all three 833 bottleneck links. The three single-bottleneck flows each traverse 834 two access links and a single bottleneck link, with one flow for each 835 bottleneck link. The flows start in quick succession, separated by 836 approximately 1 second. These flows last at least 5 minutes. 838 An additional test of interest would be to have a longer, multiple- 839 bottleneck flow competing against shorter single-bottleneck flows. 841 3.7.3. Outputs 843 The output for this experiment is the ratio between the average 844 throughput of the single-bottleneck flows and the throughput of the 845 multiple-bottleneck flow, measured over the second half of the 846 experiment. Output also includes the packet drop rate for the 847 congested link. 849 3.8. Implementations 851 There are two on-going implementation efforts. 853 A testbed implementation is jointly being developed by the Centre for 854 Advanced Internet Architectures (CAIA) at Swinburne University of 855 technology and by Netlab at Caltech. It will eventually be available 856 for public use through the web interface 857 . 859 A simulation implementation in ns is being developed by NEC Labs, 860 China. Contributions can be made via its source forge page, 861 . 863 3.9. Conclusions 865 An initial specification of an evaluation suite for TCP extensions 866 has been described. Future versions will include: detailed 867 specifications, with modifications for simulations and testbeds; more 868 measurement results about congested links in the current Internet; 869 alternate specifications; and specific sets of scenarios that can be 870 run in a plausible period of time in simulators and testbeds, 871 respectively. 873 Several software and hardware implementations of these tests are 874 being developed for use by the community. An implementation is being 875 developed on WAN-in-Lab [LATL07], which will allow users to upload 876 Linux kernels via the web and will run tests similar to those 877 described here. Some tests will be modified to suit the hardware 878 available in WAN-in-Lab. An NS-2 implementation is also being 879 developed at NEC. We invite others to contribute implementations on 880 other simulator platforms, such as OMNeT++ and OpNet. 882 3.10. Acknowledgements 884 This work is based on a paper by Lachlan Andrew, Cesar Marcondes, 885 Sally Floyd, Lawrence Dunn, Romaric Guillier, Wang Gang, Lars Eggert, 886 Sangtae Ha and Injong Rhee. 888 The authors would also like to thank Roman Chertov, Doug Leith, 889 Saverio Mascolo, Ihsan Qazi, Bob Shorten, David Wei and Michele 890 Weigle for valuable feedback. 892 4. IANA Considerations 894 None. 896 5. Security Considerations 898 None. 900 6. Informative References 902 [AHM08] Andrew, L., Hanly, S., and R. Mukhtar, "Active Queue 903 Management for Fair Resource Allocation in Wireless 904 Networks", IEEE Transactions on Mobile Computing vol. 7, 905 2008. 907 [FK03] Floyd, S. and E. Kohler, "Internet Research Needs Better 908 Models", SIGCOMM Computer Communication Review (CCR) vol. 909 33, no. 1, pp. 29-34, 2003. 911 [FS01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An 912 Algorithm for Increasing the Robustness of RED", ICIR, 913 Tech. Rep., 2001. [Online]. Available: 914 http://www.icir.org/floyd/papers/adaptiveRed.pdf . 916 [GF04] Gurtov, A. and S. Floyd, "Modeling Wireless Links for 917 Transport Protocols", SIGCOMM Computer Communication 918 Review (CCR) vol. 34, no. 2, pp. 85-96, 2004. 920 [HK99] Henderson, T. and R. Katz, "Transport Protocols for 921 Internet-Compatible Satellite Networks", IEEE Journal on 922 Selected Areas in Communications (JSAC) vol. 17, no. 2, 923 pp. 326-344, 1999. 925 [HVA03] Hohn, N., Veitch, D., and P. Abry, "The Impact of the Flow 926 Arrival Process in Internet Traffic", Proc. IEEE 927 International Conference on Acoustics, Speech, and Signal 928 Processing (ICASSP'03) vol. 6, pp. 37-40, 2003. 930 [KS01] Kunniyur, S. and R. Srikant, "Analysis and Design of an 931 Adaptive Virtual Queue (AVQ) Algorithm for Active Queue 932 Management", Proc. SIGCOMM'01 pp. 123-134, 2001. 934 [Kelly79] Kelly, F., "Reversibility and stochastic networks", 935 Wiley 1979. 937 [LATL07] Lee, G., Andrew, L., Tang, A., and S. Low, "A WAN-in-Lab 938 for protocol development", PFLDnet 2007. 940 [MV06] Mascolo, S. and F. Vacirca, "The Effect of Reverse Traffic 941 on the Performance of New TCP Congestion Control 942 Algorithms for Gigabit Networks", PFLDnet 2006. 944 [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions 945 for High Performance", RFC 1323, May 1992. 947 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 948 Extension to the Selective Acknowledgement (SACK) Option 949 for TCP", RFC 2883, July 2000. 951 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 952 of Explicit Congestion Notification (ECN) to IP", 953 RFC 3168, September 2001. 955 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 956 Counting (ABC)", RFC 3465, February 2003. 958 [RFC4138] Sarolahti, P. and M. Kojo, "Forward RTO-Recovery (F-RTO): 959 An Algorithm for Detecting Spurious Retransmission 960 Timeouts with TCP and the Stream Control Transmission 961 Protocol (SCTP)", RFC 4138, August 2005. 963 [RMC03] Rossi, D., Mellia, M., and C. Casetti, "User Patience and 964 the Web: a Hands-on Investigation", IEEE Globecom 2003. 966 [SA03] Souza, E. and D. Agarwal, "A HighSpeed TCP Study: 967 Characteristics and Deployment Issues", LBNL, Technical 968 Report LBNL-53215, 2003. 970 [SSM07] Shimonishi, H., Sanadidi, M., and T. Murase, "Assessing 971 Interactions among Legacy and High-Speed TCP Protocols", 972 Proc. Workshop on Protocols for Fast Long-Delay Networks 973 (PFLDNet) 2007. 975 [Tmix] Weigle, M., Adurthi, P., Hernandez-Campos, F., Jeffay, K., 976 and F. Smith, "Tmix: a tool for generating realistic TCP 977 application workloads in ns-2", SIGCOMM Computer 978 Communication Review (CCR) vol. 36, no. 3,pp. 65-76, 2006. 980 [WCL05] Wei, D., Cao, P., and S. Low, "Time for a TCP Benchmark 981 Suite?", [Online]. Available: 983 http://wil.cs.caltech.edu/pubs/ 984 DWei-TCPBenchmark06.ps 2006. 986 Authors' Addresses 988 Lachlan Andrew 989 CAIA, Swinburne University of Technology 990 PO Box 218 991 Hawthorn, Vic 3122 992 Australia 994 Email: landrew@swin.edu.au 996 Sally Floyd 997 ICSI Center for Internet Research 998 1947 Center Street, Suite 600 999 Berkeley, CA 94704 1000 USA 1002 Email: floyd@icir.org 1004 Gang Wang 1005 NEC, China 1006 Innovation Plaza, Tsinghua Science Park, 1 Zhongguancun East Road 1007 Beijing 100084 1008 China 1010 Email: wanggang@research.nec.com.cn