idnits 2.17.1 draft-ietf-ippm-tcp-throughput-tm-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == It seems as if not all pages are separated by form feeds - found 0 form feeds but 18 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 32 instances of too long lines in the document, the longest one being 2 characters in excess of 72. ** There are 2 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (Apr 15, 2010) is 5123 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2581' is defined on line 734, but no explicit reference was found in the text == Unused Reference: 'RFC3148' is defined on line 737, but no explicit reference was found in the text == Unused Reference: 'RFC2544' is defined on line 741, but no explicit reference was found in the text == Unused Reference: 'RFC3449' is defined on line 744, but no explicit reference was found in the text == Unused Reference: 'RFC5357' is defined on line 748, but no explicit reference was found in the text == Unused Reference: 'RFC4821' is defined on line 752, but no explicit reference was found in the text == Unused Reference: 'MSMO' is defined on line 759, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) Summary: 7 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group B. Constantine, Ed. 2 Internet-Draft JDSU 3 Intended status: Informational G. Forget 4 Expires: November 15, 2010 Bell Canada (Ext. Consultant) 5 L. Jorgenson 6 Apparent Networks 7 Reinhard Schrage 8 Schrage Consulting 9 Apr 15, 2010 11 TCP Throughput Testing Methodology 12 draft-ietf-ippm-tcp-throughput-tm-00.txt 14 Abstract 16 This memo describes a methodology for measuring sustained TCP 17 throughput performance in an end-to-end managed network environment. 18 This memo is intended to provide a practical approach to help users 19 validate the TCP layer performance of a managed network, which should 20 provide a better indication of end-user application level experience. 21 In the methodology, various TCP and network parameters are identified 22 that should be tested as part of the network verification at the TCP 23 layer. 25 Status of this Memo 27 This Internet-Draft is submitted to IETF in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as Internet- 33 Drafts. Creation date Apr 15, 2010. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 The list of current Internet-Drafts can be accessed at 41 http://www.ietf.org/ietf/1id-abstracts.txt. 43 The list of Internet-Draft Shadow Directories can be accessed at 44 http://www.ietf.org/shadow.html. 46 This Internet-Draft will expire on November 15, 2010. 48 Copyright Notice 50 Copyright (c) 2010 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Goals of this Methodology. . . . . . . . . . . . . . . . . . . 4 67 2.1 TCP Equilibrium State Throughput . . . . . . . . . . . . . 5 68 3. TCP Throughput Testing Methodology . . . . . . . . . . . . . . 6 69 3.1. Baseline Round-trip Delay and Bandwidth. . . . . . . . . . 7 70 3.1.1 Techniques to Measure Round Trip Time . . . . . . . . 7 71 3.1.2 Techniques to Measure End-end Bandwidth . . . . . . . 8 72 3.2. Single TCP Connection Throughput Tests . . . . . . . . . . .9 73 3.2.1 Interpretation of the Single Connection TCP 74 Throughput Results . . . . . . . . . . . . . . . . . . 12 75 3.3. TCP MSS Throughput Testing . . . . . . . . . . . . . . . . 12 76 3.3.1 TCP Test for Network Path MTU . . . . . . . . . . . . 12 77 3.3.2 MSS Size Testing Method . . . . . . . . . . . . . . . 13 78 3.3.3 Interpretation of TCP MSS Throughput Results . . . . . 14 79 3.4. Multiple TCP Connection Throughput Tests. . . . . . . . . . 14 80 3.4.1 Multiple TCP Connections - below Link Capacity . . . . 14 81 3.4.2 Multiple TCP Connections - over Link Capacity. . . . . 15 82 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 83 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 86 1. Introduction 88 Even though RFC2544 was meant to benchmark network equipment and 89 used by network equipment manufacturers (NEMs), network providers 90 have used it to benchmark operational networks in order to 91 provide SLAs (Service Level Agreements) to their business customers. 92 Network providers are coming to the realization that RFC2544 testing 93 and TCP layer testing are required to more adequately ensure end-user 94 satisfaction. 96 Therefore, the network provider community desires to measure network 97 throughput performance at the TCP layer. Measuring TCP throughput 98 provides a meaningful measure with respect to the end user's 99 application SLA (and ultimately reach some level of TCP testing 100 interoperability which does not exist today). 102 The complexity of the network grows and the various queuing 103 mechanisms in the network greatly affect TCP layer performance (i.e. 104 improper default router settings for queuing, etc.) and devices such 105 as firewalls, proxies, load-balancers can actively alter the TCP 106 settings as a TCP session traverses the network (such as window size, 107 MSS, etc.). Network providers (and NEMs) are wrestling with end-end 108 complexities of the above and there is a strong interest in the 109 standardization of a test methodology to validate end-to-end TCP 110 performance (as this is the precursor to acceptable end-user 111 application performance). 113 Before RFC2544 testing existed, network providers and NEMs deployed 114 a variety of ad hoc test techniques to verify the Layer 2/3 115 performance of the network. RFC2544 was a huge step forward in the 116 network test world, standardizing the Layer 2/3 test methodology 117 which greatly improved the quality of the network and reduced 118 operational test expenses. These managed networks are intended to be 119 predictable, but therein lies the problem. It is difficult if not 120 impossible, to extrapolate end user application layer performance 121 from RFC2544 results and the goal of RFC2544 was never intended 122 to do so. 124 So the intent behind this draft TCP throughput work is to define 125 a methodology for testing sustained TCP layer performance. In this 126 document, sustained TCP throughput is that amount of data per unit 127 time that TCP transports during equilibrium (steady state), i.e. 128 after the initial slow start phase. We refer to this state as TCP 129 Equilibrium, and that the equalibrium throughput is the maximum 130 achievable for the TCP connection(s). 132 One other important note; the precursor to conducting the TCP tests 133 test methodlogy is to perform RFC2544 related Layer 2/3 tests. It 134 is highly recommended to run traditional RFC2544 type test to verify 135 the integrity of the network before conducting TCP testing. 137 2. Goals of this Methodology 139 Before defining the goals of this methodology, it is important to 140 clearly define the areas that are not intended to be measured or 141 analyzed by such a methodology. 143 - The methodology is not intended to predict TCP throughput 144 behavior during the transient stages of a TCP connection, such 145 as initial slow start. 147 - The methodology is not intended to definitively benchmark TCP 148 implementations of one OS to another, although some users may find 149 some value in conducting qualitative experiments 151 - The methodology is not intended to provide detailed diagnosis 152 of problems within end-points or the network itself as related to 153 non-optimal TCP performance, although a results interpretation 154 section for each test step may provide insight into potential 155 issues within the network 157 In contrast to the above exclusions, the goals of this methodology 158 are to define a method to conduct a structured, end-to-end 159 assessment of sustained TCP performance within a managed business 160 class IP network. A key goal is to establish a set of "best 161 practices" that an engineer should apply when validating the 162 ability of a managed network to carry end-user TCP applications. 164 Some specific goals are to: 166 - Provide a practical test approach that specifies the more well 167 understood (and end-user configurable) TCP parameters such as Window 168 size, MSS, # connections, and how these affect the outcome of TCP 169 performance over a network 171 - Provide specific test conditions (link speed, RTD, window size, 172 etc.) and maximum achievable TCP throughput under TCP Equilbrium 173 conditions. For guideline purposes, provide examples of these test 174 conditions and the maximum achievable TCP throughput during the 175 equilbrium state. Section 2.1 provides specific details concerning 176 the definition of TCP Equilibrium within the context of this draft. 178 - In test situations where the recommended procedure does not yield 179 the maximum achievable TCP throughput result, this draft provides some 180 possible areas within the end host or network that should be 181 considered for investigation (although again, this draft is not 182 intended to provide a detailed diagnosis of these issues) 184 2.1 TCP Equilibrium State Throughput 186 TCP connections have three (3) fundamental congestion window phases 187 as documented in RFC-TBD. These states are: 189 - Slow Start, which occurs during the beginning of a TCP transmission 190 or after a retransmission time out event 192 - Congestion avoidance, which is the phase during which TCP ramps up 193 to establish the maximum attainable throughput on an end-end network 194 path. Retransmissions are a natural by-product of the TCP congestion 195 avoidance algorithm as it seeks to achieve maximum throughput on 196 the network path. 198 - Retransmission phase, which include Fast Retransmit (Tahoe) and Fast 199 Recovery (Reno and New Reno). When a packet is lost, the Congestion 200 avoidance phase transitions to a Fast Retransmission or Recovery 201 Phase dependent upon the TCP implementation. 203 The following diagram depicts these states. 205 | ssthresh 206 TCP | | 207 Through- | | Equilibrium 208 put | |\ /\/\/\/\/\ Retransmit /\/\ ... 209 | | \ / | Time-out / 210 | | \ / | _______ _/ 211 | Slow _/ |/ | / | Slow _/ 212 | Start _/ Congestion |/ |Start_/ Congestion 213 | _/ Avoidance Loss | _/ Avoidance 214 | _/ Event | _/ 215 | _/ |/ 216 |/___________________________________________________________ 217 Time 219 This TCP methodology provides guidelines to measure the equilibrium 220 throughput which refers to the maximum sustained rate obtained by 221 congestion avoidance before packet loss conditions occur (which would 222 cause the state change from congestion avoidance to a retransmission 223 phase). All maximum achievable throughputs specified in Section 3 are 224 with respect to this Equilibrium state. 226 3. TCP Throughput Testing Methodology 228 This section summarizes the specific test methodology to achieve the 229 goals listed in Section 2. 231 As stated in Section 1, it is considered best practice to verify 232 the integrity of the network from a Layer2/3 perspective by first 233 conducting RFC2544 type testing. If the network is not performing 234 properly in terms of packet loss, jitter, etc. when running RFC2544 235 tests, then the TCP layer testing will not be meaningful since the 236 equalibrium throughput would be very difficult to achieve (in a 237 "dysfunctional" network). 239 The following provides the sequential order of steps to conduct the 240 TCP throughput testing methodology: 242 1. Baseline Round-trip Delay and Bandwidth. These measurements provide 243 estimates of the ideal TCP window size, which will be used in 244 subsequent test steps. 246 2. Single TCP Connection Throughput Tests. With baseline measurements 247 of round trip delay and bandwidth, a series of single connection TCP 248 throughput tests can be conducted to baseline the performance of the 249 network against expectations. 251 3. TCP MSS Throughput Testing. By varying the MSS size of the TCP 252 connection, the ability of the network to sustain expected TCP 253 throughput can be verified. 255 4. Multiple TCP Connection Throughput Tests. Single connection TCP 256 testing is a useful first step to measure expected versus actual TCP 257 performance. The multiple connection test more closely emulates 258 customer traffic, which comprise many TCP connections over a network 259 link. 261 Important to note are some of the key characteristics and 262 considerations for the TCP test instrument. The test host may be a 263 standard computer or dedicated communications test instrument 264 and these TCP test hosts be capable of emulating both a client and a 265 server. 267 Whether the TCP test host is a standard computer or dedicated test 268 instrument, the following areas should be considered when selecting 269 a test host: 271 - TCP implementation used by the test host OS, i.e. Linux OS kernel 272 using TCP Reno, TCP options supported, etc. This will obviously be 273 more important when using custom test equipment where the TCP 274 implementation may be customized or tuned to run in higher 275 performance hardware 277 - Most importantly, the TCP test host must be capable of generating 278 and receiving stateful TCP test traffic at the full link speed of the 279 network under test. This requirement is very serious and may require 280 custom test equipment, especially on 1 GigE and 10 GigE networks. 282 3.1. Baseline Round-trip Delay and Bandwidth 284 Before stateful TCP testing can begin, it is important to baseline 285 the round trip delay and bandwidth of the network to be tested. 286 These measurements provide estimates of the ideal TCP window size, 287 which will be used in subsequent test steps. 289 These latency and bandwidth tests should be run over a long enough 290 period of time to characterize the performance of the network over 291 the course of a meaningful time period. One example would 292 be to take samples during various times of the work day. The goal 293 would be to determine a representative minimum, average, and maximum 294 RTD and bandwidth for the network under test. Topology changes are 295 to be avoided during this time of initial convergence (e.g. in 296 crossing BGP4 boundaries). 298 In some cases, baselining bandwidth may not be required, since a 299 network provider's end-to-end topology may be well enough defined. 301 3.1.1 Techniques to Measure Round Trip Time 303 We follow in the definitions used in the references of the appendix; 304 hence Round Trip Time (RTT) is the time elapsed between the clocking 305 in of the first bit of a payload packet to the receipt of the last 306 bit of the corresponding acknowledgement. Round Trip Delay (RTD) 307 is used synonymously to twice the Link Latency. 309 In any method used to baseline round trip delay between network 310 end-points, it is important to realize that network latency is the 311 sum of inherent network delay and congestion. The RTT should be 312 baselined during "off-peak" hours to obtain a reliable figure for 313 network latency (versus additional delay caused by congestion). 315 During the actual sustained TCP throughput tests, it is critical 316 to measure RTT along with measured TCP throughput. Congestive 317 effects can be isolated if RTT is concurrently measured. 319 This is not meant to provide an exhaustive list, but summarizes some 320 of the more common ways to determine round trip time (RTT) through 321 the network. The desired resolution of the measurement (i.e. msec 322 versus usec) may dictate whether the RTT measurement can be achieved 323 with standard tools such as ICMP ping techniques or whether 324 specialized test equipment would be required with high precision 325 timers. The objective in this section is to list several techniques 326 in order of decreasing accuracy. 328 - Use test equipment on each end of the network, "looping" the 329 far-end tester so that a packet stream can be measured end-end. This 330 test equipment RTT measurement may be compatible with delay 331 measurement protocols specified in RFC5357. 333 - Conduct packet captures of TCP test applications using for example 334 "iperf" or FTP, etc. By running multiple experiments, the packet 335 captures can be studied to estimate RTT based upon the SYN -> SYN-ACK 336 handshakes within the TCP connection set-up. 338 - ICMP Pings may also be adequate to provide round trip time 339 estimations. Some limitations of ICMP Ping are the msec resolution 340 and whether the network elements / end points respond to pings (or 341 block them). 343 3.1.2 Techniques to Measure End-end Bandwidth 345 There are many well established techniques available to provide 346 estimated measures of bandwidth over a network. This measurement 347 should be conducted in both directions of the network, especially for 348 access networks which are inherently asymmetrical. Some of the 349 asymmetric implications to TCP performance are documented in RFC-3449 350 and the results of this work will be further studied to determine 351 relevance to this draft. 353 The bandwidth measurement test must be run with stateless IP streams 354 (not stateful TCP) in order to determine the available bandwidth in 355 each direction. And this test should obviously be performed at 356 various intervals throughout a business day (or even across a week). 357 Ideally, the bandwidth test should produce a log output of the 358 bandwidth achieved across the test interval AND the round trip delay. 360 And during the actual TCP level performance measurements (Sections 361 3.2 - 3.5), the test tool must be able to track round trip time 362 of the TCP connection(s) during the test. Measuring round trip time 363 variation (aka "jitter") provides insight into effects of congestive 364 delay on the sustained throughput achieved for the TCP layer test. 366 3.2. Single TCP Connection Throughput Tests 368 This draft specifically defines TCP throughput techniques to verify 369 sustained TCP performance in a managed business network. Defined 370 in section 2.1, the equalibrium throughput reflects the maximum 371 rate achieved by a TCP connection within the congestion avoidance 372 phase on a end-end network path. This section and others will define 373 the method to conduct these sustained throughput tests and guidelines 374 of the predicted results. 376 With baseline measurements of round trip time and bandwidth 377 from section 3.1, a series of single connection TCP throughput tests 378 can be conducted to baseline the performance of the network against 379 expectations. The optimum TCP window size can be calculated from 380 the bandwidth delay product (BDP), which is: 382 BDP = RTT x Bandwidth 384 By dividing the BDP by 8, the "ideal" TCP window size is calculated. 385 An example would be a T3 link with 25 msec RTT. The BDP would equal 386 ~1,105,000 bits and the ideal TCP window would equal ~138,000 bytes. 388 The following table provides some representative network link speeds, 389 latency, BDP, and associated "optimum" TCP window size. Sustained 390 TCP transfers should reach nearly 100% throughput, minus the overhead 391 of Layers 1-3 and the divisor of the MSS into the window. 393 For this single connection baseline test, the MSS size will effect 394 the achieved throughput (especially for smaller TCP window sizes). 395 Table 3.2 provides the achievable, equalibrium TCP 396 throughput (at Layer 4) using 1000 byte MSS. Also in this table, 397 the case of 58 byte L1-L4 overhead including the Ethernet CRC32 is 398 used for simplicity. 400 Table 3.2: Link Speed, RTT and calculated BDP, TCP Throughput 402 Link Ideal TCP Maximum Achievable 403 Speed* RTT (ms) BDP (bits) Window (kbytes) TCP Throughput(Mbps) 404 ---------------------------------------------------------------------- 405 T1 20 30,720 3.84 1.20 406 T1 50 76,800 9.60 1.44 407 T1 100 153,600 19.20 1.44 408 T3 10 442,100 55.26 41.60 409 T3 15 663,150 82.89 41.13 410 T3 25 1,105,250 138.16 41.92 411 T3(ATM) 10 407,040 50.88 32.44 412 T3(ATM) 15 610,560 76.32 32.44 413 T3(ATM) 25 1,017,600 127.20 32.44 414 100M 1 100,000 12.50 90.699 415 100M 2 200,000 25.00 92.815 417 Link Ideal TCP Maximum Achievable 418 Speed* RTT (ms) BDP (bits) Window (kbytes) TCP Throughput (Mbps) 419 ---------------------------------------------------------------------- 420 100M 5 500,000 62.50 90.699 421 1Gig 0.1 100,000 12.50 906.991 422 1Gig 0.5 500,000 62.50 906.991 423 1Gig 1 1,000,000 125.00 906.991 424 10Gig 0.05 500,000 62.50 9,069.912 425 10Gig 0.3 3,000,000 375.00 9,069.912 427 * Note that link speed is the minimum link speed throughput a network; 428 i.e. WAN with T1 link, etc. 430 Also, the following link speeds (available payload bandwidth) were 431 used for the WAN entries: 433 - T1 = 1.536 Mbits/sec (B8ZS line encoding facility) 434 - T3 = 44.21 Mbits/sec (C-Bit Framing) 435 - T3(ATM) = 36.86 Mbits/sec (C-Bit Framing & PLCP, 96000 Cells per 436 second) 438 The calculation method used in this document is a 3 step process : 440 1 - We determine what should be the optimal TCP Window size value 441 based on the optimal quantity of "in-flight" octets discovered by 442 the BDP calculation. We take into consideration that the TCP 443 Window size has to be an exact multiple value of the MSS. 444 2 - Then we calculate the achievable layer 2 throughput by multiplying 445 the value determined in step 1 with the MSS & (MSS + L2 + L3 + L4 446 Overheads) divided by the RTT. 447 3 - Finally, we multiply the calculated value of step 2 by the MSS 448 versus (MSS + L2 + L3 + L4 Overheads) ratio. 450 This gives us the achievable TCP Throughput value. Sometimes, the 451 maximum achievable throughput is limited by the maximum achievable 452 quantity of Ethernet Frames per second on the physical media. Then 453 this value is used in step 2 instead of the calculated one. 455 There are several TCP tools that are commonly used in the network 456 provider world and one of the most common is the "iperf" tool. With 457 this tool, hosts are installed at each end of the network segment; 458 one as client and the other as server. The TCP Window size of both 459 the client and the server can be maunally set and the achieved 460 throughput is measured, either uni-directionally or bi-directionally. 461 For higher BDP situations in lossy networks (long fat networks or 462 satellite links, etc.), TCP options such as Selective Acknowledgment 463 should be considered and also become part of the window 464 size / throughput characterization. 466 The following diagram shows the achievable TCP throughput on a T3 with 467 the default Windows2000/XP TCP Window size of 17520 Bytes. 469 45| 470 | 471 40| 472 TCP | 473 Throughput 35| 474 in Mbps | 475 30| 476 | 477 25| 478 | 479 20| 480 | 481 15| _______ 14.48M 482 | | | 483 10| | | +-----+ 9.65M 484 | | | | | _______ 5.79M 485 5| | | | | | | 486 |_________+_____+_________+_____+________+____ +___________ 487 10 15 25 488 RTT in milliseconds 490 The following diagram shows the achievable TCP throughput on a 25ms T3 491 when the TCP Window size is increased and with the RFC1323 TCP Window 492 scaling option. 494 45| 495 | +-----+42.47M 496 40| | | 497 TCP | | | 498 Throughput 35| | | 499 in Mbps | | | 500 30| | | 501 | | | 502 25| | | 503 | ______ 21.23M | | 504 20| | | | | 505 | | | | | 506 15| | | | | 507 | | | | | 508 10| +----+10.62M | | | | 509 | _______5.31M | | | | | | 510 5| | | | | | | | | 511 |__+_____+______+____+___________+____+________+_____+___ 512 16 32 64 128 513 TCP Window size in Kili Bytes 515 The single connection TCP throughput test must be run over a 516 a long duration and results must be logged at the desired interval. 517 The test must record RTT and TCP retransmissions at each interval. 519 This correlation of retransmissions and RTT over the course of the 520 test will clearly identify which portions of the transfer reached 521 TCP Equilbrium state and to what effect increased RTT (congestive 522 effects) may have been the cause of reduced equilibrium performance. 524 Host hardware performance must be well understood before conducting 525 this TCP single connection test and other tests in this section. 526 Dedicated test equipment may be required, especially for line rates 527 of GigE and 10 GigE. 529 3.2.1 Interpretation of the Single Connection TCP Throughput Results 531 At the end of this step, the user will document the theoretical BDP 532 and a set of Window size experiments with measured TCP throughput for 533 each TCP window size setting. For cases where the sustained TCP 534 throughput does not equal the predicted value, some possible causes 535 are listed: 537 - Network congestion causing packet loss 538 - Network congestion not causing packet loss, but effectively 539 increasing the size of the required TCP window during the transfer 540 - Network fragmentation at the IP layer 541 - Intermediate network devices which actively regenerate the TCP 542 connection and can alter window size, MSS, etc. 544 3.3. TCP MSS Throughput Testing 546 This test setup should be conducted as a single TCP connection test. 547 By varying the MSS size of the TCP connection, the ability of the 548 network to sustain expected TCP throughput can be verified. This is 549 similar to frame and packet size techniques within RFC2-2544, which 550 aim to determine the ability of the routing/switching devices to 551 handle loads in term of packets/frames per second at various frame 552 and packet sizes. This test can also further characterize the 553 performance of a network in the presence of active TCP elements 554 (proxies, etc.), devices that fragment IP packets, and the actual 555 end hosts themselves (servers, etc.). 557 3.3.1 TCP Test for Network Path MTU 559 TCP implementations should use Path MTU Discovery techniques (PMTUD), 560 but this technique does not always prove reliable in real world 561 situations. Since PMTUD relies on ICMP messages (to inform the host 562 that unfragmented transmission cannot occur), PMTUD is not always 563 reliable since many network managers completely disable ICMP. 565 Increasingly network providers and enterprises are instituting fixed 566 MTU sizes on the hosts to eliminate TCP fragmentation issues in the 567 application. 569 Packetization Layer Path MTU Discovery or PLPMTUD (RFC4821) should 570 be conducted to verify the minimum network path MTU. Conducting 571 the PLPMTUD test establishes the upper limit upon the MTU, which in 572 turn establishes the upper limit for the MSS testing of section 3.3.2. 573 MSS refers specifically to the payload size of the TCP packet and does 574 not include TCP or IP headers. 576 3.3.2 MSS Size Testing Method 578 The single connection testing listed in Section 3.2 should be 579 repeated, using the appropriate window size and collecting 580 throughput measurements per various MSS sizes. 582 The following are the typical sizes of MSS settings for various 583 link speeds: 585 - 256 bytes for very low speed links such as 9.6Kbps (per RFC1144). 586 - 536 bytes for low speed links (per RFC879) . 587 - 966 bytes for SLIP high speed (per RFC1055). 588 - 1380 bytes for IPSec VPN Tunnel testing 589 - 1452 bytes for PPPoE connectivity (per RFC2516) 590 - 1460 for Ethernet and Fast Ethernet (per RFC895). 591 - 8960 byte jumbo frames for GigE 593 Using the optimum window size determined by conducting steps 3.1 and 594 3.2, a variety of window sizes should be tested according to the link 595 speed under test. Using Fast Ethernet with 5 msec RTT as an example, 596 the optimum TCP window size would be 62.5 kbytes and the recommended 597 MSS for Fast Ethernet is 1460 bytes. 599 Link Achievable TCP Throughput (Mbps) for 600 Speed RTT(ms) MSS=1000 MSS=1260 MSS=1300 MSS=1380 MSS=1420 MSS=1460 601 ---------------------------------------------------------------------- 602 T1 20 | 1.20 1.008 1.040 1.104 1.136 1.168 603 T1 50 | 1.44 1.411 1.456 1.335 1.363 1.402 604 T1 100 | 1.44 1.512 1.456 1.435 1.477 1.402 605 T3 10 | 41.60 42.336 42.640 41.952 40.032 42.048 606 T3 15 | 42.13 42.336 42.293 42.688 42.411 42.048 607 T3 25 | 41.92 42.336 42.432 42.394 42.714 42.515 608 T3(ATM) 10 | 32.44 33.815 34.477 35.482 36.022 36.495 609 T3(ATM) 15 | 32.44 34.120 34.477 35.820 36.022 36.127 610 T3(ATM) 25 | 32.44 34.363 34.860 35.684 36.022 36.274 611 100M 1 | 90.699 89.093 91.970 86.866 89.424 91.982 612 100M 2 | 92.815 93.226 93.275 88.505 90.973 93.442 613 100M 5 | 90.699 92.481 92.697 88.245 90.844 93.442 614 For GigE and 10GigE, Jumbo frames (9000 bytes) are becoming more 615 common. The following table adds jumbo frames to the possible MSS 616 values. 618 Link Achievable TCP Throughput (Mbps) for 619 Speed RTT(ms) MSS=1260 MSS=1300 MSS=1380 MSS=1420 MSS=1460 MSS=8960 620 ---------------------------------------------------------------------- 621 1Gig 0.1 | 924.812 926.966 882.495 894.240 919.819 713.786 622 1Gig 0.5 | 924.812 926.966 930.922 932.743 934.467 856.543 623 1Gig 1.0 | 924.812 926.966 930.922 932.743 934.467 927.922 624 10Gig 0.05| 9248.125 9269.655 9309.218 9839.790 9344.671 8565.435 625 10Gig 0.3 | 9248.125 9269.655 9309.218 9839.790 9344.671 9755.079 627 Each row in the table is a separate test that should be conducted 628 over a predetermined test interval and the throughput, retransmissions, 629 and RTT logged during the entire test interval. 631 3.3.3 Interpretation of TCP MSS Throughput Results 633 For cases where the predicted TCP throughput does not equal the 634 predicted throughput predicted for a given MSS, some possible causes 635 are listed: 637 - TBD 639 3.4. Multiple TCP Connection Throughput Tests 641 After baselining the network under test with a single TCP connection 642 (Section 3.2), the nominal capacity of the network has been 643 determined. The capacity measured in section 3.2 may be a capacity 644 range and it is reasonable that some level of tuning may have been 645 required (i.e. router shaping techniques employed, intermediary 646 proxy like devices tuned, etc.). 648 Single connection TCP testing is a useful first step to measure 649 expected versus actual TCP performance and as a means to diagnose 650 / tune issues in the network and active elements. However, the 651 ultimate goal of this methodology is to more closely emulate customer 652 traffic, which comprise many TCP connections over a network link. 653 This methodology inevitably seeks to provide the framework for 654 testing stateful TCP connections in concurrence with stateless 655 traffic streams, and this is described in Section 3.5. 657 3.4.1 Multiple TCP Connections - below Link Capacity 659 First, the ability of the network to carry multiple TCP connections 660 to full network capacity should be tested. Prioritization and QoS 661 settings are not considered during this step, since the network 662 capacity is not to be exceeded by the test traffic (section 3.3.2 663 covers the over capacity test case). 665 For this multiple connection TCP throughput test, the number of 666 connections will more than likely be limited by the test tool (host 667 vs. dedicated test equipment). As an example, for a GigE link with 668 1 msec RTT, the optimum TCP window would equal ~128 KBytes. So under 669 this condition, 8 concurrent connections with window size equal to 670 16KB would fill the GigE link. For 10G, 80 connections would be 671 required to accomplish the same. 673 Just as in section 3.2, the end host or test tool can not be the 674 processing bottleneck or the throughput measurements will not be 675 valid. The test tool must be benchmarked in ideal lab conditions to 676 verify it's ability to transfer stateful TCP traffic at the given 677 network line rate. 679 For this test step, it should be conducted over a reasonable test 680 duration and results should be logged per interval such as throughput 681 per connection, RTT, and retransmissions. 683 Since the network is not to be driven into over capacity (by nature 684 of the BDP allocated evenly to each connection), this test verifies 685 the ability of the network to carry multiple TCP connections up to 686 the link speed of the network. 688 3.4.2 Multiple TCP Connections - over Link Capacity 690 In this step, the network bandwidth is intentionally exceeded with 691 multiple TCP connections to test expected prioritization and queuing 692 within the network. 694 All conditions related to Section 3.3 set-up apply, especially the 695 ability of the test hosts to transfer stateful TCP traffic at network 696 line rates. 698 Using the same example from Section 3.2, a GigE link with 1 msec 699 RTT would require a window size of 128 KB to fill the link (with 700 one TCP connection). Assuming a 16KB window, 8 concurrent 701 connections would fill the GigE link capacity and values higher than 702 8 would over-subscribe the network capacity. The user would select 703 values to over-subscribe the network (i.e. possibly 10 15, 20, etc.) 704 to conduct experiments to verify proper prioritization and queuing 705 within the network. 707 Without any prioritization in the network, the over subscribed test 708 results could assist in the queuing studies. With proper queuing, 709 the bandwidth should be shared in a reasonable manner. The author 710 understands that the term "reasonable" is too wide open, and future 711 draft versions of this memo would attempt to quantify this sharing 712 in more tangible terms. It is known that if a network element 713 is not set for proper queuing (i.e. FIFO), then an oversubscribed 714 TCP connection test will generally show a very uneven distribution of 715 bandwidth. 717 With prioritization in the network, different TCP connections can be 718 assigned various QoS settings via the various mechanisms (i.e. per 719 VLAN, DSCP, etc.), and the higher priority connections must be 720 verified to achieve the expected throughput. 722 4. Acknowledgements 724 The author would like to thank Gilles Forget, Loki Jorgenson, 725 and Reinhard Schrage for technical review and contributions to this 726 draft-00 memo. 728 Also thanks to Matt Mathis and Matt Zekauskas for many good comments 729 through email exchange and for pointing me to great sources of 730 information pertaining to past works in the TCP capacity area. 732 5. References 734 [RFC2581] Allman, M., Paxson, V., Stevens W., "TCP Congestion 735 Control", RFC 2581, April 1999. 737 [RFC3148] Mathis M., Allman, M., "A Framework for Defining 738 Empirical Bulk Transfer Capacity Metrics", RFC 3148, July 739 2001. 741 [RFC2544] Bradner, S., McQuaid, J., "Benchmarking Methodology for 742 Network Interconnect Devices", RFC 2544, April 1999 744 [RFC3449] Balakrishnan, H., Padmanabhan, V. N., Fairhurst, G., 745 Sooriyabandara, M., "TCP Performance Implications of 746 Network Path Asymmetry", RFC 3449, December 2002 748 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz, 749 J., "A Two-Way Active Measurement Protocol (TWAMP)", 750 RFC 5357, October 2008 752 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 753 Discovery", RFC 4821, April 2007 755 draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk 756 Transfer Capacity Methodology for Cooperating Hosts", 757 August 2001 759 [MSMO] The Macroscopic Behavior of the TCP Congestion Avoidance 760 Algorithm Mathis, M.,Semke, J, Mahdavi, J, Ott, T 761 July 1997 SIGCOMM Computer Communication Review, 762 Volume 27 Issue 3 764 [Stevens Vol1] TCP/IP Illustrated, Vol1, The Protocols 765 Addison-Wesley 767 Authors' Addresses 769 Barry Constantine 770 JDSU, Test and Measurement Division 771 One Milesone Center Court 772 Germantown, MD 20876-7100 773 USA 775 Phone: +1 240 404 2227 776 Email: barry.constantine@jdsu.com 778 Gilles Forget 779 Independent Consultant to Bell Canada. 780 308, rue de Monaco, St-Eustache 781 Qc. CANADA, Postal Code : J7P-4T5 783 Phone: (514) 895-8212 784 gilles.forget@sympatico.ca 786 Loki Jorgenson 787 Apparent Networks 789 Phone: (604) 433-2333 ext 105 790 ljorgenson@apparentnetworks.com 792 Reinhard Schrage 793 Schrage Consulting 795 Phone: +49 (0) 5137 909540 796 reinhard@schrageconsult.com