idnits 2.17.1 draft-ietf-ippm-tcp-throughput-tm-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == It seems as if not all pages are separated by form feeds - found 0 form feeds but 19 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 32 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** There are 2 instances of lines with control characters in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 311: '...TCP client probe MUST be capable of se...' RFC 2119 keyword, line 316: '...the client probe MUST generate these p...' RFC 2119 keyword, line 328: '... the test client MUST verify that the ...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 18, 2010) is 5093 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2581' is defined on line 786, but no explicit reference was found in the text == Unused Reference: 'RFC3148' is defined on line 789, but no explicit reference was found in the text == Unused Reference: 'RFC2544' is defined on line 793, but no explicit reference was found in the text == Unused Reference: 'RFC3449' is defined on line 796, but no explicit reference was found in the text == Unused Reference: 'RFC5357' is defined on line 800, but no explicit reference was found in the text == Unused Reference: 'RFC4821' is defined on line 804, but no explicit reference was found in the text == Unused Reference: 'MSMO' is defined on line 811, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) Summary: 8 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group B. Constantine 2 Internet-Draft JDSU 3 Intended status: Informational G. Forget 4 Expires: November 18, 2010 Bell Canada (Ext. Consultant) 5 L. Jorgenson 6 Apparent Networks 7 Reinhard Schrage 8 Schrage Consulting 9 May 18, 2010 11 TCP Throughput Testing Methodology 12 draft-ietf-ippm-tcp-throughput-tm-02.txt 14 Abstract 16 This memo describes a methodology for measuring sustained TCP 17 throughput performance in an end-to-end managed network environment. 18 This memo is intended to provide a practical approach to help users 19 validate the TCP layer performance of a managed network, which should 20 provide a better indication of end-user application level experience. 21 In the methodology, various TCP and network parameters are identified 22 that should be tested as part of the network verification at the TCP 23 layer. 25 Status of this Memo 27 This Internet-Draft is submitted to IETF in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as Internet- 33 Drafts. Creation date May 18, 2010. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 The list of current Internet-Drafts can be accessed at 41 http://www.ietf.org/ietf/1id-abstracts.txt. 43 The list of Internet-Draft Shadow Directories can be accessed at 44 http://www.ietf.org/shadow.html. 46 This Internet-Draft will expire on November 18, 2010. 48 Copyright Notice 50 Copyright (c) 2010 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Goals of this Methodology. . . . . . . . . . . . . . . . . . . 4 67 2.1 TCP Equilibrium State Throughput . . . . . . . . . . . . . 5 68 3. TCP Throughput Testing Methodology . . . . . . . . . . . . . . 6 69 3.1 Determine Network Path MTU . . . . . . . . . . . . . . . . 7 70 3.2. Baseline Round-trip Delay and Bandwidth. . . . . . . . . . 8 71 3.2.1 Techniques to Measure Round Trip Time . . . . . . . . 9 72 3.2.2 Techniques to Measure End-end Bandwidth . . . . . . . 10 73 3.3. Single TCP Connection Throughput Tests . . . . . . . . . . 10 74 3.3.1 Interpretation of the Single Connection TCP 75 Throughput Results . . . . . . . . . . . . . . . . . . 14 76 3.4. TCP MSS Throughput Testing . . . . . . . . . . . . . . . . 14 77 3.4.1 MSS Size Testing Method. . . . . . . . . . . . . . . 14 78 3.4.2 Interpretation of TCP MSS Throughput Results. . . . . 15 79 3.5. Multiple TCP Connection Throughput Tests. . . . . . . . . . 16 80 3.5.1 Multiple TCP Connections - below Link Capacity . . . . 16 81 3.5.2 Multiple TCP Connections - over Link Capacity. . . . . 17 82 3.5.3 Interpretation of Multiple TCP Connection Results. . . 17 84 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 85 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 88 1. Introduction 90 Even though RFC2544 was meant to benchmark network equipment and 91 used by network equipment manufacturers (NEMs), network providers 92 have used it to benchmark operational networks in order to 93 verify SLAs (Service Level Agreements) before turning on a service 94 to their business customers. Testing an operational network prior to 95 customer activation is referred to as "turn-up" testing and the SLA 96 is generally Layer 2/3 packet throughput, delay, loss and 97 jitter. 99 Network providers are coming to the realization that RFC2544 testing 100 and TCP layer testing are required to more adequately ensure end-user 101 satisfaction. Therefore, the network provider community desires to 102 measure network throughput performance at the TCP layer. Measuring 103 TCP throughput provides a meaningful measure with respect to the end 104 user's application SLA (and ultimately reach some level of TCP 105 testing interoperability which does not exist today). 107 The complexity of the network grows and the various queuing 108 mechanisms in the network greatly affect TCP layer performance (i.e. 109 improper default router settings for queuing, etc.) and devices such 110 as firewalls, proxies, load-balancers can actively alter the TCP 111 settings as a TCP session traverses the network (such as window size, 112 MSS, etc.). Network providers (and NEMs) are wrestling with end-end 113 complexities of the above and there is a strong interest in the 114 standardization of a test methodology to validate end-to-end TCP 115 performance (as this is the precursor to acceptable end-user 116 application performance). 118 So the intent behind this draft TCP throughput work is to define 119 a methodology for testing sustained TCP layer performance. In this 120 document, sustained TCP throughput is that amount of data per unit 121 time that TCP transports during equilibrium (steady state), i.e. 122 after the initial slow start phase. We refer to this state as TCP 123 Equilibrium, and that the equalibrium throughput is the maximum 124 achievable for the TCP connection(s). 126 One other important note; the precursor to conducting the TCP tests 127 test methodlogy is to perform "network stress tests" such as RFC2544 128 Layer 2/3 tests or other conventional tests (OWAMP, etc.). It is 129 highly recommended to run traditional Layer 2/3 type test to verify 130 the integrity of the network before conducting TCP testing. 132 2. Goals of this Methodology 134 Before defining the goals of this methodology, it is important to 135 clearly define the areas that are not intended to be measured or 136 analyzed by such a methodology. 138 - The methodology is not intended to predict TCP throughput 139 behavior during the transient stages of a TCP connection, such 140 as initial slow start. 142 - The methodology is not intended to definitively benchmark TCP 143 implementations of one OS to another, although some users may find 144 some value in conducting qualitative experiments 146 - The methodology is not intended to provide detailed diagnosis 147 of problems within end-points or the network itself as related to 148 non-optimal TCP performance, although a results interpretation 149 section for each test step may provide insight into potential 150 issues within the network 152 In contrast to the above exclusions, the goals of this methodology 153 are to define a method to conduct a structured, end-to-end 154 assessment of sustained TCP performance within a managed business 155 class IP network. A key goal is to establish a set of "best 156 practices" that an engineer should apply when validating the 157 ability of a managed network to carry end-user TCP applications. 159 Some specific goals are to: 161 - Provide a practical test approach that specifies the more well 162 understood (and end-user configurable) TCP parameters such as Window 163 size, MSS, # connections, and how these affect the outcome of TCP 164 performance over a network 166 - Provide specific test conditions (link speed, RTT, window size, 167 etc.) and maximum achievable TCP throughput under TCP Equilbrium 168 conditions. For guideline purposes, provide examples of these test 169 conditions and the maximum achievable TCP throughput during the 170 equilbrium state. Section 2.1 provides specific details concerning 171 the definition of TCP Equilibrium within the context of this draft. 173 - In test situations where the recommended procedure does not yield 174 the maximum achievable TCP throughput result, this draft provides some 175 possible areas within the end host or network that should be 176 considered for investigation (although again, this draft is not 177 intended to provide a detailed diagnosis of these issues) 179 2.1 TCP Equilibrium State Throughput 181 TCP connections have three (3) fundamental congestion window phases 182 as documented in RFC2581. These states are: 184 - Slow Start, which occurs during the beginning of a TCP transmission 185 or after a retransmission time out event 187 - Congestion avoidance, which is the phase during which TCP ramps up 188 to establish the maximum attainable throughput on an end-end network 189 path. Retransmissions are a natural by-product of the TCP congestion 190 avoidance algorithm as it seeks to achieve maximum throughput on 191 the network path. 193 - Retransmission phase, which include Fast Retransmit (Tahoe) and Fast 194 Recovery (Reno and New Reno). When a packet is lost, the Congestion 195 avoidance phase transitions to a Fast Retransmission or Recovery 196 Phase dependent upon the TCP implementation. 198 The following diagram depicts these states. 200 | ssthresh 201 TCP | | 202 Through- | | Equilibrium 203 put | |\ /\/\/\/\/\ Retransmit /\/\ ... 204 | | \ / | Time-out / 205 | | \ / | _______ _/ 206 | Slow _/ |/ | / | Slow _/ 207 | Start _/ Congestion |/ |Start_/ Congestion 208 | _/ Avoidance Loss | _/ Avoidance 209 | _/ Event | _/ 210 | _/ |/ 211 |/___________________________________________________________ 212 Time 214 This TCP methodology provides guidelines to measure the equilibrium 215 throughput which refers to the maximum sustained rate obtained by 216 congestion avoidance before packet loss conditions occur (which would 217 cause the state change from congestion avoidance to a retransmission 218 phase). All maximum achievable throughputs specified in Section 3 are 219 with respect to this Equilibrium state. 221 3. TCP Throughput Testing Methodology 223 This section summarizes the specific test methodology to achieve the 224 goals listed in Section 2. 226 As stated in Section 1, it is considered best practice to verify 227 the integrity of the network by conducting Layer2/3 stress tests 228 such as RFC2544 or other methods of network stress tests. If the 229 network is not performing properly in terms of packet loss, jitter, 230 etc. then the TCP layer testing will not be meaningful since the 231 equalibrium throughput would be very difficult to achieve (in a 232 "dysfunctional" network). 234 The following provides the sequential order of steps to conduct the 235 TCP throughput testing methodology: 237 1. Identify the Path MTU. Packetization Layer Path MTU Discovery 238 or PLPMTUD (RFC4821) should be conducted to verify the minimum network 239 path MTU. Conducting PLPMTUD establishes the upper limit for the MSS 240 to be used in subsequent steps. 242 2. Baseline Round-trip Delay and Bandwidth. These measurements provide 243 estimates of the ideal TCP window size, which will be used in 244 subsequent test steps. 246 3. Single TCP Connection Throughput Tests. With baseline measurements 247 of round trip delay and bandwidth, a series of single connection TCP 248 throughput tests can be conducted to baseline the performance of the 249 network against expectations. 251 4. TCP MSS Throughput Testing. By varying the MSS size of the TCP 252 connection, the ability of the network to sustain expected TCP 253 throughput can be verified. 255 5. Multiple TCP Connection Throughput Tests. Single connection TCP 256 testing is a useful first step to measure expected versus actual TCP 257 performance. The multiple connection test more closely emulates 258 customer traffic, which comprise many TCP connections over a network 259 link. 261 Important to note are some of the key characteristics and 262 considerations for the TCP test instrument. The test host may be a 263 standard computer or dedicated communications test instrument 264 and these TCP test hosts be capable of emulating both a client and a 265 server. As a general rule of thumb, testing TCP throughput at rates 266 greater than 250-500 Mbit/sec generally requires high performance 267 server hardware or dedicated hardware based test tools. 269 Whether the TCP test host is a standard computer or dedicated test 270 instrument, the following areas should be considered when selecting 271 a test host: 273 - TCP implementation used by the test host OS, i.e. Linux OS kernel 274 using TCP Reno, TCP options supported, etc. This will obviously be 275 more important when using custom test equipment where the TCP 276 implementation may be customized or tuned to run in higher 277 performance hardware 279 - Most importantly, the TCP test host must be capable of generating 280 and receiving stateful TCP test traffic at the full link speed of the 281 network under test. This requirement is very serious and may require 282 custom test equipment, especially on 1 GigE and 10 GigE networks. 284 3.1. Determine Network Path MTU 286 TCP implementations should use Path MTU Discovery techniques (PMTUD), 287 but this technique does not always prove reliable in real world 288 situations. Since PMTUD relies on ICMP messages (to inform the host 289 that unfragmented transmission cannot occur), it's not always 290 reliable since many network managers completely disable ICMP. 292 Increasingly, network providers and enterprises are instituting fixed 293 MTU sizes on the hosts to eliminate TCP fragmentation issues. 295 Packetization Layer Path MTU Discovery or PLPMTUD (RFC4821) should 296 be conducted to verify the minimum network path MTU. PLPMTUD can 297 be used with or without ICMP. The following sections provide a 298 summary of the PLPMTUD approach and an example using the TCP 299 protocol. 301 RFC4821 specifies a search_high and search_low parameter for the 302 MTU. As specified in RFC4821, a value of 1024 is a generally safe 303 value to choose for search_low in modern networks. 305 It is important to determine the overhead of the links in the path, 306 and then to select a TCP MSS size corresponding to the Layer 3 MTU. 307 For example, if the MTU is 1024 bytes and the TCP/IP headers are 40 308 bytes, then the MSS would be set to 984 bytes. 310 An example scenario is a network where the actual path MTU is 1240 311 bytes. The TCP client probe MUST be capable of setting the MSS for 312 the probe packets and could start at MSS = 984 (which corresponds 313 to an MTU size of 1024 bytes). 315 The TCP client probe would open a TCP connection and advertise the 316 MSS as 984. Note that the client probe MUST generate these packets 317 with the DF bit set. The TCP client probe then sends test traffic 318 per a nominal window size (8KB, etc.). The window size should be 319 kept small to minimize the possibility of congesting the network, 320 which could induce congestive loss. The duration of the test should 321 also be short (10-30 seconds), again to minimize congestive effects 322 during the test. 324 In the example of a 1240 byte path MTU, probing with an MSS equal to 325 984 would yield a successful probe and the test client packets would 326 be successfully transferred to the test server. 328 Also note that the test client MUST verify that the MSS advertised 329 is indeed negotiated. Network devices with built-in Layer 4 330 capabilities can intercede during the connection establishment 331 process and reduce the advertised MSS to avoid fragmentation. This 332 is certainly a desirable feature from a network perspective, but 333 can yield erroneous test results if the client test probe does not 334 confirm the negotiated MSS. 336 The next test probe would use the search_high value and this would 337 be set to MSS = 1460 to correspond to a 1500 byte MTU. In this 338 example, the test client would retransmit based upon time-outs (since 339 no ACKs will be received from the test server). This test probe is 340 marked as a conclusive failure if none of the test packets are 341 ACK'ed. If any of the test packets are ACK'ed, congestive network 342 may be the cause and the test probe is not conclusive. Re-testing 343 at other times of the day is recommended to further isolate. 345 The test is repeated until the desired granularity of the MTU is 346 discovered. The method can yield precise results at the expense of 347 probing time. One approach would be to reduce the probe size to 348 half between the unsuccessful search_high and successful search_low 349 value, and increase by increments of 1/2 when seeking the upper 350 limit. 352 3.2. Baseline Round-trip Delay and Bandwidth 354 Before stateful TCP testing can begin, it is important to baseline 355 the round trip delay and bandwidth of the network to be tested. 356 These measurements provide estimates of the ideal TCP window size, 357 which will be used in subsequent test steps. These latency and 358 bandwidth tests should be run over a long enough period of time to 359 characterize the performance of the network over the course of a 360 meaningful time period. 362 One example would be to take samples during various times of the work 363 day. The goal would be to determine a representative minimum, average, 364 and maximum RTD and bandwidth for the network under test. Topology 365 changes are to be avoided during this time of initial convergence 366 (e.g. in crossing BGP4 boundaries). 368 In some cases, baselining bandwidth may not be required, since a 369 network provider's end-to-end topology may be well enough defined. 371 3.2.1 Techniques to Measure Round Trip Time 373 We follow in the definitions used in the references of the appendix; 374 hence Round Trip Time (RTT) is the time elapsed between the clocking 375 in of the first bit of a payload packet to the receipt of the last 376 bit of the corresponding acknowledgement. Round Trip Delay (RTD) 377 is used synonymously to twice the Link Latency. 379 In any method used to baseline round trip delay between network 380 end-points, it is important to realize that network latency is the 381 sum of inherent network delay and congestion. The RTT should be 382 baselined during "off-peak" hours to obtain a reliable figure for 383 network latency (versus additional delay caused by congestion). 385 During the actual sustained TCP throughput tests, it is critical 386 to measure RTT along with measured TCP throughput. Congestive 387 effects can be isolated if RTT is concurrently measured 389 This is not meant to provide an exhaustive list, but summarizes some 390 of the more common ways to determine round trip time (RTT) through 391 the network. The desired resolution of the measurement (i.e. msec 392 versus usec) may dictate whether the RTT measurement can be achieved 393 with standard tools such as ICMP ping techniques or whether 394 specialized test equipment would be required with high precision 395 timers. The objective in this section is to list several techniques 396 in order of decreasing accuracy. 398 - Use test equipment on each end of the network, "looping" the 399 far-end tester so that a packet stream can be measured end-end. This 400 test equipment RTT measurement may be compatible with delay 401 measurement protocols specified in RFC5357. 403 - Conduct packet captures of TCP test applications using for example 404 "iperf" or FTP, etc. By running multiple experiments, the packet 405 captures can be studied to estimate RTT based upon the SYN -> SYN-ACK 406 handshakes within the TCP connection set-up. 408 - ICMP Pings may also be adequate to provide round trip time 409 estimations. Some limitations of ICMP Ping are the msec resolution 410 and whether the network elements respond to pings (or block them). 412 3.2.2 Techniques to Measure End-end Bandwidth 414 There are many well established techniques available to provide 415 estimated measures of bandwidth over a network. This measurement 416 should be conducted in both directions of the network, especially for 417 access networks which are inherently asymmetrical. Some of the 418 asymmetric implications to TCP performance are documented in RFC-3449 419 and the results of this work will be further studied to determine 420 relevance to this draft. 422 The bandwidth measurement test must be run with stateless IP streams 423 (not stateful TCP) in order to determine the available bandwidth in 424 each direction. And this test should obviously be performed at 425 various intervals throughout a business day (or even across a week). 426 Ideally, the bandwidth test should produce a log output of the 427 bandwidth achieved across the test interval AND the round trip delay. 429 And during the actual TCP level performance measurements (Sections 430 3.3 - 3.5), the test tool must be able to track round trip time 431 of the TCP connection(s) during the test. Measuring round trip time 432 variation (aka "jitter") provides insight into effects of congestive 433 delay on the sustained throughput achieved for the TCP layer test. 435 3.3. Single TCP Connection Throughput Tests 437 This draft specifically defines TCP throughput techniques to verify 438 sustained TCP performance in a managed business network. Defined 439 in section 2.1, the equalibrium throughput reflects the maximum 440 rate achieved by a TCP connection within the congestion avoidance 441 phase on a end-end network path. This section and others will define 442 the method to conduct these sustained throughput tests and guidelines 443 of the predicted results. 445 With baseline measurements of round trip time and bandwidth 446 from section 3.2, a series of single connection TCP throughput tests 447 can be conducted to baseline the performance of the network against 448 expectations. The optimum TCP window size can be calculated from 449 the bandwidth delay product (BDP), which is: 451 BDP = RTT x Bandwidth 453 By dividing the BDP by 8, the "ideal" TCP window size is calculated. 454 An example would be a T3 link with 25 msec RTT. The BDP would equal 455 ~1,105,000 bits and the ideal TCP window would equal ~138,000 bytes. 457 The following table provides some representative network link speeds, 458 latency, BDP, and associated "optimum" TCP window size. Sustained 459 TCP transfers should reach nearly 100% throughput, minus the overhead 460 of Layers 1-3 and the divisor of the MSS into the window. 462 For this single connection baseline test, the MSS size will effect 463 the achieved throughput (especially for smaller TCP window sizes). 464 Table 3.2 provides the achievable, equalibrium TCP 465 throughput (at Layer 4) using 1000 byte MSS. Also in this table, 466 the case of 58 byte L1-L4 overhead including the Ethernet CRC32 is 467 used for simplicity. 469 Table 3.2: Link Speed, RTT and calculated BDP, TCP Throughput 471 Link Ideal TCP Maximum Achievable 472 Speed* RTT (ms) BDP (bits) Window (kbytes) TCP Throughput(Mbps) 473 ---------------------------------------------------------------------- 474 T1 20 30,720 3.84 1.20 475 T1 50 76,800 9.60 1.44 476 T1 100 153,600 19.20 1.44 477 T3 10 442,100 55.26 41.60 478 T3 15 663,150 82.89 41.13 479 T3 25 1,105,250 138.16 41.92 480 T3(ATM) 10 407,040 50.88 32.44 481 T3(ATM) 15 610,560 76.32 32.44 482 T3(ATM) 25 1,017,600 127.20 32.44 483 100M 1 100,000 12.50 90.699 484 100M 2 200,000 25.00 92.815 486 Link Ideal TCP Maximum Achievable 487 Speed* RTT (ms) BDP (bits) Window (kbytes) TCP Throughput (Mbps) 488 ---------------------------------------------------------------------- 489 100M 5 500,000 62.50 90.699 490 1Gig 0.1 100,000 12.50 906.991 491 1Gig 0.5 500,000 62.50 906.991 492 1Gig 1 1,000,000 125.00 906.991 493 10Gig 0.05 500,000 62.50 9,069.912 494 10Gig 0.3 3,000,000 375.00 9,069.912 496 * Note that link speed is the minimum link speed throughput a network; 497 i.e. WAN with T1 link, etc. 499 Also, the following link speeds (available payload bandwidth) were 500 used for the WAN entries: 502 - T1 = 1.536 Mbits/sec (B8ZS line encoding facility) 503 - T3 = 44.21 Mbits/sec (C-Bit Framing) 504 - T3(ATM) = 36.86 Mbits/sec (C-Bit Framing & PLCP, 96000 Cells per 505 second) 507 The calculation method used in this document is a 3 step process : 509 1 - We determine what should be the optimal TCP Window size value 510 based on the optimal quantity of "in-flight" octets discovered by 511 the BDP calculation. We take into consideration that the TCP 512 Window size has to be an exact multiple value of the MSS. 513 2 - Then we calculate the achievable layer 2 throughput by multiplying 514 the value determined in step 1 with the MSS & (MSS + L2 + L3 + L4 515 Overheads) divided by the RTT. 516 3 - Finally, we multiply the calculated value of step 2 by the MSS 517 versus (MSS + L2 + L3 + L4 Overheads) ratio. 519 This gives us the achievable TCP Throughput value. Sometimes, the 520 maximum achievable throughput is limited by the maximum achievable 521 quantity of Ethernet Frames per second on the physical media. Then 522 this value is used in step 2 instead of the calculated one. 524 There are several TCP tools that are commonly used in the network 525 provider world and one of the most common is the "iperf" tool. With 526 this tool, hosts are installed at each end of the network segment; 527 one as client and the other as server. The TCP Window size of both 528 the client and the server can be maunally set and the achieved 529 throughput is measured, either uni-directionally or bi-directionally. 530 For higher BDP situations in lossy networks (long fat networks or 531 satellite links, etc.), TCP options such as Selective Acknowledgment 532 should be considered and also become part of the window 533 size / throughput characterization. 535 The following diagram shows the achievable TCP throughput on a T3 with 536 the default Windows2000/XP TCP Window size of 17520 Bytes. 538 45| 539 | 540 40| 541 TCP | 542 Throughput 35| 543 in Mbps | 544 30| 545 | 546 25| 547 | 548 20| 549 | 550 15| _______ 14.48M 551 | | | 552 10| | | +-----+ 9.65M 553 | | | | | _______ 5.79M 554 5| | | | | | | 555 |_________+_____+_________+_____+________+____ +___________ 556 10 15 25 557 RTT in milliseconds 559 The following diagram shows the achievable TCP throughput on a 25ms T3 560 when the TCP Window size is increased and with the RFC1323 TCP Window 561 scaling option. 563 45| 564 | +-----+42.47M 565 40| | | 566 TCP | | | 567 Throughput 35| | | 568 in Mbps | | | 569 30| | | 570 | | | 571 25| | | 572 | ______ 21.23M | | 573 20| | | | | 574 | | | | | 575 15| | | | | 576 | | | | | 577 10| +----+10.62M | | | | 578 | _______5.31M | | | | | | 579 5| | | | | | | | | 580 |__+_____+______+____+___________+____+________+_____+___ 581 16 32 64 128 582 TCP Window size in Kili Bytes 584 The single connection TCP throughput test must be run over a 585 a long duration and results must be logged at the desired interval. 586 The test must record RTT and TCP retransmissions at each interval. 588 This correlation of retransmissions and RTT over the course of the 589 test will clearly identify which portions of the transfer reached 590 TCP Equilbrium state and to what effect increased RTT (congestive 591 effects) may have been the cause of reduced equilibrium performance. 593 Host hardware performance must be well understood before conducting 594 this TCP single connection test and other tests in this section. 595 Dedicated test equipment may be required, especially for line rates 596 of GigE and 10 GigE. 598 3.3.1 Interpretation of the Single Connection TCP Throughput Results 600 At the end of this step, the user will document the theoretical BDP 601 and a set of Window size experiments with measured TCP throughput for 602 each TCP window size setting. For cases where the sustained TCP 603 throughput does not equal the predicted value, some possible causes 604 are listed: 606 - Network congestion causing packet loss 607 - Network congestion not causing packet loss, but effectively 608 increasing the size of the required TCP window during the transfer 609 - Intermediate network devices which actively regenerate the TCP 610 connection and can alter window size, MSS, etc. 612 3.4. TCP MSS Throughput Testing 614 This test setup should be conducted as a single TCP connection test. 615 By varying the MSS size of the TCP connection, the ability of the 616 network to sustain expected TCP throughput can be verified. This is 617 similar to frame and packet size techniques within RFC2-2544, which 618 aim to determine the ability of the routing/switching devices to 619 handle loads in term of packets/frames per second at various frame 620 and packet sizes. This test can also further characterize the 621 performance of a network in the presence of active TCP elements 622 (proxies, etc.), devices that fragment IP packets, and the actual 623 end hosts themselves (servers, etc.). 625 3.4.1 MSS Size Testing Method 627 The single connection testing listed in Section 3.3 should be 628 repeated, using the appropriate window size and collecting 629 throughput measurements per various MSS sizes. 631 The following are the typical sizes of MSS settings for various 632 link speeds: 634 - 256 bytes for very low speed links such as 9.6Kbps (per RFC1144). 635 - 536 bytes for low speed links (per RFC879) . 636 - 966 bytes for SLIP high speed (per RFC1055). 637 - 1380 bytes for IPSec VPN Tunnel testing 638 - 1452 bytes for PPPoE connectivity (per RFC2516) 639 - 1460 for Ethernet and Fast Ethernet (per RFC895). 640 - 8960 byte jumbo frames for GigE 642 Using the optimum window size determined by conducting steps 3.2 and 643 3.3, a variety of window sizes should be tested according to the link 644 speed under test. Using Fast Ethernet with 5 msec RTT as an example, 645 the optimum TCP window size would be 62.5 kbytes and the recommended 646 MSS for Fast Ethernet is 1460 bytes. 648 Link Achievable TCP Throughput (Mbps) for 649 Speed RTT(ms) MSS=1000 MSS=1260 MSS=1300 MSS=1380 MSS=1420 MSS=1460 650 ---------------------------------------------------------------------- 651 T1 20 | 1.20 1.008 1.040 1.104 1.136 1.168 652 T1 50 | 1.44 1.411 1.456 1.335 1.363 1.402 653 T1 100 | 1.44 1.512 1.456 1.435 1.477 1.402 654 T3 10 | 41.60 42.336 42.640 41.952 40.032 42.048 655 T3 15 | 42.13 42.336 42.293 42.688 42.411 42.048 656 T3 25 | 41.92 42.336 42.432 42.394 42.714 42.515 657 T3(ATM) 10 | 32.44 33.815 34.477 35.482 36.022 36.495 658 T3(ATM) 15 | 32.44 34.120 34.477 35.820 36.022 36.127 659 T3(ATM) 25 | 32.44 34.363 34.860 35.684 36.022 36.274 660 100M 1 | 90.699 89.093 91.970 86.866 89.424 91.982 661 100M 2 | 92.815 93.226 93.275 88.505 90.973 93.442 662 100M 5 | 90.699 92.481 92.697 88.245 90.844 93.442 664 For GigE and 10GigE, Jumbo frames (9000 bytes) are becoming more 665 common. The following table adds jumbo frames to the possible MSS 666 values. 668 Link Achievable TCP Throughput (Mbps) for 669 Speed RTT(ms) MSS=1260 MSS=1300 MSS=1380 MSS=1420 MSS=1460 MSS=8960 670 ---------------------------------------------------------------------- 671 1Gig 0.1 | 924.812 926.966 882.495 894.240 919.819 713.786 672 1Gig 0.5 | 924.812 926.966 930.922 932.743 934.467 856.543 673 1Gig 1.0 | 924.812 926.966 930.922 932.743 934.467 927.922 674 10Gig 0.05| 9248.125 9269.655 9309.218 9839.790 9344.671 8565.435 675 10Gig 0.3 | 9248.125 9269.655 9309.218 9839.790 9344.671 9755.079 677 Each row in the table is a separate test that should be conducted 678 over a predetermined test interval and the throughput,retransmissions, 679 and RTT logged during the entire test interval. 681 3.4.2 Interpretation of TCP MSS Throughput Results 683 For cases where the predicted TCP throughput does not equal the 684 predicted throughput predicted for a given MSS, some possible causes 685 are listed: 687 - TBD 689 3.5. Multiple TCP Connection Throughput Tests 691 After baselining the network under test with a single TCP connection 692 (Section 3.3), the nominal capacity of the network has been 693 determined. The capacity measured in section 3.3 may be a capacity 694 range and it is reasonable that some level of tuning may have been 695 required (i.e. router shaping techniques employed, intermediary 696 proxy like devices tuned, etc.). 698 Single connection TCP testing is a useful first step to measure 699 expected versus actual TCP performance and as a means to diagnose 700 / tune issues in the network and active elements. However, the 701 ultimate goal of this methodology is to more closely emulate customer 702 traffic, which comprise many TCP connections over a network link. 703 This methodology inevitably seeks to provide the framework for 704 testing stateful TCP connections in concurrence with stateless 705 traffic streams, and this is described in Section 3.5. 707 3.5.1 Multiple TCP Connections - below Link Capacity 709 First, the ability of the network to carry multiple TCP connections 710 to full network capacity should be tested. Prioritization and QoS 711 settings are not considered during this step, since the network 712 capacity is not to be exceeded by the test traffic (section 3.5.2 713 covers the over capacity test case). 715 For this multiple connection TCP throughput test, the number of 716 connections will more than likely be limited by the test tool (host 717 vs. dedicated test equipment). As an example, for a GigE link with 718 1 msec RTT, the optimum TCP window would equal ~128 KBytes. So under 719 this condition, 8 concurrent connections with window size equal to 720 16KB would fill the GigE link. For 10G, 80 connections would be 721 required to accomplish the same. 723 Just as in section 3.3, the end host or test tool can not be the 724 processing bottleneck or the throughput measurements will not be 725 valid. The test tool must be benchmarked in ideal lab conditions to 726 verify it's ability to transfer stateful TCP traffic at the given 727 network line rate. 729 For this test step, it should be conducted over a reasonable test 730 duration and results should be logged per interval such as throughput 731 per connection, RTT, and retransmissions. 733 Since the network is not to be driven into over capacity (by nature 734 of the BDP allocated evenly to each connection), this test verifies 735 the ability of the network to carry multiple TCP connections up to 736 the link speed of the network. 738 3.5.2 Multiple TCP Connections - over Link Capacity 740 In this step, the network bandwidth is intentionally exceeded with 741 multiple TCP connections to test expected prioritization and queuing 742 within the network. 744 All conditions related to Section 3.3 set-up apply, especially the 745 ability of the test hosts to transfer stateful TCP traffic at network 746 line rates. 748 Using the same example from Section 3.3, a GigE link with 1 msec 749 RTT would require a window size of 128 KB to fill the link (with 750 one TCP connection). Assuming a 16KB window, 8 concurrent 751 connections would fill the GigE link capacity and values higher than 752 8 would over-subscribe the network capacity. The user would select 753 values to over-subscribe the network (i.e. possibly 10 15, 20, etc.) 754 to conduct experiments to verify proper prioritization and queuing 755 within the network. 757 3.5.3 Interpretation of Multiple TCP Connection Test Restults 759 Without any prioritization in the network, the over subscribed test 760 results could assist in the queuing studies. With proper queuing, 761 the bandwidth should be shared in a reasonable manner. The author 762 understands that the term "reasonable" is too wide open, and future 763 draft versions of this memo would attempt to quantify this sharing 764 in more tangible terms. It is known that if a network element 765 is not set for proper queuing (i.e. FIFO), then an oversubscribed 766 TCP connection test will generally show a very uneven distribution of 767 bandwidth. 769 With prioritization in the network, different TCP connections can be 770 assigned various QoS settings via the various mechanisms (i.e. per 771 VLAN, DSCP, etc.), and the higher priority connections must be 772 verified to achieve the expected throughput. 774 4. Acknowledgements 776 The author would like to thank Gilles Forget, Loki Jorgenson, 777 and Reinhard Schrage for technical review and contributions to this 778 draft-00 memo. 780 Also thanks to Matt Mathis and Matt Zekauskas for many good comments 781 through email exchange and for pointing me to great sources of 782 information pertaining to past works in the TCP capacity area. 784 5. References 786 [RFC2581] Allman, M., Paxson, V., Stevens W., "TCP Congestion 787 Control", RFC 2581, May 1999. 789 [RFC3148] Mathis M., Allman, M., "A Framework for Defining 790 Empirical Bulk Transfer Capacity Metrics", RFC 3148, July 791 2001. 793 [RFC2544] Bradner, S., McQuaid, J., "Benchmarking Methodology for 794 Network Interconnect Devices", RFC 2544, May 1999 796 [RFC3449] Balakrishnan, H., Padmanabhan, V. N., Fairhurst, G., 797 Sooriyabandara, M., "TCP Performance Implications of 798 Network Path Asymmetry", RFC 3449, December 2002 800 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz, 801 J., "A Two-Way Active Measurement Protocol (TWAMP)", 802 RFC 5357, October 2008 804 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 805 Discovery", RFC 4821, May 2007 807 draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk 808 Transfer Capacity Methodology for Cooperating Hosts", 809 August 2001 811 [MSMO] The Macroscopic Behavior of the TCP Congestion Avoidance 812 Algorithm Mathis, M.,Semke, J, Mahdavi, J, Ott, T 813 July 1997 SIGCOMM Computer Communication Review, 814 Volume 27 Issue 3 816 [Stevens Vol1] TCP/IP Illustrated, Vol1, The Protocols 817 Addison-Wesley 819 Authors' Addresses 821 Barry Constantine 822 JDSU, Test and Measurement Division 823 One Milesone Center Court 824 Germantown, MD 20876-7100 825 USA 827 Phone: +1 240 404 2227 828 Email: barry.constantine@jdsu.com 830 Gilles Forget 831 Independent Consultant to Bell Canada. 832 308, rue de Monaco, St-Eustache 833 Qc. CANADA, Postal Code : J7P-4T5 835 Phone: (514) 895-8212 836 gilles.forget@sympatico.ca 838 Loki Jorgenson 839 Apparent Networks 841 Phone: (604) 433-2333 ext 105 842 ljorgenson@apparentnetworks.com 844 Reinhard Schrage 845 Schrage Consulting 847 Phone: +49 (0) 5137 909540 848 reinhard@schrageconsult.com