idnits 2.17.1 draft-ietf-ippm-tcp-throughput-tm-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == It seems as if not all pages are separated by form feeds - found 0 form feeds but 20 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 67 instances of too long lines in the document, the longest one being 2 characters in excess of 72. ** There are 2 instances of lines with control characters in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 374: '...TCP client probe MUST be capable of se...' RFC 2119 keyword, line 379: '...the client probe MUST generate these p...' RFC 2119 keyword, line 391: '... the test client MUST verify that the ...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 684 has weird spacing: '... Window to ...' -- The document date (August 12, 2010) is 4999 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2581' is defined on line 883, but no explicit reference was found in the text == Unused Reference: 'RFC3148' is defined on line 886, but no explicit reference was found in the text == Unused Reference: 'RFC2544' is defined on line 889, but no explicit reference was found in the text == Unused Reference: 'RFC3449' is defined on line 892, but no explicit reference was found in the text == Unused Reference: 'RFC5357' is defined on line 896, but no explicit reference was found in the text == Unused Reference: 'RFC4821' is defined on line 900, but no explicit reference was found in the text == Unused Reference: 'MSMO' is defined on line 906, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) Summary: 8 errors (**), 0 flaws (~~), 10 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group B. Constantine 2 Internet-Draft JDSU 3 Intended status: Informational G. Forget 4 Expires: February 12, 2011 Bell Canada (Ext. Consultant) 5 L. Jorgenson 6 nooCore 7 Reinhard Schrage 8 Schrage Consulting 9 August 12, 2010 11 TCP Throughput Testing Methodology 12 draft-ietf-ippm-tcp-throughput-tm-05.txt 14 Abstract 16 This memo describes a methodology for measuring sustained TCP 17 throughput performance in an end-to-end managed network environment. 18 This memo is intended to provide a practical approach to help users 19 validate the TCP layer performance of a managed network, which should 20 provide a better indication of end-user application level experience. 21 In the methodology, various TCP and network parameters are identified 22 that should be tested as part of the network verification at the TCP 23 layer. 25 Status of this Memo 27 This Internet-Draft is submitted to IETF in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as Internet- 33 Drafts. Creation date August 12, 2010. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 The list of current Internet-Drafts can be accessed at 41 http://www.ietf.org/ietf/1id-abstracts.txt. 43 The list of Internet-Draft Shadow Directories can be accessed at 44 http://www.ietf.org/shadow.html. 46 This Internet-Draft will expire on February 12, 2011. 48 Copyright Notice 50 Copyright (c) 2010 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Goals of this Methodology. . . . . . . . . . . . . . . . . . . 4 67 2.1 TCP Equilibrium State Throughput . . . . . . . . . . . . . 5 68 2.2 Metrics for TCP Throughput Tests . . . . . . . . . . . . . 6 69 3. TCP Throughput Testing Methodology . . . . . . . . . . . . . . 7 70 3.1 Determine Network Path MTU . . . . . . . . . . . . . . . . 8 71 3.2. Baseline Round-trip Delay and Bandwidth. . . . . . . . . . 10 72 3.2.1 Techniques to Measure Round Trip Time . . . . . . . . 10 73 3.2.2 Techniques to Measure End-end Bandwidth . . . . . . . 11 74 3.3. TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 11 75 3.3.1 Calculate Optimum TCP Window Size. . . . . . . . . . . 12 76 3.3.2 Conducting the TCP Throughput Tests. . . . . . . . . . 14 77 3.3.3 Single vs. Multiple TCP Connection Testing . . . . . . 15 78 3.3.4 Interpretation of the TCP Throughput Results . . . . . 16 79 3.4. Traffic Management Tests . . . . . . . . . . . . . . . . . 16 80 3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 16 81 3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 17 82 3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 18 83 3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 18 84 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 85 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 88 1. Introduction 90 Testing an operational network prior to customer activation is referred 91 to as "turn-up" testing and the SLA is generally Layer 2/3 packet 92 throughput, delay, loss and jitter. 94 Network providers are coming to the realization that Layer 2/3 testing 95 and TCP layer testing are required to more adequately ensure end-user 96 satisfaction. Therefore, the network provider community desires to 97 measure network throughput performance at the TCP layer. Measuring 98 TCP throughput provides a meaningful measure with respect to the end 99 user's application SLA (and ultimately reach some level of TCP 100 testing interoperability which does not exist today). 102 Additionally, end-users (business enterprises) seek to conduct 103 repeatable TCP throughput tests between enterprise locations. Since 104 these enterprises rely on the networks of the providers, a common test 105 methodology (and metrics) would be equally beneficial to both parties. 107 So the intent behind this TCP throughput draft is to define 108 a methodology for testing sustained TCP layer performance. In this 109 document, sustained TCP throughput is that amount of data per unit 110 time that TCP transports during equilibrium (steady state), i.e. 111 after the initial slow start phase. We refer to this state as TCP 112 Equilibrium, and that the equalibrium throughput is the maximum 113 achievable for the TCP connection(s). 115 There are many variables to consider when conducting a TCP throughput 116 test and this methodology focuses on some of the most common 117 parameters that should be considered such as: 119 - Path MTU and Maximum Segment Size (MSS) 120 - RTT and Bottleneck BW 121 - Ideal TCP Window (Bandwidth Delay Product) 122 - Single Connection and Multiple Connection testing 124 One other important note, it is highly recommended that traditional 125 Layer 2/3 type tests are conducted to verify the integrity of the 126 network before conducting TCP tests. Examples include RFC2544, iperf 127 (UDP mode), or manual packet layer test techniques where packet 128 throughput, loss, and delay measurements are conducted. 130 2. Goals of this Methodology 132 Before defining the goals of this methodology, it is important to 133 clearly define the areas that are not intended to be measured or 134 analyzed by such a methodology. 136 - The methodology is not intended to predict TCP throughput 137 behavior during the transient stages of a TCP connection, such 138 as initial slow start. 140 - The methodology is not intended to definitively benchmark TCP 141 implementations of one OS to another, although some users may find 142 some value in conducting qualitative experiments 144 - The methodology is not intended to provide detailed diagnosis 145 of problems within end-points or the network itself as related to 146 non-optimal TCP performance, although a results interpretation 147 section for each test step may provide insight into potential 148 issues within the network 150 In contrast to the above exclusions, the goals of this methodology 151 are to define a method to conduct a structured, end-to-end 152 assessment of sustained TCP performance within a managed business 153 class IP network. A key goal is to establish a set of "best 154 practices" that an engineer should apply when validating the 155 ability of a managed network to carry end-user TCP applications. 157 Some specific goals are to: 159 - Provide a practical test approach that specifies the more well 160 understood (and end-user configurable) TCP parameters such as Window 161 size, MSS (Maximum Segment Size), # connections, and how these affect 162 the outcome of TCP performance over a network. 164 - Provide specific test conditions (link speed, RTT, window size, 165 etc.) and maximum achievable TCP throughput under TCP Equilbrium 166 conditions. For guideline purposes, provide examples of these test 167 conditions and the maximum achievable TCP throughput during the 168 equilbrium state. Section 2.1 provides specific details concerning 169 the definition of TCP Equilibrium within the context of this draft. 171 - Define two (2) basic metrics that can be used to compare the 172 performance of TCP connections under various network conditions 174 - In test situations where the recommended procedure does not yield 175 the maximum achievable TCP throughput result, this draft provides some 176 possible areas within the end host or network that should be 177 considered for investigation (although again, this draft is not 178 intended to provide a detailed diagnosis of these issues) 180 2.1 TCP Equilibrium State Throughput 182 TCP connections have three (3) fundamental congestion window phases 183 as documented in RFC2581. These states are: 185 - Slow Start, which occurs during the beginning of a TCP transmission 186 or after a retransmission time out event 188 - Congestion avoidance, which is the phase during which TCP ramps up 189 to establish the maximum attainable throughput on an end-end network 190 path. Retransmissions are a natural by-product of the TCP congestion 191 avoidance algorithm as it seeks to achieve maximum throughput on 192 the network path. 194 - Retransmission phase, which include Fast Retransmit (Tahoe) and Fast 195 Recovery (Reno and New Reno). When a packet is lost, the Congestion 196 avoidance phase transitions to a Fast Retransmission or Recovery 197 Phase dependent upon the TCP implementation. 199 The following diagram depicts these states. 201 | ssthresh 202 TCP | | 203 Through- | | Equilibrium 204 put | |\ /\/\/\/\/\ Retransmit /\/\ ... 205 | | \ / | Time-out / 206 | | \ / | _______ _/ 207 | Slow _/ |/ | / | Slow _/ 208 | Start _/ Congestion |/ |Start_/ Congestion 209 | _/ Avoidance Loss | _/ Avoidance 210 | _/ Event | _/ 211 | _/ |/ 212 |/___________________________________________________________ 213 Time 215 This TCP methodology provides guidelines to measure the equilibrium 216 throughput which refers to the maximum sustained rate obtained by 217 congestion avoidance before packet loss conditions occur (which would 218 cause the state change from congestion avoidance to a retransmission 219 phase). All maximum achievable throughputs specified in Section 3 are 220 with respect to this Equilibrium state. 222 2.2 Metrics for TCP Throughput Tests 224 This draft focuses on a TCP throughtput methodology and also 225 provides two basic metrics to compare results of various throughput 226 tests. It is recognized that the complexity and unpredictability of 227 TCP makes it impossible to develop a complete set of metrics that 228 account for the myriad of variables (i.e. RTT variation, loss 229 conditions, TCP implementation, etc.). However, these two basic 230 metrics faciliate TCP throughput comparisons under varying network 231 conditions and between network traffic management techniques. 233 The TCP Efficiency metric is the percentage of bytes that were not 234 retransmitted and is defined as: 236 Transmitted Bytes - Retransmitted Bytes 237 --------------------------------------- x 100 238 Transmitted Bytes 240 This metric provides a comparative measure between various QoS 241 mechanisms such as traffic management, congestion avoidance, and also 242 various TCP implementations (i.e. Reno, Vegas, etc.). 244 As an example, if 100,000 bytes were sent and 2,000 had to be 245 retransmitted, the TCP Efficiency would be calculated as: 247 100,000 - 2,000 248 ---------------- x 100 = 98% 249 100,000 251 Note that the retranmitted bytes may have occurred more than once, 252 and these multiple retransmissions are added to the bytes retransmitted 253 count. 255 The second metric is the TCP Transfer Time, which is simply the time 256 it takes to transfer a block of data across simultaneous TCP 257 connections. The concept is useful when benchmarking traffic 258 management techniques, where multiple connections are generally 259 required. 261 The TCP Transfer time can also be used to provide a normalized ratio of 262 the actual TCP Transfer Time versus ideal Transfer Time. This ratio 263 is called the TCP Transfer Index and is defined as: 265 Actual TCP Transfer Time 266 ------------------------- 267 Ideal TCP Transfer Time 269 An example would be the bulk transfer of 100 MB upon 5 simultaneous TCP 270 connections over a 500 Mbit/s Ethernet service (each connection 271 uploading 100 MB). Each connection may achieve different throughputs 272 during a test and the overall throughput rate is not always easy to 273 determine (especially as the number of connections increases). 275 The ideal TCP Transfer Time would be ~8 seconds, but in this example, 276 the actual TCP Transfer Time was 12 seconds. The TCP Transfer Index 277 would be 12/8 = 1.5, which indicates that the transfer across all 278 connections took 1.5 times longer than the ideal. 280 Note that both the TCP Efficiency and TCP Transfer Time metrics must be 281 measured during each throughput test. The correlation of TCP Transfer 282 Time with TCP Efficiency can help to diagnose whether the TCP Transfer 283 Time was negatively impacted by retransmissions (poor TCP Efficiency). 285 3. TCP Throughput Testing Methodology 287 As stated in Section 1, it is considered best practice to verify 288 the integrity of the network by conducting Layer2/3 stress tests 289 such as RFC2544 (or other methods of network stress tests). If the 290 network is not performing properly in terms of packet loss, jitter, 291 etc. then the TCP layer testing will not be meaningful since the 292 equalibrium throughput would be very difficult to achieve (in a 293 "dysfunctional" network). 295 The following represents the sequential order of steps to conduct the 296 TCP throughput testing methodology: 298 1. Identify the Path MTU. Packetization Layer Path MTU Discovery 299 or PLPMTUD (RFC4821) should be conducted to verify the minimum network 300 path MTU. Conducting PLPMTUD establishes the upper limit for the MSS 301 to be used in subsequent steps. 303 2. Baseline Round-trip Delay and Bandwidth. These measurements provide 304 estimates of the ideal TCP window size, which will be used in 305 subsequent test steps. 307 3. TCP Connection Throughput Tests. With baseline measurements 308 of round trip delay and bandwidth, a series of single and multiple TCP 309 connection throughput tests can be conducted to baseline the network 310 performance expectations. 312 4. Traffic Management Tests. Various traffic management and queuing 313 techniques are tested in this step, using multiple TCP connections. 314 Multiple connection testing can verify that the network is configured 315 properly for traffic shaping versus policing, various queuing 316 implementations, and RED. 318 Important to note are some of the key characteristics and 319 considerations for the TCP test instrument. The test host may be a 320 standard computer or dedicated communications test instrument 321 and these TCP test hosts be capable of emulating both a client and a 322 server. 324 Whether the TCP test host is a standard computer or dedicated test 325 instrument, the following areas should be considered when selecting 326 a test host: 328 - TCP implementation used by the test host OS, i.e. Linux OS kernel 329 using TCP Reno, TCP options supported, etc. This will obviously be 330 more important when using custom test equipment where the TCP 331 implementation may be customized or tuned to run in higher 332 performance hardware 334 - Most importantly, the TCP test host must be capable of generating 335 and receiving stateful TCP test traffic at the full link speed of the 336 network under test. As a general rule of thumb, testing TCP throughput 337 at rates greater than 100 Mbit/sec generally requires high 338 performance server hardware or dedicated hardware based test tools. 340 - To measure RTT and TCP Efficiency per connection, this will generally 341 require dedicated hardware based test tools. In the absence of 342 dedciated hardware based test tools, these measurements may need to be 343 conducted with packet capture tools (conduct TCP throughput tests and 344 analyze RTT and retransmission results with packet captures). 346 3.1. Determine Network Path MTU 348 TCP implementations should use Path MTU Discovery techniques (PMTUD). 349 PMTUD relies on ICMP 'need to frag' messages to learn the path MTU. 350 When a device has a packet to send which has the Don't Fragment (DF) 351 bit in the IP header set and the packet is larger than the Maximum 352 Transmission Unit (MTU) of the next hop link, the packet is dropped 353 and the device sends an ICMP 'need to frag' message back to the host 354 that originated the packet. The ICMP 'need to frag' message includes 355 the next hop MTU which PMTUD uses to tune the TCP Maximum Segment 356 Size (MSS). Unfortunately, because many network managers completely 357 disable ICMP, this technique does not always prove reliable in real 358 world situations. 360 Packetization Layer Path MTU Discovery or PLPMTUD (RFC4821) should 361 be conducted to verify the minimum network path MTU. PLPMTUD can 362 be used with or without ICMP. The following sections provide a 363 summary of the PLPMTUD approach and an example using the TCP 364 protocol. RFC4821 specifies a search_high and search_low parameter 365 for the MTU. As specified in RFC4821, a value of 1024 is a generally 366 safe value to choose for search_low in modern networks. 368 It is important to determine the overhead of the links in the path, 369 and then to select a TCP MSS size corresponding to the Layer 3 MTU. 370 For example, if the MTU is 1024 bytes and the TCP/IP headers are 40 371 bytes, then the MSS would be set to 984 bytes. 373 An example scenario is a network where the actual path MTU is 1240 374 bytes. The TCP client probe MUST be capable of setting the MSS for 375 the probe packets and could start at MSS = 984 (which corresponds 376 to an MTU size of 1024 bytes). 378 The TCP client probe would open a TCP connection and advertise the 379 MSS as 984. Note that the client probe MUST generate these packets 380 with the DF bit set. The TCP client probe then sends test traffic 381 per a nominal window size (8KB, etc.). The window size should be 382 kept small to minimize the possibility of congesting the network, 383 which could induce congestive loss. The duration of the test should 384 also be short (10-30 seconds), again to minimize congestive effects 385 during the test. 387 In the example of a 1240 byte path MTU, probing with an MSS equal to 388 984 would yield a successful probe and the test client packets would 389 be successfully transferred to the test server. 391 Also note that the test client MUST verify that the MSS advertised 392 is indeed negotiated. Network devices with built-in Layer 4 393 capabilities can intercede during the connection establishment 394 process and reduce the advertised MSS to avoid fragmentation. This 395 is certainly a desirable feature from a network perspective, but 396 can yield erroneous test results if the client test probe does not 397 confirm the negotiated MSS. 399 The next test probe would use the search_high value and this would 400 be set to MSS = 1460 to correspond to a 1500 byte MTU. In this 401 example, the test client would retransmit based upon time-outs (since 402 no ACKs will be received from the test server). This test probe is 403 marked as a conclusive failure if none of the test packets are 404 ACK'ed. If any of the test packets are ACK'ed, congestive network 405 may be the cause and the test probe is not conclusive. Re-testing 406 at other times of the day is recommended to further isolate. 408 The test is repeated until the desired granularity of the MTU is 409 discovered. The method can yield precise results at the expense of 410 probing time. One approach would be to reduce the probe size to 411 half between the unsuccessful search_high and successful search_low 412 value, and increase by increments of 1/2 when seeking the upper 413 limit. 415 3.2. Baseline Round-trip Delay and Bandwidth 417 Before stateful TCP testing can begin, it is important to baseline 418 the round trip delay and bandwidth of the network to be tested. 419 These measurements provide estimates of the ideal TCP window size, 420 which will be used in subsequent test steps. These latency and 421 bandwidth tests should be run during the time of day for which 422 the TCP throughput tests will occur. 424 The baseline RTT is used to predict the bandwidth delay product and 425 the TCP Transfer Time for the subsequent throughput tests. Since this 426 methodology requires that RTT be measured during the entire throughput 427 test, the extent by which the RTT varied during the throughput test can 428 be quantified. 430 3.2.1 Techniques to Measure Round Trip Time 432 Following the definitions used in the references of the appendix; 433 Round Trip Time (RTT) is the time elapsed between the clocking in of 434 the first bit of a payload packet to the receipt of the last bit of the 435 corresponding acknowledgement. Round Trip Delay (RTD) is used 436 synonymously to twice the Link Latency. 438 In any method used to baseline round trip delay between network 439 end-points, it is important to realize that network latency is the 440 sum of inherent network delay and congestion. The RTT should be 441 baselined during "off-peak" hours to obtain a reliable figure for 442 network latency (versus additional delay caused by congestion). 444 During the actual sustained TCP throughput tests, it is critical 445 to measure RTT along with measured TCP throughput. Congestive 446 effects can be isolated if RTT is concurrently measured. 448 This is not meant to provide an exhaustive list, but summarizes some 449 of the more common ways to determine round trip time (RTT) through 450 the network. The desired resolution of the measurement (i.e. msec 451 versus usec) may dictate whether the RTT measurement can be achieved 452 with standard tools such as ICMP ping techniques or whether 453 specialized test equipment would be required with high precision 454 timers. The objective in this section is to list several techniques 455 in order of decreasing accuracy. 457 - Use test equipment on each end of the network, "looping" the 458 far-end tester so that a packet stream can be measured end-end. This 459 test equipment RTT measurement may be compatible with delay 460 measurement protocols specified in RFC5357. 462 - Conduct packet captures of TCP test applications using for example 463 "iperf" or FTP, etc. By running multiple experiments, the packet 464 captures can be studied to estimate RTT based upon the SYN -> SYN-ACK 465 handshakes within the TCP connection set-up. 467 - ICMP Pings may also be adequate to provide round trip time 468 estimations. Some limitations of ICMP Ping are the msec resolution 469 and whether the network elements respond to pings (or block them). 471 3.2.2 Techniques to Measure End-end Bandwidth 473 There are many well established techniques available to provide 474 estimated measures of bandwidth over a network. This measurement 475 should be conducted in both directions of the network, especially for 476 access networks which are inherently asymmetrical. Some of the 477 asymmetric implications to TCP performance are documented in RFC-3449 478 and the results of this work will be further studied to determine 479 relevance to this draft. 481 The bandwidth measurement test must be run with stateless IP streams 482 (not stateful TCP) in order to determine the available bandwidth in 483 each direction. And this test should obviously be performed at 484 various intervals throughout a business day (or even across a week). 485 Ideally, the bandwidth test should produce a log output of the 486 bandwidth achieved across the test interval AND the round trip delay. 488 And during the actual TCP level performance measurements (Sections 489 3.3 - 3.5), the test tool must be able to track round trip time 490 of the TCP connection(s) during the test. Measuring round trip time 491 variation (aka "jitter") provides insight into effects of congestive 492 delay on the sustained throughput achieved for the TCP layer test. 494 3.3. TCP Throughput Tests 496 This draft specifically defines TCP throughput techniques to verify 497 sustained TCP performance in a managed business network. Defined 498 in section 2.1, the equalibrium throughput reflects the maximum 499 rate achieved by a TCP connection within the congestion avoidance 500 phase on a end-end network path. This section and others will define 501 the method to conduct these sustained throughput tests and guidelines 502 of the predicted results. 504 With baseline measurements of round trip time and bandwidth 505 from section 3.2, a series of single and multiple TCP connection 506 throughput tests can be conducted to baseline network performance 507 against expectations. 509 It is recommended to run the tests in each direction independently 510 first, then run both directions simultaneously. In each case, the TCP 511 Efficiency and TCP Transfer Time metrics must be measured in each 512 direction. 514 3.3.1 Calculate Optimum TCP Window Size 516 The optimum TCP window size can be calculated from the bandwidth delay 517 product (BDP), which is: 519 BDP (bits) = RTT (sec) x Bandwidth (bps) 521 By dividing the BDP by 8, the "ideal" TCP window size is calculated. 522 An example would be a T3 link with 25 msec RTT. The BDP would equal 523 ~1,105,000 bits and the ideal TCP window would equal ~138,000 bytes. 525 The following table provides some representative network link speeds, 526 latency, BDP, and associated "optimum" TCP window size. Sustained 527 TCP transfers should reach nearly 100% throughput, minus the overhead 528 of Layers 1-3 and the divisor of the MSS into the window. 530 For this single connection baseline test, the MSS size will effect 531 the achieved throughput (especially for smaller TCP window sizes). 532 Table 3.2 provides the achievable, equalibrium TCP throughput (at 533 Layer 4) using 1460 byte MSS. Also in this table, the case of 58 byte 534 L1-L4 overhead including the Ethernet CRC32 is used for simplicity. 536 Table 3.2: Link Speed, RTT and calculated BDP, TCP Throughput 538 Link Ideal TCP Maximum Achievable 539 Speed* RTT (ms) BDP (bits) Window (kbytes) TCP Throughput(Mbps) 540 ---------------------------------------------------------------------- 541 T1 20 30,720 3.84 1.17 542 T1 50 76,800 9.60 1.40 543 T1 100 153,600 19.20 1.40 544 T3 10 442,100 55.26 42.05 545 T3 15 663,150 82.89 42.05 546 T3 25 1,105,250 138.16 41.52 547 T3(ATM) 10 407,040 50.88 36.50 548 T3(ATM) 15 610,560 76.32 36.23 549 T3(ATM) 25 1,017,600 127.20 36.27 550 100M 1 100,000 12.50 91.98 551 100M 2 200,000 25.00 93.44 552 100M 5 500,000 62.50 93.44 553 1Gig 0.1 100,000 12.50 919.82 554 1Gig 0.5 500,000 62.50 934.47 555 1Gig 1 1,000,000 125.00 934.47 556 10Gig 0.05 500,000 62.50 9,344.67 557 10Gig 0.3 3,000,000 375.00 9,344.67 559 * Note that link speed is the minimum link speed throughput a network; 560 i.e. WAN with T1 link, etc. 562 Also, the following link speeds (available payload bandwidth) were 563 used for the WAN entries: 565 - T1 = 1.536 Mbits/sec (B8ZS line encoding facility) 566 - T3 = 44.21 Mbits/sec (C-Bit Framing) 567 - T3(ATM) = 36.86 Mbits/sec (C-Bit Framing & PLCP, 96000 Cells per 568 second) 570 The calculation method used in this document is a 3 step process : 572 1 - We determine what should be the optimal TCP Window size value 573 based on the optimal quantity of "in-flight" octets discovered by 574 the BDP calculation. We take into consideration that the TCP 575 Window size has to be an exact multiple value of the MSS. 576 2 - Then we calculate the achievable layer 2 throughput by multiplying 577 the value determined in step 1 with the MSS & (MSS + L2 + L3 + L4 578 Overheads) divided by the RTT. 579 3 - Finally, we multiply the calculated value of step 2 by the MSS 580 versus (MSS + L2 + L3 + L4 Overheads) ratio. 582 This gives us the achievable TCP Throughput value. Sometimes, the 583 maximum achievable throughput is limited by the maximum achievable 584 quantity of Ethernet Frames per second on the physical media. Then 585 this value is used in step 2 instead of the calculated one. 587 The following diagram compares achievable TCP throughputs on a T3 link 588 with Windows 2000/XP TCP window sizes of 16KB versus 64KB. 590 45| 591 | _____42.1M 592 40| |64K| 593 TCP | | | 594 Throughput 35| | | _____34.3M 595 in Mbps | | | |64K| 596 30| | | | | 597 | | | | | 598 25| | | | | 599 | | | | | 600 20| | | | | _____20.5M 601 | | | | | |64K| 602 15| 14.5M____| | | | | | 603 | |16K| | | | | | 604 10| | | | 9.6M+---+ | | | 605 | | | | |16K| | 5.8M____+ | 606 5| | | | | | | |16K| | 607 |______+___+___+_______+___+___+_______+__ +___+_______ 608 10 15 25 609 RTT in milliseconds 611 The following diagram shows the achievable TCP throughput on a 25ms T3 612 when the TCP Window size is increased and with the RFC1323 TCP Window 613 scaling option. 615 45| 616 | +-----+42.47M 617 40| | | 618 TCP | | | 619 Throughput 35| | | 620 in Mbps | | | 621 30| | | 622 | | | 623 25| | | 624 | ______ 21.23M | | 625 20| | | | | 626 | | | | | 627 15| | | | | 628 | | | | | 629 10| +----+10.62M | | | | 630 | _______5.31M | | | | | | 631 5| | | | | | | | | 632 |__+_____+______+____+___________+____+________+_____+___ 633 16 32 64 128 634 TCP Window size in KBytes 636 3.3.2 Conducting the TCP Throughput Tests 638 There are several TCP tools that are commonly used in the network 639 world and one of the most common is the "iperf" tool. With this tool, 640 hosts are installed at each end of the network segment; one as client 641 and the other as server. The TCP Window size of both the client and 642 the server can be maunally set and the achieved throughput is measured, 643 either uni-directionally or bi-directionally. For higher BDP 644 situations in lossy networks (long fat networks or satellite links, 645 etc.), TCP options such as Selective Acknowledgment should be 646 considered and also become part of the window size / throughput 647 characterization. 649 Host hardware performance must be well understood before conducting 650 the TCP throughput tests and other tests in the following sections. 651 Dedicated test equipment will generally be required, especially for 652 line rates of GigE and 10 GigE. 654 The TCP throughput test should be run over a a long enough duration 655 to properly exercise network buffers and also characterize performance 656 during different time periods of the day. The results must be logged 657 at the desired interval and the test must record RTT and TCP 658 retransmissions at each interval. 660 This correlation of retransmissions and RTT over the course of the 661 test will clearly identify which portions of the transfer reached 662 TCP Equilbrium state and to what effect increased RTT (congestive 663 effects) may have been the cause of reduced equilibrium performance. 665 Additionally, the TCP Efficiency and TCP Transfer time metrics should 666 be logged in order to further characterize the window size tests. 668 3.3.3 Single vs. Multiple TCP Connection Testing 670 The decision whether to conduct single or multiple TCP connection 671 tests depends upon the size of the BDP in relation to the window sizes 672 configured in the end-user environment. For example, if the BDP for a 673 long-fat pipe turns out to be 2MB, then it is probably more realistic 674 to test this pipe with multiple connections. Assuming typical host 675 computer window settings of 64 KB, using 32 connections would 676 realistically test this pipe. 678 The following table is provided to illustrate the relationship of the 679 BDP, window size, and the number of connections required to utilize the 680 the available capacity. For this example, the network bandwidth is 681 500 Mbps, RTT is equal to 5 ms, and the BDP equates to 312 KBytes. 683 #Connections 684 Window to Fill Link 685 ------------------------ 686 16KB 20 687 32KB 10 688 64KB 5 689 128KB 3 691 The TCP Transfer Time metric is useful for conducting multiple 692 connection tests. Each connection should be configured to transfer 693 a certain payload (i.e. 100 MB), and the TCP Transfer time provides 694 a simple metric to verify the actual versus expected results. 696 Note that the TCP transfer time is the time for all connections to 697 complete the transfer of the configured payload size. From the 698 example table listed above, the 64KB window is considered. Each of 699 the 5 connections would be configured to transfer 100MB, and each 700 TCP should obtain a maximum of 100 Mb/sec per connection. So for this 701 example, the 100MB payload should be transferred across the connections 702 in approximately 8 seconds (which would be the ideal TCP transfer time 703 for these conditions). 705 Additionally, the TCP Efficiency metric should be computed for each 706 connection tested (defined in section 2.2). 708 3.3.4 Interpretation of the TCP Throughput Results 710 At the end of this step, the user will document the theoretical BDP 711 and a set of Window size experiments with measured TCP throughput for 712 each TCP window size setting. For cases where the sustained TCP 713 throughput does not equal the predicted value, some possible causes 714 are listed: 716 - Network congestion causing packet loss; the TCP Efficiency metric 717 is a useful gauge to compare network performance 718 - Network congestion not causing packet loss but increasing RTT 719 - Intermediate network devices which actively regenerate the TCP 720 connection and can alter window size, MSS, etc. 721 - Over utilization of available link or rate limiting (policing). More 722 discussion of traffic management tests follows in section 3.4 724 3.4. Traffic Management Tests 726 In most cases, the network connection between two geographic locations 727 (branch offices, etc.) is lower than the network connection of the 728 host computers. An example would be LAN connectivity of GigE and 729 WAN connectivity of 100 Mbps. The WAN connectivity may be physically 730 100 Mbps or logically 100 Mbps (over a GigE WAN connection). In the 731 later case, rate limiting is used to provide the WAN bandwidth per the 732 SLA. 734 Traffic management techniques are employed to provide various forms of 735 QoS, the more common include: 737 - Traffic Shaping 738 - Priority Queuing 739 - Random Early Discard (RED, etc.) 741 Configuring the end-end network with these various traffic management 742 mechanisms is a complex under-taking. For traffic shaping and RED 743 techniques, the end goal is to provide better performance for bursty 744 traffic such as TCP (RED is specifically intended for TCP). 746 This section of the methodology provides guidelines to test traffic 747 shaping and RED implementations. As in section 3.3, host hardware 748 performance must be well understood before conducting the traffic 749 shaping and RED tests. Dedicated test equipment will generally be 750 required, especially for line rates of GigE and 10 GigE. 752 3.4.1 Traffic Shaping Tests 754 For services where the available bandwidth is rate limited, there are 755 two (2) techniques used to implement rate limiting: traffic policing 756 and traffic shaping. 758 Simply stated, traffic policing marks and/or drops packets which 759 exceed the SLA bandwidth (in most cases, excess traffic is dropped). 760 Traffic shaping employs the use of queues to smooth the bursty 761 traffic and then send out within the SLA bandwidth limit (without 762 dropping packets unless the traffic shaping queue is exceeded). 764 Traffic shaping is generally configured for TCP data services and 765 can provide improved TCP performance since the retransmissions are 766 reduced, which in turn optimizes TCP throughput for the given 767 available bandwidth. Through this section, the available rate-limited 768 bandwidth shall be referred to as the "bottleneck bandwidth". 770 The ability to detect proper traffic shaping is more easily diagnosed 771 when conducting a multiple TCP connection test. Proper shaping will 772 provide a fair distribution of the available bottleneck bandwidth, 773 while traffic policing will not. 775 The traffic shaping tests build upon the concepts of multiple 776 connection testing as defined in section 3.3.3. Calculating the BDP 777 for the bottleneck bandwidth is first required and then selecting 778 the number of connections / window size per connection. 780 Similar to the example in section 3.3, a typical test scenario might 781 be: GigE LAN with a 100Mbps bottleneck bandwidth (rate limited logical 782 interface), and 5 msec RTT. This would require five (5) TCP 783 connections of 64 KB window size evenly fill the bottleneck bandwidth 784 (about 100 Mbps per connection). 786 The traffic shaping should be run over a long enough duration to 787 properly exercise network buffers and also characterize performance 788 during different time periods of the day. The throughput of each 789 connection must be logged during the entire test, along with the TCP 790 Efficiency and TCP Transfer time metric. Additionally, it is 791 recommended to log RTT and retransmissions per connection over the test 792 interval. 794 3.4.1.1 Interpretation of Traffic Shaping Test Restults 796 By plotting the throughput achieved by each TCP connection, the fair 797 sharing of the bandwidth is generally very obvious when traffic shaping 798 is properly configured for the bottleneck interface. For the previous 799 example of 5 connections sharing 500 Mbps, each connection would 800 consume ~100 Mbps with a smooth variation. If traffic policing was 801 present on the bottleneck interface, the bandwidth sharing would not 802 be fair and the resulting throughput plot would reveal "spikey" 803 connection throughput consumption of the competing TCP connections 804 (due to the retransmissions). 806 3.4.2 RED Tests 808 Random Early Discard techniques are specifically targeted to provide 809 congestion avoidance for TCP traffic. Before the network element queue 810 "fills" and enters the tail drop state, RED drops packets at 811 configurable queue depth thresholds. This action causes TCP 812 connections to back-off which helps to prevent tail drop, which in 813 turn helps to prevent global TCP synchronization. 815 Again, rate limited interfaces can benefit greatly from RED based 816 techniques. Without RED, TCP is generally not able to achieve the full 817 bandwidth of the bottleneck interface. With RED enabled, TCP 818 congestion avoidance throttles the connections on the higher speed 819 interface (i.e. LAN) and can reach equalibrium with the bottleneck 820 bandwidth (achieving closer to full throughput). 822 The ability to detect proper RED configuration is more easily diagnosed 823 when conducting a multiple TCP connection test. Multiple TCP 824 connections provide the multiple bursty sources that emulate the 825 real-world conditions for which RED was intended. 827 The RED tests also build upon the concepts of multiple connection 828 testing as defined in secion 3.3.3. Calculating the BDP for the 829 bottleneck bandwidth is first required and then selecting the number of 830 connections / window size per connection. 832 For RED testing, the desired effect is to cause the TCP connections to 833 burst beyond the bottleneck bandwidth so that queue drops will occur. 834 Using the same example from section 3.4.1 (traffic shaping), the 835 500 Mbps bottleneck bandwidth requires 5 TCP connections (with window 836 size of 64Kb) to fill the capacity. Some experimentation is required, 837 but it is recommended to start with double the number of connections 838 to stress the network element buffers / queues. In this example, 10 839 connections would produce TCP bursts of 64KB for each connection. 840 If the timing of the TCP tester permits, these TCP bursts could stress 841 queue sizes in the 512KB range. Again experimentation will be required 842 and the proper number of TCP connections / window size will be dictated 843 by the size the network element queue. 845 3.4.2.1 Interpretation of RED Results 847 The default queuing technique for most network devices is FIFO based. 848 Without RED, the FIFO based queue will cause excessive loss to all of 849 the TCP connections and in the worst case global TCP synchronization. 851 By plotting the aggregate throughput achieved on the bottleneck 852 interface, proper RED operation can be determined if the bottleneck 853 bandwidth is fully utilized. For the previous example of 10 854 connections (window = 64 KB) sharing 500 Mbps, each connection should 855 consume ~50 Mbps. If RED was not properly enabled on the interface, 856 then the TCP connections will retransmit at a higher rate and the net 857 effect is that the bottleneck bandwidth is not fully utilized. 859 Another means to study non-RED versus RED implementation is to use 860 the TCP Transfer Time metric for all of the connections. In this 861 example, a 100 MB payload transfer should take ideally 16 seconds 862 across all 10 connections (with RED enabled). With RED not enabled, 863 the throughput across the bottleneck bandwidth would be greatly reduced 864 (generally 20-40%) and the TCP Transfer time would be proportionally 865 longer then the ideal transfer time. 867 Additionally, the TCP Transfer Efficiency metric is useful, since 868 non-RED implementations will exhibit a lower TCP Tranfer Efficiency 869 than RED implementations. 871 4. Acknowledgements 873 The author would like to thank Gilles Forget, Loki Jorgenson, 874 and Reinhard Schrage for technical review and original contributions 875 to this draft-03. 877 Also thanks to Matt Mathis and Matt Zekauskas for many good comments 878 through email exchange and for pointing us to great sources of 879 information pertaining to past works in the TCP capacity area. 881 5. References 883 [RFC2581] Allman, M., Paxson, V., Stevens W., "TCP Congestion 884 Control", RFC 2581, June 1999. 886 [RFC3148] Mathis M., Allman, M., "A Framework for Defining 887 Empirical Bulk Transfer Capacity Metrics", RFC 3148, July 888 2001. 889 [RFC2544] Bradner, S., McQuaid, J., "Benchmarking Methodology for 890 Network Interconnect Devices", RFC 2544, June 1999 892 [RFC3449] Balakrishnan, H., Padmanabhan, V. N., Fairhurst, G., 893 Sooriyabandara, M., "TCP Performance Implications of 894 Network Path Asymmetry", RFC 3449, December 2002 896 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz, 897 J., "A Two-Way Active Measurement Protocol (TWAMP)", 898 RFC 5357, October 2008 900 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 901 Discovery", RFC 4821, June 2007 902 draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk 903 Transfer Capacity Methodology for Cooperating Hosts", 904 August 2001 906 [MSMO] The Macroscopic Behavior of the TCP Congestion Avoidance 907 Algorithm Mathis, M.,Semke, J, Mahdavi, J, Ott, T 908 July 1997 SIGCOMM Computer Communication Review, 909 Volume 27 Issue 3 911 [Stevens Vol1] TCP/IP Illustrated, Vol1, The Protocols 912 Addison-Wesley 914 Authors' Addresses 916 Barry Constantine 917 JDSU, Test and Measurement Division 918 One Milesone Center Court 919 Germantown, MD 20876-7100 920 USA 922 Phone: +1 240 404 2227 923 barry.constantine@jdsu.com 925 Gilles Forget 926 Independent Consultant to Bell Canada. 927 308, rue de Monaco, St-Eustache 928 Qc. CANADA, Postal Code : J7P-4T5 930 Phone: (514) 895-8212 931 gilles.forget@sympatico.ca 933 Loki Jorgenson 934 nooCore 936 Phone: (604) 908-5833 937 ljorgenson@nooCore.com 939 Reinhard Schrage 940 Schrage Consulting 942 Phone: +49 (0) 5137 909540 943 reinhard@schrageconsult.com