idnits 2.17.1 draft-ietf-ippm-tcp-throughput-tm-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == It seems as if not all pages are separated by form feeds - found 0 form feeds but 20 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 59 instances of too long lines in the document, the longest one being 2 characters in excess of 72. ** There are 2 instances of lines with control characters in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 346: '...TCP client probe MUST be capable of se...' RFC 2119 keyword, line 351: '...the client probe MUST generate these p...' RFC 2119 keyword, line 363: '... the test client MUST verify that the ...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 124 has weird spacing: '... It is highl...' == Line 656 has weird spacing: '... Window to ...' -- The document date (July 9, 2010) is 5033 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2581' is defined on line 855, but no explicit reference was found in the text == Unused Reference: 'RFC3148' is defined on line 858, but no explicit reference was found in the text == Unused Reference: 'RFC2544' is defined on line 861, but no explicit reference was found in the text == Unused Reference: 'RFC3449' is defined on line 864, but no explicit reference was found in the text == Unused Reference: 'RFC5357' is defined on line 868, but no explicit reference was found in the text == Unused Reference: 'RFC4821' is defined on line 872, but no explicit reference was found in the text == Unused Reference: 'MSMO' is defined on line 879, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) Summary: 8 errors (**), 0 flaws (~~), 11 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group B. Constantine 2 Internet-Draft JDSU 3 Intended status: Informational G. Forget 4 Expires: January 9, 2011 Bell Canada (Ext. Consultant) 5 L. Jorgenson 6 nooCore 7 Reinhard Schrage 8 Schrage Consulting 9 July 9, 2010 11 TCP Throughput Testing Methodology 12 draft-ietf-ippm-tcp-throughput-tm-04.txt 14 Abstract 16 This memo describes a methodology for measuring sustained TCP 17 throughput performance in an end-to-end managed network environment. 18 This memo is intended to provide a practical approach to help users 19 validate the TCP layer performance of a managed network, which should 20 provide a better indication of end-user application level experience. 21 In the methodology, various TCP and network parameters are identified 22 that should be tested as part of the network verification at the TCP 23 layer. 25 Status of this Memo 27 This Internet-Draft is submitted to IETF in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as Internet- 33 Drafts. Creation date July 9, 2010. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 The list of current Internet-Drafts can be accessed at 41 http://www.ietf.org/ietf/1id-abstracts.txt. 43 The list of Internet-Draft Shadow Directories can be accessed at 44 http://www.ietf.org/shadow.html. 46 This Internet-Draft will expire on January 9, 2011. 48 Copyright Notice 50 Copyright (c) 2010 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Goals of this Methodology. . . . . . . . . . . . . . . . . . . 4 67 2.1 TCP Equilibrium State Throughput . . . . . . . . . . . . . 5 68 2.2 Metrics for TCP Throughput Tests . . . . . . . . . . . . . 6 69 3. TCP Throughput Testing Methodology . . . . . . . . . . . . . . 6 70 3.1 Determine Network Path MTU . . . . . . . . . . . . . . . . 8 71 3.2. Baseline Round-trip Delay and Bandwidth. . . . . . . . . . 9 72 3.2.1 Techniques to Measure Round Trip Time . . . . . . . . 9 73 3.2.2 Techniques to Measure End-end Bandwidth . . . . . . . 10 74 3.3. TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 10 75 3.3.1 Calculate Optimum TCP Window Size. . . . . . . . . . . 11 76 3.3.2 Conducting the TCP Throughput Tests. . . . . . . . . . 14 77 3.3.3 Single vs. Multiple TCP Connection Testing . . . . . . 14 78 3.3.4 Interpretation of the TCP Throughput Results . . . . . 15 79 3.4. Traffic Management Tests . . . . . . . . . . . . . . . . . 15 80 3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 16 81 3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 17 82 3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 17 83 3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 18 84 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 85 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 88 1. Introduction 90 Even though RFC2544 was meant to benchmark network equipment and 91 used by network equipment manufacturers (NEMs), network providers 92 have used it to benchmark operational networks in order to 93 verify SLAs (Service Level Agreements) before turning on a service 94 to their business customers. Testing an operational network prior to 95 customer activation is referred to as "turn-up" testing and the SLA 96 is generally Layer 2/3 packet throughput, delay, loss and 97 jitter. 99 Network providers are coming to the realization that Layer 2/3 testing 100 and TCP layer testing are required to more adequately ensure end-user 101 satisfaction. Therefore, the network provider community desires to 102 measure network throughput performance at the TCP layer. Measuring 103 TCP throughput provides a meaningful measure with respect to the end 104 user's application SLA (and ultimately reach some level of TCP 105 testing interoperability which does not exist today). 107 Additionally, end-users (business enterprises) seek to conduct 108 repeatable TCP throughput tests between enterprise locations. Since 109 these enterprises rely on the networks of the providers, a common test 110 methodology (and metrics) would be equally beneficial to both parties. 112 So the intent behind this draft TCP throughput work is to define 113 a methodology for testing sustained TCP layer performance. In this 114 document, sustained TCP throughput is that amount of data per unit 115 time that TCP transports during equilibrium (steady state), i.e. 116 after the initial slow start phase. We refer to this state as TCP 117 Equilibrium, and that the equalibrium throughput is the maximum 118 achievable for the TCP connection(s). 120 One other important note; the precursor to conducting the TCP tests 121 test methodlogy is to perform "network stress tests" such as RFC2544 122 Layer 2/3 tests or other conventional tests. Examples include 123 OWAMP or manual packet layer test techniques where packet throughput, 124 loss, and delay measurements are conducted. It is highly recommended 125 to run traditional Layer 2/3 type test to verify the integrity of the 126 network before conducting TCP tests. 128 2. Goals of this Methodology 130 Before defining the goals of this methodology, it is important to 131 clearly define the areas that are not intended to be measured or 132 analyzed by such a methodology. 134 - The methodology is not intended to predict TCP throughput 135 behavior during the transient stages of a TCP connection, such 136 as initial slow start. 137 - The methodology is not intended to definitively benchmark TCP 138 implementations of one OS to another, although some users may find 139 some value in conducting qualitative experiments 141 - The methodology is not intended to provide detailed diagnosis 142 of problems within end-points or the network itself as related to 143 non-optimal TCP performance, although a results interpretation 144 section for each test step may provide insight into potential 145 issues within the network 147 In contrast to the above exclusions, the goals of this methodology 148 are to define a method to conduct a structured, end-to-end 149 assessment of sustained TCP performance within a managed business 150 class IP network. A key goal is to establish a set of "best 151 practices" that an engineer should apply when validating the 152 ability of a managed network to carry end-user TCP applications. 154 Some specific goals are to: 156 - Provide a practical test approach that specifies the more well 157 understood (and end-user configurable) TCP parameters such as Window 158 size, MSS (Maximum Segment Size), # connections, and how these affect 159 the outcome of TCP performance over a network. 161 - Provide specific test conditions (link speed, RTT, window size, 162 etc.) and maximum achievable TCP throughput under TCP Equilbrium 163 conditions. For guideline purposes, provide examples of these test 164 conditions and the maximum achievable TCP throughput during the 165 equilbrium state. Section 2.1 provides specific details concerning 166 the definition of TCP Equilibrium within the context of this draft. 168 - Define two (2) basic metrics that can be used to compare the 169 performance of TCP connections under various network conditions 171 - In test situations where the recommended procedure does not yield 172 the maximum achievable TCP throughput result, this draft provides some 173 possible areas within the end host or network that should be 174 considered for investigation (although again, this draft is not 175 intended to provide a detailed diagnosis of these issues) 177 2.1 TCP Equilibrium State Throughput 179 TCP connections have three (3) fundamental congestion window phases 180 as documented in RFC2581. These states are: 182 - Slow Start, which occurs during the beginning of a TCP transmission 183 or after a retransmission time out event 185 - Congestion avoidance, which is the phase during which TCP ramps up 186 to establish the maximum attainable throughput on an end-end network 187 path. Retransmissions are a natural by-product of the TCP congestion 188 avoidance algorithm as it seeks to achieve maximum throughput on 189 the network path. 191 - Retransmission phase, which include Fast Retransmit (Tahoe) and Fast 192 Recovery (Reno and New Reno). When a packet is lost, the Congestion 193 avoidance phase transitions to a Fast Retransmission or Recovery 194 Phase dependent upon the TCP implementation. 196 The following diagram depicts these states. 198 | ssthresh 199 TCP | | 200 Through- | | Equilibrium 201 put | |\ /\/\/\/\/\ Retransmit /\/\ ... 202 | | \ / | Time-out / 203 | | \ / | _______ _/ 204 | Slow _/ |/ | / | Slow _/ 205 | Start _/ Congestion |/ |Start_/ Congestion 206 | _/ Avoidance Loss | _/ Avoidance 207 | _/ Event | _/ 208 | _/ |/ 209 |/___________________________________________________________ 210 Time 212 This TCP methodology provides guidelines to measure the equilibrium 213 throughput which refers to the maximum sustained rate obtained by 214 congestion avoidance before packet loss conditions occur (which would 215 cause the state change from congestion avoidance to a retransmission 216 phase). All maximum achievable throughputs specified in Section 3 are 217 with respect to this Equilibrium state. 219 2.2 Metrics for TCP Throughput Tests 221 This draft focuses on a TCP throughtput methodology and also 222 provides two basic metrics to compare results of various throughput 223 tests. It is recognized that the complexity and unpredictability of 224 TCP makes it impossible to develop a complete set of metrics that 225 account for the myriad of variables (i.e. RTT variation, loss 226 conditions, TCP implementation, etc.). However, these two basic 227 metrics faciliate TCP throughput comparisons under varying network 228 conditions and between network traffic management techniques. 230 The TCP Efficiency metric is the percentage of bytes that were not 231 retransmitted and is defined as: 233 Transmitted Bytes - Retransmitted Bytes 234 --------------------------------------- x 100 235 Transmitted Bytes 237 This metric provides a comparative measure between various QoS 238 mechanisms such as traffic management, congestion avoidance, and also 239 various TCP implementations (i.e. Reno, Vegas, etc.). 241 As an example, if 1000 TCP segments were sent and 20 had to be 242 retransmitted, the TCP Efficiency would be calculated as: 244 1000 - 20 245 --------- x 100 = 98% 246 1000 248 The second metric is the TCP Transfer Time, which is simply the time 249 it takes to transfer a block of data across simultaneous TCP 250 connections. The concept is useful when benchmarking traffic 251 management techniques, where multiple connections are generally 252 required. An example would be the bulk transfer of 10 MB upon 8 253 separate TCP connections (each connection uploading 10 MB). Each 254 connection may achieve different throughputs during a test and the 255 overall throughput rate is not always easy to determine (especially as 256 the number of connections increases). But by defining the TCP Transfer 257 Time as the total transfer time of 10MB over all 8 connections, the 258 single transfer time metric is a useful means to compare various 259 traffic management techniques (i.e. FiFO, WFQ queuing, WRED, etc.). 261 3. TCP Throughput Testing Methodology 263 This section summarizes the specific test methodology to achieve the 264 goals listed in Section 2. 266 As stated in Section 1, it is considered best practice to verify 267 the integrity of the network by conducting Layer2/3 stress tests 268 such as RFC2544 (or other methods of network stress tests). If the 269 network is not performing properly in terms of packet loss, jitter, 270 etc. then the TCP layer testing will not be meaningful since the 271 equalibrium throughput would be very difficult to achieve (in a 272 "dysfunctional" network). 274 The following represents the sequential order of steps to conduct the 275 TCP throughput testing methodology: 277 1. Identify the Path MTU. Packetization Layer Path MTU Discovery 278 or PLPMTUD (RFC4821) should be conducted to verify the minimum network 279 path MTU. Conducting PLPMTUD establishes the upper limit for the MSS 280 to be used in subsequent steps. 282 2. Baseline Round-trip Delay and Bandwidth. These measurements provide 283 estimates of the ideal TCP window size, which will be used in 284 subsequent test steps. 286 3. TCP Connection Throughput Tests. With baseline measurements 287 of round trip delay and bandwidth, a series of single and multiple TCP 288 connection throughput tests can be conducted to baseline the network 289 performance expectations. 291 4. Traffic Management Tests. Various traffic management and queuing 292 techniques are tested in this step, using multiple TCP connections. 293 Multiple connection testing can verify that the network is configured 294 properly for traffic shaping versus policing, various queuing 295 implementations, and RED. 297 Important to note are some of the key characteristics and 298 considerations for the TCP test instrument. The test host may be a 299 standard computer or dedicated communications test instrument 300 and these TCP test hosts be capable of emulating both a client and a 301 server. 303 Whether the TCP test host is a standard computer or dedicated test 304 instrument, the following areas should be considered when selecting 305 a test host: 307 - TCP implementation used by the test host OS, i.e. Linux OS kernel 308 using TCP Reno, TCP options supported, etc. This will obviously be 309 more important when using custom test equipment where the TCP 310 implementation may be customized or tuned to run in higher 311 performance hardware 312 - Most importantly, the TCP test host must be capable of generating 313 and receiving stateful TCP test traffic at the full link speed of the 314 network under test. As a general rule of thumb, testing TCP throughput 315 at rates greater than 100 Mbit/sec generally requires high 316 performance server hardware or dedicated hardware based test tools. 318 3.1. Determine Network Path MTU 320 TCP implementations should use Path MTU Discovery techniques (PMTUD). 321 PMTUD relies on ICMP 'need to frag' messages to learn the path MTU. 322 When a device has a packet to send which has the Don't Fragment (DF) 323 bit in the IP header set and the packet is larger than the Maximum 324 Transmission Unit (MTU) of the next hop link, the packet is dropped 325 and the device sends an ICMP 'need to frag' message back to the host 326 that originated the packet. The ICMP 'need to frag' message includes 327 the next hop MTU which PMTUD uses to tune the TCP Maximum Segment 328 Size (MSS). Unfortunately, because many network managers completely 329 disable ICMP, this technique does not always prove reliable in real 330 world situations. 332 Packetization Layer Path MTU Discovery or PLPMTUD (RFC4821) should 333 be conducted to verify the minimum network path MTU. PLPMTUD can 334 be used with or without ICMP. The following sections provide a 335 summary of the PLPMTUD approach and an example using the TCP 336 protocol. RFC4821 specifies a search_high and search_low parameter 337 for the MTU. As specified in RFC4821, a value of 1024 is a generally 338 safe value to choose for search_low in modern networks. 340 It is important to determine the overhead of the links in the path, 341 and then to select a TCP MSS size corresponding to the Layer 3 MTU. 342 For example, if the MTU is 1024 bytes and the TCP/IP headers are 40 343 bytes, then the MSS would be set to 984 bytes. 345 An example scenario is a network where the actual path MTU is 1240 346 bytes. The TCP client probe MUST be capable of setting the MSS for 347 the probe packets and could start at MSS = 984 (which corresponds 348 to an MTU size of 1024 bytes). 350 The TCP client probe would open a TCP connection and advertise the 351 MSS as 984. Note that the client probe MUST generate these packets 352 with the DF bit set. The TCP client probe then sends test traffic 353 per a nominal window size (8KB, etc.). The window size should be 354 kept small to minimize the possibility of congesting the network, 355 which could induce congestive loss. The duration of the test should 356 also be short (10-30 seconds), again to minimize congestive effects 357 during the test. 359 In the example of a 1240 byte path MTU, probing with an MSS equal to 360 984 would yield a successful probe and the test client packets would 361 be successfully transferred to the test server. 363 Also note that the test client MUST verify that the MSS advertised 364 is indeed negotiated. Network devices with built-in Layer 4 365 capabilities can intercede during the connection establishment 366 process and reduce the advertised MSS to avoid fragmentation. This 367 is certainly a desirable feature from a network perspective, but 368 can yield erroneous test results if the client test probe does not 369 confirm the negotiated MSS. 371 The next test probe would use the search_high value and this would 372 be set to MSS = 1460 to correspond to a 1500 byte MTU. In this 373 example, the test client would retransmit based upon time-outs (since 374 no ACKs will be received from the test server). This test probe is 375 marked as a conclusive failure if none of the test packets are 376 ACK'ed. If any of the test packets are ACK'ed, congestive network 377 may be the cause and the test probe is not conclusive. Re-testing 378 at other times of the day is recommended to further isolate. 380 The test is repeated until the desired granularity of the MTU is 381 discovered. The method can yield precise results at the expense of 382 probing time. One approach would be to reduce the probe size to 383 half between the unsuccessful search_high and successful search_low 384 value, and increase by increments of 1/2 when seeking the upper 385 limit. 387 3.2. Baseline Round-trip Delay and Bandwidth 389 Before stateful TCP testing can begin, it is important to baseline 390 the round trip delay and bandwidth of the network to be tested. 391 These measurements provide estimates of the ideal TCP window size, 392 which will be used in subsequent test steps. These latency and 393 bandwidth tests should be run over a long enough period of time to 394 characterize the performance of the network over the course of a 395 meaningful time period. 397 One example would be to take samples during various times of the work 398 day. The goal would be to determine a representative minimum, average, 399 and maximum RTD and bandwidth for the network under test. Topology 400 changes are to be avoided during this time of initial convergence 401 (e.g. in crossing BGP4 boundaries). 403 In some cases, baselining bandwidth may not be required, since a 404 network provider's end-to-end topology may be well enough defined. 406 3.2.1 Techniques to Measure Round Trip Time 408 Following the definitions used in the references of the appendix; 409 Round Trip Time (RTT) is the time elapsed between the clocking in of 410 the first bit of a payload packet to the receipt of the last bit of the 411 corresponding acknowledgement. Round Trip Delay (RTD) is used 412 synonymously to twice the Link Latency. 414 In any method used to baseline round trip delay between network 415 end-points, it is important to realize that network latency is the 416 sum of inherent network delay and congestion. The RTT should be 417 baselined during "off-peak" hours to obtain a reliable figure for 418 network latency (versus additional delay caused by congestion). 420 During the actual sustained TCP throughput tests, it is critical 421 to measure RTT along with measured TCP throughput. Congestive 422 effects can be isolated if RTT is concurrently measured. 424 This is not meant to provide an exhaustive list, but summarizes some 425 of the more common ways to determine round trip time (RTT) through 426 the network. The desired resolution of the measurement (i.e. msec 427 versus usec) may dictate whether the RTT measurement can be achieved 428 with standard tools such as ICMP ping techniques or whether 429 specialized test equipment would be required with high precision 430 timers. The objective in this section is to list several techniques 431 in order of decreasing accuracy. 433 - Use test equipment on each end of the network, "looping" the 434 far-end tester so that a packet stream can be measured end-end. This 435 test equipment RTT measurement may be compatible with delay 436 measurement protocols specified in RFC5357. 438 - Conduct packet captures of TCP test applications using for example 439 "iperf" or FTP, etc. By running multiple experiments, the packet 440 captures can be studied to estimate RTT based upon the SYN -> SYN-ACK 441 handshakes within the TCP connection set-up. 443 - ICMP Pings may also be adequate to provide round trip time 444 estimations. Some limitations of ICMP Ping are the msec resolution 445 and whether the network elements respond to pings (or block them). 447 3.2.2 Techniques to Measure End-end Bandwidth 449 There are many well established techniques available to provide 450 estimated measures of bandwidth over a network. This measurement 451 should be conducted in both directions of the network, especially for 452 access networks which are inherently asymmetrical. Some of the 453 asymmetric implications to TCP performance are documented in RFC-3449 454 and the results of this work will be further studied to determine 455 relevance to this draft. 457 The bandwidth measurement test must be run with stateless IP streams 458 (not stateful TCP) in order to determine the available bandwidth in 459 each direction. And this test should obviously be performed at 460 various intervals throughout a business day (or even across a week). 461 Ideally, the bandwidth test should produce a log output of the 462 bandwidth achieved across the test interval AND the round trip delay. 464 And during the actual TCP level performance measurements (Sections 465 3.3 - 3.5), the test tool must be able to track round trip time 466 of the TCP connection(s) during the test. Measuring round trip time 467 variation (aka "jitter") provides insight into effects of congestive 468 delay on the sustained throughput achieved for the TCP layer test. 470 3.3. TCP Throughput Tests 472 This draft specifically defines TCP throughput techniques to verify 473 sustained TCP performance in a managed business network. Defined 474 in section 2.1, the equalibrium throughput reflects the maximum 476 rate achieved by a TCP connection within the congestion avoidance 477 phase on a end-end network path. This section and others will define 478 the method to conduct these sustained throughput tests and guidelines 479 of the predicted results. 481 With baseline measurements of round trip time and bandwidth 482 from section 3.2, a series of single and multiple TCP connection 483 throughput tests can be conducted to baseline network performance 484 against expectations. 486 3.3.1 Calculate Optimum TCP Window Size 488 The optimum TCP window size can be calculated from the bandwidth delay 489 product (BDP), which is: 491 BDP (bits) = RTT (sec) x Bandwidth (bps) 493 By dividing the BDP by 8, the "ideal" TCP window size is calculated. 494 An example would be a T3 link with 25 msec RTT. The BDP would equal 495 ~1,105,000 bits and the ideal TCP window would equal ~138,000 bytes. 497 The following table provides some representative network link speeds, 498 latency, BDP, and associated "optimum" TCP window size. Sustained 499 TCP transfers should reach nearly 100% throughput, minus the overhead 500 of Layers 1-3 and the divisor of the MSS into the window. 502 For this single connection baseline test, the MSS size will effect 503 the achieved throughput (especially for smaller TCP window sizes). 504 Table 3.2 provides the achievable, equalibrium TCP throughput (at 505 Layer 4) using 1460 byte MSS. Also in this table, the case of 58 byte 506 L1-L4 overhead including the Ethernet CRC32 is used for simplicity. 508 Table 3.2: Link Speed, RTT and calculated BDP, TCP Throughput 510 Link Ideal TCP Maximum Achievable 511 Speed* RTT (ms) BDP (bits) Window (kbytes) TCP Throughput(Mbps) 512 ---------------------------------------------------------------------- 513 T1 20 30,720 3.84 1.17 514 T1 50 76,800 9.60 1.40 515 T1 100 153,600 19.20 1.40 516 T3 10 442,100 55.26 42.05 517 T3 15 663,150 82.89 42.05 518 T3 25 1,105,250 138.16 41.52 519 T3(ATM) 10 407,040 50.88 36.50 520 T3(ATM) 15 610,560 76.32 36.23 521 T3(ATM) 25 1,017,600 127.20 36.27 522 100M 1 100,000 12.50 91.98 523 100M 2 200,000 25.00 93.44 524 100M 5 500,000 62.50 93.44 525 1Gig 0.1 100,000 12.50 919.82 526 1Gig 0.5 500,000 62.50 934.47 527 1Gig 1 1,000,000 125.00 934.47 528 10Gig 0.05 500,000 62.50 9,344.67 529 10Gig 0.3 3,000,000 375.00 9,344.67 531 * Note that link speed is the minimum link speed throughput a network; 532 i.e. WAN with T1 link, etc. 534 Also, the following link speeds (available payload bandwidth) were 535 used for the WAN entries: 537 - T1 = 1.536 Mbits/sec (B8ZS line encoding facility) 538 - T3 = 44.21 Mbits/sec (C-Bit Framing) 539 - T3(ATM) = 36.86 Mbits/sec (C-Bit Framing & PLCP, 96000 Cells per 540 second) 542 The calculation method used in this document is a 3 step process : 544 1 - We determine what should be the optimal TCP Window size value 545 based on the optimal quantity of "in-flight" octets discovered by 546 the BDP calculation. We take into consideration that the TCP 547 Window size has to be an exact multiple value of the MSS. 548 2 - Then we calculate the achievable layer 2 throughput by multiplying 549 the value determined in step 1 with the MSS & (MSS + L2 + L3 + L4 550 Overheads) divided by the RTT. 551 3 - Finally, we multiply the calculated value of step 2 by the MSS 552 versus (MSS + L2 + L3 + L4 Overheads) ratio. 554 This gives us the achievable TCP Throughput value. Sometimes, the 555 maximum achievable throughput is limited by the maximum achievable 556 quantity of Ethernet Frames per second on the physical media. Then 557 this value is used in step 2 instead of the calculated one. 559 The following diagram compares achievable TCP throughputs on a T3 link 560 with Windows 2000/XP TCP window sizes of 16KB versus 64KB. 562 45| 563 | _____42.1M 564 40| |64K| 565 TCP | | | 566 Throughput 35| | | _____34.3M 567 in Mbps | | | |64K| 568 30| | | | | 569 | | | | | 570 25| | | | | 571 | | | | | 572 20| | | | | _____20.5M 573 | | | | | |64K| 574 15| 14.5M____| | | | | | 575 | |16K| | | | | | 576 10| | | | 9.6M+---+ | | | 577 | | | | |16K| | 5.8M____+ | 578 5| | | | | | | |16K| | 579 |______+___+___+_______+___+___+_______+__ +___+_______ 580 10 15 25 581 RTT in milliseconds 583 The following diagram shows the achievable TCP throughput on a 25ms T3 584 when the TCP Window size is increased and with the RFC1323 TCP Window 585 scaling option. 587 45| 588 | +-----+42.47M 589 40| | | 590 TCP | | | 591 Throughput 35| | | 592 in Mbps | | | 593 30| | | 594 | | | 595 25| | | 596 | ______ 21.23M | | 597 20| | | | | 598 | | | | | 599 15| | | | | 600 | | | | | 601 10| +----+10.62M | | | | 602 | _______5.31M | | | | | | 603 5| | | | | | | | | 604 |__+_____+______+____+___________+____+________+_____+___ 605 16 32 64 128 606 TCP Window size in KBytes 608 3.3.2 Conducting the TCP Throughput Tests 610 There are several TCP tools that are commonly used in the network 611 world and one of the most common is the "iperf" tool. With this tool, 612 hosts are installed at each end of the network segment; one as client 613 and the other as server. The TCP Window size of both the client and 614 the server can be maunally set and the achieved throughput is measured, 615 either uni-directionally or bi-directionally. For higher BDP 616 situations in lossy networks (long fat networks or satellite links, 617 etc.), TCP options such as Selective Acknowledgment should be 618 considered and also become part of the window size / throughput 619 characterization. 621 Host hardware performance must be well understood before conducting 622 the TCP throughput tests and other tests in the following sections. 623 Dedicated test equipment will generally be required, especially for 624 line rates of GigE and 10 GigE. 626 The TCP throughput test should be run over a a long enough duration 627 to properly exercise network buffers and also characterize performance 628 during different time periods of the day. The results must be logged 629 at the desired interval and the test must record RTT and TCP 630 retransmissions at each interval. 632 This correlation of retransmissions and RTT over the course of the 633 test will clearly identify which portions of the transfer reached 634 TCP Equilbrium state and to what effect increased RTT (congestive 635 effects) may have been the cause of reduced equilibrium performance. 637 Additionally, the TCP Efficiency and TCP Transfer time metrics should 638 be logged in order to further characterize the window size tests. 640 3.3.3 Single vs. Multiple TCP Connection Testing 642 The decision whether to conduct single or multiple TCP connection 643 tests depends upon the size of the BDP in relation to the window sizes 644 configured in the end-user environment. For example, if the BDP for a 645 long-fat pipe turns out to be 2MB, then it is probably more realistic 646 to test this pipe with multiple connections. Assuming typical host 647 computer window settings of 64 KB, using 32 connections would 648 realistically test this pipe. 650 The following table is provided to illustrate the relationship of the 651 BDP, window size, and the number of connections required to utilize the 652 the available capacity. For this example, the network bandwidth is 653 500 Mbps, RTT is equal to 5 ms, and the BDP equates to 312 KBytes. 655 #Connections 656 Window to Fill Link 657 ------------------------ 658 16KB 20 659 32KB 10 660 64KB 5 661 128KB 3 663 The TCP Transfer Time metric is useful for conducting multiple 664 connection tests. Each connection should be configured to transfer 665 a certain payload (i.e. 100 MB), and the TCP Transfer time provides 666 a simple metric to verify the actual versus expected results. 668 Note that the TCP transfer time is the time for all connections to 669 complete the transfer of the configured payload size. From the 670 example table listed above, the 64KB window is considered. Each of 671 the 5 connections would be configured to transfer 100MB, and each 672 TCP should obtain a maximum of 100 Mb/sec per connection. So for this 673 example, the 100MB payload should be transferred across the connections 674 in approximately 8 seconds (which would be the ideal TCP transfer time 675 for these conditions). 677 Additionally, the TCP Efficiency metric should be computed for each 678 connection tested (defined in section 2.2). 680 3.3.4 Interpretation of the TCP Throughput Results 682 At the end of this step, the user will document the theoretical BDP 683 and a set of Window size experiments with measured TCP throughput for 684 each TCP window size setting. For cases where the sustained TCP 685 throughput does not equal the predicted value, some possible causes 686 are listed: 688 - Network congestion causing packet loss; the TCP Efficiency metric 689 is a useful gauge to compare network performance 690 - Network congestion not causing packet loss but increasing RTT 691 - Intermediate network devices which actively regenerate the TCP 692 connection and can alter window size, MSS, etc. 693 - Over utilization of available link or rate limiting (policing). More 694 discussion of traffic management tests follows in section 3.4 696 3.4. Traffic Management Tests 698 In most cases, the network connection between two geographic locations 699 (branch offices, etc.) is lower than the network connection of the 700 host computers. An example would be LAN connectivity of GigE and 701 WAN connectivity of 100 Mbps. The WAN connectivity may be physically 702 100 Mbps or logically 100 Mbps (over a GigE WAN connection). In the 703 later case, rate limiting is used to provide the WAN bandwidth per the 704 SLA. 706 Traffic management techniques are employed to provide various forms of 707 QoS, the more common include: 709 - Traffic Shaping 710 - Priority Queuing 711 - Random Early Discard (RED, etc.) 713 Configuring the end-end network with these various traffic management 714 mechanisms is a complex under-taking. For traffic shaping and RED 715 techniques, the end goal is to provide better performance for bursty 716 traffic such as TCP (RED is specifically intended for TCP). 718 This section of the methodology provides guidelines to test traffic 719 shaping and RED implementations. As in section 3.3, host hardware 720 performance must be well understood before conducting the traffic 721 shaping and RED tests. Dedicated test equipment will generally be 722 required, especially for line rates of GigE and 10 GigE. 724 3.4.1 Traffic Shaping Tests 726 For services where the available bandwidth is rate limited, there are 727 two (2) techniques used to implement rate limiting: traffic policing 728 and traffic shaping. 730 Simply stated, traffic policing marks and/or drops packets which 731 exceed the SLA bandwidth (in most cases, excess traffic is dropped). 732 Traffic shaping employs the use of queues to smooth the bursty 733 traffic and then send out within the SLA bandwidth limit (without 734 dropping packets unless the traffic shaping queue is exceeded). 736 Traffic shaping is generally configured for TCP data services and 737 can provide improved TCP performance since the retransmissions are 738 reduced, which in turn optimizes TCP throughput for the given 739 available bandwidth. Through this section, the available rate-limited 740 bandwidth shall be referred to as the "bottleneck bandwidth". 742 The ability to detect proper traffic shaping is more easily diagnosed 743 when conducting a multiple TCP connection test. Proper shaping will 744 provide a fair distribution of the available bottleneck bandwidth, 745 while traffic policing will not. 747 The traffic shaping tests build upon the concepts of multiple 748 connection testing as defined in section 3.3.3. Calculating the BDP 749 for the bottleneck bandwidth is first required and then selecting 750 the number of connections / window size per connection. 752 Similar to the example in section 3.3, a typical test scenario might 753 be: GigE LAN with a 100Mbps bottleneck bandwidth (rate limited logical 754 interface), and 5 msec RTT. This would require five (5) TCP 755 connections of 64 KB window size evenly fill the bottleneck bandwidth 756 (about 100 Mbps per connection). 758 The traffic shaping should be run over a long enough duration to 759 properly exercise network buffers and also characterize performance 760 during different time periods of the day. The throughput of each 761 connection must be logged during the entire test, along with the TCP 762 Efficiency and TCP Transfer time metric. Additionally, it is 763 recommended to log RTT and retransmissions per connection over the test 764 interval. 766 3.4.1.1 Interpretation of Traffic Shaping Test Restults 768 By plotting the throughput achieved by each TCP connection, the fair 769 sharing of the bandwidth is generally very obvious when traffic shaping 770 is properly configured for the bottleneck interface. For the previous 771 example of 5 connections sharing 500 Mbps, each connection would 772 consume ~100 Mbps with a smooth variation. If traffic policing was 773 present on the bottleneck interface, the bandwidth sharing would not 774 be fair and the resulting throughput plot would reveal "spikey" 775 connection throughput consumption of the competing TCP connections 776 (due to the retransmissions). 778 3.4.2 RED Tests 780 Random Early Discard techniques are specifically targeted to provide 781 congestion avoidance for TCP traffic. Before the network element queue 782 "fills" and enters the tail drop state, RED drops packets at 783 configurable queue depth thresholds. This action causes TCP 784 connections to back-off which helps to prevent tail drop, which in 785 turn helps to prevent global TCP synchronization. 787 Again, rate limited interfaces can benefit greatly from RED based 788 techniques. Without RED, TCP is generally not able to achieve the full 789 bandwidth of the bottleneck interface. With RED enabled, TCP 790 congestion avoidance throttles the connections on the higher speed 791 interface (i.e. LAN) and can reach equalibrium with the bottleneck 792 bandwidth (achieving closer to full throughput). 794 The ability to detect proper RED configuration is more easily diagnosed 795 when conducting a multiple TCP connection test. Multiple TCP 796 connections provide the multiple bursty sources that emulate the 797 real-world conditions for which RED was intended. 799 The RED tests also build upon the concepts of multiple connection 800 testing as defined in secion 3.3.3. Calculating the BDP for the 801 bottleneck bandwidth is first required and then selecting the number of 802 connections / window size per connection. 804 For RED testing, the desired effect is to cause the TCP connections to 805 burst beyond the bottleneck bandwidth so that queue drops will occur. 806 Using the same example from section 3.4.1 (traffic shaping), the 807 500 Mbps bottleneck bandwidth requires 5 TCP connections (with window 808 size of 64Kb) to fill the capacity. Some experimentation is required, 809 but it is recommended to start with double the number of connections 810 to stress the network element buffers / queues. In this example, 10 811 connections would produce TCP bursts of 64KB for each connection. 812 If the timing of the TCP tester permits, these TCP bursts could stress 813 queue sizes in the 512KB range. Again experimentation will be required 814 and the proper number of TCP connections / window size will be dictated 815 by the size the network element queue. 817 3.4.2.1 Interpretation of RED Results 819 The default queuing technique for most network devices is FIFO based. 820 Without RED, the FIFO based queue will cause excessive loss to all of 821 the TCP connections and in the worst case global TCP synchronization. 823 By plotting the aggregate throughput achieved on the bottleneck 824 interface, proper RED operation can be determined if the bottleneck 825 bandwidth is fully utilized. For the previous example of 10 826 connections (window = 64 KB) sharing 500 Mbps, each connection should 827 consume ~50 Mbps. If RED was not properly enabled on the interface, 828 then the TCP connections will retransmit at a higher rate and the net 829 effect is that the bottleneck bandwidth is not fully utilized. 831 Another means to study non-RED versus RED implementation is to use 832 the TCP Transfer Time metric for all of the connections. In this 833 example, a 100 MB payload transfer should take ideally 16 seconds 834 across all 10 connections (with RED enabled). With RED not enabled, 835 the throughput across the bottleneck bandwidth would be greatly reduced 836 (generally 20-40%) and the TCP Transfer time would be proportionally 837 longer then the ideal transfer time. 839 Additionally, the TCP Transfer Efficiency metric is useful, since 840 non-RED implementations will exhibit a lower TCP Tranfer Efficiency 841 than RED implementations. 843 4. Acknowledgements 845 The author would like to thank Gilles Forget, Loki Jorgenson, 846 and Reinhard Schrage for technical review and contributions to this 847 draft-03 memo. 849 Also thanks to Matt Mathis and Matt Zekauskas for many good comments 850 through email exchange and for pointing us to great sources of 851 information pertaining to past works in the TCP capacity area. 853 5. References 855 [RFC2581] Allman, M., Paxson, V., Stevens W., "TCP Congestion 856 Control", RFC 2581, June 1999. 858 [RFC3148] Mathis M., Allman, M., "A Framework for Defining 859 Empirical Bulk Transfer Capacity Metrics", RFC 3148, July 860 2001. 861 [RFC2544] Bradner, S., McQuaid, J., "Benchmarking Methodology for 862 Network Interconnect Devices", RFC 2544, June 1999 864 [RFC3449] Balakrishnan, H., Padmanabhan, V. N., Fairhurst, G., 865 Sooriyabandara, M., "TCP Performance Implications of 866 Network Path Asymmetry", RFC 3449, December 2002 868 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz, 869 J., "A Two-Way Active Measurement Protocol (TWAMP)", 870 RFC 5357, October 2008 872 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 873 Discovery", RFC 4821, June 2007 875 draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk 876 Transfer Capacity Methodology for Cooperating Hosts", 877 August 2001 879 [MSMO] The Macroscopic Behavior of the TCP Congestion Avoidance 880 Algorithm Mathis, M.,Semke, J, Mahdavi, J, Ott, T 881 July 1997 SIGCOMM Computer Communication Review, 882 Volume 27 Issue 3 884 [Stevens Vol1] TCP/IP Illustrated, Vol1, The Protocols 885 Addison-Wesley 887 Authors' Addresses 889 Barry Constantine 890 JDSU, Test and Measurement Division 891 One Milesone Center Court 892 Germantown, MD 20876-7100 893 USA 895 Phone: +1 240 404 2227 896 barry.constantine@jdsu.com 898 Gilles Forget 899 Independent Consultant to Bell Canada. 900 308, rue de Monaco, St-Eustache 901 Qc. CANADA, Postal Code : J7P-4T5 903 Phone: (514) 895-8212 904 gilles.forget@sympatico.ca 906 Loki Jorgenson 907 nooCore 909 Phone: (604) 908-5833 910 ljorgenson@nooCore.com 912 Reinhard Schrage 913 Schrage Consulting 915 Phone: +49 (0) 5137 909540 916 reinhard@schrageconsult.com