idnits 2.17.1 draft-ietf-ippm-tcp-throughput-tm-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 694 has weird spacing: '... Window to ...' -- The document date (August 27, 2010) is 4981 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group B. Constantine 2 Internet-Draft JDSU 3 Intended status: Informational G. Forget 4 Expires: February 27, 2011 Bell Canada (Ext. Consultant) 5 L. Jorgenson 6 nooCore 7 Reinhard Schrage 8 Schrage Consulting 9 August 27, 2010 11 TCP Throughput Testing Methodology 12 draft-ietf-ippm-tcp-throughput-tm-06.txt 14 Abstract 16 This memo describes a methodology for measuring sustained TCP 17 throughput performance in an end-to-end managed network environment. 18 This memo is intended to provide a practical approach to help users 19 validate the TCP layer performance of a managed network, which should 20 provide a better indication of end-user application level experience. 21 In the methodology, various TCP and network parameters are identified 22 that should be tested as part of the network verification at the TCP 23 layer. 25 Requirements Language 27 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 28 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 29 document are to be interpreted as described in RFC 2119 [RFC2119]. 31 Status of this Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on February 27, 2011. 48 Copyright Notice 50 Copyright (c) 2010 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Goals of this Methodology. . . . . . . . . . . . . . . . . . . 4 67 2.1 TCP Equilibrium State Throughput . . . . . . . . . . . . . 5 68 2.2 Metrics for TCP Throughput Tests . . . . . . . . . . . . . 6 69 3. TCP Throughput Testing Methodology . . . . . . . . . . . . . . 7 70 3.1 Determine Network Path MTU . . . . . . . . . . . . . . . . 8 71 3.2. Baseline Round-trip Delay and Bandwidth. . . . . . . . . . 10 72 3.2.1 Techniques to Measure Round Trip Time . . . . . . . . 10 73 3.2.2 Techniques to Measure End-end Bandwidth . . . . . . . 11 74 3.3. TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 11 75 3.3.1 Calculate Optimum TCP Window Size. . . . . . . . . . . 12 76 3.3.2 Conducting the TCP Throughput Tests. . . . . . . . . . 14 77 3.3.3 Single vs. Multiple TCP Connection Testing . . . . . . 15 78 3.3.4 Interpretation of the TCP Throughput Results . . . . . 16 79 3.4. Traffic Management Tests . . . . . . . . . . . . . . . . . 16 80 3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 16 81 3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 17 82 3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 18 83 3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 18 84 4. Security Considerations . . . . . . . . . . . . . . . . . . . 18 85 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 86 5.1. Registry Specification . . . . . . . . . . . . . . . . . . 19 87 5.2. Registry Contents . . . . . . . . . . . . . . . . . . . . 19 88 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 89 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 90 7.1 Normative References . . . . . . . . . . . . . . . . . . . 19 91 7.2 Informative References . . . . . . . . . . . . . . . . . . 20 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 95 1. Introduction 97 Testing an operational network prior to customer activation is 98 referred to as "turn-up" testing and the SLA (Service Level 99 Agreement) is generally based upon Layer 2/3 packet throughput, 100 delay, loss and jitter. 102 Network providers are coming to the realization that Layer 2/3 103 testing and TCP layer testing are required to more adequately ensure 104 end-user satisfaction. Therefore, the network provider community 105 desires to measure network throughput performance at the TCP layer. 106 Measuring TCP throughput provides a meaningful measure with respect 107 to the end user experience (and ultimately reach some level of 108 TCP testing interoperability which does not exist today). 110 Additionally, end-users (business enterprises) seek to conduct 111 repeatable TCP throughput tests between enterprise locations. Since 112 these enterprises rely on the networks of the providers, a common 113 test methodology (and metrics) would be equally beneficial to both 114 parties. 116 So the intent behind this TCP throughput draft is to define 117 a methodology for testing sustained TCP layer performance. In this 118 document, sustained TCP throughput is that amount of data per unit 119 time that TCP transports during equilibrium (steady state), i.e. 120 after the initial slow start phase. We refer to this state as TCP 121 Equilibrium, and that the equilibrium throughput is the maximum 122 achievable for the TCP connection(s). 124 There are many variables to consider when conducting a TCP throughput 125 test and this methodology focuses on some of the most common 126 parameters that should be considered such as: 128 - Path MTU and Maximum Segment Size (MSS) 129 - RTT and Bottleneck BW 130 - Ideal TCP Window (Bandwidth Delay Product) 131 - Single Connection and Multiple Connection testing 133 One other important note, it is highly recommended that traditional 134 Layer 2/3 type tests are conducted to verify the integrity of the 135 network before conducting TCP tests. Examples include RFC 2544 136 [RFC2544], iperf (UDP mode), or manual packet layer test techniques 137 where packet throughput, loss, and delay measurements are conducted. 139 2. Goals of this Methodology 141 Before defining the goals of this methodology, it is important to 142 clearly define the areas that are not intended to be measured or 143 analyzed by such a methodology. 145 - The methodology is not intended to predict TCP throughput 146 behavior during the transient stages of a TCP connection, such 147 as initial slow start. 149 - The methodology is not intended to definitively benchmark TCP 150 implementations of one OS to another, although some users may find 151 some value in conducting qualitative experiments 153 - The methodology is not intended to provide detailed diagnosis 154 of problems within end-points or the network itself as related to 155 non-optimal TCP performance, although a results interpretation 156 section for each test step may provide insight into potential 157 issues within the network 159 In contrast to the above exclusions, the goals of this methodology 160 are to define a method to conduct a structured, end-to-end 161 assessment of sustained TCP performance within a managed business 162 class IP network. A key goal is to establish a set of "best 163 practices" that an engineer should apply when validating the 164 ability of a managed network to carry end-user TCP applications. 166 Some specific goals are to: 168 - Provide a practical test approach that specifies the more well 169 understood (and end-user configurable) TCP parameters such as Window 170 size, MSS (Maximum Segment Size), # connections, and how these affect 171 the outcome of TCP performance over a network. 173 - Provide specific test conditions (link speed, RTT, window size, 174 etc.) and maximum achievable TCP throughput under TCP Equilbrium 175 conditions. For guideline purposes, provide examples of these test 176 conditions and the maximum achievable TCP throughput during the 177 equilibrium state. Section 2.1 provides specific details concerning 178 the definition of TCP Equilibrium within the context of this draft. 180 - Define two (2) basic metrics that can be used to compare the 181 performance of TCP connections under various network conditions 183 - In test situations where the recommended procedure does not yield 184 the maximum achievable TCP throughput result, this draft provides 185 some possible areas within the end host or network that should be 186 considered for investigation (although again, this draft is not 187 intended to provide a detailed diagnosis of these issues) 189 2.1 TCP Equilibrium State Throughput 191 TCP connections have three (3) fundamental congestion window phases 192 as documented in RFC 5681 [RFC5681]. These states are: 194 - Slow Start, which occurs during the beginning of a TCP transmission 195 or after a retransmission time out event 197 - Congestion avoidance, which is the phase during which TCP ramps up 198 to establish the maximum attainable throughput on an end-end network 199 path. Retransmissions are a natural by-product of the TCP congestion 200 avoidance algorithm as it seeks to achieve maximum throughput on 201 the network path. 203 - Retransmission phase, which include Fast Retransmit (Tahoe) and 204 Fast Recovery (Reno and New Reno). When a packet is lost, the 205 Congestion avoidance phase transitions to a Fast Retransmission or 206 Recovery Phase dependent upon the TCP implementation. 208 The following diagram depicts these states. 210 | ssthresh 211 TCP | | 212 Through- | | Equilibrium 213 put | |\ /\/\/\/\/\ Retransmit /\/\ ... 214 | | \ / | Time-out / 215 | | \ / | _______ _/ 216 | Slow _/ |/ | / | Slow _/ 217 | Start _/ Congestion |/ |Start_/ Congestion 218 | _/ Avoidance Loss | _/ Avoidance 219 | _/ Event | _/ 220 | _/ |/ 221 |/__________________________________________________________ 222 Time 224 This TCP methodology provides guidelines to measure the equilibrium 225 throughput which refers to the maximum sustained rate obtained by 226 congestion avoidance before packet loss conditions occur (which would 227 cause the state change from congestion avoidance to a retransmission 228 phase). All maximum achievable throughputs specified in Section 3 are 229 with respect to this equilibrium state. 231 2.2 Metrics for TCP Throughput Tests 233 This draft focuses on a TCP throughput methodology and also 234 provides two basic metrics to compare results of various throughput 235 tests. It is recognized that the complexity and unpredictability of 236 TCP makes it impossible to develop a complete set of metrics that 237 account for the myriad of variables (i.e. RTT variation, loss 238 conditions, TCP implementation, etc.). However, these two basic 239 metrics will faciliate TCP throughput comparisons under varying 240 network conditions and between network traffic management techniques. 242 The TCP Efficiency metric is the percentage of bytes that were not 243 retransmitted and is defined as: 245 Transmitted Bytes - Retransmitted Bytes 246 --------------------------------------- x 100 247 Transmitted Bytes 249 This metric provides a comparative measure between various QoS 250 mechanisms such as traffic management, congestion avoidance, and also 251 various TCP implementations (i.e. Reno, Vegas, etc.). 253 As an example, if 100,000 bytes were sent and 2,000 had to be 254 retransmitted, the TCP Efficiency would be calculated as: 256 100,000 - 2,000 257 ---------------- x 100 = 98% 258 100,000 260 Note that the retransmitted bytes may have occurred more than once, 261 and these multiple retransmissions are added to the bytes 262 retransmitted count. 264 The second metric is the TCP Transfer Time, which is simply the time 265 it takes to transfer a block of data across simultaneous TCP 266 connections. The concept is useful when benchmarking traffic 267 management techniques, where multiple connections are generally 268 required. 270 The TCP Transfer time can also be used to provide a normalized ratio 271 of the actual TCP Transfer Time versus ideal Transfer Time. This 272 ratio is called the TCP Transfer Index and is defined as: 274 Actual TCP Transfer Time 275 ------------------------- 276 Ideal TCP Transfer Time 278 An example would be the bulk transfer of 100 MB upon 5 simultaneous 279 TCP connections over a 500 Mbit/s Ethernet service (each connection 280 uploading 100 MB). Each connection may achieve different throughputs 281 during a test and the overall throughput rate is not always easy to 282 determine (especially as the number of connections increases). 284 The ideal TCP Transfer Time would be ~8 seconds, but in this example, 285 the actual TCP Transfer Time was 12 seconds. The TCP Transfer Index 286 would be 12/8 = 1.5, which indicates that the transfer across all 287 connections took 1.5 times longer than the ideal. 289 Note that both the TCP Efficiency and TCP Transfer Time metrics must 290 be measured during each throughput test. The correlation of TCP 291 Transfer Time with TCP Efficiency can help to diagnose whether the 292 TCP Transfer Time was negatively impacted by retransmissions (poor 293 TCP Efficiency). 295 3. TCP Throughput Testing Methodology 297 As stated in Section 1, it is considered best practice to verify 298 the integrity of the network by conducting Layer2/3 stress tests 299 such as RFC2544 (or other methods of network stress tests). If the 300 network is not performing properly in terms of packet loss, jitter, 301 etc. then the TCP layer testing will not be meaningful since the 302 equilibrium throughput would be very difficult to achieve (in a 303 "dysfunctional" network). 305 The following represents the sequential order of steps to conduct the 306 TCP throughput testing methodology: 308 1. Identify the Path MTU. Packetization Layer Path MTU Discovery 309 or PLPMTUD, [RFC4821], should be conducted to verify the maximum 310 network path MTU. Conducting PLPMTUD establishes the upper limit for 311 the MSS to be used in subsequent steps. 313 2. Baseline Round-trip Delay and Bandwidth. These measurements 314 provide estimates of the ideal TCP window size, which will be used in 315 subsequent test steps. 317 3. TCP Connection Throughput Tests. With baseline measurements 318 of round trip delay and bandwidth, a series of single and multiple 319 TCP connection throughput tests can be conducted to baseline the 320 network performance expectations. 322 4. Traffic Management Tests. Various traffic management and queueing 323 techniques are tested in this step, using multiple TCP connections. 324 Multiple connection testing can verify that the network is configured 325 properly for traffic shaping versus policing, various queueing 326 implementations, and RED. 328 Important to note are some of the key characteristics and 329 considerations for the TCP test instrument. The test host may be a 330 standard computer or dedicated communications test instrument 331 and these TCP test hosts be capable of emulating both a client and a 332 server. 334 Whether the TCP test host is a standard computer or dedicated test 335 instrument, the following areas should be considered when selecting 336 a test host: 338 - TCP implementation used by the test host OS, i.e. Linux OS kernel 339 using TCP Reno, TCP options supported, etc. This will obviously be 340 more important when using custom test equipment where the TCP 341 implementation may be customized or tuned to run in higher 342 performance hardware 344 - Most importantly, the TCP test host must be capable of generating 345 and receiving stateful TCP test traffic at the full link speed of the 346 network under test. As a general rule of thumb, testing TCP 347 throughput at rates greater than 100 Mbit/sec generally requires high 348 performance server hardware or dedicated hardware based test tools. 350 - To measure RTT and TCP Efficiency per connection, this will 351 generally require dedicated hardware based test tools. In the absence 352 of dedicated hardware based test tools, these measurements may need 353 to be conducted with packet capture tools (conduct TCP throughput 354 tests and analyze RTT and retransmission results with packet 355 captures). 357 3.1. Determine Network Path MTU 359 TCP implementations should use Path MTU Discovery techniques (PMTUD). 360 PMTUD relies on ICMP 'need to frag' messages to learn the path MTU. 361 When a device has a packet to send which has the Don't Fragment (DF) 362 bit in the IP header set and the packet is larger than the Maximum 363 Transmission Unit (MTU) of the next hop link, the packet is dropped 364 and the device sends an ICMP 'need to frag' message back to the host 365 that originated the packet. The ICMP 'need to frag' message includes 366 the next hop MTU which PMTUD uses to tune the TCP Maximum Segment 367 Size (MSS). Unfortunately, because many network managers completely 368 disable ICMP, this technique does not always prove reliable in real 369 world situations. 371 Packetization Layer Path MTU Discovery or PLPMTUD (RFC4821) should 372 be conducted to verify the minimum network path MTU. PLPMTUD can 373 be used with or without ICMP. The following sections provide a 374 summary of the PLPMTUD approach and an example using the TCP 375 protocol. RFC4821 specifies a search_high and search_low parameter 376 for the MTU. As specified in RFC4821, a value of 1024 is a generally 377 safe value to choose for search_low in modern networks. 379 It is important to determine the overhead of the links in the path, 380 and then to select a TCP MSS size corresponding to the Layer 3 MTU. 381 For example, if the MTU is 1024 bytes and the TCP/IP headers are 40 382 bytes, then the MSS would be set to 984 bytes. 384 An example scenario is a network where the actual path MTU is 1240 385 bytes. The TCP client probe MUST be capable of setting the MSS for 386 the probe packets and could start at MSS = 984 (which corresponds 387 to an MTU size of 1024 bytes). 389 The TCP client probe would open a TCP connection and advertise the 390 MSS as 984. Note that the client probe MUST generate these packets 391 with the DF bit set. The TCP client probe then sends test traffic 392 per a nominal window size (8KB, etc.). The window size should be 393 kept small to minimize the possibility of congesting the network, 394 which could induce congestive loss. The duration of the test should 395 also be short (10-30 seconds), again to minimize congestive effects 396 during the test. 398 In the example of a 1240 byte path MTU, probing with an MSS equal to 399 984 would yield a successful probe and the test client packets would 400 be successfully transferred to the test server. 402 Also note that the test client MUST verify that the MSS advertised 403 is indeed negotiated. Network devices with built-in Layer 4 404 capabilities can intercede during the connection establishment 405 process and reduce the advertised MSS to avoid fragmentation. This 406 is certainly a desirable feature from a network perspective, but 407 can yield erroneous test results if the client test probe does not 408 confirm the negotiated MSS. 410 The next test probe would use the search_high value and this would 411 be set to MSS = 1460 to correspond to a 1500 byte MTU. In this 412 example, the test client would retransmit based upon time-outs (since 413 no ACKs will be received from the test server). This test probe is 414 marked as a conclusive failure if none of the test packets are 415 ACK'ed. If any of the test packets are ACK'ed, congestive network 416 may be the cause and the test probe is not conclusive. Re-testing 417 at other times of the day is recommended to further isolate. 419 The test is repeated until the desired granularity of the MTU is 420 discovered. The method can yield precise results at the expense of 421 probing time. One approach would be to reduce the probe size to 422 half between the unsuccessful search_high and successful search_low 423 value, and increase by increments of 1/2 when seeking the upper 424 limit. 426 3.2. Baseline Round-trip Delay and Bandwidth 428 Before stateful TCP testing can begin, it is important to baseline 429 the round trip delay and bandwidth of the network to be tested. 430 These measurements provide estimates of the ideal TCP window size, 431 which will be used in subsequent test steps. These latency and 432 bandwidth tests should be run during the time of day for which 433 the TCP throughput tests will occur. 435 The baseline RTT is used to predict the bandwidth delay product and 436 the TCP Transfer Time for the subsequent throughput tests. Since this 437 methodology requires that RTT be measured during the entire 438 throughput test, the extent by which the RTT varied during the 439 throughput test can be quantified. 441 3.2.1 Techniques to Measure Round Trip Time 443 Following the definitions used in the references of the appendix; 444 Round Trip Time (RTT) is the time elapsed between the clocking in of 445 the first bit of a payload packet to the receipt of the last bit of 446 the corresponding acknowledgement. Round Trip Delay (RTD) is used 447 synonymously to twice the Link Latency. 449 In any method used to baseline round trip delay between network 450 end-points, it is important to realize that network latency is the 451 sum of inherent network delay and congestion. The RTT should be 452 baselined during "off-peak" hours to obtain a reliable figure for 453 network latency (versus additional delay caused by congestion). 455 During the actual sustained TCP throughput tests, it is critical 456 to measure RTT along with measured TCP throughput. Congestive 457 effects can be isolated if RTT is concurrently measured. 459 This is not meant to provide an exhaustive list, but summarizes some 460 of the more common ways to determine round trip time (RTT) through 461 the network. The desired resolution of the measurement (i.e. msec 462 versus usec) may dictate whether the RTT measurement can be achieved 463 with standard tools such as ICMP ping techniques or whether 464 specialized test equipment would be required with high precision 465 timers. The objective in this section is to list several techniques 466 in order of decreasing accuracy. 468 - Use test equipment on each end of the network, "looping" the 469 far-end tester so that a packet stream can be measured end-end. This 470 test equipment RTT measurement may be compatible with delay 471 measurement protocols specified in RFC5357. 473 - Conduct packet captures of TCP test applications using for example 474 "iperf" or FTP, etc. By running multiple experiments, the packet 475 captures can be studied to estimate RTT based upon the SYN -> SYN-ACK 476 handshakes within the TCP connection set-up. 478 - ICMP Pings may also be adequate to provide round trip time 479 estimations. Some limitations of ICMP Ping are the msec resolution 480 and whether the network elements respond to pings (or block them). 482 3.2.2 Techniques to Measure End-end Bandwidth 484 There are many well established techniques available to provide 485 estimated measures of bandwidth over a network. This measurement 486 should be conducted in both directions of the network, especially for 487 access networks which are inherently asymmetrical. Some of the 488 asymmetric implications to TCP performance are documented in RFC 3449 489 [RFC3449]. 491 The bandwidth measurement test must be run with stateless IP streams 492 (not stateful TCP) in order to determine the available bandwidth in 493 each direction. And this test should obviously be performed at 494 various intervals throughout a business day (or even across a week). 495 Ideally, the bandwidth test should produce a log output of the 496 bandwidth achieved across the test interval AND the round trip delay. 498 And during the actual TCP level performance measurements (Sections 499 3.3 - 3.5), the test tool must be able to track round trip time 500 of the TCP connection(s) during the test. Measuring round trip time 501 variation (aka "jitter") provides insight into effects of congestive 502 delay on the sustained throughput achieved for the TCP layer test. 504 3.3. TCP Throughput Tests 506 This draft specifically defines TCP throughput techniques to verify 507 sustained TCP performance in a managed business network. Defined 508 in section 2.1, the equilibrium throughput reflects the maximum 509 rate achieved by a TCP connection within the congestion avoidance 510 phase on a end-end network path. This section and others will define 511 the method to conduct these sustained throughput tests and guidelines 512 of the predicted results. 514 With baseline measurements of round trip time and bandwidth 515 from section 3.2, a series of single and multiple TCP connection 516 throughput tests can be conducted to baseline network performance 517 against expectations. 519 It is recommended to run the tests in each direction independently 520 first, then run both directions simultaneously. In each case, the 521 TCP Efficiency and TCP Transfer Time metrics must be measured in each 522 direction. 524 3.3.1 Calculate Optimum TCP Window Size 526 The optimum TCP window size can be calculated from the bandwidth 527 delay product (BDP), which is: 529 BDP (bits) = RTT (sec) x Bandwidth (bps) 531 By dividing the BDP by 8, the "ideal" TCP window size is calculated. 532 An example would be a T3 link with 25 msec RTT. The BDP would equal 533 ~1,105,000 bits and the ideal TCP window would equal ~138,000 bytes. 535 The following table provides some representative network link speeds, 536 latency, BDP, and associated "optimum" TCP window size. Sustained 537 TCP transfers should reach nearly 100% throughput, minus the overhead 538 of Layers 1-3 and the divisor of the MSS into the window. 540 For this single connection baseline test, the MSS size will effect 541 the achieved throughput (especially for smaller TCP window sizes). 542 Table 3.2 provides the achievable, equilibrium TCP throughput (at 543 Layer 4) using 1460 byte MSS. Also in this table, the 58 byte L1-L4 544 overhead including the Ethernet CRC32 is used for simplicity. 546 Table 3.2: Link Speed, RTT and calculated BDP, TCP Throughput 548 Link Ideal TCP Maximum Achievable 549 Speed* RTT (ms) BDP (bits) Window (kbytes) TCP Throughput(Mbps) 550 --------------------------------------------------------------------- 551 T1 20 30,720 3.84 1.17 552 T1 50 76,800 9.60 1.40 553 T1 100 153,600 19.20 1.40 554 T3 10 442,100 55.26 42.05 555 T3 15 663,150 82.89 42.05 556 T3 25 1,105,250 138.16 41.52 557 T3(ATM) 10 407,040 50.88 36.50 558 T3(ATM) 15 610,560 76.32 36.23 559 T3(ATM) 25 1,017,600 127.20 36.27 560 100M 1 100,000 12.50 91.98 561 100M 2 200,000 25.00 93.44 562 100M 5 500,000 62.50 93.44 563 1Gig 0.1 100,000 12.50 919.82 564 1Gig 0.5 500,000 62.50 934.47 565 1Gig 1 1,000,000 125.00 934.47 566 10Gig 0.05 500,000 62.50 9,344.67 567 10Gig 0.3 3,000,000 375.00 9,344.67 569 * Note that link speed is the minimum link speed throughput a 570 network; i.e. WAN with T1 link, etc. 572 Also, the following link speeds (available payload bandwidth) were 573 used for the WAN entries: 575 - T1 = 1.536 Mbits/sec (B8ZS line encoding facility) 576 - T3 = 44.21 Mbits/sec (C-Bit Framing) 577 - T3(ATM) = 36.86 Mbits/sec (C-Bit Framing & PLCP, 96000 Cells per 578 second) 580 The calculation method used in this document is a 3 step process : 582 1 - We determine what should be the optimal TCP Window size value 583 based on the optimal quantity of "in-flight" octets discovered by 584 the BDP calculation. We take into consideration that the TCP 585 Window size has to be an exact multiple value of the MSS. 586 2 - Then we calculate the achievable layer 2 throughput by 587 multiplying the value determined in step 1 with the 588 MSS & (MSS + L2 + L3 + L4 Overheads) divided by the RTT. 589 3 - Finally, we multiply the calculated value of step 2 by the MSS 590 versus (MSS + L2 + L3 + L4 Overheads) ratio. 592 This gives us the achievable TCP Throughput value. Sometimes, the 593 maximum achievable throughput is limited by the maximum achievable 594 quantity of Ethernet Frames per second on the physical media. Then 595 this value is used in step 2 instead of the calculated one. 597 The following diagram compares achievable TCP throughputs on a T3 link 598 with Windows 2000/XP TCP window sizes of 16KB versus 64KB. 600 45| 601 | _____42.1M 602 40| |64K| 603 TCP | | | 604 Throughput 35| | | _____34.3M 605 in Mbps | | | |64K| 606 30| | | | | 607 | | | | | 608 25| | | | | 609 | | | | | 610 20| | | | | _____20.5M 611 | | | | | |64K| 612 15| 14.5M____| | | | | | 613 | |16K| | | | | | 614 10| | | | 9.6M+---+ | | | 615 | | | | |16K| | 5.8M____+ | 616 5| | | | | | | |16K| | 617 |______+___+___+_______+___+___+_______+__ +___+_______ 618 10 15 25 619 RTT in milliseconds 621 The following diagram shows the achievable TCP throughput on a 25ms 622 T3 when the TCP Window size is increased and with the RFC1323 TCP 623 Window scaling option. 625 45| 626 | +-----+42.47M 627 40| | | 628 TCP | | | 629 Throughput 35| | | 630 in Mbps | | | 631 30| | | 632 | | | 633 25| | | 634 | ______ 21.23M | | 635 20| | | | | 636 | | | | | 637 15| | | | | 638 | | | | | 639 10| +----+10.62M | | | | 640 | _______5.31M | | | | | | 641 5| | | | | | | | | 642 |__+_____+______+____+__________+____+________+_____+___ 643 16 32 64 128 644 TCP Window size in KBytes 646 3.3.2 Conducting the TCP Throughput Tests 648 There are several TCP tools that are commonly used in the network 649 world and one of the most common is the "iperf" tool. With this tool, 650 hosts are installed at each end of the network segment; one as client 651 and the other as server. The TCP Window size of both the client and 652 the server can be manually set and the achieved throughput is 653 measured, either uni-directionally or bi-directionally. For higher 654 BDP situations in lossy networks (long fat networks or satellite 655 links, etc.), TCP options such as Selective Acknowledgment should be 656 considered and also become part of the window size / throughput 657 characterization. 659 Host hardware performance must be well understood before conducting 660 the TCP throughput tests and other tests in the following sections. 661 Dedicated test equipment will generally be required, especially for 662 line rates of GigE and 10 GigE. 664 The TCP throughput test should be run over a a long enough duration 665 to properly exercise network buffers and also characterize 666 performance during different time periods of the day. The results 667 must be logged at the desired interval and the test must record RTT 668 and TCP retransmissions at each interval. 670 This correlation of retransmissions and RTT over the course of the 671 test will clearly identify which portions of the transfer reached 672 TCP Equilbrium state and to what effect increased RTT (congestive 673 effects) may have been the cause of reduced equilibrium performance. 675 Additionally, the TCP Efficiency and TCP Transfer time metrics should 676 be logged in order to further characterize the window size tests. 678 3.3.3 Single vs. Multiple TCP Connection Testing 680 The decision whether to conduct single or multiple TCP connection 681 tests depends upon the size of the BDP in relation to the window 682 sizes configured in the end-user environment. For example, if the 683 BDP for a long-fat pipe turns out to be 2MB, then it is probably more 684 realistic to test this pipe with multiple connections. Assuming 685 typical host computer window settings of 64 KB, using 32 connections 686 would realistically test this pipe. 688 The following table is provided to illustrate the relationship of the 689 BDP, window size, and the number of connections required to utilize 690 the available capacity. For this example, the network bandwidth is 691 500 Mbps, RTT is equal to 5 ms, and the BDP equates to 312 KBytes. 693 #Connections 694 Window to Fill Link 695 ------------------------ 696 16KB 20 697 32KB 10 698 64KB 5 699 128KB 3 701 The TCP Transfer Time metric is useful for conducting multiple 702 connection tests. Each connection should be configured to transfer 703 a certain payload (i.e. 100 MB), and the TCP Transfer time provides 704 a simple metric to verify the actual versus expected results. 706 Note that the TCP transfer time is the time for all connections to 707 complete the transfer of the configured payload size. From the 708 example table listed above, the 64KB window is considered. Each of 709 the 5 connections would be configured to transfer 100MB, and each 710 TCP should obtain a maximum of 100 Mb/sec per connection. So for 711 this example, the 100MB payload should be transferred across the 712 connections in approximately 8 seconds (which would be the ideal TCP 713 transfer time for these conditions). 715 Additionally, the TCP Efficiency metric should be computed for each 716 connection tested (defined in section 2.2). 718 3.3.4 Interpretation of the TCP Throughput Results 720 At the end of this step, the user will document the theoretical BDP 721 and a set of Window size experiments with measured TCP throughput for 722 each TCP window size setting. For cases where the sustained TCP 723 throughput does not equal the predicted value, some possible causes 724 are listed: 726 - Network congestion causing packet loss; the TCP Efficiency metric 727 is a useful gauge to compare network performance 728 - Network congestion not causing packet loss but increasing RTT 729 - Intermediate network devices which actively regenerate the TCP 730 connection and can alter window size, MSS, etc. 731 - Over utilization of available link or rate limiting (policing). 732 More discussion of traffic management tests follows in section 3.4 734 3.4. Traffic Management Tests 736 In most cases, the network connection between two geographic 737 locations (branch offices, etc.) is lower than the network connection 738 of the host computers. An example would be LAN connectivity of GigE 739 and WAN connectivity of 100 Mbps. The WAN connectivity may be 740 physically 100 Mbps or logically 100 Mbps (over a GigE WAN 741 connection). In the later case, rate limiting is used to provide the 742 WAN bandwidth per the SLA. 744 Traffic management techniques are employed to provide various forms 745 of QoS, the more common include: 747 - Traffic Shaping 748 - Priority Queueing 749 - Random Early Discard (RED, etc.) 751 Configuring the end-end network with these various traffic management 752 mechanisms is a complex under-taking. For traffic shaping and RED 753 techniques, the end goal is to provide better performance for bursty 754 traffic such as TCP (RED is specifically intended for TCP). 756 This section of the methodology provides guidelines to test traffic 757 shaping and RED implementations. As in section 3.3, host hardware 758 performance must be well understood before conducting the traffic 759 shaping and RED tests. Dedicated test equipment will generally be 760 required, especially for line rates of GigE and 10 GigE. 762 3.4.1 Traffic Shaping Tests 764 For services where the available bandwidth is rate limited, there are 765 two (2) techniques used to implement rate limiting: traffic policing 766 and traffic shaping. 768 Simply stated, traffic policing marks and/or drops packets which 769 exceed the SLA bandwidth (in most cases, excess traffic is dropped). 770 Traffic shaping employs the use of queues to smooth the bursty 771 traffic and then send out within the SLA bandwidth limit (without 772 dropping packets unless the traffic shaping queue is exceeded). 774 Traffic shaping is generally configured for TCP data services and 775 can provide improved TCP performance since the retransmissions are 776 reduced, which in turn optimizes TCP throughput for the given 777 available bandwidth. Through this section, the available 778 rate-limited bandwidth shall be referred to as the 779 "bottleneck bandwidth". 781 The ability to detect proper traffic shaping is more easily diagnosed 782 when conducting a multiple TCP connection test. Proper shaping will 783 provide a fair distribution of the available bottleneck bandwidth, 784 while traffic policing will not. 786 The traffic shaping tests build upon the concepts of multiple 787 connection testing as defined in section 3.3.3. Calculating the BDP 788 for the bottleneck bandwidth is first required and then selecting 789 the number of connections / window size per connection. 791 Similar to the example in section 3.3, a typical test scenario might 792 be: GigE LAN with a 100Mbps bottleneck bandwidth (rate limited 793 logical interface), and 5 msec RTT. This would require five (5) TCP 794 connections of 64 KB window size evenly fill the bottleneck bandwidth 795 (about 100 Mbps per connection). 797 The traffic shaping should be run over a long enough duration to 798 properly exercise network buffers and also characterize performance 799 during different time periods of the day. The throughput of each 800 connection must be logged during the entire test, along with the TCP 801 Efficiency and TCP Transfer time metric. Additionally, it is 802 recommended to log RTT and retransmissions per connection over the 803 test interval. 805 3.4.1.1 Interpretation of Traffic Shaping Test Restults 807 By plotting the throughput achieved by each TCP connection, the fair 808 sharing of the bandwidth is generally very obvious when traffic 809 shaping is properly configured for the bottleneck interface. For the 810 previous example of 5 connections sharing 500 Mbps, each connection 811 would consume ~100 Mbps with a smooth variation. If traffic policing 812 was present on the bottleneck interface, the bandwidth sharing would 813 not be fair and the resulting throughput plot would reveal "spikey" 814 throughput consumption of the competing TCP connections (due to the 815 retransmissions). 817 3.4.2 RED Tests 819 Random Early Discard techniques are specifically targeted to provide 820 congestion avoidance for TCP traffic. Before the network element 821 queue "fills" and enters the tail drop state, RED drops packets at 822 configurable queue depth thresholds. This action causes TCP 823 connections to back-off which helps to prevent tail drop, which in 824 turn helps to prevent global TCP synchronization. 826 Again, rate limited interfaces can benefit greatly from RED based 827 techniques. Without RED, TCP is generally not able to achieve the 828 full bandwidth of the bottleneck interface. With RED enabled, TCP 829 congestion avoidance throttles the connections on the higher speed 830 interface (i.e. LAN) and can reach equalibrium with the bottleneck 831 bandwidth (achieving closer to full throughput). 833 The ability to detect proper RED configuration is more easily 834 diagnosed when conducting a multiple TCP connection test. Multiple 835 TCP connections provide the multiple bursty sources that emulate the 836 real-world conditions for which RED was intended. 838 The RED tests also build upon the concepts of multiple connection 839 testing as defined in secion 3.3.3. Calculating the BDP for the 840 bottleneck bandwidth is first required and then selecting the number 841 of connections / window size per connection. 843 For RED testing, the desired effect is to cause the TCP connections 844 to burst beyond the bottleneck bandwidth so that queue drops will 845 occur. Using the same example from section 3.4.1 (traffic shaping), 846 the 500 Mbps bottleneck bandwidth requires 5 TCP connections (with 847 window size of 64Kb) to fill the capacity. Some experimentation is 848 required,but it is recommended to start with double the number of 849 connections to stress the network element buffers / queues. In this 850 example, 10 connections would produce TCP bursts of 64KB for each 851 connection. If the timing of the TCP tester permits, these TCP 852 bursts could stress queue sizes in the 512KB range. Again 853 experimentation will be required and the proper number of TCP 854 connections / window size will be dictated by the size the network 855 element queue. 857 3.4.2.1 Interpretation of RED Results 859 The default queuing technique for most network devices is FIFO based. 860 Without RED, the FIFO based queue will cause excessive loss to all of 861 the TCP connections and in the worst case global TCP synchronization. 863 By plotting the aggregate throughput achieved on the bottleneck 864 interface, proper RED operation can be determined if the bottleneck 865 bandwidth is fully utilized. For the previous example of 10 866 connections (window = 64 KB) sharing 500 Mbps, each connection should 867 consume ~50 Mbps. If RED was not properly enabled on the interface, 868 then the TCP connections will retransmit at a higher rate and the net 869 effect is that the bottleneck bandwidth is not fully utilized. 871 Another means to study non-RED versus RED implementation is to use 872 the TCP Transfer Time metric for all of the connections. In this 873 example, a 100 MB payload transfer should take ideally 16 seconds 874 across all 10 connections (with RED enabled). With RED not enabled, 875 the throughput across the bottleneck bandwidth would be greatly 876 reduced (generally 20-40%) and the TCP Transfer time would be 877 proportionally longer then the ideal transfer time. 879 Additionally, the TCP Transfer Efficiency metric is useful, since 880 non-RED implementations will exhibit a lower TCP Tranfer Efficiency 881 than RED implementations. 883 4. Security Considerations 885 The security considerations that apply to any active measurement of 886 live networks are relevant here as well. See [RFC4656] and 887 [RFC5357]. 889 5. IANA Considerations 891 This memo does not require and IANA registration for ports dedicated 892 to the TCP testing described in this memo. 894 6. Acknowledgements 896 The author would like to thank Gilles Forget, Loki Jorgenson, 897 and Reinhard Schrage for technical review and original contributions 898 to this draft-06. 900 Also thanks to Matt Mathis, Matt Zekauskas, Al Morton, and Yaakov 901 Stein for many good comments and for pointing us to great sources of 902 information pertaining to past works in the TCP capacity area. 904 7. References 906 7.1 Normative References 908 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 909 Requirement Levels", BCP 14, RFC 2119, March 1997. 911 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 912 Zekauskas, "A One-way Active Measurement Protocol 913 (OWAMP)", RFC 4656, September 2006. 915 [RFC5681] Allman, M., Paxson, V., Stevens W., "TCP Congestion 916 Control", RFC 5681, September 2009. 918 [RFC2544] Bradner, S., McQuaid, J., "Benchmarking Methodology for 919 Network Interconnect Devices", RFC 2544, June 1999 921 [RFC3449] Balakrishnan, H., Padmanabhan, V. N., Fairhurst, G., 922 Sooriyabandara, M., "TCP Performance Implications of 923 Network Path Asymmetry", RFC 3449, December 2002 925 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz, 926 J., "A Two-Way Active Measurement Protocol (TWAMP)", 927 RFC 5357, October 2008 929 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 930 Discovery", RFC 4821, June 2007 932 draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk 933 Transfer Capacity Methodology for Cooperating Hosts", 934 August 2001 936 7.2. Informative References 938 Authors' Addresses 940 Barry Constantine 941 JDSU, Test and Measurement Division 942 One Milesone Center Court 943 Germantown, MD 20876-7100 944 USA 946 Phone: +1 240 404 2227 947 barry.constantine@jdsu.com 949 Gilles Forget 950 Independent Consultant to Bell Canada. 951 308, rue de Monaco, St-Eustache 952 Qc. CANADA, Postal Code : J7P-4T5 954 Phone: (514) 895-8212 955 gilles.forget@sympatico.ca 957 Loki Jorgenson 958 nooCore 960 Phone: (604) 908-5833 961 ljorgenson@nooCore.com 963 Reinhard Schrage 964 Schrage Consulting 966 Phone: +49 (0) 5137 909540 967 reinhard@schrageconsult.com