idnits 2.17.1 draft-ietf-ippm-tcp-throughput-tm-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 7, 2010) is 4888 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group B. Constantine 2 Internet-Draft JDSU 3 Intended status: Informational G. Forget 4 Expires: June 7, 2011 Bell Canada (Ext. Consultant) 5 Rudiger Geib 6 Deutsche Telekom 7 Reinhard Schrage 8 Schrage Consulting 10 December 7, 2010 12 Framework for TCP Throughput Testing 13 draft-ietf-ippm-tcp-throughput-tm-09.txt 15 Abstract 17 This framework describes a methodology for measuring end-to-end TCP 18 throughput performance in a managed IP network. The intention is to 19 provide a practical methodology to validate TCP layer performance. 20 The goal is to provide a better indication of the user experience. 21 In this framework, various TCP and IP parameters are identified and 22 should be tested as part of a managed IP network. 24 Requirements Language 26 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 27 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 28 document are to be interpreted as described in RFC 2119 [RFC2119]. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on June 7, 2011. 47 Copyright Notice 49 Copyright (c) 2010 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 66 1.2 Test Set-up . . . . . . . . . . . . . . . . . . . . . . . 4 67 2. Scope and Goals of this methodology. . . . . . . . . . . . . . 5 68 2.1 TCP Equilibrium. . . . . . . . . . . . . . . . . . . . . . 6 69 3. TCP Throughput Testing Methodology . . . . . . . . . . . . . . 7 70 3.1 Determine Network Path MTU . . . . . . . . . . . . . . . . 9 71 3.2. Baseline Round Trip Time and Bandwidth . . . . . . . . . . 10 72 3.2.1 Techniques to Measure Round Trip Time . . . . . . . . 10 73 3.2.2 Techniques to Measure end-to-end Bandwidth. . . . . . 11 74 3.3. TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 12 75 3.3.1 Calculate Ideal TCP Receive Window Size. . . . . . . . 12 76 3.3.2 Metrics for TCP Throughput Tests . . . . . . . . . . . 15 77 3.3.3 Conducting the TCP Throughput Tests. . . . . . . . . . 18 78 3.3.4 Single vs. Multiple TCP Connection Testing . . . . . . 19 79 3.3.5 Interpretation of the TCP Throughput Results . . . . . 20 80 3.4. Traffic Management Tests . . . . . . . . . . . . . . . . . 20 81 3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 21 82 3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 21 83 3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 22 84 3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 23 85 4. Security Considerations . . . . . . . . . . . . . . . . . . . 23 86 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 87 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23 88 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 89 7.1 Normative References . . . . . . . . . . . . . . . . . . . 24 90 7.2 Informative References . . . . . . . . . . . . . . . . . . 24 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 94 1. Introduction 96 Network providers are coming to the realization that Layer 2/3 97 testing is not enough to adequately ensure end-user's satisfaction. 98 An SLA (Service Level Agreement) is provided to business customers 99 and is generally based upon Layer 2/3 criteria such as access rate, 100 latency, packet loss and delay variations. On the other hand, 101 measuring TCP throughput provides meaningful results with respect to 102 user experience. Thus, the network provider community desires to 103 measure IP network throughput performance at the TCP layer. 105 Additionally, business enterprise customers seek to conduct 106 repeatable TCP throughput tests between locations. Since these 107 enterprises rely on the networks of the providers, a common test 108 methodology with predefined metrics will benefit both parties. 110 Note that the primary focus of this methodology is managed business 111 class IP networks; i.e. those Ethernet terminated services for which 112 businesses are provided an SLA from the network provider. End-users 113 with "best effort" access between locations can use this methodology, 114 but this framework and its metrics are intended to be used in a 115 predictable managed IP service environment. 117 So the intent behind this document is to define a methodology for 118 testing sustained TCP layer performance. In this document, the 119 maximum achievable TCP Throughput is that amount of data per unit 120 time that TCP transports when trying to reach Equilibrium, i.e. 121 after the initial slow start and congestion avoidance phases. 123 TCP uses a congestion window, (TCP CWND), to determine how many 124 packets it can send at one time. The network path bandwidth delay 125 product (BDP) determines the ideal TCP CWND. With the help of slow 126 start and congestion avoidance mechanisms, TCP probes the network 127 path. So up to the bandwidth limit, a larger TCP CWND permits a 128 higher throughput. And up to local host limits, TCP "Slow Start" and 129 "Congestion Avoidance" algorithms together will determine the TCP 130 CWND size. The Maximum TCP CWND size is also tributary to the buffer 131 space allocated by the kernel for each socket. For each socket, there 132 is a default buffer size that can be changed by the program using a 133 system library called just before opening the socket. There is also 134 a kernel enforced maximum buffer size. This buffer size can be 135 adjusted at both ends of the socket (send and receive). In order 136 to obtain the maximum throughput, it is critical to use optimal TCP 137 Send and Receive Socket Buffer sizes. 139 There are many variables to consider when conducting a TCP throughput 140 test, but this methodology focuses on: 141 - RTT and Bottleneck BW 142 - Ideal TCP Receive Window (Ideal Receive Socket Buffer) 143 - Ideal Send Socket Buffer 144 - TCP Congestion Window (TCP CWND) 145 - Path MTU and Maximum Segment Size (MSS) 146 - Single Connection and Multiple Connections testing 147 This methodology proposes TCP testing that should be performed in 148 addition to traditional Layer 2/3 type tests. Layer 2/3 tests are 149 required to verify the integrity of the network before conducting TCP 150 tests. Examples include iperf (UDP mode) or manual packet layer test 151 techniques where packet throughput, loss, and delay measurements are 152 conducted. When available, standardized testing similar to RFC 2544 153 [RFC2544] but adapted for use in operational networks may be used. 154 Note: RFC 2544 was never meant to be used outside a lab environment. 156 The following 2 sections provide a general overview of the test 157 methodology. 159 1.1 Terminology 161 Common terminologies used in the test methodology are: 163 - TCP Throughput Test Device (TCP TTD), refers to compliant TCP 164 host that generates traffic and measures metrics as defined in 165 this methodology. i.e. a dedicated communications test instrument. 166 - Customer Provided Equipment (CPE), refers to customer owned 167 equipment (routers, switches, computers, etc.) 168 - Customer Edge (CE), refers to provider owned demarcation device. 169 - Provider Edge (PE), refers to provider's distribution equipment. 170 - Bottleneck Bandwidth (BB), lowest bandwidth along the complete 171 path. Bottleneck Bandwidth and Bandwidth are used synonymously 172 in this document. Most of the time the Bottleneck Bandwidth is 173 in the access portion of the wide area network (CE - PE) 174 - Provider (P), refers to provider core network equipment. 175 - Network Under Test (NUT), refers to the tested IP network path. 176 - Round-Trip Time (RTT), refers to Layer 4 back and forth delay. 178 +----+ +----+ +----+ +----+ +---+ +---+ +----+ +----+ +----+ +----+ 179 | TCP|-| CPE|-| CE |--| PE |-| P |--| P |-| PE |--| CE |-| CPE|-| TCP| 180 | TTD| | | | |BB| | | | | | | |BB| | | | | TTD| 181 +----+ +----+ +----+ +----+ +---+ +---+ +----+ +----+ +----+ +----+ 182 <------------------------ NUT ------------------------> 183 R >-----------------------------------------------------------| 184 T | 185 T <-----------------------------------------------------------| 187 Note that the NUT may consist of a variety of devices including but 188 not limited to, load balancers, proxy servers or WAN acceleration 189 devices. The detailed topology of the NUT should be well understood 190 when conducting the TCP throughput tests, although this methodology 191 makes no attempt to characterize specific network architectures. 193 1.2 Test Set-up 195 This methodology is intended for operational and managed IP networks. 196 A multitude of network architectures and topologies can be tested. 197 The above set-up diagram is very general and it only illustrates the 198 segmentation within end-user and network provider domains. 200 2. Scope and Goals of this Methodology 202 Before defining the goals, it is important to clearly define the 203 areas that are out-of-scope. 205 - This methodology is not intended to predict the TCP throughput 206 during the transient stages of a TCP connection, such as the initial 207 slow start. 209 - This methodology is not intended to definitively benchmark TCP 210 implementations of one OS to another, although some users may find 211 some value in conducting qualitative experiments. 213 - This methodology is not intended to provide detailed diagnosis 214 of problems within end-points or within the network itself as 215 related to non-optimal TCP performance, although a results 216 interpretation section for each test step may provide insight in 217 regards with potential issues. 219 - This methodology does not propose to operate permanently with high 220 measurement loads. TCP performance and optimization within 221 operational networks may be captured and evaluated by using data 222 from the "TCP Extended Statistics MIB" [RFC4898]. 224 - This methodology is not intended to measure TCP throughput as part 225 of an SLA, or to compare the TCP performance between service 226 providers or to compare between implementations of this methodology 227 in dedicated communications test instruments. 229 In contrast to the above exclusions, a primary goal is to define a 230 method to conduct a practical, end-to-end assessment of sustained 231 TCP performance within a managed business class IP network. Another 232 key goal is to establish a set of "best practices" that a non-TCP 233 expert should apply when validating the ability of a managed network 234 to carry end-user TCP applications. 236 Other specific goals are to : 238 - Provide a practical test approach that specifies IP hosts 239 configurable TCP parameters such as TCP Receive Window size, Socket 240 Buffer size, MSS (Maximum Segment Size), number of connections, and 241 how these affect the outcome of TCP performance over a network. 242 See section 3.3.3. 244 - Provide specific test conditions like link speed, RTT, TCP Receive 245 Window size, Socket Buffer size and maximum achievable TCP throughput 246 when trying to reach TCP Equilibrium. For guideline purposes, 247 provide examples of test conditions and their maximum achievable 248 TCP throughput. Section 2.1 provides specific details concerning the 249 definition of TCP Equilibrium within this methodology while section 3 250 provides specific test conditions with examples. 252 Note that some TCP/IP stack implementations are using Receive Window 253 Auto-Tuning and cannot be adjusted until this feature is disabled. 255 - Define three (3) basic metrics to compare the performance of TCP 256 connections under various network conditions. See section 3.3.2. 258 - In test situations where the recommended procedure does not yield 259 the maximum achievable TCP throughput results, this methodology 260 provides some possible areas within the end host or the network that 261 should be considered for investigation. Although again, this 262 methodology is not intended to provide a detailed diagnosis on these 263 issues. See section 3.3.5. 265 2.1 TCP Equilibrium 267 TCP connections have three (3) fundamental congestion window phases : 269 1 - The Slow Start phase, which occurs at the beginning of a TCP 270 transmission or after a retransmission time out. 272 2 - The Congestion Avoidance phase, during which TCP ramps up to 273 establish the maximum attainable throughput on an end-to-end network 274 path. Retransmissions are a natural by-product of the TCP congestion 275 avoidance algorithm as it seeks to achieve maximum throughput. 277 3 - The Loss Recovery phase, which could include Fast Retransmit 278 (Tahoe) or Fast Recovery (Reno & New Reno). When packet loss occurs, 279 Congestion Avoidance phase transitions either to Fast Retransmission 280 or Fast Recovery depending upon TCP implementations. If a Time-Out 281 occurs, TCP transitions back to the Slow Start phase. 283 The following diagram depicts these 3 phases. 285 /\ | Trying to reach TCP Equilibrium > > > > > > > > > 286 /\ | 287 /\ |High ssthresh TCP CWND 288 /\ |Loss Event * halving 3-Loss Recovery 289 /\ | * \ upon loss Adjusted 290 /\ | * \ / \ Time-Out ssthresh 291 /\ | * \ / \ +--------+ * 292 TCP | * \/ \ / Multiple| * 293 Through- | * 2-Congestion\ / Loss | * 294 put | * Avoidance \/ Event | * 295 | * Half | * 296 | * TCP CWND | * 1-Slow Start 297 | * 1-Slow Start Min TCP CWND after T-O 298 +----------------------------------------------------------- 299 Time > > > > > > > > > > > > > > > 301 Note : ssthresh = Slow Start threshold. 303 A well tuned and managed IP network with appropriate TCP adjustments 304 in it's IP hosts and applications should perform very close to TCP 305 Equilibrium and to the BB (Bottleneck Bandwidth). 307 This TCP methodology provides guidelines to measure the maximum 308 achievable TCP throughput or maximum TCP sustained rate obtained 309 after TCP CWND has stabilized to an optimal value. All maximum 310 achievable TCP throughputs specified in section 3 are with respect to 311 this condition. 313 It is important to clarify the interaction between the sender's Send 314 Socket Buffer and the receiver's advertised TCP Receive Window. TCP 315 test programs such as iperf, ttcp, etc. allow the sender to control 316 the quantity of TCP Bytes transmitted and unacknowledged (in-flight), 317 commonly referred to as the Send Socket Buffer. This is done 318 independently of the TCP Receive Window size advertised by the 319 receiver. Implications to the capabilities of the Throughput Test 320 Device (TTD) are covered at the end of section 3. 322 3. TCP Throughput Testing Methodology 324 As stated earlier in section 1, it is considered best practice to 325 verify the integrity of the network by conducting Layer2/3 tests such 326 as [RFC2544] or other methods of network stress tests. Although, it 327 is important to mention here that RFC 2544 was never meant to be used 328 outside a lab environment. 330 If the network is not performing properly in terms of packet loss, 331 jitter, etc. then the TCP layer testing will not be meaningful. A 332 dysfunctional network will not acheive optimal TCP throughputs in 333 regards with the available bandwidth. 335 TCP Throughput testing may require cooperation between the end-user 336 customer and the network provider. In a Layer 2/3 VPN architecture, 337 the testing should be conducted either on the CPE or on the CE device 338 and not on the PE (Provider Edge) router. 340 The following represents the sequential order of steps for this 341 testing methodology: 343 1. Identify the Path MTU. Packetization Layer Path MTU Discovery 344 or PLPMTUD, [RFC4821], MUST be conducted to verify the network path 345 MTU. Conducting PLPMTUD establishes the upper limit for the MSS to 346 be used in subsequent steps. 348 2. Baseline Round Trip Time and Bandwidth. This step establishes the 349 inherent, non-congested Round Trip Time (RTT) and the bottleneck 350 bandwidth of the end-to-end network path. These measurements are 351 used to provide estimates of the ideal TCP Receive Window and Send 352 Socket Buffer sizes that SHOULD be used in subsequent test steps. 353 These measurements reference [RFC2681] and [RFC4898] to measure RTD 354 and the associated RTT. 356 3. TCP Connection Throughput Tests. With baseline measurements 357 of Round Trip Time and bottleneck bandwidth, single and multiple TCP 358 connection throughput tests SHOULD be conducted to baseline network 359 performance expectations. 361 4. Traffic Management Tests. Various traffic management and queuing 362 techniques can be tested in this step, using multiple TCP 363 connections. Multiple connections testing should verify that the 364 network is configured properly for traffic shaping versus policing, 365 various queuing implementations and Random Early Discards (RED). 367 Important to note are some of the key characteristics and 368 considerations for the TCP test instrument. The test host may be a 369 standard computer or a dedicated communications test instrument. 370 In both cases, they must be capable of emulating both client and 371 server. 373 The following criteria should be considered when selecting whether 374 the TCP test host can be a standard computer or has to be a dedicated 375 communications test instrument: 377 - TCP implementation used by the test host, OS version, i.e. Linux OS 378 kernel using TCP New Reno, TCP options supported, etc. These will 379 obviously be more important when using dedicated communications test 380 instruments where the TCP implementation may be customized or tuned 381 to run in higher performance hardware. When a compliant TCP TTD is 382 used, the TCP implementation MUST be identified in the test results. 383 The compliant TCP TTD should be usable for complete end-to-end 384 testing through network security elements and should also be usable 385 for testing network sections. 387 - More important, the TCP test host MUST be capable to generate 388 and receive stateful TCP test traffic at the full link speed of the 389 network under test. Stateful TCP test traffic means that the test 390 host MUST fully implement a TCP/IP stack; this is generally a comment 391 aimed at dedicated communications test equipments which sometimes 392 "blast" packets with TCP headers. As a general rule of thumb, testing 393 TCP throughput at rates greater than 100 Mbit/sec MAY require high 394 performance server hardware or dedicated hardware based test tools. 396 - A compliant TCP Throughput Test Device MUST allow adjusting both 397 Send and Receive Socket Buffer sizes. The Receive Socket Buffer MUST 398 be large enough to accommodate the TCP Receive Window Size. Note that 399 some TCP/IP stack implementations are using Receive Window 400 Auto-Tuning and cannot be adjusted until this feature is disabled. 402 - Measuring RTT and retransmissions per connection will generally 403 require a dedicated communications test instrument. In the absence of 404 dedicated hardware based test tools, these measurements may need to 405 be conducted with packet capture tools, i.e. conduct TCP throughput 406 tests and analyze RTT and retransmission results in packet captures. 407 Another option may be to use "TCP Extended Statistics MIB" per 408 [RFC4898]. 410 - The RFC4821 PLPMTUD test SHOULD be conducted with a dedicated 411 tester which exposes the ability to run the PLPMTUD algorithm 412 independent from the OS stack. 414 3.1. Determine Network Path MTU 416 TCP implementations should use Path MTU Discovery techniques (PMTUD). 417 PMTUD relies on ICMP 'need to frag' messages to learn the path MTU. 418 When a device has a packet to send which has the Don't Fragment (DF) 419 bit in the IP header set and the packet is larger than the Maximum 420 Transmission Unit (MTU) of the next hop, the packet is dropped and 421 the device sends an ICMP 'need to frag' message back to the host that 422 originated the packet. The ICMP 'need to frag' message includes 423 the next hop MTU which PMTUD uses to tune the TCP Maximum Segment 424 Size (MSS). Unfortunately, because many network managers completely 425 disable ICMP, this technique does not always prove reliable. 427 Packetization Layer Path MTU Discovery or PLPMTUD [RFC4821] MUST then 428 be conducted to verify the network path MTU. PLPMTUD can be used 429 with or without ICMP. The following sections provide a summary of the 430 PLPMTUD approach and an example using TCP. [RFC4821] specifies a 431 search_high and a search_low parameter for the MTU. As specified in 432 [RFC4821], 1024 Bytes is a safe value for search_low in modern 433 networks. 435 It is important to determine the links overhead along the IP path, 436 and then to select a TCP MSS size corresponding to the Layer 3 MTU. 437 For example, if the MTU is 1024 Bytes and the TCP/IP headers are 40 438 Bytes, then the MSS would be set to 984 Bytes. 440 An example scenario is a network where the actual path MTU is 1240 441 Bytes. The TCP client probe MUST be capable of setting the MSS for 442 the probe packets and could start at MSS = 984 (which corresponds 443 to an MTU size of 1024 Bytes). 445 The TCP client probe would open a TCP connection and advertise the 446 MSS as 984. Note that the client probe MUST generate these packets 447 with the DF bit set. The TCP client probe then sends test traffic 448 per a small default Send Socket Buffer size of ~8KBytes. It should 449 be kept small to minimize the possibility of congesting the network, 450 which may induce packet loss. The duration of the test should also 451 be short (10-30 seconds), again to minimize congestive effects 452 during the test. 454 In the example of a 1240 Bytes path MTU, probing with an MSS equal to 455 984 would yield a successful probe and the test client packets would 456 be successfully transferred to the test server. 458 Also note that the test client MUST verify that the MSS advertised 459 is indeed negotiated. Network devices with built-in Layer 4 460 capabilities can intercede during the connection establishment and 461 reduce the advertised MSS to avoid fragmentation. This is certainly 462 a desirable feature from a network perspective, but it can yield 463 erroneous test results if the client test probe does not confirm the 464 negotiated MSS. 466 The next test probe would use the search_high value and this would 467 be set to MSS = 1460 to correspond to a 1500 Bytes MTU. In this 468 example, the test client will retransmit based upon time-outs, since 469 no ACKs will be received from the test server. This test probe is 470 marked as a conclusive failure if none of the test packets are 471 ACK'ed. If any of the test packets are ACK'ed, congestive network 472 may be the cause and the test probe is not conclusive. Re-testing 473 at other times of the day is recommended to further isolate. 475 The test is repeated until the desired granularity of the MTU is 476 discovered. The method can yield precise results at the expense of 477 probing time. One approach may be to reduce the probe size to 478 half between the unsuccessful search_high and successful search_low 479 value and raise it by half also when seeking the upper limit. 481 3.2. Baseline Round Trip Time and Bandwidth 483 Before stateful TCP testing can begin, it is important to determine 484 the baseline Round Trip Time (non-congested inherent delay) and 485 bottleneck bandwidth of the end-to-end network to be tested. These 486 measurements are used to provide estimates of the ideal TCP Receive 487 Window and Send Socket Buffer sizes that SHOULD be used in subsequent 488 test steps. 490 3.2.1 Techniques to Measure Round Trip Time 492 Following the definitions used in section 1.1, Round Trip Time (RTT) 493 is the elapsed time between the clocking in of the first bit of a 494 payload sent packet to the receipt of the last bit of the 495 corresponding Acknowledgment. Round Trip Delay (RTD) is used 496 synonymously to twice the Link Latency. RTT measurements SHOULD use 497 techniques defined in [RFC2681] or statistics available from MIBs 498 defined in [RFC4898]. 500 The RTT SHOULD be baselined during "off-peak" hours to obtain a 501 reliable figure for inherent network latency versus additional delay 502 caused by network buffering. When sampling values of RTT over a test 503 interval, the minimum value measured SHOULD be used as the baseline 504 RTT since this will most closely estimate the inherent network 505 latency. This inherent RTT is also used to determine the Buffer 506 Delay Percentage metric which is defined in Section 3.3.2 507 The following list is not meant to be exhaustive, although it 508 summarizes some of the most common ways to determine round trip time. 509 The desired resolution of the measurement (i.e. msec versus usec) may 510 dictate whether the RTT measurement can be achieved with ICMP pings 511 or by a dedicated communications test instrument with precision 512 timers. 514 The objective in this section is to list several techniques 515 in order of decreasing accuracy. 517 - Use test equipment on each end of the network, "looping" the 518 far-end tester so that a packet stream can be measured back and forth 519 from end-to-end. This RTT measurement may be compatible with delay 520 measurement protocols specified in [RFC5357]. 522 - Conduct packet captures of TCP test sessions using "iperf" or FTP, 523 or other TCP test applications. By running multiple experiments, 524 packet captures can then be analyzed to estimate RTT. It is 525 important to note that results based upon the SYN -> SYN-ACK at the 526 beginning of TCP sessions should be avoided since Firewalls might 527 slow down 3 way handshakes. 529 - ICMP pings may also be adequate to provide round trip time 530 estimates, provided that the packet size is factored into the 531 estimates (i.e. pings with different packet sizes might be required). 532 Some limitations with ICMP Ping may include msec resolution and 533 whether the network elements are responding to pings or not. Also, 534 ICMP is often rate-limited and segregated into different buffer 535 queues and is not as reliable and accurate as in-band measurements. 537 3.2.2 Techniques to Measure end-to-end Bandwidth 539 There are many well established techniques available to provide 540 estimated measures of bandwidth over a network. These measurements 541 SHOULD be conducted in both directions of the network, especially for 542 access networks, which may be asymmetrical. Measurements SHOULD use 543 network capacity techniques defined in [RFC5136]. 545 Before any TCP Throughput test can be done, a bandwidth measurement 546 test MUST be run with stateless IP streams(not stateful TCP) in order 547 to determine the available bandwidths in each direction. This test 548 should obviously be performed at various intervals throughout a 549 business day or even across a week. Ideally, the bandwidth test 550 should produce logged outputs of the achieved bandwidths across the 551 test interval. 553 3.3. TCP Throughput Tests 555 This methodology specifically defines TCP throughput techniques to 556 verify sustained TCP performance in a managed business IP network, as 557 defined in section 2.1. This section and others will define the 558 method to conduct these sustained TCP throughput tests and guidelines 559 for the predicted results. 561 With baseline measurements of round trip time and bandwidth 562 from section 3.2, a series of single and multiple TCP connection 563 throughput tests SHOULD be conducted to baseline network performance 564 against expectations. The number of trials and the type of testing 565 (single versus multiple connections) will vary according to the 566 intention of the test. One example would be a single connection test 567 in which the throughput achieved by large Send Socket Buffer and TCP 568 Receive Window sizes (i.e. 256KB) is to be measured. It would be 569 advisable to test performance at various times of the business day. 571 It is RECOMMENDED to run the tests in each direction independently 572 first, then run both directions simultaneously. In each case, 573 TCP Transfer Time, TCP Efficiency, and Buffer Delay Percentage MUST 574 be measured in each direction. These metrics are defined in 3.3.2. 576 3.3.1 Calculate Ideal TCP Receive Window Size 578 The ideal TCP Receive Window size can be calculated from the 579 bandwidth delay product (BDP), which is: 581 BDP (bits) = RTT (sec) x Bandwidth (bps) 583 Note that the RTT is being used as the "Delay" variable in the 584 BDP calculations. 586 Then, by dividing the BDP by 8, we obtain the "ideal" TCP Receive 587 Window size in Bytes. For optimal results, the Send Socket Buffer 588 size must be adjusted to the same value at the opposite end of the 589 network path. 591 Ideal TCP RWIN = BDP / 8 593 An example would be a T3 link with 25 msec RTT. The BDP would equal 594 ~1,105,000 bits and the ideal TCP Receive Window would be ~138 595 KBytes. 597 Note that separate calculations are required on asymetrical paths. 598 An asymetrical path example would be a 90 msec RTT ADSL line with 599 5Mbps downstream and 640Kbps upstream. The downstream BDP would equal 600 ~450,000 bits while the upstream one would be only ~57,600 bits. 602 The following table provides some representative network Link Speeds, 603 RTT, BDP, and their associated Ideal TCP Receive Window sizes. 605 Table 3.3.1: Link Speed, RTT and calculated BDP & TCP Receive Window 607 Link Ideal TCP 608 Speed* RTT BDP Receive Window 609 (Mbps) (ms) (bits) (KBytes) 610 --------------------------------------------------------------------- 611 1.536 20 30,720 3.84 612 1.536 50 76,800 9.60 613 1.536 100 153,600 19.20 614 44.210 10 442,100 55.26 615 44.210 15 663,150 82.89 616 44.210 25 1,105,250 138.16 617 100 1 100,000 12.50 618 100 2 200,000 25.00 619 100 5 500,000 62.50 620 1,000 0.1 100,000 12.50 621 1,000 0.5 500,000 62.50 622 1,000 1 1,000,000 125.00 623 10,000 0.05 500,000 62.50 624 10,000 0.3 3,000,000 375.00 626 * Note that link speed is the bottleneck bandwidth for the NUT 628 The following serial link speeds are used: 629 - T1 = 1.536 Mbits/sec (for a B8ZS line encoding facility) 630 - T3 = 44.21 Mbits/sec (for a C-Bit Framing facility) 632 The above table illustrates the ideal TCP Receive Window size. 633 If a smaller TCP Receive Window is used, then the TCP Throughput 634 is not optimal. To calculate the TCP Throughput, the following 635 formula is used: TCP Throughput = TCP RWIN X 8 / RTT 637 An example could be a 100 Mbps IP path with 5 ms RTT and a TCP 638 Receive Window size of 16KB, then: 640 TCP Throughput = 16 KBytes X 8 bits / 5 ms. 641 TCP Throughput = 128,000 bits / 0.005 sec. 642 TCP Throughput = 25.6 Mbps. 644 Another example for a T3 using the same calculation formula is 645 illustrated on the next page: 646 TCP Throughput = TCP RWIN X 8 / RTT. 647 TCP Throughput = 16 KBytes X 8 bits / 10 ms. 648 TCP Throughput = 128,000 bits / 0.01 sec. 649 TCP Throughput = 12.8 Mbps. 651 When the TCP Receive Window size exceeds the BDP (i.e. T3 link, 652 64 KBytes TCP Receive Window on a 10 ms RTT path), the maximum frames 653 per second limit of 3664 is reached and the calculation formula is: 655 TCP Throughput = Max FPS X MSS X 8. 656 TCP Throughput = 3664 FPS X 1460 Bytes X 8 bits. 657 TCP Throughput = 42.8 Mbps 658 The following diagram compares achievable TCP throughputs on a T3 659 with Send Socket Buffer & TCP Receive Window sizes of 16KB vs. 64KB. 661 45| 662 | _______42.8M 663 40| |64KB | 664 TCP | | | 665 Throughput 35| | | 666 in Mbps | | | +-----+34.1M 667 30| | | |64KB | 668 | | | | | 669 25| | | | | 670 | | | | | 671 20| | | | | _______20.5M 672 | | | | | |64KB | 673 15| | | | | | | 674 |12.8M+-----| | | | | | 675 10| |16KB | | | | | | 676 | | | |8.5M+-----| | | | 677 5| | | | |16KB | |5.1M+-----| | 678 |_____|_____|_____|____|_____|_____|____|16KB |_____|_____ 679 10 15 25 680 RTT in milliseconds 682 The following diagram shows the achievable TCP throughput on a 25ms 683 T3 when Send Socket Buffer & TCP Receive Window sizes are increased. 685 45| 686 | 687 40| +-----+40.9M 688 TCP | | | 689 Throughput 35| | | 690 in Mbps | | | 691 30| | | 692 | | | 693 25| | | 694 | | | 695 20| +-----+20.5M | | 696 | | | | | 697 15| | | | | 698 | | | | | 699 10| +-----+10.2M | | | | 700 | | | | | | | 701 5| +-----+5.1M | | | | | | 702 |_____|_____|______|_____|______|_____|_______|_____|_____ 703 16 32 64 128* 704 TCP Receive Window size in KBytes 706 * Note that 128KB requires [RFC1323] TCP Window scaling option. 708 Note that some TCP/IP stack implementations are using Receive Window 709 Auto-Tuning and cannot be adjusted until the feature is disabled. 711 3.3.2 Metrics for TCP Throughput Tests 713 This framework focuses on a TCP throughput methodology and also 714 provides several basic metrics to compare results of various 715 throughput tests. It is recognized that the complexity and 716 unpredictability of TCP makes it impossible to develop a complete 717 set of metrics that accounts for the myriad of variables (i.e. RTT 718 variation, loss conditions, TCP implementation, etc.). However, 719 these basic metrics will facilitate TCP throughput comparisons 720 under varying network conditions and between network traffic 721 management techniques. 723 The first metric is the TCP Transfer Time, which is simply the 724 measured time it takes to transfer a block of data across 725 simultaneous TCP connections. This concept is useful when 726 benchmarking traffic management techniques and where multiple 727 TCP connections are required. 729 TCP Transfer time may also be used to provide a normalized ratio of 730 the actual TCP Transfer Time versus the Ideal Transfer Time. This 731 ratio is called the TCP Transfer Index and is defined as: 733 Actual TCP Transfer Time 734 ------------------------- 735 Ideal TCP Transfer Time 737 The Ideal TCP Transfer time is derived from the network path 738 bottleneck bandwidth and various Layer 1/2/3/4 overheads associated 739 with the network path. Additionally, both the TCP Receive Window and 740 the Send Socket Buffer sizes must be tuned to equal the bandwidth 741 delay product (BDP) as described in section 3.3.1. 743 The following table illustrates the Ideal TCP Transfer time of a 744 single TCP connection when its TCP Receive Window and Send Socket 745 Buffer sizes are equal to the BDP. 747 Table 3.3.2: Link Speed, RTT, BDP, TCP Throughput, and 748 Ideal TCP Transfer time for a 100 MB File 750 Link Maximum Ideal TCP 751 Speed BDP Achievable TCP Transfer time 752 (Mbps) RTT (ms) (KBytes) Throughput(Mbps) (seconds) 753 -------------------------------------------------------------------- 754 1.536 50 9.6 1.4 571 755 44.21 25 138.2 42.8 18 756 100 2 25.0 94.9 9 757 1,000 1 125.0 949.2 1 758 10,000 0.05 62.5 9,492 0.1 760 Transfer times are rounded for simplicity. 762 For a 100MB file(100 x 8 = 800 Mbits), the Ideal TCP Transfer Time 763 is derived as follows: 765 800 Mbits 766 Ideal TCP Transfer Time = ----------------------------------- 767 Maximum Achievable TCP Throughput 769 The maximum achievable layer 2 throughput on T1 and T3 Interfaces 770 is based on the maximum frames per second (FPS) permitted by the 771 actual layer 1 speed when the MTU is 1500 Bytes. 773 The maximum FPS for a T1 is 127 and the calculation formula is: 774 FPS = T1 Link Speed / ((MTU + PPP + Flags + CRC16) X 8) 775 FPS = (1.536M /((1500 Bytes + 4 Bytes + 2 Bytes + 2 Bytes) X 8 ))) 776 FPS = (1.536M / (1508 Bytes X 8)) 777 FPS = 1.536 Mbps / 12064 bits 778 FPS = 127 780 The maximum FPS for a T3 is 3664 and the calculation formula is: 781 FPS = T3 Link Speed / ((MTU + PPP + Flags + CRC16) X 8) 782 FPS = (44.21M /((1500 Bytes + 4 Bytes + 2 Bytes + 2 Bytes) X 8 ))) 783 FPS = (44.21M / (1508 Bytes X 8)) 784 FPS = 44.21 Mbps / 12064 bits 785 FPS = 3664 787 The 1508 equates to: 789 MTU + PPP + Flags + CRC16 791 Where MTU is 1500 Bytes, PPP is 4 Bytes, Flags are 2 Bytes and CRC16 792 is 2 Bytes. 794 Then, to obtain the Maximum Achievable TCP Throughput (layer 4), we 795 simply use: MSS in Bytes X 8 bits X max FPS. 796 For a T3, the maximum TCP Throughput = 1460 Bytes X 8 bits X 3664 FPS 797 Maximum TCP Throughput = 11680 bits X 3664 FPS 798 Maximum TCP Throughput = 42.8 Mbps. 800 The maximum achievable layer 2 throughput on Ethernet Interfaces is 801 based on the maximum frames per second permitted by the IEEE802.3 802 standard when the MTU is 1500 Bytes. 804 The maximum FPS for 100M Ethernet is 8127 and the calculation is: 805 FPS = (100Mbps /(1538 Bytes X 8 bits)) 807 The maximum FPS for GigE is 81274 and the calculation formula is: 808 FPS = (1Gbps /(1538 Bytes X 8 bits)) 810 The maximum FPS for 10GigE is 812743 and the calculation formula is: 811 FPS = (10Gbps /(1538 Bytes X 8 bits)) 812 The 1538 equates to: 814 MTU + Eth + CRC32 + IFG + Preamble + SFD 816 Where MTU is 1500 Bytes, Ethernet is 14 Bytes, CRC32 is 4 Bytes, 817 IFG is 12 Bytes, Preamble is 7 Bytes and SFD is 1 Byte. 819 Note that better results could be obtained with jumbo frames on 820 GigE and 10 GigE. 822 Then, to obtain the Maximum Achievable TCP Throughput (layer 4), we 823 simply use: MSS in Bytes X 8 bits X max FPS. 824 For a 100M, the maximum TCP Throughput = 1460 B X 8 bits X 8127 FPS 825 Maximum TCP Throughput = 11680 bits X 8127 FPS 826 Maximum TCP Throughput = 94.9 Mbps. 828 To illustrate the TCP Transfer Time Index, an example would be the 829 bulk transfer of 100 MB over 5 simultaneous TCP connections (each 830 connection uploading 100 MB). In this example, the Ethernet service 831 provides a Committed Access Rate (CAR) of 500 Mbit/s. Each 832 connection may achieve different throughputs during a test and the 833 overall throughput rate is not always easy to determine (especially 834 as the number of connections increases). 836 The ideal TCP Transfer Time would be ~8 seconds, but in this example, 837 the actual TCP Transfer Time was 12 seconds. The TCP Transfer Index 838 would then be 12/8 = 1.5, which indicates that the transfer across 839 all connections took 1.5 times longer than the ideal. 841 The second metric is TCP Efficiency, which is the percentage of Bytes 842 that were not retransmitted and is defined as: 844 Transmitted Bytes - Retransmitted Bytes 845 --------------------------------------- x 100 846 Transmitted Bytes 848 Transmitted Bytes are the total number of TCP payload Bytes to be 849 transmitted which includes the original and retransmitted Bytes. This 850 metric provides a comparative measure between various QoS mechanisms 851 like traffic management or congestion avoidance. Various TCP 852 implementations like Reno, Vegas, etc. could also be compared. 854 As an example, if 100,000 Bytes were sent and 2,000 had to be 855 retransmitted, the TCP Efficiency should be calculated as: 857 102,000 - 2,000 858 ---------------- x 100 = 98.03% 859 102,000 861 Note that the retransmitted Bytes may have occurred more than once, 862 and these multiple retransmissions are added to the Retransmitted 863 Bytes count (and the Transmitted Bytes count). 865 The third metric is the Buffer Delay Percentage, which represents the 866 increase in RTT during a TCP throughput test with respect to 867 inherent or baseline network RTT. The baseline RTT is the round-trip 868 time inherent to the network path under non-congested conditions. 869 (See 3.2.1 for details concerning the baseline RTT measurements). 871 The Buffer Delay Percentage is defined as: 873 Average RTT during Transfer - Baseline RTT 874 ------------------------------------------ x 100 875 Baseline RTT 877 As an example, the baseline RTT for the network path is 25 msec. 878 During the course of a TCP transfer, the average RTT across the 879 entire transfer increased to 32 msec. In this example, the Buffer 880 Delay Percentage would be calculated as: 882 32 - 25 883 ------- x 100 = 28% 884 25 886 Note that the TCP Transfer Time, TCP Efficiency, and Buffer Delay 887 Percentage MUST be measured during each throughput test. Poor TCP 888 Transfer Time Indexes (TCP Transfer Time greater than Ideal TCP 889 Transfer Times) may be diagnosed by correlating with sub-optimal TCP 890 Efficiency and/or Buffer Delay Percentage metrics. 892 3.3.3 Conducting the TCP Throughput Tests 894 Several TCP tools are currently used in the network world and one of 895 the most common is "iperf". With this tool, hosts are installed at 896 each end of the network path; one acts as client and the other as 897 a server. The Send Socket Buffer and the TCP Receive Window sizes 898 of both client and server can be manually set. The achieved 899 throughput can then be measured, either uni-directionally or 900 bi-directionally. For higher BDP situations in lossy networks 901 (long fat networks or satellite links, etc.), TCP options such as 902 Selective Acknowledgment SHOULD be considered and become part of 903 the window size / throughput characterization. 905 Note that some TCP/IP stack implementations are using Receive Window 906 Auto-Tuning and cannot be adjusted until this feature is disabled. 908 Host hardware performance must be well understood before conducting 909 the tests described in the following sections. A dedicated 910 communications test instrument will generally be required, especially 911 for line rates of GigE and 10 GigE. A compliant TCP TTD SHOULD 912 provide a warning message when the expected test throughput will 913 exceed 10% of the network bandwidth capacity. If the throughput test 914 is expected to exceed 10% of the provider bandwidth, then the test 915 should be coordinated with the network provider. This does not 916 include the customer premise bandwidth, the 10% refers directly to 917 the provider's bandwidth (Provider Edge to Provider router). 919 The TCP throughput test should be run over a long enough duration 920 to properly exercise network buffers (greater than 30 seconds) and 921 also characterize performance at different time periods of the day. 923 3.3.4 Single vs. Multiple TCP Connection Testing 925 The decision whether to conduct single or multiple TCP connection 926 tests depends upon the size of the BDP in relation to the configured 927 TCP Receive Window sizes configured in the end-user environment. 928 For example, if the BDP for a long fat network turns out to be 2MB, 929 then it is probably more realistic to test this network path with 930 multiple connections. Assuming typical host computer TCP Receive 931 Window Sizes of 64 KB, using 32 TCP connections would realistically 932 test this path. 934 The following table is provided to illustrate the relationship 935 between the TCP Receive Window size and the number of TCP connections 936 required to utilize the available capacity of a given BDP. For this 937 example, the network bandwidth is 500 Mbps and the RTT is 5 ms, then 938 the BDP equates to 312.5 KBytes. 940 TCP Number of TCP Connections 941 Window to fill available bandwidth 942 ------------------------------------- 943 16KB 20 944 32KB 10 945 64KB 5 946 128KB 3 948 Note that some TCP/IP stack implementations are using Receive Window 949 Auto-Tuning and cannot be adjusted until this feature is disabled. 951 The TCP Transfer Time metric is useful for conducting multiple 952 connection tests. Each connection should be configured to transfer 953 payloads of the same size (i.e. 100 MB), and the TCP Transfer time 954 should provide a simple metric to verify the actual versus expected 955 results. 957 Note that the TCP transfer time is the time for all connections to 958 complete the transfer of the configured payload size. From the 959 previous table, the 64KB window is considered. Each of the 5 960 TCP connections would be configured to transfer 100MB, and each one 961 should obtain a maximum of 100 Mb/sec. So for this example, the 962 100MB payload should be transferred across the connections in 963 approximately 8 seconds (which would be the ideal TCP transfer time 964 under these conditions). 966 Additionally, the TCP Efficiency metric MUST be computed for each 967 connection tested as defined in section 3.3.2. 969 3.3.5 Interpretation of the TCP Throughput Results 971 At the end of this step, the user will document the theoretical BDP 972 and a set of Window size experiments with measured TCP throughput for 973 each TCP window size. For cases where the sustained TCP throughput 974 does not equal the ideal value, some possible causes are: 976 - Network congestion causing packet loss which MAY be inferred from 977 a poor TCP Efficiency % (higher TCP Efficiency % = less packet 978 loss) 979 - Network congestion causing an increase in RTT which MAY be inferred 980 from the Buffer Delay Percentage (i.e., 0% = no increase in RTT 981 over baseline) 982 - Intermediate network devices which actively regenerate the TCP 983 connection and can alter TCP Receive Window size, MSS, etc. 984 - Rate limiting (policing). More details on traffic management 985 tests follows in section 3.4 987 3.4. Traffic Management Tests 989 In most cases, the network connection between two geographic 990 locations (branch offices, etc.) is lower than the network connection 991 to host computers. An example would be LAN connectivity of GigE 992 and WAN connectivity of 100 Mbps. The WAN connectivity may be 993 physically 100 Mbps or logically 100 Mbps (over a GigE WAN 994 connection). In the later case, rate limiting is used to provide the 995 WAN bandwidth per the SLA. 997 Traffic management techniques are employed to provide various forms 998 of QoS, the more common include: 1000 - Traffic Shaping 1001 - Priority queuing 1002 - Random Early Discard (RED) 1004 Configuring the end-to-end network with these various traffic 1005 management mechanisms is a complex under-taking. For traffic shaping 1006 and RED techniques, the end goal is to provide better performance to 1007 bursty traffic such as TCP,(RED is specifically intended for TCP). 1009 This section of the methodology provides guidelines to test traffic 1010 shaping and RED implementations. As in section 3.3, host hardware 1011 performance must be well understood before conducting the traffic 1012 shaping and RED tests. Dedicated communications test instrument will 1013 generally be REQUIRED for line rates of GigE and 10 GigE. If the 1014 throughput test is expected to exceed 10% of the provider bandwidth, 1015 then the test should be coordinated with the network provider. This 1016 does not include the customer premises bandwidth, the 10% refers to 1017 the provider's bandwidth (Provider Edge to Provider router). Note 1018 that GigE and 10 GigE interfaces might benefit from hold-queue 1019 adjustments in order to prevent the saw-tooth TCP traffic pattern. 1021 3.4.1 Traffic Shaping Tests 1023 For services where the available bandwidth is rate limited, two (2) 1024 techniques can be used: traffic policing or traffic shaping. 1026 Simply stated, traffic policing marks and/or drops packets which 1027 exceed the SLA bandwidth (in most cases, excess traffic is dropped). 1028 Traffic shaping employs the use of queues to smooth the bursty 1029 traffic and then send out within the SLA bandwidth limit (without 1030 dropping packets unless the traffic shaping queue is exhausted). 1032 Traffic shaping is generally configured for TCP data services and 1033 can provide improved TCP performance since the retransmissions are 1034 reduced, which in turn optimizes TCP throughput for the available 1035 bandwidth. Through this section, the rate-limited bandwidth shall 1036 be referred to as the "bottleneck bandwidth". 1038 The ability to detect proper traffic shaping is more easily diagnosed 1039 when conducting a multiple TCP connections test. Proper shaping will 1040 provide a fair distribution of the available bottleneck bandwidth, 1041 while traffic policing will not. 1043 The traffic shaping tests are built upon the concepts of multiple 1044 connections testing as defined in section 3.3.3. Calculating the BDP 1045 for the bottleneck bandwidth is first required before selecting the 1046 number of connections, the Send Socket Buffer and TCP Receive Window 1047 sizes per connection. 1049 Similar to the example in section 3.3, a typical test scenario might 1050 be: GigE LAN with a 100Mbps bottleneck bandwidth (rate limited 1051 logical interface), and 5 msec RTT. This would require five (5) TCP 1052 connections of 64 KB Send Socket Buffer and TCP Receive Window sizes 1053 to evenly fill the bottleneck bandwidth (~100 Mbps per connection). 1055 The traffic shaping test should be run over a long enough duration to 1056 properly exercise network buffers (greater than 30 seconds) and also 1057 characterize performance during different time periods of the day. 1058 The throughput of each connection MUST be logged during the entire 1059 test, along with the TCP Transfer Time, TCP Efficiency, and 1060 Buffer Delay Percentage. 1062 3.4.1.1 Interpretation of Traffic Shaping Test Results 1064 By plotting the throughput achieved by each TCP connection, the fair 1065 sharing of the bandwidth is generally very obvious when traffic 1066 shaping is properly configured for the bottleneck interface. For the 1067 previous example of 5 connections sharing 500 Mbps, each connection 1068 would consume ~100 Mbps with a smooth variation. 1070 If traffic policing was present on the bottleneck interface, the 1071 bandwidth sharing may not be fair and the resulting throughput plot 1072 may reveal "spikey" throughput consumption of the competing TCP 1073 connections (due to the TCP retransmissions). 1075 3.4.2 RED Tests 1077 Random Early Discard techniques are specifically targeted to provide 1078 congestion avoidance for TCP traffic. Before the network element 1079 queue "fills" and enters the tail drop state, RED drops packets at 1080 configurable queue depth thresholds. This action causes TCP 1081 connections to back-off which helps to prevent tail drop, which in 1082 turn helps to prevent global TCP synchronization. 1084 Again, rate limited interfaces may benefit greatly from RED based 1085 techniques. Without RED, TCP may not be able to achieve the full 1086 bottleneck bandwidth. With RED enabled, TCP congestion avoidance 1087 throttles the connections on the higher speed interface (i.e. LAN) 1088 and can help achieve the full bottleneck bandwidth. The burstiness 1089 of TCP traffic is a key factor in the overall effectiveness of RED 1090 techniques; steady state bulk transfer flows will generally not 1091 benefit from RED. With bulk transfer flows, network device queues 1092 gracefully throttle the effective throughput rates due to increased 1093 delays. 1095 The ability to detect proper RED configuration is more easily 1096 diagnosed when conducting a multiple TCP connections test. Multiple 1097 TCP connections provide the bursty sources that emulate the 1098 real-world conditions for which RED was intended. 1100 The RED tests also builds upon the concepts of multiple connections 1101 testing as defined in section 3.3.3. Calculating the BDP for the 1102 bottleneck bandwidth is first required before selecting the number 1103 of connections, the Send Socket Buffer size and the TCP Receive 1104 Window size per connection. 1106 For RED testing, the desired effect is to cause the TCP connections 1107 to burst beyond the bottleneck bandwidth so that queue drops will 1108 occur. Using the same example from section 3.4.1 (traffic shaping), 1109 the 500 Mbps bottleneck bandwidth requires 5 TCP connections (with 1110 window size of 64KB) to fill the capacity. Some experimentation is 1111 required, but it is recommended to start with double the number of 1112 connections to stress the network element buffers / queues (10 1113 connections for this example). 1115 The TCP TTD must be configured to generate these connections as 1116 shorter (bursty) flows versus bulk transfer type flows. These TCP 1117 bursts should stress queue sizes in the 512KB range. Again 1118 experimentation will be required; the proper number of TCP 1119 connections, the Send Socket Buffer and TCP Receive Window sizes will 1120 be dictated by the size of the network element queue. 1122 3.4.2.1 Interpretation of RED Results 1124 The default queuing technique for most network devices is FIFO based. 1125 Without RED, the FIFO based queue may cause excessive loss to all of 1126 the TCP connections and in the worst case global TCP synchronization. 1128 By plotting the aggregate throughput achieved on the bottleneck 1129 interface, proper RED operation may be determined if the bottleneck 1130 bandwidth is fully utilized. For the previous example of 10 1131 connections (window = 64 KB) sharing 500 Mbps, each connection should 1132 consume ~50 Mbps. If RED was not properly enabled on the interface, 1133 then the TCP connections will retransmit at a higher rate and the 1134 net effect is that the bottleneck bandwidth is not fully utilized. 1136 Another means to study non-RED versus RED implementation is to use 1137 the TCP Transfer Time metric for all of the connections. In this 1138 example, a 100 MB payload transfer should take ideally 16 seconds 1139 across all 10 connections (with RED enabled). With RED not enabled, 1140 the throughput across the bottleneck bandwidth may be greatly 1141 reduced (generally 10-20%) and the actual TCP Transfer time may be 1142 proportionally longer then the Ideal TCP Transfer time. 1144 Additionally, non-RED implementations may exhibit a lower TCP 1145 Transfer Efficiency. 1147 4. Security Considerations 1149 The security considerations that apply to any active measurement of 1150 live networks are relevant here as well. See [RFC4656] and 1151 [RFC5357]. 1153 5. IANA Considerations 1155 This document does not REQUIRE an IANA registration for ports 1156 dedicated to the TCP testing described in this document. 1158 6. Acknowledgments 1160 Thanks to Lars Eggert, Al Morton, Matt Mathis, Matt Zekauskas, 1161 Yaakov Stein, and Loki Jorgenson for many good comments and for 1162 pointing us to great sources of information pertaining to past works 1163 in the TCP capacity area. 1165 7. References 1167 7.1 Normative References 1169 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1170 Requirement Levels", BCP 14, RFC 2119, March 1997. 1172 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 1173 Zekauskas, "A One-way Active Measurement Protocol 1174 (OWAMP)", RFC 4656, September 2006. 1176 [RFC2544] Bradner, S., McQuaid, J., "Benchmarking Methodology for 1177 Network Interconnect Devices", RFC 2544, June 1999 1179 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz, 1180 J., "A Two-Way Active Measurement Protocol (TWAMP)", 1181 RFC 5357, October 2008 1183 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 1184 Discovery", RFC 4821, June 2007 1186 draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk 1187 Transfer Capacity Methodology for Cooperating Hosts", 1188 August 2001 1190 [RFC2681] Almes G., Kalidindi S., Zekauskas, M., "A Round-trip Delay 1191 Metric for IPPM", RFC 2681, September, 1999 1193 [RFC4898] Mathis, M., Heffner, J., Raghunarayan, R., "TCP Extended 1194 Statistics MIB", May 2007 1196 [RFC5136] Chimento P., Ishac, J., "Defining Network Capacity", 1197 February 2008 1199 [RFC1323] Jacobson, V., Braden, R., Borman D., "TCP Extensions for 1200 High Performance", May 1992 1202 7.2. Informative References 1203 Authors' Addresses 1205 Barry Constantine 1206 JDSU, Test and Measurement Division 1207 One Milesone Center Court 1208 Germantown, MD 20876-7100 1209 USA 1211 Phone: +1 240 404 2227 1212 barry.constantine@jdsu.com 1214 Gilles Forget 1215 Independent Consultant to Bell Canada. 1216 308, rue de Monaco, St-Eustache 1217 Qc. CANADA, Postal Code : J7P-4T5 1219 Phone: (514) 895-8212 1220 gilles.forget@sympatico.ca 1222 Rudiger Geib 1223 Heinrich-Hertz-Strasse (Number: 3-7) 1224 Darmstadt, Germany, 64295 1226 Phone: +49 6151 6282747 1227 Ruediger.Geib@telekom.de 1229 Reinhard Schrage 1230 Schrage Consulting 1232 Phone: +49 (0) 5137 909540 1233 reinhard@schrageconsult.com