idnits 2.17.1 draft-ietf-ippm-tcp-throughput-tm-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 14, 2010) is 4905 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group B. Constantine 2 Internet-Draft JDSU 3 Intended status: Informational G. Forget 4 Expires: May 14, 2011 Bell Canada (Ext. Consultant) 5 Rudiger Geib 6 Deutsche Telekom 7 Reinhard Schrage 8 Schrage Consulting 10 November 14, 2010 12 Framework for TCP Throughput Testing 13 draft-ietf-ippm-tcp-throughput-tm-08.txt 15 Abstract 17 This framework describes a methodology for measuring end-to-end TCP 18 throughput performance in a managed IP network. The intention is to 19 provide a practical methodology to validate TCP layer performance. 20 The goal is to provide a better indication of the user experience. 21 In this framework, various TCP and IP parameters are identified and 22 should be tested as part of a managed IP network verification. 24 Requirements Language 26 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 27 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 28 document are to be interpreted as described in RFC 2119 [RFC2119]. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on May 14, 2011. 47 Copyright Notice 49 Copyright (c) 2010 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1 Test Set-up and Terminology . . . . . . . . . . . . . . . 4 66 2. Scope and Goals of this methodology. . . . . . . . . . . . . . 5 67 2.1 TCP Equilibrium. . . . . . . . . . . . . . . . . . . . . . 6 68 3. TCP Throughput Testing Methodology . . . . . . . . . . . . . . 7 69 3.1 Determine Network Path MTU . . . . . . . . . . . . . . . . 9 70 3.2. Baseline Round Trip Time and Bandwidth . . . . . . . . . . 10 71 3.2.1 Techniques to Measure Round Trip Time . . . . . . . . 10 72 3.2.2 Techniques to Measure end-to-end Bandwidth. . . . . . 11 73 3.3. TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 12 74 3.3.1 Calculate Ideal TCP Receive Window Size. . . . . . . . 12 75 3.3.2 Metrics for TCP Throughput Tests . . . . . . . . . . . 15 76 3.3.3 Conducting the TCP Throughput Tests. . . . . . . . . . 18 77 3.3.4 Single vs. Multiple TCP Connection Testing . . . . . . 19 78 3.3.5 Interpretation of the TCP Throughput Results . . . . . 20 79 3.4. Traffic Management Tests . . . . . . . . . . . . . . . . . 20 80 3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 21 81 3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 21 82 3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 22 83 3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 23 84 4. Security Considerations . . . . . . . . . . . . . . . . . . . 23 85 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 86 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23 87 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 88 7.1 Normative References . . . . . . . . . . . . . . . . . . . 24 89 7.2 Informative References . . . . . . . . . . . . . . . . . . 24 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 93 1. Introduction 95 Network providers are coming to the realization that Layer 2/3 96 testing is not enough to adequately ensure end-user's satisfaction. 97 An SLA (Service Level Agreement) is provided to business customers 98 and is generally based upon Layer 2/3 criteria such as access rate, 99 latency, packet loss and delay variations. On the other hand, 100 measuring TCP throughput provides meaningful results with respect to 101 user experience. Thus, the network provider community desires to 102 measure IP network throughput performance at the TCP layer. 104 Additionally, business enterprise customers seek to conduct 105 repeatable TCP throughput tests between locations. Since these 106 enterprises rely on the networks of the providers, a common test 107 methodology with predefined metrics will benefit both parties. 109 Note that the primary focus of this methodology is managed business 110 class IP networks; i.e. those Ethernet terminated services for which 111 businesses are provided an SLA from the network provider. End-users 112 with "best effort" access between locations can use this methodology, 113 but this framework and its metrics are intended to be used in a 114 predictable managed IP service environment. 116 So the intent behind this document is to define a methodology for 117 testing sustained TCP layer performance. In this document, the 118 maximum achievable TCP throughput is that amount of data per unit 119 time that TCP transports when trying to reach Equilibrium, i.e. 120 after the initial slow start and congestion avoidance phases. We 121 refer to this as the maximum achievable TCP Throughput for the TCP 122 connection(s). 124 TCP uses a congestion window, (TCP CWND), to determine how many 125 packets it can send at one time. A larger TCP CWND permits a higher 126 throughput. TCP "slow start" and "congestion avoidance" algorithms 127 together determine the TCP CWND size. The Maximum TCP CWND size is 128 also tributary to the buffer space allocated by the kernel for each 129 socket. For each socket, there is a default buffer size that can be 130 changed by the program using a system library called just before 131 opening the socket. There is also a kernel enforced maximum buffer 132 size. This buffer size can be adjusted at both ends of the socket 133 (send and receive). In order to obtain the maximum throughput, it 134 is critical to use optimal TCP Send and Receive Socket Buffer sizes 135 as well as the optimal TCP Receive Window size. 137 There are many variables to consider when conducting a TCP throughput 138 test and this methodology focuses on the most common: 139 - Path MTU and Maximum Segment Size (MSS) 140 - RTT and Bottleneck BW 141 - Ideal TCP Receive Window (including Ideal Receive Socket Buffer) 142 - Ideal Send Socket Buffer 143 - TCP Congestion Window (TCP CWND) 144 - Single Connection and Multiple Connections testing 145 This methodology proposes TCP testing that should be performed in 146 addition to traditional Layer 2/3 type tests. Layer 2/3 tests are 147 required to verify the integrity of the network before conducting TCP 148 test. Examples include iperf (UDP mode) or manual packet layer test 149 techniques where packet throughput, loss, and delay measurements are 150 conducted. When available, standardized testing similar to RFC 2544 151 [RFC2544] but adapted for use in operational networks may be used. 152 Note: RFC 2544 was never meant to be used outside a lab environment. 154 1.1 Test Set-up and Terminology 156 This section provides a general overview of the test configuration 157 for this methodology. The test is intended to be conducted on an 158 end-to-end operational and managed IP network. A multitude of 159 network architectures and topologies can be tested. The following 160 set-up diagram is very general and it only illustrates the 161 segmentation within end user and network provider domains. 163 Common terminologies used in the test methodology are: 165 - Bottleneck Bandwidth (BB), lowest bandwidth along the complete 166 path. Bottleneck Bandwidth and Bandwidth are used synonymously 167 in this document. Most of the time the Bottleneck Bandwidth is 168 in the access portion of the wide area network (CE - PE) 169 - Customer Provided Equipment (CPE), refers to customer owned 170 equipment (routers, switches, computers, etc.) 171 - Customer Edge (CE), refers to provider owned demarcation device. 172 - End-user: The business enterprise customer. For the purposes of 173 conducting TCP throughput tests, this may be the IT department. 174 - Network Under Test (NUT), refers to the tested IP network path. 175 - Provider Edge (PE), refers to provider's distribution equipment. 176 - P (Provider), refers to provider core network equipment. 177 - Round-Trip Time (RTT), refers to Layer 4 back and forth delay. 178 - Round-Trip Delay (RTD), refers to Layer 1 back and forth delay. 179 - TCP Throughput Test Device (TCP TTD), refers to compliant TCP 180 host that generates traffic and measures metrics as defined in 181 this methodology. i.e. a dedicated communications test instrument. 183 +----+ +----+ +----+ +----+ +---+ +---+ +----+ +----+ +----+ +----+ 184 | TCP|-| CPE|-| CE |--| PE |-| P |--| P |-| PE |--| CE |-| CPE|-| TCP| 185 | TTD| | | | |BB| | | | | | | |BB| | | | | TTD| 186 +----+ +----+ +----+ +----+ +---+ +---+ +----+ +----+ +----+ +----+ 187 <------------------------ NUT ------------------------> 188 R >-----------------------------------------------------------| 189 T | 190 T <-----------------------------------------------------------| 192 Note that the NUT may consist of a variety of devices including but 193 not limited to, load balancers, proxy servers or WAN acceleration 194 devices. The detailed topology of the NUT should be well understood 195 when conducting the TCP throughput tests, although this methodology 196 makes no attempt to characterize specific network architectures. 198 2. Scope and Goals of this Methodology 200 Before defining the goals, it is important to clearly define the 201 areas that are out-of-scope. 203 - This methodology is not intended to predict the TCP throughput 204 during the transient stages of a TCP connection, such as the initial 205 slow start. 207 - This methodology is not intended to definitively benchmark TCP 208 implementations of one OS to another, although some users may find 209 some value in conducting qualitative experiments. 211 - This methodology is not intended to provide detailed diagnosis 212 of problems within end-points or within the network itself as 213 related to non-optimal TCP performance, although a results 214 interpretation section for each test step may provide insight in 215 regards with potential issues. 217 - This methodology does not propose to operate permanently with high 218 measurement loads. TCP performance and optimization within 219 operational networks may be captured and evaluated by using data 220 from the "TCP Extended Statistics MIB" [RFC4898]. 222 - This methodology is not intended to measure TCP throughput as part 223 of an SLA, or to compare the TCP performance between service 224 providers or to compare between implementations of this methodology 225 in dedicated communications test instruments. 227 In contrast to the above exclusions, a primary goal is to define a 228 method to conduct a practical, end-to-end assessment of sustained 229 TCP performance within a managed business class IP network. Another 230 key goal is to establish a set of "best practices" that a non-TCP 231 expert should apply when validating the ability of a managed network 232 to carry end-user TCP applications. 234 Other specific goals are to : 236 - Provide a practical test approach that specifies IP hosts 237 configurable TCP parameters such as TCP Receive Window size, Socket 238 Buffer size, MSS (Maximum Segment Size), number of connections, and 239 how these affect the outcome of TCP performance over a network. 240 See section 3.3.3. 242 - Provide specific test conditions like link speed, RTT, TCP Receive 243 Window size, Socket Buffer size and maximum achievable TCP throughput 244 when trying to reach TCP Equilibrium. For guideline purposes, 245 provide examples of test conditions and their maximum achievable 246 TCP throughput. Section 2.1 provides specific details concerning the 247 definition of TCP Equilibrium within this methodology while section 3 248 provides specific test conditions with examples. 250 - Define three (3) basic metrics to compare the performance of TCP 251 connections under various network conditions. See section 3.3.2. 253 - In test situations where the recommended procedure does not yield 254 the maximum achievable TCP throughput results, this methodology 255 provides some possible areas within the end host or network that 256 should be considered for investigation. Although again, this 257 methodology is not intended to provide a detailed diagnosis on these 258 issues. See section 3.3.5. 260 2.1 TCP Equilibrium 262 TCP connections have three (3) fundamental congestion window phases 263 as documented in [RFC5681]. 265 These 3 phases are: 266 1 - The Slow Start phase, which occurs at the beginning of a TCP 267 transmission or after a retransmission time out. 269 2 - The Congestion Avoidance phase, during which TCP ramps up to 270 establish the maximum attainable throughput on an end-to-end network 271 path. Retransmissions are a natural by-product of the TCP congestion 272 avoidance algorithm as it seeks to achieve maximum throughput. 274 3 - The Retransmission Time-out phase, which could include Fast 275 Retransmit (Tahoe) or Fast Recovery (Reno & New Reno). When multiple 276 packet lost occurs, Congestion Avoidance phase transitions to Fast 277 Retransmission or Fast Recovery depending upon TCP implementations. 278 If a Time-Out occurs, TCP transitions back to the Slow Start phase. 280 The following diagram depicts these 3 phases. 282 | Trying to reach TCP Equilibrium >>>>>>>>>>>>> 283 /\ | High ssthresh TCP CWND 3 284 /\ | Loss Event * halving Retransmission 285 /\ | * \ upon loss Time-Out Adjusted 286 /\ | * \ /\ _______ ssthresh 287 /\ | * \ / \ /M-Loss | * 288 TCP | * 2 \/ \ / Events |1 * 289 Through- | * Congestion\ / |Slow * 290 put | 1 * Avoidance \/ |Start * 291 | Slow * Half | * 292 | Start * TCP CWND * 293 |___*_______________________Minimum TCP CWND after Time-Out_ 294 Time >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 295 Note : ssthresh = Slow Start threshold. 297 Through the above 3 phases, TCP is trying to reach Equilibrium, but 298 since packet loss is currently its only available feedback indicator, 299 TCP will never reach that goal. Although, a well tuned (and managed) 300 IP network with well tuned IP hosts and applications should perform 301 very close to TCP Equilibrium and to the BB (Bottleneck Bandwidth). 303 This TCP methodology provides guidelines to measure the maximum 304 achievable TCP throughput or maximum TCP sustained rate obtained 305 after TCP CWND has stabilized to an optimal value. All maximum 306 achievable TCP throughputs specified in section 3 are with respect to 307 this condition. 309 It is important to clarify the interaction between the sender's Send 310 Socket Buffer and the receiver's advertised TCP Receive Window. TCP 311 test programs such as iperf, ttcp, etc. allow the sender to control 312 the quantity of TCP Bytes transmitted and unacknowledged (in-flight), 313 commonly referred to as the Send Socket Buffer. This is done 314 independently of the TCP Receive Window size advertised by the 315 receiver. Implications to the capabilities of the Throughput Test 316 Device (TTD) are covered at the end of section 3. 318 3. TCP Throughput Testing Methodology 320 As stated earlier in section 1, it is considered best practice to 321 verify the integrity of the network by conducting Layer2/3 tests such 322 as [RFC2544] or other methods of network stress tests. Although, it 323 is important to mention here that RFC 2544 was never meant to be used 324 outside a lab environment. 326 If the network is not performing properly in terms of packet loss, 327 jitter, etc. then the TCP layer testing will not be meaningful. A 328 dysfunctional network will not reach close enough to TCP Equilibrium 329 to provide optimal TCP throughputs with the available bandwidth. 331 TCP Throughput testing may require cooperation between the end user 332 customer and the network provider. In a Layer 2/3 VPN architecture, 333 the testing should be conducted either on the CPE or on the CE device 334 and not on the PE (Provider Edge) router. 336 The following represents the sequential order of steps for this 337 testing methodology: 339 1. Identify the Path MTU. Packetization Layer Path MTU Discovery 340 or PLPMTUD, [RFC4821], MUST be conducted to verify the network path 341 MTU. Conducting PLPMTUD establishes the upper limit for the MSS to 342 be used in subsequent steps. 344 2. Baseline Round Trip Time and Bandwidth. This step establishes the 345 inherent, non-congested Round Trip Time (RTT) and the bottleneck 346 bandwidth of the end-to-end network path. These measurements are 347 used to provide estimates of the ideal TCP Receive Window and Send 348 Socket Buffer sizes that SHOULD be used in subsequent test steps. 349 These measurements reference [RFC2681] and [RFC4898] to measure RTD 350 and the associated RTT. 352 3. TCP Connection Throughput Tests. With baseline measurements 353 of Round Trip Time and bottleneck bandwidth, single and multiple TCP 354 connection throughput tests SHOULD be conducted to baseline network 355 performance expectations. 357 4. Traffic Management Tests. Various traffic management and queuing 358 techniques can be tested in this step, using multiple TCP 359 connections. Multiple connections testing should verify that the 360 network is configured properly for traffic shaping versus policing, 361 various queuing implementations and RED. 363 Important to note are some of the key characteristics and 364 considerations for the TCP test instrument. The test host may be a 365 standard computer or a dedicated communications test instrument. 366 In both cases, they must be capable of emulating both client and 367 server. 369 The following criteria should be considered when selecting whether 370 the TCP test host can be a standard computer or has to be a dedicated 371 communications test instrument: 373 - TCP implementation used by the test host, OS version, i.e. Linux OS 374 kernel using TCP Reno, TCP options supported, etc. These will 375 obviously be more important when using dedicated communications test 376 instruments where the TCP implementation may be customized or tuned 377 to run in higher performance hardware. When a compliant TCP TTD is 378 used, the TCP implementation MUST be identified in the test results. 379 The compliant TCP TTD should be usable for complete end-to-end 380 testing through network security elements and should also be usable 381 for testing network sections. 383 - More important, the TCP test host MUST be capable to generate 384 and receive stateful TCP test traffic at the full link speed of the 385 network under test. Stateful TCP test traffic means that the test 386 host MUST fully implement a TCP stack; this is generally a comment 387 aimed at dedicated communications test equipments which sometimes 388 "blast" packets with TCP headers. As a general rule of thumb, testing 389 TCP throughput at rates greater than 100 Mbit/sec MAY require high 390 performance server hardware or dedicated hardware based test tools. 392 - A compliant TCP Throughput Test Device MUST allow adjusting both 393 Send Socket Buffer and TCP Receive Window sizes. The Receive Socket 394 Buffer MUST be large enough to accommodate the TCP Receive Window. 396 - Measuring RTT and retransmissions per connection will generally 397 require a dedicated communications test instrument. In the absence of 398 dedicated hardware based test tools, these measurements may need to 399 be conducted with packet capture tools, i.e. conduct TCP throughput 400 tests and analyze RTT and retransmission results in packet captures. 401 Another option may be to use "TCP Extended Statistics MIB" per 402 [RFC4898]. 404 - The RFC4821 PLPMTUD test SHOULD be conducted with a dedicated 405 tester which exposes the ability to run the PLPMTUD algorithm 406 independent from the OS stack. 408 3.1. Determine Network Path MTU 410 TCP implementations should use Path MTU Discovery techniques (PMTUD). 411 PMTUD relies on ICMP 'need to frag' messages to learn the path MTU. 412 When a device has a packet to send which has the Don't Fragment (DF) 413 bit in the IP header set and the packet is larger than the Maximum 414 Transmission Unit (MTU) of the next hop, the packet is dropped and 415 the device sends an ICMP 'need to frag' message back to the host that 416 originated the packet. The ICMP 'need to frag' message includes 417 the next hop MTU which PMTUD uses to tune the TCP Maximum Segment 418 Size (MSS). Unfortunately, because many network managers completely 419 disable ICMP, this technique does not always prove reliable. 421 Packetization Layer Path MTU Discovery or PLPMTUD [RFC4821] MUST then 422 be conducted to verify the network path MTU. PLPMTUD can be used 423 with or without ICMP. The following sections provide a summary of the 424 PLPMTUD approach and an example using TCP. [RFC4821] specifies a 425 search_high and a search_low parameter for the MTU. As specified in 426 [RFC4821], 1024 Bytes is a safe value for search_low in modern 427 networks. 429 It is important to determine the links overhead along the IP path, 430 and then to select a TCP MSS size corresponding to the Layer 3 MTU. 431 For example, if the MTU is 1024 Bytes and the TCP/IP headers are 40 432 Bytes, then the MSS would be set to 984 Bytes. 434 An example scenario is a network where the actual path MTU is 1240 435 Bytes. The TCP client probe MUST be capable of setting the MSS for 436 the probe packets and could start at MSS = 984 (which corresponds 437 to an MTU size of 1024 Bytes). 439 The TCP client probe would open a TCP connection and advertise the 440 MSS as 984. Note that the client probe MUST generate these packets 441 with the DF bit set. The TCP client probe then sends test traffic 442 per a small default Send Socket Buffer size of ~8KBytes. It should 443 be kept small to minimize the possibility of congesting the network, 444 which may induce packet loss. The duration of the test should also 445 be short (10-30 seconds), again to minimize congestive effects 446 during the test. 448 In the example of a 1240 Bytes path MTU, probing with an MSS equal to 449 984 would yield a successful probe and the test client packets would 450 be successfully transferred to the test server. 452 Also note that the test client MUST verify that the MSS advertised 453 is indeed negotiated. Network devices with built-in Layer 4 454 capabilities can intercede during the connection establishment and 455 reduce the advertised MSS to avoid fragmentation. This is certainly 456 a desirable feature from a network perspective, but it can yield 457 erroneous test results if the client test probe does not confirm the 458 negotiated MSS. 460 The next test probe would use the search_high value and this would 461 be set to MSS = 1460 to correspond to a 1500 Bytes MTU. In this 462 example, the test client will retransmit based upon time-outs, since 463 no ACKs will be received from the test server. This test probe is 464 marked as a conclusive failure if none of the test packets are 465 ACK'ed. If any of the test packets are ACK'ed, congestive network 466 may be the cause and the test probe is not conclusive. Re-testing 467 at other times of the day is recommended to further isolate. 469 The test is repeated until the desired granularity of the MTU is 470 discovered. The method can yield precise results at the expense of 471 probing time. One approach may be to reduce the probe size to 472 half between the unsuccessful search_high and successful search_low 473 value and raise it by half also when seeking the upper limit. 475 3.2. Baseline Round Trip Time and Bandwidth 477 Before stateful TCP testing can begin, it is important to determine 478 the baseline Round Trip Time (non-congested inherent delay) and 479 bottleneck bandwidth of the end-to-end network to be tested. These 480 measurements are used to provide estimates of the ideal TCP Receive 481 Window and Send Socket Buffer sizes that SHOULD be used in subsequent 482 test steps. 484 3.2.1 Techniques to Measure Round Trip Time 486 Following the definitions used in section 1.1, Round Trip Time (RTT) 487 is the elapsed time between the clocking in of the first bit of a 488 payload sent packet to the receipt of the last bit of the 489 corresponding Acknowledgment. Round Trip Delay (RTD) is used 490 synonymously to twice the Link Latency. RTT measurements SHOULD use 491 techniques defined in [RFC2681] or statistics available from MIBs 492 defined in [RFC4898]. 494 The RTT SHOULD be baselined during "off-peak" hours to obtain a 495 reliable figure for inherent network latency versus additional delay 496 caused by network buffering. When sampling values of RTT over a test 497 interval, the minimum value measured SHOULD be used as the baseline 498 RTT since this will most closely estimate the inherent network 499 latency. This inherent RTT is also used to determine the Buffer 500 Delay Percentage metric which is defined in Section 3.3.2 501 The following list is not meant to be exhaustive, although it 502 summarizes some of the most common ways to determine round trip time. 503 The desired resolution of the measurement (i.e. msec versus usec) may 504 dictate whether the RTT measurement can be achieved with ICMP pings 505 or by a dedicated communications test instrument with precision 506 timers. 508 The objective in this section is to list several techniques 509 in order of decreasing accuracy. 511 - Use test equipment on each end of the network, "looping" the 512 far-end tester so that a packet stream can be measured back and forth 513 from end-to-end. This RTT measurement may be compatible with delay 514 measurement protocols specified in [RFC5357]. 516 - Conduct packet captures of TCP test sessions using "iperf" or FTP, 517 or other TCP test applications. By running multiple experiments, 518 packet captures can then be analyzed to estimate RTT based upon the 519 SYN -> SYN-ACK from the 3 way handshake at the beginning of the TCP 520 sessions. Although, note that Firewalls might slow down 3 way 521 handshakes, so it might be useful to compare with measured RTT later 522 on in the same capture. 524 - ICMP Pings may also be adequate to provide round trip time 525 estimations. Some limitations with ICMP Ping may include msec 526 resolution and whether the network elements are responding to pings 527 or not. Also, ICMP is often rate-limited and segregated into 528 different buffer queues, so it is not as reliable and accurate as 529 in-band measurements. 531 3.2.2 Techniques to Measure end-to-end Bandwidth 533 There are many well established techniques available to provide 534 estimated measures of bandwidth over a network. These measurements 535 SHOULD be conducted in both directions of the network, especially for 536 access networks, which may be asymmetrical. Measurements SHOULD use 537 network capacity techniques defined in [RFC5136]. 539 Before any TCP Throughput test can be done, a bandwidth measurement 540 test MUST be run with stateless IP streams(not stateful TCP) in order 541 to determine the available bandwidths in each direction. This test 542 should obviously be performed at various intervals throughout a 543 business day or even across a week. Ideally, the bandwidth test 544 should produce logged outputs of the achieved bandwidths across the 545 test interval. 547 3.3. TCP Throughput Tests 549 This methodology specifically defines TCP throughput techniques to 550 verify sustained TCP performance in a managed business IP network, as 551 defined in section 2.1. This section and others will define the 552 method to conduct these sustained TCP throughput tests and guidelines 553 for the predicted results. 555 With baseline measurements of round trip time and bandwidth 556 from section 3.2, a series of single and multiple TCP connection 557 throughput tests SHOULD be conducted to baseline network performance 558 against expectations. The number of trials and the type of testing 559 (single versus multiple connections) will vary according to the 560 intention of the test. One example would be a single connection test 561 in which the throughput achieved by large Send Socket Buffer and TCP 562 Receive Window sizes (i.e. 256KB) is to be measured. It would be 563 advisable to test performance at various times of the business day. 565 It is RECOMMENDED to run the tests in each direction independently 566 first, then run both directions simultaneously. In each case, 567 TCP Transfer Time, TCP Efficiency, and Buffer Delay Percentage MUST 568 be measured in each direction. These metrics are defined in 3.3.2. 570 3.3.1 Calculate Ideal TCP Receive Window Size 572 The ideal TCP Receive Window size can be calculated from the 573 bandwidth delay product (BDP), which is: 575 BDP (bits) = RTT (sec) x Bandwidth (bps) 577 Note that the RTT is being used as the "Delay" variable in the 578 BDP calculations. 580 Then, by dividing the BDP by 8, we obtain the "ideal" TCP Receive 581 Window size in Bytes. For optimal results, the Send Socket Buffer 582 size must be adjusted to the same value at the opposite end of the 583 network path. 585 Ideal TCP RWIN = BDP / 8 587 An example would be a T3 link with 25 msec RTT. The BDP would equal 588 ~1,105,000 bits and the ideal TCP Receive Window would be ~138 589 KBytes. 591 The following table provides some representative network Link Speeds, 592 RTT, BDP, and their associated Ideal TCP Receive Window sizes. 594 Table 3.3.1: Link Speed, RTT and calculated BDP & TCP Receive Window 596 Link Ideal TCP 597 Speed* RTT BDP Receive Window 598 (Mbps) (ms) (bits) (KBytes) 599 --------------------------------------------------------------------- 600 1.536 20 30,720 3.84 601 1.536 50 76,800 9.60 602 1.536 100 153,600 19.20 603 44.21 10 442,100 55.26 604 44.21 15 663,150 82.89 605 44.21 25 1,105,250 138.16 606 100 1 100,000 12.50 607 100 2 200,000 25.00 608 100 5 500,000 62.50 609 1,000 0.1 100,000 12.50 610 1,000 0.5 500,000 62.50 611 1,000 1 1,000,000 125.00 612 10,000 0.05 500,000 62.50 613 10,000 0.3 3,000,000 375.00 615 * Note that link speed is the bottleneck bandwidth for the NUT 617 The following serial link speeds are used: 618 - T1 = 1.536 Mbits/sec (for a B8ZS line encoding facility) 619 - T3 = 44.21 Mbits/sec (for a C-Bit Framing facility) 621 The above table illustrates the ideal TCP Receive Window size. 622 If a smaller TCP Receive Window is used, then the TCP Throughput 623 is not optimal. To calculate the Ideal TCP Throughput, the following 624 formula is used: TCP Throughput = TCP RWIN X 8 / RTT 626 An example could be a 100 Mbps IP path with 5 ms RTT and a TCP 627 Receive Window size of 16KB, then: 629 TCP Throughput = 16 KBytes X 8 bits / 5 ms. 630 TCP Throughput = 128,000 bits / 0.005 sec. 631 TCP Throughput = 25.6 Mbps. 633 Another example for a T3 using the same calculation formula is 634 illustrated on the next page: 635 TCP Throughput = TCP RWIN X 8 / RTT. 636 TCP Throughput = 16 KBytes X 8 bits / 10 ms. 637 TCP Throughput = 128,000 bits / 0.01 sec. 638 TCP Throughput = 12.8 Mbps. 640 When the TCP Receive Window size exceeds the BDP (i.e. T3 link, 641 64 KBytes TCP Receive Window on a 10 ms RTT path), the maximum frames 642 per second limit of 3664 is reached and the calculation formula is: 644 TCP Throughput = Max FPS X MSS X 8. 645 TCP Throughput = 3664 FPS X 1460 Bytes X 8 bits. 646 TCP Throughput = 42.8 Mbps 647 The following diagram compares achievable TCP throughputs on a T3 648 with Send Socket Buffer & TCP Receive Window sizes of 16KB vs. 64KB. 650 45| 651 | _______42.8M 652 40| |64KB | 653 TCP | | | 654 Throughput 35| | | 655 in Mbps | | | +-----+34.1M 656 30| | | |64KB | 657 | | | | | 658 25| | | | | 659 | | | | | 660 20| | | | | _______20.5M 661 | | | | | |64KB | 662 15| | | | | | | 663 |12.8M+-----| | | | | | 664 10| |16KB | | | | | | 665 | | | |8.5M+-----| | | | 666 5| | | | |16KB | |5.1M+-----| | 667 |_____|_____|_____|____|_____|_____|____|16KB |_____|_____ 668 10 15 25 669 RTT in milliseconds 671 The following diagram shows the achievable TCP throughput on a 25ms 672 T3 when Send Socket Buffer & TCP Receive Window sizes are increased. 674 45| 675 | 676 40| +-----+40.9M 677 TCP | | | 678 Throughput 35| | | 679 in Mbps | | | 680 30| | | 681 | | | 682 25| | | 683 | | | 684 20| +-----+20.5M | | 685 | | | | | 686 15| | | | | 687 | | | | | 688 10| +-----+10.2M | | | | 689 | | | | | | | 690 5| +-----+5.1M | | | | | | 691 |_____|_____|______|_____|______|_____|_______|_____|_____ 692 16 32 64 128* 693 TCP Receive Window size in KBytes 695 * Note that 128KB requires [RFC1323] TCP Window scaling option. 697 3.3.2 Metrics for TCP Throughput Tests 699 This framework focuses on a TCP throughput methodology and also 700 provides several basic metrics to compare results of various 701 throughput tests. It is recognized that the complexity and 702 unpredictability of TCP makes it impossible to develop a complete 703 set of metrics that accounts for the myriad of variables (i.e. RTT 704 variation, loss conditions, TCP implementation, etc.). However, 705 these basic metrics will facilitate TCP throughput comparisons 706 under varying network conditions and between network traffic 707 management techniques. 709 The first metric is the TCP Transfer Time, which is simply the 710 measured time it takes to transfer a block of data across 711 simultaneous TCP connections. This concept is useful when 712 benchmarking traffic management techniques and where multiple 713 TCP connections are required. 715 TCP Transfer time may also be used to provide a normalized ratio of 716 the actual TCP Transfer Time versus the Ideal Transfer Time. This 717 ratio is called the TCP Transfer Index and is defined as: 719 Actual TCP Transfer Time 720 ------------------------- 721 Ideal TCP Transfer Time 723 The Ideal TCP Transfer time is derived from the network path 724 bottleneck bandwidth and various Layer 1/2/3/4 overheads associated 725 with the network path. Additionally, both the TCP Receive Window and 726 the Send Socket Buffer sizes must be tuned to equal the bandwidth 727 delay product (BDP) as described in section 3.3.1. 729 The following table illustrates the Ideal TCP Transfer time of a 730 single TCP connection when its TCP Receive Window and Send Socket 731 Buffer sizes are equal to the BDP. 733 Table 3.3.2: Link Speed, RTT, BDP, TCP Throughput, and 734 Ideal TCP Transfer time for a 100 MB File 736 Link Maximum Ideal TCP 737 Speed BDP Achievable TCP Transfer time 738 (Mbps) RTT (ms) (KBytes) Throughput(Mbps) (seconds) 739 -------------------------------------------------------------------- 740 1.536 50 9.6 1.4 571 741 44.21 25 138.2 42.8 18 742 100 2 25.0 94.9 9 743 1,000 1 125.0 949.2 1 744 10,000 0.05 62.5 9,492 0.1 746 Transfer times are rounded for simplicity. 748 For a 100MB file(100 x 8 = 800 Mbits), the Ideal TCP Transfer Time 749 is derived as follows: 751 800 Mbits 752 Ideal TCP Transfer Time = ----------------------------------- 753 Maximum Achievable TCP Throughput 755 The maximum achievable layer 2 throughput on T1 and T3 Interfaces 756 is based on the maximum frames per second (FPS) permitted by the 757 actual layer 1 speed when the MTU is 1500 Bytes. 759 The maximum FPS for a T1 is 127 and the calculation formula is: 760 FPS = T1 Link Speed / ((MTU + PPP + Flags + CRC16) X 8) 761 FPS = (1.536M /((1500 Bytes + 4 Bytes + 2 Bytes + 2 Bytes) X 8 ))) 762 FPS = (1.536M / (1508 Bytes X 8)) 763 FPS = 1.536 Mbps / 12064 bits 764 FPS = 127 766 The maximum FPS for a T3 is 3664 and the calculation formula is: 767 FPS = T3 Link Speed / ((MTU + PPP + Flags + CRC16) X 8) 768 FPS = (44.21M /((1500 Bytes + 4 Bytes + 2 Bytes + 2 Bytes) X 8 ))) 769 FPS = (44.21M / (1508 Bytes X 8)) 770 FPS = 44.21 Mbps / 12064 bits 771 FPS = 3664 773 The 1508 equates to: 775 MTU + PPP + Flags + CRC16 777 Where MTU is 1500 Bytes, PPP is 4 Bytes, Flags are 2 Bytes and CRC16 778 is 2 Bytes. 780 Then, to obtain the Maximum Achievable TCP Throughput (layer 4), we 781 simply use: MSS in Bytes X 8 bits X max FPS. 782 For a T3, the maximum TCP Throughput = 1460 Bytes X 8 bits X 3664 FPS 783 Maximum TCP Throughput = 11680 bits X 3664 FPS 784 Maximum TCP Throughput = 42.8 Mbps. 786 The maximum achievable layer 2 throughput on Ethernet Interfaces is 787 based on the maximum frames per second permitted by the IEEE802.3 788 standard when the MTU is 1500 Bytes. 790 The maximum FPS for 100M Ethernet is 8127 and the calculation is: 791 FPS = (100Mbps /(1538 Bytes X 8 bits)) 793 The maximum FPS for GigE is 81274 and the calculation formula is: 794 FPS = (1Gbps /(1538 Bytes X 8 bits)) 796 The maximum FPS for 10GigE is 812743 and the calculation formula is: 797 FPS = (10Gbps /(1538 Bytes X 8 bits)) 798 The 1538 equates to: 800 MTU + Eth + CRC32 + IFG + Preamble + SFD 802 Where MTU is 1500 Bytes, Ethernet is 14 Bytes, CRC32 is 4 Bytes, 803 IFG is 12 Bytes, Preamble is 7 Bytes and SFD is 1 Byte. 805 Note that better results could be obtained with jumbo frames on 806 GigE and 10 GigE. 808 Then, to obtain the Maximum Achievable TCP Throughput (layer 4), we 809 simply use: MSS in Bytes X 8 bits X max FPS. 810 For a 100M, the maximum TCP Throughput = 1460 B X 8 bits X 8127 FPS 811 Maximum TCP Throughput = 11680 bits X 8127 FPS 812 Maximum TCP Throughput = 94.9 Mbps. 814 To illustrate the TCP Transfer Time Index, an example would be the 815 bulk transfer of 100 MB over 5 simultaneous TCP connections (each 816 connection uploading 100 MB). In this example, the Ethernet service 817 provides a Committed Access Rate (CAR) of 500 Mbit/s. Each 818 connection may achieve different throughputs during a test and the 819 overall throughput rate is not always easy to determine (especially 820 as the number of connections increases). 822 The ideal TCP Transfer Time would be ~8 seconds, but in this example, 823 the actual TCP Transfer Time was 12 seconds. The TCP Transfer Index 824 would then be 12/8 = 1.5, which indicates that the transfer across 825 all connections took 1.5 times longer than the ideal. 827 The second metric is TCP Efficiency, which is the percentage of Bytes 828 that were not retransmitted and is defined as: 830 Transmitted Bytes - Retransmitted Bytes 831 --------------------------------------- x 100 832 Transmitted Bytes 834 Transmitted Bytes are the total number of TCP payload Bytes to be 835 transmitted which includes the original and retransmitted Bytes. This 836 metric provides a comparative measure between various QoS mechanisms 837 like traffic management or congestion avoidance. Various TCP 838 implementations like Reno, Vegas, etc. could also be compared. 840 As an example, if 100,000 Bytes were sent and 2,000 had to be 841 retransmitted, the TCP Efficiency should be calculated as: 843 102,000 - 2,000 844 ---------------- x 100 = 98.03% 845 102,000 847 Note that the retransmitted Bytes may have occurred more than once, 848 and these multiple retransmissions are added to the Retransmitted 849 Bytes count (and the Transmitted Bytes count). 851 The third metric is the Buffer Delay Percentage, which represents the 852 increase in RTT during a TCP throughput test with respect to 853 inherent or baseline network RTT. The baseline RTT is the round-trip 854 time inherent to the network path under non-congested conditions. 855 (See 3.2.1 for details concerning the baseline RTT measurements). 857 The Buffer Delay Percentage is defined as: 859 Average RTT during Transfer - Baseline RTT 860 ------------------------------------------ x 100 861 Baseline RTT 863 As an example, the baseline RTT for the network path is 25 msec. 864 During the course of a TCP transfer, the average RTT across the 865 entire transfer increased to 32 msec. In this example, the Buffer 866 Delay Percentage would be calculated as: 868 32 - 25 869 ------- x 100 = 28% 870 25 872 Note that the TCP Transfer Time, TCP Efficiency, and Buffer Delay 873 Percentage MUST be measured during each throughput test. Poor TCP 874 Transfer Time Indexes (TCP Transfer Time greater than Ideal TCP 875 Transfer Times) may be diagnosed by correlating with sub-optimal TCP 876 Efficiency and/or Buffer Delay Percentage metrics. 878 3.3.3 Conducting the TCP Throughput Tests 880 Several TCP tools are currently used in the network world and one of 881 the most common is "iperf". With this tool, hosts are installed at 882 each end of the network path; one acts as client and the other as 883 a server. The Send Socket Buffer and the TCP Receive Window sizes 884 of both client and server can be manually set. The achieved 885 throughput can then be measured, either uni-directionally or 886 bi-directionally. For higher BDP situations in lossy networks 887 (long fat networks or satellite links, etc.), TCP options such as 888 Selective Acknowledgment SHOULD be considered and become part of 889 the window size / throughput characterization. 891 Host hardware performance must be well understood before conducting 892 the tests described in the following sections. A dedicated 893 communications test instrument will generally be required, especially 894 for line rates of GigE and 10 GigE. A compliant TCP TTD SHOULD 895 provide a warning message when the expected test throughput will 896 exceed 10% of the network bandwidth capacity. If the throughput test 897 is expected to exceed 10% of the provider bandwidth, then the test 898 should be coordinated with the network provider. This does not 899 include the customer premise bandwidth, the 10% refers directly to 900 the provider's bandwidth (Provider Edge to Provider router). 902 The TCP throughput test should be run over a long enough duration 903 to properly exercise network buffers (greater than 30 seconds) and 904 also characterize performance at different time periods of the day. 906 3.3.4 Single vs. Multiple TCP Connection Testing 908 The decision whether to conduct single or multiple TCP connection 909 tests depends upon the size of the BDP in relation to the configured 910 TCP Receive Window sizes configured in the end-user environment. 911 For example, if the BDP for a long fat network turns out to be 2MB, 912 then it is probably more realistic to test this network path with 913 multiple connections. Assuming typical host computer TCP Receive 914 Window Sizes of 64 KB, using 32 TCP connections would realistically 915 test this path. 917 The following table is provided to illustrate the relationship 918 between the TCP Receive Window size and the number of TCP connections 919 required to utilize the available capacity of a given BDP. For this 920 example, the network bandwidth is 500 Mbps and the RTT is 5 ms, then 921 the BDP equates to 312.5 KBytes. 923 TCP Number of TCP Connections 924 Window to fill available bandwidth 925 ------------------------------------- 926 16KB 20 927 32KB 10 928 64KB 5 929 128KB 3 931 The TCP Transfer Time metric is useful for conducting multiple 932 connection tests. Each connection should be configured to transfer 933 payloads of the same size (i.e. 100 MB), and the TCP Transfer time 934 should provide a simple metric to verify the actual versus expected 935 results. 937 Note that the TCP transfer time is the time for all connections to 938 complete the transfer of the configured payload size. From the 939 previous table, the 64KB window is considered. Each of the 5 940 TCP connections would be configured to transfer 100MB, and each one 941 should obtain a maximum of 100 Mb/sec. So for this example, the 942 100MB payload should be transferred across the connections in 943 approximately 8 seconds (which would be the ideal TCP transfer time 944 under these conditions). 946 Additionally, the TCP Efficiency metric MUST be computed for each 947 connection tested as defined in section 3.3.2. 949 3.3.5 Interpretation of the TCP Throughput Results 951 At the end of this step, the user will document the theoretical BDP 952 and a set of Window size experiments with measured TCP throughput for 953 each TCP window size. For cases where the sustained TCP throughput 954 does not equal the ideal value, some possible causes are: 956 - Network congestion causing packet loss which MAY be inferred from 957 a poor TCP Efficiency % (higher TCP Efficiency % = less packet 958 loss) 959 - Network congestion causing an increase in RTT which MAY be inferred 960 from the Buffer Delay Percentage (i.e., 0% = no increase in RTT 961 over baseline) 962 - Intermediate network devices which actively regenerate the TCP 963 connection and can alter TCP Receive Window size, MSS, etc. 964 - Rate limiting (policing). More details on traffic management 965 tests follows in section 3.4 967 3.4. Traffic Management Tests 969 In most cases, the network connection between two geographic 970 locations (branch offices, etc.) is lower than the network connection 971 to host computers. An example would be LAN connectivity of GigE 972 and WAN connectivity of 100 Mbps. The WAN connectivity may be 973 physically 100 Mbps or logically 100 Mbps (over a GigE WAN 974 connection). In the later case, rate limiting is used to provide the 975 WAN bandwidth per the SLA. 977 Traffic management techniques are employed to provide various forms 978 of QoS, the more common include: 980 - Traffic Shaping 981 - Priority queuing 982 - Random Early Discard (RED) 984 Configuring the end-to-end network with these various traffic 985 management mechanisms is a complex under-taking. For traffic shaping 986 and RED techniques, the end goal is to provide better performance to 987 bursty traffic such as TCP,(RED is specifically intended for TCP). 989 This section of the methodology provides guidelines to test traffic 990 shaping and RED implementations. As in section 3.3, host hardware 991 performance must be well understood before conducting the traffic 992 shaping and RED tests. Dedicated communications test instrument will 993 generally be REQUIRED for line rates of GigE and 10 GigE. If the 994 throughput test is expected to exceed 10% of the provider bandwidth, 995 then the test should be coordinated with the network provider. This 996 does not include the customer premises bandwidth, the 10% refers to 997 the provider's bandwidth (Provider Edge to Provider router). Note 998 that GigE and 10 GigE interfaces might benefit from hold-queue 999 adjustments in order to prevent the saw-tooth TCP traffic pattern. 1001 3.4.1 Traffic Shaping Tests 1003 For services where the available bandwidth is rate limited, two (2) 1004 techniques can be used: traffic policing or traffic shaping. 1006 Simply stated, traffic policing marks and/or drops packets which 1007 exceed the SLA bandwidth (in most cases, excess traffic is dropped). 1008 Traffic shaping employs the use of queues to smooth the bursty 1009 traffic and then send out within the SLA bandwidth limit (without 1010 dropping packets unless the traffic shaping queue is exhausted). 1012 Traffic shaping is generally configured for TCP data services and 1013 can provide improved TCP performance since the retransmissions are 1014 reduced, which in turn optimizes TCP throughput for the available 1015 bandwidth. Through this section, the rate-limited bandwidth shall 1016 be referred to as the "bottleneck bandwidth". 1018 The ability to detect proper traffic shaping is more easily diagnosed 1019 when conducting a multiple TCP connections test. Proper shaping will 1020 provide a fair distribution of the available bottleneck bandwidth, 1021 while traffic policing will not. 1023 The traffic shaping tests are built upon the concepts of multiple 1024 connections testing as defined in section 3.3.3. Calculating the BDP 1025 for the bottleneck bandwidth is first required before selecting the 1026 number of connections and Send Buffer and TCP Receive Window sizes 1027 per connection. 1029 Similar to the example in section 3.3, a typical test scenario might 1030 be: GigE LAN with a 100Mbps bottleneck bandwidth (rate limited 1031 logical interface), and 5 msec RTT. This would require five (5) TCP 1032 connections of 64 KB Send Socket Buffer and TCP Receive Window sizes 1033 to evenly fill the bottleneck bandwidth (~100 Mbps per connection). 1035 The traffic shaping test should be run over a long enough duration to 1036 properly exercise network buffers (greater than 30 seconds) and also 1037 characterize performance during different time periods of the day. 1038 The throughput of each connection MUST be logged during the entire 1039 test, along with the TCP Transfer Time, TCP Efficiency, and 1040 Buffer Delay Percentage. 1042 3.4.1.1 Interpretation of Traffic Shaping Test Results 1044 By plotting the throughput achieved by each TCP connection, the fair 1045 sharing of the bandwidth is generally very obvious when traffic 1046 shaping is properly configured for the bottleneck interface. For the 1047 previous example of 5 connections sharing 500 Mbps, each connection 1048 would consume ~100 Mbps with a smooth variation. 1050 If traffic policing was present on the bottleneck interface, the 1051 bandwidth sharing may not be fair and the resulting throughput plot 1052 may reveal "spikey" throughput consumption of the competing TCP 1053 connections (due to the TCP retransmissions). 1055 3.4.2 RED Tests 1057 Random Early Discard techniques are specifically targeted to provide 1058 congestion avoidance for TCP traffic. Before the network element 1059 queue "fills" and enters the tail drop state, RED drops packets at 1060 configurable queue depth thresholds. This action causes TCP 1061 connections to back-off which helps to prevent tail drop, which in 1062 turn helps to prevent global TCP synchronization. 1064 Again, rate limited interfaces may benefit greatly from RED based 1065 techniques. Without RED, TCP may not be able to achieve the full 1066 bottleneck bandwidth. With RED enabled, TCP congestion avoidance 1067 throttles the connections on the higher speed interface (i.e. LAN) 1068 and can help achieve the full bottleneck bandwidth. The burstiness 1069 of TCP traffic is a key factor in the overall effectiveness of RED 1070 techniques; steady state bulk transfer flows will generally not 1071 benefit from RED. With bulk transfer flows, network device queues 1072 gracefully throttle the effective throughput rates due to increased 1073 delays. 1075 The ability to detect proper RED configuration is more easily 1076 diagnosed when conducting a multiple TCP connections test. Multiple 1077 TCP connections provide the bursty sources that emulate the 1078 real-world conditions for which RED was intended. 1080 The RED tests also builds upon the concepts of multiple connections 1081 testing as defined in section 3.3.3. Calculating the BDP for the 1082 bottleneck bandwidth is first required before selecting the number 1083 of connections, the Send Socket Buffer size and the TCP Receive 1084 Window size per connection. 1086 For RED testing, the desired effect is to cause the TCP connections 1087 to burst beyond the bottleneck bandwidth so that queue drops will 1088 occur. Using the same example from section 3.4.1 (traffic shaping), 1089 the 500 Mbps bottleneck bandwidth requires 5 TCP connections (with 1090 window size of 64KB) to fill the capacity. Some experimentation is 1091 required, but it is recommended to start with double the number of 1092 connections to stress the network element buffers / queues (10 1093 connections for this example). 1095 The TCP TTD must be configured to generate these connections as 1096 shorter (bursty) flows versus bulk transfer type flows. These TCP 1097 bursts should stress queue sizes in the 512KB range. Again 1098 experimentation will be required; the proper number of TCP 1099 connections, the Send Socket Buffer and TCP Receive Window sizes will 1100 be dictated by the size of the network element queue. 1102 3.4.2.1 Interpretation of RED Results 1104 The default queuing technique for most network devices is FIFO based. 1105 Without RED, the FIFO based queue may cause excessive loss to all of 1106 the TCP connections and in the worst case global TCP synchronization. 1108 By plotting the aggregate throughput achieved on the bottleneck 1109 interface, proper RED operation may be determined if the bottleneck 1110 bandwidth is fully utilized. For the previous example of 10 1111 connections (window = 64 KB) sharing 500 Mbps, each connection should 1112 consume ~50 Mbps. If RED was not properly enabled on the interface, 1113 then the TCP connections will retransmit at a higher rate and the 1114 net effect is that the bottleneck bandwidth is not fully utilized. 1116 Another means to study non-RED versus RED implementation is to use 1117 the TCP Transfer Time metric for all of the connections. In this 1118 example, a 100 MB payload transfer should take ideally 16 seconds 1119 across all 10 connections (with RED enabled). With RED not enabled, 1120 the throughput across the bottleneck bandwidth may be greatly 1121 reduced (generally 10-20%) and the actual TCP Transfer time may be 1122 proportionally longer then the Ideal TCP Transfer time. 1124 Additionally, non-RED implementations may exhibit a lower TCP 1125 Transfer Efficiency. 1127 4. Security Considerations 1129 The security considerations that apply to any active measurement of 1130 live networks are relevant here as well. See [RFC4656] and 1131 [RFC5357]. 1133 5. IANA Considerations 1135 This document does not REQUIRE an IANA registration for ports 1136 dedicated to the TCP testing described in this document. 1138 6. Acknowledgments 1140 Thanks to Lars Eggert, Al Morton, Matt Mathis, Matt Zekauskas, 1141 Yaakov Stein, and Loki Jorgenson for many good comments and for 1142 pointing us to great sources of information pertaining to past works 1143 in the TCP capacity area. 1145 7. References 1147 7.1 Normative References 1149 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1150 Requirement Levels", BCP 14, RFC 2119, March 1997. 1152 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 1153 Zekauskas, "A One-way Active Measurement Protocol 1154 (OWAMP)", RFC 4656, September 2006. 1156 [RFC5681] Allman, M., Paxson, V., Stevens W., "TCP Congestion 1157 Control", RFC 5681, September 2009. 1159 [RFC2544] Bradner, S., McQuaid, J., "Benchmarking Methodology for 1160 Network Interconnect Devices", RFC 2544, June 1999 1162 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz, 1163 J., "A Two-Way Active Measurement Protocol (TWAMP)", 1164 RFC 5357, October 2008 1166 [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU 1167 Discovery", RFC 4821, June 2007 1169 draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk 1170 Transfer Capacity Methodology for Cooperating Hosts", 1171 August 2001 1173 [RFC2681] Almes G., Kalidindi S., Zekauskas, M., "A Round-trip Delay 1174 Metric for IPPM", RFC 2681, September, 1999 1176 [RFC4898] Mathis, M., Heffner, J., Raghunarayan, R., "TCP Extended 1177 Statistics MIB", May 2007 1179 [RFC5136] Chimento P., Ishac, J., "Defining Network Capacity", 1180 February 2008 1182 [RFC1323] Jacobson, V., Braden, R., Borman D., "TCP Extensions for 1183 High Performance", May 1992 1185 7.2. Informative References 1186 Authors' Addresses 1188 Barry Constantine 1189 JDSU, Test and Measurement Division 1190 One Milesone Center Court 1191 Germantown, MD 20876-7100 1192 USA 1194 Phone: +1 240 404 2227 1195 barry.constantine@jdsu.com 1197 Gilles Forget 1198 Independent Consultant to Bell Canada. 1199 308, rue de Monaco, St-Eustache 1200 Qc. CANADA, Postal Code : J7P-4T5 1202 Phone: (514) 895-8212 1203 gilles.forget@sympatico.ca 1205 Rudiger Geib 1206 Heinrich-Hertz-Strasse (Number: 3-7) 1207 Darmstadt, Germany, 64295 1209 Phone: +49 6151 6282747 1210 Ruediger.Geib@telekom.de 1212 Reinhard Schrage 1213 Schrage Consulting 1215 Phone: +49 (0) 5137 909540 1216 reinhard@schrageconsult.com