idnits 2.17.1 

draft-ietf-ippm-tcp-throughput-tm-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 694 has weird spacing: '... Window    to ...'

  -- The document date (August 27, 2010) is 4981 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                     B. Constantine
2	Internet-Draft                                                      JDSU
3	Intended status: Informational                                 G. Forget
4	Expires: February 27, 2011                 Bell Canada (Ext. Consultant)
5	                                                            L. Jorgenson
6	                                                                 nooCore
7	                                                        Reinhard Schrage
8	                                                      Schrage Consulting
9	                                                         August 27, 2010

11	                    TCP Throughput Testing Methodology
12	                draft-ietf-ippm-tcp-throughput-tm-06.txt

14	Abstract

16	   This memo describes a methodology for measuring sustained TCP
17	   throughput performance in an end-to-end managed network environment.
18	   This memo is intended to provide a practical approach to help users
19	   validate the TCP layer performance of a managed network, which should
20	   provide a better indication of end-user application level experience.
21	   In the methodology, various TCP and network parameters are identified
22	   that should be tested as part of the network verification at the TCP
23	   layer.

25	Requirements Language

27	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
28	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
29	   document are to be interpreted as described in RFC 2119 [RFC2119].

31	Status of this Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at http://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	           This Internet-Draft will expire on February 27, 2011.

48	   Copyright Notice

50	   Copyright (c) 2010 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (http://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the Simplified BSD License.

63	Table of Contents

65	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
66	   2.  Goals of this Methodology. . . . . . . . . . . . . . . . . . .  4
67	     2.1   TCP Equilibrium State Throughput . . . . . . . . . . . . .  5
68	     2.2   Metrics for TCP Throughput Tests . . . . . . . . . . . . .  6
69	   3.  TCP Throughput Testing Methodology . . . . . . . . . . . . . .  7
70	     3.1   Determine Network Path MTU . . . . . . . . . . . . . . . .  8
71	     3.2.  Baseline Round-trip Delay and Bandwidth. . . . . . . . . . 10
72	         3.2.1  Techniques to Measure Round Trip Time . . . . . . . . 10
73	         3.2.2  Techniques to Measure End-end Bandwidth . . . . . . . 11
74	     3.3.  TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 11
75	         3.3.1 Calculate Optimum TCP Window Size. . . . . . . . . . . 12
76	         3.3.2 Conducting the TCP Throughput Tests. . . . . . . . . . 14
77	         3.3.3 Single vs. Multiple TCP Connection Testing . . . . . . 15
78	         3.3.4 Interpretation of the TCP Throughput Results . . . . . 16
79	     3.4. Traffic Management Tests .  . . . . . . . . . . . . . . . . 16
80	         3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 16
81	          3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 17
82	         3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 18
83	          3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 18
84	   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 18
85	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
86	     5.1.  Registry Specification . . . . . . . . . . . . . . . . . . 19
87	     5.2.  Registry Contents  . . . . . . . . . . . . . . . . . . . . 19
88	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
89	   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
90	     7.1   Normative References . . . . . . . . . . . . . . . . . . . 19
91	     7.2   Informative References . . . . . . . . . . . . . . . . . . 20

93	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20

95	1. Introduction

97	   Testing an operational network prior to customer activation is
98	   referred to as "turn-up" testing and the SLA (Service Level
99	   Agreement) is generally based upon Layer 2/3 packet throughput,
100	   delay, loss and jitter.

102	   Network providers are coming to the realization that Layer 2/3
103	   testing and TCP layer testing are required to more adequately ensure
104	   end-user satisfaction. Therefore, the network provider community
105	   desires to measure network throughput performance at the TCP layer.
106	   Measuring TCP throughput provides a meaningful measure with respect
107	   to the end user experience (and ultimately reach some level of
108	   TCP testing interoperability which does not exist today).

110	   Additionally, end-users (business enterprises) seek to conduct
111	   repeatable TCP throughput tests between enterprise locations.  Since
112	   these enterprises rely on the networks of the providers, a common
113	   test methodology (and metrics) would be equally beneficial to both
114	   parties.

116	   So the intent behind this TCP throughput draft is to define
117	   a methodology for testing sustained TCP layer performance.  In this
118	   document, sustained TCP throughput is that amount of data per unit
119	   time that TCP transports during equilibrium (steady state), i.e.
120	   after the initial slow start phase. We refer to this state as TCP
121	   Equilibrium, and that the equilibrium throughput is the maximum
122	   achievable for the TCP connection(s).

124	   There are many variables to consider when conducting a TCP throughput
125	   test and this methodology focuses on some of the most common
126	   parameters that should be considered such as:

128	   - Path MTU and Maximum Segment Size (MSS)
129	   - RTT and Bottleneck BW
130	   - Ideal TCP Window (Bandwidth Delay Product)
131	   - Single Connection and Multiple Connection testing

133	   One other important note, it is highly recommended that traditional
134	   Layer 2/3 type tests are conducted to verify the integrity of the
135	   network before conducting TCP tests.  Examples include RFC 2544
136	   [RFC2544], iperf (UDP mode), or manual packet layer test techniques
137	   where packet throughput, loss, and delay measurements are conducted.

139	2. Goals of this Methodology

141	   Before defining the goals of this methodology, it is important to
142	   clearly define the areas that are not intended to be measured or
143	   analyzed by such a methodology.

145	   - The methodology is not intended to predict TCP throughput
146	   behavior during the transient stages of a TCP connection, such
147	   as initial slow start.

149	   - The methodology is not intended to definitively benchmark TCP
150	   implementations of one OS to another, although some users may find
151	   some value in conducting qualitative experiments

153	   - The methodology is not intended to provide detailed diagnosis
154	   of problems within end-points or the network itself as related to
155	   non-optimal TCP performance, although a results interpretation
156	   section for each test step may provide insight into potential
157	   issues within the network

159	   In contrast to the above exclusions, the goals of this methodology
160	   are to define a method to conduct a structured, end-to-end
161	   assessment of sustained TCP performance within a managed business
162	   class IP network.  A key goal is to establish a set of "best
163	   practices" that an engineer should apply when validating the
164	   ability of a managed network to carry end-user TCP applications.

166	   Some specific goals are to:

168	   - Provide a practical test approach that specifies the more well
169	   understood (and end-user configurable) TCP parameters such as Window
170	   size, MSS (Maximum Segment Size), # connections, and how these affect
171	   the outcome of TCP performance over a network.

173	   - Provide specific test conditions (link speed, RTT, window size,
174	   etc.) and maximum achievable TCP throughput under TCP Equilbrium
175	   conditions.  For guideline purposes, provide examples of these test
176	   conditions and the maximum achievable TCP throughput during the
177	   equilibrium state.  Section 2.1 provides specific details concerning
178	   the definition of TCP Equilibrium within the context of this draft.

180	   - Define two (2) basic metrics that can be used to compare the
181	   performance of TCP connections under various network conditions

183	   - In test situations where the recommended procedure does not yield
184	   the maximum achievable TCP throughput result, this draft provides
185	   some possible areas within the end host or network that should be
186	   considered for investigation (although again, this draft is not
187	   intended to provide a detailed diagnosis of these issues)

189	2.1 TCP Equilibrium State Throughput

191	   TCP connections have three (3) fundamental congestion window phases
192	   as documented in RFC 5681 [RFC5681].  These states are:

194	   - Slow Start, which occurs during the beginning of a TCP transmission
195	   or after a retransmission time out event

197	   - Congestion avoidance, which is the phase during which TCP ramps up
198	   to establish the maximum attainable throughput on an end-end network
199	   path.  Retransmissions are a natural by-product of the TCP congestion
200	   avoidance algorithm as it seeks to achieve maximum throughput on
201	   the network path.

203	   - Retransmission phase, which include Fast Retransmit (Tahoe) and
204	   Fast Recovery (Reno and New Reno).  When a packet is lost, the
205	   Congestion avoidance phase transitions to a Fast Retransmission or
206	   Recovery Phase dependent upon the TCP implementation.

208	   The following diagram depicts these states.

210	            |        ssthresh
211	   TCP      |           |
212	   Through- |           |       Equilibrium
213	   put      |           |\      /\/\/\/\/\  Retransmit          /\/\ ...
214	            |           | \    /         |  Time-out           /
215	            |           |  \  /          |  _______          _/
216	            |  Slow   _/    |/           | /       | Slow  _/
217	            | Start _/      Congestion   |/        |Start_/   Congestion
218	            |     _/         Avoidance   Loss      |   _/     Avoidance
219	            |   _/                       Event     | _/
220	            | _/                                   |/
221	            |/__________________________________________________________
222	                                                            Time

224	   This TCP methodology provides guidelines to measure the equilibrium
225	   throughput which refers to the maximum sustained rate obtained by
226	   congestion avoidance before packet loss conditions occur (which would
227	   cause the state change from congestion avoidance to a retransmission
228	   phase). All maximum achievable throughputs specified in Section 3 are
229	   with respect to this equilibrium state.

231	2.2 Metrics for TCP Throughput Tests

233	   This draft focuses on a TCP throughput methodology and also
234	   provides two basic metrics to compare results of various throughput
235	   tests.  It is recognized that the complexity and unpredictability of
236	   TCP makes it impossible to develop a complete set of metrics that
237	   account for the myriad of variables (i.e. RTT variation, loss
238	   conditions, TCP implementation, etc.).  However, these two basic
239	   metrics will faciliate TCP throughput comparisons under varying
240	   network conditions and between network traffic management techniques.

242	   The TCP Efficiency metric is the percentage of bytes that were not
243	   retransmitted and is defined as:

245	                Transmitted Bytes - Retransmitted Bytes
246	                ---------------------------------------  x 100
247	                          Transmitted Bytes

249	   This metric provides a comparative measure between various QoS
250	   mechanisms such as traffic management, congestion avoidance, and also
251	   various TCP implementations (i.e. Reno, Vegas, etc.).

253	   As an example, if 100,000 bytes were sent and 2,000 had to be
254	   retransmitted, the TCP Efficiency would be calculated as:

256	                   100,000 - 2,000
257	                   ----------------  x 100 = 98%
258	                        100,000

260	   Note that the retransmitted bytes may have occurred more than once,
261	   and these multiple retransmissions are added to the bytes
262	   retransmitted count.

264	   The second metric is the TCP Transfer Time, which is simply the time
265	   it takes to transfer a block of data across simultaneous TCP
266	   connections.  The concept is useful when benchmarking traffic
267	   management techniques, where multiple connections are generally
268	   required.

270	   The TCP Transfer time can also be used to provide a normalized ratio
271	   of the actual TCP Transfer Time versus ideal Transfer Time.  This
272	   ratio is called the TCP Transfer Index and is defined as:

274	                     Actual TCP Transfer Time
275	                    -------------------------
276	                     Ideal TCP Transfer Time

278	   An example would be the bulk transfer of 100 MB upon 5 simultaneous
279	   TCP connections over a 500 Mbit/s Ethernet service (each connection
280	   uploading 100 MB).  Each connection may achieve different throughputs
281	   during a test and the overall throughput rate is not always easy to
282	   determine (especially as the number of connections increases).

284	   The ideal TCP Transfer Time would be ~8 seconds, but in this example,
285	   the actual TCP Transfer Time was 12 seconds.  The TCP Transfer Index
286	   would be 12/8 = 1.5, which indicates that the transfer across all
287	   connections took 1.5 times longer than the ideal.

289	   Note that both the TCP Efficiency and TCP Transfer Time metrics must
290	   be measured during each throughput test. The correlation of TCP
291	   Transfer Time with TCP Efficiency can help to diagnose whether the
292	   TCP Transfer Time was negatively impacted by retransmissions (poor
293	   TCP Efficiency).

295	3. TCP Throughput Testing Methodology

297	   As stated in Section 1, it is considered best practice to verify
298	   the integrity of the network by conducting Layer2/3 stress tests
299	   such as RFC2544 (or other methods of network stress tests).  If the
300	   network is not performing properly in terms of packet loss, jitter,
301	   etc. then the TCP layer testing will not be meaningful since the
302	   equilibrium throughput would be very difficult to achieve (in a
303	   "dysfunctional" network).

305	   The following represents the sequential order of steps to conduct the
306	   TCP throughput testing methodology:

308	   1. Identify the Path MTU.  Packetization Layer Path MTU Discovery
309	   or PLPMTUD, [RFC4821], should be conducted to verify the maximum
310	   network path MTU.  Conducting PLPMTUD establishes the upper limit for
311	   the MSS to be used in subsequent steps.

313	   2. Baseline Round-trip Delay and Bandwidth. These measurements
314	   provide estimates of the ideal TCP window size, which will be used in
315	   subsequent test steps.

317	   3. TCP Connection Throughput Tests.  With baseline measurements
318	   of round trip delay and bandwidth, a series of single and multiple
319	   TCP connection throughput tests can be conducted to baseline the
320	   network performance expectations.

322	   4. Traffic Management Tests.  Various traffic management and queueing
323	   techniques are tested in this step, using multiple TCP connections.
324	   Multiple connection testing can verify that the network is configured
325	   properly for traffic shaping versus policing, various queueing
326	   implementations, and RED.

328	   Important to note are some of the key characteristics and
329	   considerations for the TCP test instrument.  The test host may be a
330	   standard computer or dedicated communications test instrument
331	   and these TCP test hosts be capable of emulating both a client and a
332	   server.

334	   Whether the TCP test host is a standard computer or dedicated test
335	   instrument, the following areas should be considered when selecting
336	   a test host:

338	   - TCP implementation used by the test host OS, i.e. Linux OS kernel
339	   using TCP Reno, TCP options supported, etc.  This will obviously be
340	   more important when using custom test equipment where the TCP
341	   implementation may be customized or tuned to run in higher
342	   performance hardware

344	   - Most importantly, the TCP test host must be capable of generating
345	   and receiving stateful TCP test traffic at the full link speed of the
346	   network under test. As a general rule of thumb, testing TCP
347	   throughput at rates greater than 100 Mbit/sec generally requires high
348	   performance server hardware or dedicated hardware based test tools.

350	   - To measure RTT and TCP Efficiency per connection, this will
351	   generally require dedicated hardware based test tools. In the absence
352	   of dedicated hardware based test tools, these measurements may need
353	   to be conducted with packet capture tools (conduct TCP throughput
354	   tests and analyze RTT and retransmission results with packet
355	   captures).

357	3.1. Determine Network Path MTU

359	   TCP implementations should use Path MTU Discovery techniques (PMTUD).
360	   PMTUD relies on ICMP 'need to frag' messages to learn the path MTU.
361	   When a device has a packet to send which has the Don't Fragment (DF)
362	   bit in the IP header set and the packet is larger than the Maximum
363	   Transmission Unit (MTU) of the next hop link, the packet is dropped
364	   and the device sends an ICMP 'need to frag' message back to the host
365	   that originated the packet. The ICMP 'need to frag' message includes
366	   the next hop MTU which PMTUD uses to tune the TCP Maximum Segment
367	   Size (MSS). Unfortunately, because many network managers completely
368	   disable ICMP, this technique does not always prove reliable in real
369	   world situations.

371	   Packetization Layer Path MTU Discovery or PLPMTUD (RFC4821) should
372	   be conducted to verify the minimum network path MTU.  PLPMTUD can
373	   be used with or without ICMP. The following sections provide a
374	   summary of the PLPMTUD approach and an example using the TCP
375	   protocol. RFC4821 specifies a search_high and search_low parameter
376	   for the MTU.  As specified in RFC4821, a value of 1024 is a generally
377	   safe value to choose for search_low in modern networks.

379	   It is important to determine the overhead of the links in the path,
380	   and then to select a TCP MSS size corresponding to the Layer 3 MTU.
381	   For example, if the MTU is 1024 bytes and the TCP/IP headers are 40
382	   bytes, then the MSS would be set to 984 bytes.

384	   An example scenario is a network where the actual path MTU is 1240
385	   bytes.  The TCP client probe MUST be capable of setting the MSS for
386	   the probe packets and could start at MSS = 984 (which corresponds
387	   to an MTU size of 1024 bytes).

389	   The TCP client probe would open a TCP connection and advertise the
390	   MSS as 984.  Note that the client probe MUST generate these packets
391	   with the DF bit set. The TCP client probe then sends test traffic
392	   per a nominal window size (8KB, etc.).  The window size should be
393	   kept small to minimize the possibility of congesting the network,
394	   which could induce congestive loss.  The duration of the test should
395	   also be short (10-30 seconds), again to minimize congestive effects
396	   during the test.

398	   In the example of a 1240 byte path MTU, probing with an MSS equal to
399	   984 would yield a successful probe and the test client packets would
400	   be successfully transferred to the test server.

402	   Also note that the test client MUST verify that the MSS advertised
403	   is indeed negotiated.  Network devices with built-in Layer 4
404	   capabilities can intercede during the connection establishment
405	   process and reduce the advertised MSS to avoid fragmentation.  This
406	   is certainly a desirable feature from a network perspective, but
407	   can yield erroneous test results if the client test probe does not
408	   confirm the negotiated MSS.

410	   The next test probe would use the search_high value and this would
411	   be set to MSS = 1460 to correspond to a 1500 byte MTU.  In this
412	   example, the test client would retransmit based upon time-outs (since
413	   no ACKs will be received from the test server).  This test probe is
414	   marked as a conclusive failure if none of the test packets are
415	   ACK'ed.  If any of the test packets are ACK'ed, congestive network
416	   may be the cause and the test probe is not conclusive.  Re-testing
417	   at other times of the day is recommended to further isolate.

419	   The test is repeated until the desired granularity of the MTU is
420	   discovered.  The method can yield precise results at the expense of
421	   probing time.  One approach would be to reduce the probe size to
422	   half between the unsuccessful search_high and successful search_low
423	   value, and increase by increments of 1/2 when seeking the upper
424	   limit.

426	3.2. Baseline Round-trip Delay and Bandwidth

428	   Before stateful TCP testing can begin, it is important to baseline
429	   the round trip delay and bandwidth of the network to be tested.
430	   These measurements provide estimates of the ideal TCP window size,
431	   which will be used in subsequent test steps.  These latency and
432	   bandwidth tests should be run during the time of day for which
433	   the TCP throughput tests will occur.

435	   The baseline RTT is used to predict the bandwidth delay product and
436	   the TCP Transfer Time for the subsequent throughput tests. Since this
437	   methodology requires that RTT be measured during the entire
438	   throughput test, the extent by which the RTT varied during the
439	   throughput test can be quantified.

441	   3.2.1 Techniques to Measure Round Trip Time

443	   Following the definitions used in the references of the appendix;
444	   Round Trip Time (RTT) is the time elapsed between the clocking in of
445	   the first bit of a payload packet to the receipt of the last bit of
446	   the corresponding acknowledgement.  Round Trip Delay (RTD) is used
447	   synonymously to twice the Link Latency.

449	   In any method used to baseline round trip delay between network
450	   end-points, it is important to realize that network latency is the
451	   sum of inherent network delay and congestion.  The RTT should be
452	   baselined during "off-peak" hours to obtain a reliable figure for
453	   network latency (versus additional delay caused by congestion).

455	   During the actual sustained TCP throughput tests, it is critical
456	   to measure RTT along with measured TCP throughput. Congestive
457	   effects can be isolated if RTT is concurrently measured.

459	   This is not meant to provide an exhaustive list, but summarizes some
460	   of the more common ways to determine round trip time (RTT) through
461	   the network. The desired resolution of the measurement (i.e. msec
462	   versus usec) may dictate whether the RTT measurement can be achieved
463	   with standard tools such as ICMP ping techniques or whether
464	   specialized test equipment would be required with high precision
465	   timers.  The objective in this section is to list several techniques
466	   in order of decreasing accuracy.

468	   - Use test equipment on each end of the network, "looping" the
469	   far-end tester so that a packet stream can be measured end-end. This
470	   test equipment RTT measurement may be compatible with delay
471	   measurement protocols specified in RFC5357.

473	   - Conduct packet captures of TCP test applications using for example
474	  "iperf" or FTP, etc.  By running multiple experiments, the packet
475	   captures can be studied to estimate RTT based upon the SYN -> SYN-ACK
476	   handshakes within the TCP connection set-up.

478	  - ICMP Pings may also be adequate to provide round trip time
479	   estimations.  Some limitations of ICMP Ping are the msec resolution
480	   and whether the network elements respond to pings (or block them).

482	   3.2.2 Techniques to Measure End-end Bandwidth

484	   There are many well established techniques available to provide
485	   estimated measures of bandwidth over a network.  This measurement
486	   should be conducted in both directions of the network, especially for
487	   access networks which are inherently asymmetrical.  Some of the
488	   asymmetric implications to TCP performance are documented in RFC 3449
489	   [RFC3449].

491	   The bandwidth measurement test must be run with stateless IP streams
492	   (not stateful TCP) in order to determine the available bandwidth in
493	   each direction.  And this test should obviously be performed at
494	   various intervals throughout a business day (or even across a week).
495	   Ideally, the bandwidth test should produce a log output of the
496	   bandwidth achieved across the test interval AND the round trip delay.

498	   And during the actual TCP level performance measurements (Sections
499	   3.3 - 3.5), the test tool must be able to track round trip time
500	   of the TCP connection(s) during the test.  Measuring round trip time
501	   variation (aka "jitter") provides insight into effects of congestive
502	   delay on the sustained throughput achieved for the TCP layer test.

504	3.3. TCP Throughput Tests

506	   This draft specifically defines TCP throughput techniques to verify
507	   sustained TCP performance in a managed business network.  Defined
508	   in section 2.1, the equilibrium throughput reflects the maximum
509	   rate achieved by a TCP connection within the congestion avoidance
510	   phase on a end-end network path.  This section and others will define
511	   the method to conduct these sustained throughput tests and guidelines
512	   of the predicted results.

514	   With baseline measurements of round trip time and bandwidth
515	   from section 3.2, a series of single and multiple TCP connection
516	   throughput tests can be conducted to baseline network performance
517	   against expectations.

519	   It is recommended to run the tests in each direction independently
520	   first, then run both directions simultaneously.  In each case, the
521	   TCP Efficiency and TCP Transfer Time metrics must be measured in each
522	   direction.

524	3.3.1 Calculate Optimum TCP Window Size

526	   The optimum TCP window size can be calculated from the bandwidth
527	   delay product (BDP), which is:

529	   BDP (bits) = RTT (sec) x Bandwidth (bps)

531	   By dividing the BDP by 8, the "ideal" TCP window size is calculated.
532	   An example would be a T3 link with 25 msec RTT.  The BDP would equal
533	   ~1,105,000 bits and the ideal TCP window would equal ~138,000 bytes.

535	   The following table provides some representative network link speeds,
536	   latency, BDP, and associated "optimum" TCP window size.  Sustained
537	   TCP transfers should reach nearly 100% throughput, minus the overhead
538	   of Layers 1-3 and the divisor of the MSS into the window.

540	   For this single connection baseline test, the MSS size will effect
541	   the achieved throughput (especially for smaller TCP window sizes).
542	   Table 3.2 provides the achievable, equilibrium TCP throughput (at
543	   Layer 4) using 1460 byte MSS.  Also in this table, the 58 byte L1-L4
544	   overhead including the Ethernet CRC32 is used for simplicity.

546	   Table 3.2: Link Speed, RTT and calculated BDP, TCP Throughput

548	   Link                               Ideal TCP      Maximum Achievable
549	   Speed*    RTT (ms)  BDP (bits)  Window (kbytes)  TCP Throughput(Mbps)
550	   ---------------------------------------------------------------------
551	    T1         20        30,720          3.84              1.17
552	    T1         50        76,800          9.60              1.40
553	    T1        100       153,600         19.20              1.40
554	    T3         10       442,100         55.26             42.05
555	    T3         15       663,150         82.89             42.05
556	    T3         25     1,105,250        138.16             41.52
557	    T3(ATM)    10       407,040         50.88             36.50
558	    T3(ATM)    15       610,560         76.32             36.23
559	    T3(ATM)    25     1,017,600        127.20             36.27
560	    100M        1       100,000         12.50             91.98
561	    100M        2       200,000         25.00             93.44
562	    100M        5       500,000         62.50             93.44
563	    1Gig      0.1       100,000         12.50            919.82
564	    1Gig      0.5       500,000         62.50            934.47
565	    1Gig        1     1,000,000        125.00            934.47
566	    10Gig     0.05      500,000         62.50          9,344.67
567	    10Gig     0.3     3,000,000        375.00          9,344.67

569	   * Note that link speed is the minimum link speed throughput a
570	   network; i.e. WAN with T1 link, etc.

572	   Also, the following link speeds (available payload bandwidth) were
573	   used for the WAN entries:

575	   - T1 = 1.536 Mbits/sec (B8ZS line encoding facility)
576	   - T3 = 44.21 Mbits/sec (C-Bit Framing)
577	   - T3(ATM) = 36.86 Mbits/sec (C-Bit Framing & PLCP, 96000 Cells per
578	     second)

580	   The calculation method used in this document is a 3 step process :

582	   1 - We determine what should be the optimal TCP Window size value
583	       based on the optimal quantity of "in-flight" octets discovered by
584	       the BDP calculation. We take into consideration that the TCP
585	       Window size has to be an exact multiple value of the MSS.
586	   2 - Then we calculate the achievable layer 2 throughput by
587	       multiplying the value determined in step 1 with the
588	       MSS & (MSS + L2 + L3 + L4 Overheads) divided by the RTT.
589	   3 - Finally, we multiply the calculated value of step 2 by the MSS
590	       versus (MSS + L2 + L3 + L4 Overheads) ratio.

592	   This gives us the achievable TCP Throughput value.  Sometimes, the
593	   maximum achievable throughput is limited by the maximum achievable
594	   quantity of Ethernet Frames per second on the physical media. Then
595	   this value is used in step 2 instead of the calculated one.

597	  The following diagram compares achievable TCP throughputs on a T3 link
598	  with Windows 2000/XP TCP window sizes of 16KB versus 64KB.

600	           45|
601	             |          _____42.1M
602	           40|          |64K|
603	TCP          |          |   |
604	Throughput 35|          |   |           _____34.3M
605	in Mbps      |          |   |           |64K|
606	           30|          |   |           |   |
607	             |          |   |           |   |
608	           25|          |   |           |   |
609	             |          |   |           |   |
610	           20|          |   |           |   |           _____20.5M
611	             |          |   |           |   |           |64K|
612	           15| 14.5M____|   |           |   |           |   |
613	             |      |16K|   |           |   |           |   |
614	           10|      |   |   |   9.6M+---+   |           |   |
615	             |      |   |   |       |16K|   |   5.8M____+   |
616	            5|      |   |   |       |   |   |       |16K|   |
617	             |______+___+___+_______+___+___+_______+__ +___+_______
618	                        10              15              25
619	                                RTT in milliseconds

621	   The following diagram shows the achievable TCP throughput on a 25ms
622	   T3 when the TCP Window size is increased and with the RFC1323 TCP
623	   Window scaling option.

625	           45|
626	             |                                             +-----+42.47M
627	           40|                                             |     |
628	TCP          |                                             |     |
629	Throughput 35|                                             |     |
630	in Mbps      |                                             |     |
631	           30|                                             |     |
632	             |                                             |     |
633	           25|                                             |     |
634	             |                               ______ 21.23M |     |
635	           20|                               |    |        |     |
636	             |                               |    |        |     |
637	           15|                               |    |        |     |
638	             |                               |    |        |     |
639	           10|               +----+10.62M    |    |        |     |
640	             |  _______5.31M |    |          |    |        |     |
641	            5|  |     |      |    |          |    |        |     |
642	             |__+_____+______+____+__________+____+________+_____+___
643	                   16           32           64              128
644	                               TCP Window size in KBytes

646	3.3.2 Conducting the TCP Throughput Tests

648	   There are several TCP tools that are commonly used in the network
649	   world and one of the most common is the "iperf" tool. With this tool,
650	   hosts are installed at each end of the network segment; one as client
651	   and the other as server.  The TCP Window size of both the client and
652	   the server can be manually set and the achieved throughput is
653	   measured, either uni-directionally or bi-directionally.  For higher
654	   BDP situations in lossy networks (long fat networks or satellite
655	   links, etc.), TCP options such as Selective Acknowledgment should be
656	   considered and also become part of the window size / throughput
657	   characterization.

659	   Host hardware performance must be well understood before conducting
660	   the TCP throughput tests and other tests in the following sections.
661	   Dedicated test equipment will generally be required, especially for
662	   line rates of GigE and 10 GigE.

664	   The TCP throughput test should be run over a a long enough duration
665	   to properly exercise network buffers and also characterize
666	   performance during different time periods of the day.  The results
667	   must be logged at the desired interval and the test must record RTT
668	   and TCP retransmissions at each interval.

670	   This correlation of retransmissions and RTT over the course of the
671	   test will clearly identify which portions of the transfer reached
672	   TCP Equilbrium state and to what effect increased RTT (congestive
673	   effects) may have been the cause of reduced equilibrium performance.

675	   Additionally, the TCP Efficiency and TCP Transfer time metrics should
676	   be logged in order to further characterize the window size tests.

678	3.3.3 Single vs. Multiple TCP Connection Testing

680	   The decision whether to conduct single or multiple TCP connection
681	   tests depends upon the size of the BDP in relation to the window
682	   sizes configured in the end-user environment.  For example, if the
683	   BDP for a long-fat pipe turns out to be 2MB, then it is probably more
684	   realistic to test this pipe with multiple connections. Assuming
685	   typical host computer window settings of 64 KB, using 32 connections
686	   would realistically test this pipe.

688	   The following table is provided to illustrate the relationship of the
689	   BDP, window size, and the number of connections required to utilize
690	   the available capacity.  For this example, the network bandwidth is
691	   500 Mbps, RTT is equal to 5 ms, and the BDP equates to 312 KBytes.

693	              #Connections
694	    Window    to Fill Link
695	   ------------------------
696	    16KB          20
697	    32KB          10
698	    64KB           5
699	    128KB          3

701	   The TCP Transfer Time metric is useful for conducting multiple
702	   connection tests.  Each connection should be configured to transfer
703	   a certain payload (i.e. 100 MB), and the TCP Transfer time provides
704	   a simple metric to verify the actual versus expected results.

706	   Note that the TCP transfer time is the time for all connections to
707	   complete the transfer of the configured payload size.  From the
708	   example table listed above, the 64KB window is considered.  Each of
709	   the 5 connections would be configured to transfer 100MB, and each
710	   TCP should obtain a maximum of 100 Mb/sec per connection.  So for
711	   this example, the 100MB payload should be transferred across the
712	   connections in approximately 8 seconds (which would be the ideal TCP
713	   transfer time for these conditions).

715	   Additionally, the TCP Efficiency metric should be computed for each
716	   connection tested (defined in section 2.2).

718	3.3.4 Interpretation of the TCP Throughput Results

720	   At the end of this step, the user will document the theoretical BDP
721	   and a set of Window size experiments with measured TCP throughput for
722	   each TCP window size setting.  For cases where the sustained TCP
723	   throughput does not equal the predicted value, some possible causes
724	   are listed:

726	   - Network congestion causing packet loss; the TCP Efficiency metric
727	   is a useful gauge to compare network performance
728	   - Network congestion not causing packet loss but increasing RTT
729	   - Intermediate network devices which actively regenerate the TCP
730	   connection and can alter window size, MSS, etc.
731	   - Over utilization of available link or rate limiting (policing).
732	   More discussion of traffic management tests follows in section 3.4

734	3.4. Traffic Management Tests

736	   In most cases, the network connection between two geographic
737	   locations (branch offices, etc.) is lower than the network connection
738	   of the host computers.  An example would be LAN connectivity of GigE
739	   and WAN connectivity of 100 Mbps.  The WAN connectivity may be
740	   physically 100 Mbps or logically 100 Mbps (over a GigE WAN
741	   connection).  In the later case, rate limiting is used to provide the
742	   WAN bandwidth per the SLA.

744	   Traffic management techniques are employed to provide various forms
745	   of QoS, the more common include:

747	   - Traffic Shaping
748	   - Priority Queueing
749	   - Random Early Discard (RED, etc.)

751	   Configuring the end-end network with these various traffic management
752	   mechanisms is a complex under-taking.  For traffic shaping and RED
753	   techniques, the end goal is to provide better performance for bursty
754	   traffic such as TCP (RED is specifically intended for TCP).

756	   This section of the methodology provides guidelines to test traffic
757	   shaping and RED implementations.  As in section 3.3, host hardware
758	   performance must be well understood before conducting the traffic
759	   shaping and RED tests. Dedicated test equipment will generally be
760	   required, especially for line rates of GigE and 10 GigE.

762	3.4.1 Traffic Shaping Tests

764	   For services where the available bandwidth is rate limited, there are
765	   two (2) techniques used to implement rate limiting: traffic policing
766	   and traffic shaping.

768	   Simply stated, traffic policing marks and/or drops packets which
769	   exceed the SLA bandwidth (in most cases, excess traffic is dropped).
770	   Traffic shaping employs the use of queues to smooth the bursty
771	   traffic and then send out within the SLA bandwidth limit (without
772	   dropping packets unless the traffic shaping queue is exceeded).

774	   Traffic shaping is generally configured for TCP data services and
775	   can provide improved TCP performance since the retransmissions are
776	   reduced, which in turn optimizes TCP throughput for the given
777	   available bandwidth.  Through this section, the available
778	   rate-limited bandwidth shall be referred to as the
779	   "bottleneck bandwidth".

781	   The ability to detect proper traffic shaping is more easily diagnosed
782	   when conducting a multiple TCP connection test.  Proper shaping will
783	   provide a fair distribution of the available bottleneck bandwidth,
784	   while traffic policing will not.

786	   The traffic shaping tests build upon the concepts of multiple
787	   connection testing as defined in section 3.3.3.  Calculating the BDP
788	   for the bottleneck bandwidth is first required and then selecting
789	   the number of connections / window size per connection.

791	   Similar to the example in section 3.3, a typical test scenario might
792	   be:  GigE LAN with a 100Mbps bottleneck bandwidth (rate limited
793	   logical interface), and 5 msec RTT.  This would require five (5) TCP
794	   connections of 64 KB window size evenly fill the bottleneck bandwidth
795	   (about 100 Mbps per connection).

797	   The traffic shaping should be run over a long enough duration to
798	   properly exercise network buffers and also characterize performance
799	   during different time periods of the day.  The throughput of each
800	   connection must be logged during the entire test, along with the TCP
801	   Efficiency and TCP Transfer time metric. Additionally, it is
802	   recommended to log RTT and retransmissions per connection over the
803	   test interval.

805	3.4.1.1 Interpretation of Traffic Shaping Test Restults

807	   By plotting the throughput achieved by each TCP connection, the fair
808	   sharing of the bandwidth is generally very obvious when traffic
809	   shaping is properly configured for the bottleneck interface.  For the
810	   previous example of 5 connections sharing 500 Mbps, each connection
811	   would consume ~100 Mbps with a smooth variation.  If traffic policing
812	   was present on the bottleneck interface, the bandwidth sharing would
813	   not be fair and the resulting throughput plot would reveal "spikey"
814	   throughput consumption of the competing TCP connections (due to the
815	   retransmissions).

817	3.4.2 RED Tests

819	   Random Early Discard techniques are specifically targeted to provide
820	   congestion avoidance for TCP traffic.  Before the network element
821	   queue "fills" and enters the tail drop state, RED drops packets at
822	   configurable queue depth thresholds.  This action causes TCP
823	   connections to back-off which helps to prevent tail drop, which in
824	   turn helps to prevent global TCP synchronization.

826	   Again, rate limited interfaces can benefit greatly from RED based
827	   techniques.  Without RED, TCP is generally not able to achieve the
828	   full bandwidth of the bottleneck interface.  With RED enabled, TCP
829	   congestion avoidance throttles the connections on the higher speed
830	   interface (i.e. LAN) and can reach equalibrium with the bottleneck
831	   bandwidth (achieving closer to full throughput).

833	   The ability to detect proper RED configuration is more easily
834	   diagnosed when conducting a multiple TCP connection test.  Multiple
835	   TCP connections provide the multiple bursty sources that emulate the
836	   real-world conditions for which RED was intended.

838	   The RED tests also build upon the concepts of multiple connection
839	   testing as defined in secion 3.3.3.  Calculating the BDP for the
840	   bottleneck bandwidth is first required and then selecting the number
841	   of connections / window size per connection.

843	   For RED testing, the desired effect is to cause the TCP connections
844	   to burst beyond the bottleneck bandwidth so that queue drops will
845	   occur.  Using the same example from section 3.4.1 (traffic shaping),
846	   the 500 Mbps bottleneck bandwidth requires 5 TCP connections (with
847	   window size of 64Kb) to fill the capacity.  Some experimentation is
848	   required,but it is recommended to start with double the number of
849	   connections to stress the network element buffers / queues.  In this
850	   example, 10 connections would produce TCP bursts of 64KB for each
851	   connection.  If the timing of the TCP tester permits, these TCP
852	   bursts could stress queue sizes in the 512KB range.  Again
853	   experimentation will be required and the proper number of TCP
854	   connections / window size will be dictated by the size the network
855	   element queue.

857	3.4.2.1 Interpretation of RED Results

859	   The default queuing technique for most network devices is FIFO based.
860	   Without RED, the FIFO based queue will cause excessive loss to all of
861	   the TCP connections and in the worst case global TCP synchronization.

863	   By plotting the aggregate throughput achieved on the bottleneck
864	   interface, proper RED operation can be determined if the bottleneck
865	   bandwidth is fully utilized.  For the previous example of 10
866	   connections (window = 64 KB) sharing 500 Mbps, each connection should
867	   consume ~50 Mbps.  If RED was not properly enabled on the interface,
868	   then the TCP connections will retransmit at a higher rate and the net
869	   effect is that the bottleneck bandwidth is not fully utilized.

871	   Another means to study non-RED versus RED implementation is to use
872	   the TCP Transfer Time metric for all of the connections.  In this
873	   example, a 100 MB payload transfer should take ideally 16 seconds
874	   across all 10 connections (with RED enabled).  With RED not enabled,
875	   the throughput across the bottleneck bandwidth would be greatly
876	   reduced (generally 20-40%) and the TCP Transfer time would be
877	   proportionally longer then the ideal transfer time.

879	   Additionally, the TCP Transfer Efficiency metric is useful, since
880	   non-RED implementations will exhibit a lower TCP Tranfer Efficiency
881	   than RED implementations.

883	4.  Security Considerations

885	   The security considerations that apply to any active measurement of
886	   live networks are relevant here as well.  See [RFC4656] and
887	   [RFC5357].

889	5.  IANA Considerations

891	   This memo does not require and IANA registration for ports dedicated
892	   to the TCP testing described in this memo.

894	6.  Acknowledgements

896	   The author would like to thank Gilles Forget, Loki Jorgenson,
897	   and Reinhard Schrage for technical review and original contributions
898	   to this draft-06.

900	   Also thanks to Matt Mathis, Matt Zekauskas, Al Morton, and Yaakov
901	   Stein for many good comments and for pointing us to great sources of
902	   information pertaining to past works in the TCP capacity area.

904	7.  References

906	7.1 Normative References

908	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
909	              Requirement Levels", BCP 14, RFC 2119, March 1997.

911	   [RFC4656]  Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M.
912	              Zekauskas, "A One-way Active Measurement Protocol
913	              (OWAMP)", RFC 4656, September 2006.

915	   [RFC5681]  Allman, M., Paxson, V., Stevens W., "TCP Congestion
916	              Control", RFC 5681, September 2009.

918	   [RFC2544]  Bradner, S., McQuaid, J., "Benchmarking Methodology for
919	              Network Interconnect Devices", RFC 2544, June 1999

921	   [RFC3449]  Balakrishnan, H., Padmanabhan, V. N., Fairhurst, G.,
922	              Sooriyabandara, M., "TCP Performance Implications of
923	              Network Path Asymmetry", RFC 3449, December 2002

925	   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz,
926	              J., "A Two-Way Active Measurement Protocol (TWAMP)",
927	              RFC 5357, October 2008

929	   [RFC4821]  Mathis, M., Heffner, J., "Packetization Layer Path MTU
930	              Discovery", RFC 4821, June 2007

932	              draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk
933	              Transfer Capacity Methodology for Cooperating Hosts",
934	              August 2001

936	7.2.  Informative References

938	Authors' Addresses

940	   Barry Constantine
941	   JDSU, Test and Measurement Division
942	   One Milesone Center Court
943	   Germantown, MD 20876-7100
944	   USA

946	   Phone: +1 240 404 2227
947	   barry.constantine@jdsu.com

949	   Gilles Forget
950	   Independent Consultant to Bell Canada.
951	   308, rue de Monaco, St-Eustache
952	   Qc. CANADA, Postal Code : J7P-4T5

954	   Phone: (514) 895-8212
955	   gilles.forget@sympatico.ca

957	   Loki Jorgenson
958	   nooCore

960	   Phone: (604) 908-5833
961	   ljorgenson@nooCore.com

963	   Reinhard Schrage
964	   Schrage Consulting

966	   Phone: +49 (0) 5137 909540
967	   reinhard@schrageconsult.com