idnits 2.17.1 

draft-ietf-ippm-tcp-throughput-tm-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 19 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 32 instances of too long lines in the document, the longest
     one being 1 character in excess of 72.

  ** There are 2 instances of lines with control characters in the document.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 311: '...TCP client probe MUST be capable of se...'
     RFC 2119 keyword, line 316: '...the client probe MUST generate these p...'
     RFC 2119 keyword, line 328: '... the test client MUST verify that the ...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 18, 2010) is 5093 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC2581' is defined on line 786, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3148' is defined on line 789, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2544' is defined on line 793, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3449' is defined on line 796, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5357' is defined on line 800, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4821' is defined on line 804, but no explicit
     reference was found in the text

  == Unused Reference: 'MSMO' is defined on line 811, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)


     Summary: 8 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                      B. Constantine
2	Internet-Draft                                                 	     JDSU
3	Intended status: Informational                                  G. Forget
4	Expires: November 18, 2010                  Bell Canada (Ext. Consultant)
5	                                                             L. Jorgenson
6	                                                        Apparent Networks
7	                                                         Reinhard Schrage
8	                                                       Schrage Consulting
9	                                                             May 18, 2010

11	                    TCP Throughput Testing Methodology
12	                draft-ietf-ippm-tcp-throughput-tm-02.txt

14	Abstract

16	   This memo describes a methodology for measuring sustained TCP
17	   throughput performance in an end-to-end managed network environment.
18	   This memo is intended to provide a practical approach to help users
19	   validate the TCP layer performance of a managed network, which should
20	   provide a better indication of end-user application level experience.
21	   In the methodology, various TCP and network parameters are identified
22	   that should be tested as part of the network verification at the TCP
23	   layer.

25	Status of this Memo

27	   This Internet-Draft is submitted to IETF in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF), its areas, and its working groups.  Note that
32	   other groups may also distribute working documents as Internet-
33	   Drafts.  Creation date May 18, 2010.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   The list of current Internet-Drafts can be accessed at
41	   http://www.ietf.org/ietf/1id-abstracts.txt.

43	   The list of Internet-Draft Shadow Directories can be accessed at
44	   http://www.ietf.org/shadow.html.

46	   This Internet-Draft will expire on November 18, 2010.

48	   Copyright Notice

50	   Copyright (c) 2010 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (http://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the BSD License.

63	Table of Contents

65	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
66	   2.  Goals of this Methodology. . . . . . . . . . . . . . . . . . .  4
67	     2.1   TCP Equilibrium State Throughput . . . . . . . . . . . . .  5
68	   3.  TCP Throughput Testing Methodology . . . . . . . . . . . . . .  6
69	     3.1   Determine Network Path MTU . . . . . . . . . . . . . . . .  7
70	     3.2.  Baseline Round-trip Delay and Bandwidth. . . . . . . . . .  8
71	         3.2.1  Techniques to Measure Round Trip Time . . . . . . . .  9
72	         3.2.2  Techniques to Measure End-end Bandwidth . . . . . . . 10
73	     3.3.  Single TCP Connection Throughput Tests . . . . . . . . . . 10
74	         3.3.1 Interpretation of the Single Connection TCP
75	               Throughput Results . . . . . . . . . . . . . . . . . . 14
76	     3.4.  TCP MSS Throughput Testing . . . . . . . . . . . . . . . . 14
77	         3.4.1  MSS Size Testing Method. . .  . . . . . . . . . . . . 14
78	         3.4.2  Interpretation of TCP MSS Throughput Results. . . . . 15
79	     3.5. Multiple TCP Connection Throughput Tests. . . . . . . . . . 16
80	         3.5.1 Multiple TCP Connections - below Link Capacity . . . . 16
81	         3.5.2 Multiple TCP Connections - over Link Capacity. . . . . 17
82	         3.5.3 Interpretation of Multiple TCP Connection Results. . . 17

84	   4.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18
85	   5.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
86	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19

88	1. Introduction

90	   Even though RFC2544 was meant to benchmark network equipment and
91	   used by network equipment manufacturers (NEMs), network providers
92	   have used it to benchmark operational networks in order to
93	   verify SLAs (Service Level Agreements) before turning on a service
94	   to their business customers.  Testing an operational network prior to
95	   customer activation is referred to as "turn-up" testing and the SLA
96	   is generally Layer 2/3 packet throughput, delay, loss and
97	   jitter.

99	   Network providers are coming to the realization that RFC2544 testing
100	   and TCP layer testing are required to more adequately ensure end-user
101	   satisfaction. Therefore, the network provider community desires to
102	   measure network throughput performance at the TCP layer. Measuring
103	   TCP throughput provides a meaningful measure with respect to the end
104	   user's application SLA (and ultimately reach some level of TCP
105	   testing interoperability which does not exist today).

107	   The complexity of the network grows and the various queuing
108	   mechanisms in the network greatly affect TCP layer performance (i.e.
109	   improper default router settings for queuing, etc.) and devices such
110	   as firewalls, proxies, load-balancers can actively alter the TCP
111	   settings as a TCP session traverses the network (such as window size,
112	   MSS, etc.).  Network providers (and NEMs) are wrestling with end-end
113	   complexities of the above and there is a strong interest in the
114	   standardization of a test methodology to validate end-to-end TCP
115	   performance (as this is the precursor to acceptable end-user
116	   application performance).

118	   So the intent behind this draft TCP throughput work is to define
119	   a methodology for testing sustained TCP layer performance.  In this
120	   document, sustained TCP throughput is that amount of data per unit
121	   time that TCP transports during equilibrium (steady state), i.e.
122	   after the initial slow start phase. We refer to this state as TCP
123	   Equilibrium, and that the equalibrium throughput is the maximum
124	   achievable for the TCP connection(s).

126	   One other important note; the precursor to conducting the TCP tests
127	   test methodlogy is to perform "network stress tests" such as RFC2544
128	   Layer 2/3 tests or other conventional tests (OWAMP, etc.).  It is
129	   highly recommended to run traditional Layer 2/3 type test to verify
130	   the integrity of the network before conducting TCP testing.

132	2. Goals of this Methodology

134	   Before defining the goals of this methodology, it is important to
135	   clearly define the areas that are not intended to be measured or
136	   analyzed by such a methodology.

138	   - The methodology is not intended to predict TCP throughput
139	   behavior during the transient stages of a TCP connection, such
140	   as initial slow start.

142	   - The methodology is not intended to definitively benchmark TCP
143	   implementations of one OS to another, although some users may find
144	   some value in conducting qualitative experiments

146	   - The methodology is not intended to provide detailed diagnosis
147	   of problems within end-points or the network itself as related to
148	   non-optimal TCP performance, although a results interpretation
149	   section for each test step may provide insight into potential
150	   issues within the network

152	   In contrast to the above exclusions, the goals of this methodology
153	   are to define a method to conduct a structured, end-to-end
154	   assessment of sustained TCP performance within a managed business
155	   class IP network.  A key goal is to establish a set of "best
156	   practices" that an engineer should apply when validating the
157	   ability of a managed network to carry end-user TCP applications.

159	   Some specific goals are to:

161	   - Provide a practical test approach that specifies the more well
162	   understood (and end-user configurable) TCP parameters such as Window
163	   size, MSS, # connections, and how these affect the outcome of TCP
164	   performance over a network

166	   - Provide specific test conditions (link speed, RTT, window size,
167	   etc.) and maximum achievable TCP throughput under TCP Equilbrium
168	   conditions.  For guideline purposes, provide examples of these test
169	   conditions and the maximum achievable TCP throughput during the
170	   equilbrium state.  Section 2.1 provides specific details concerning
171	   the definition of TCP Equilibrium within the context of this draft.

173	   - In test situations where the recommended procedure does not yield
174	   the maximum achievable TCP throughput result, this draft provides some
175	   possible areas within the end host or network that should be
176	   considered for investigation (although again, this draft is not
177	   intended to provide a detailed diagnosis of these issues)

179	2.1 TCP Equilibrium State Throughput

181	   TCP connections have three (3) fundamental congestion window phases
182	   as documented in RFC2581.  These states are:

184	   - Slow Start, which occurs during the beginning of a TCP transmission
185	   or after a retransmission time out event

187	   - Congestion avoidance, which is the phase during which TCP ramps up
188	   to establish the maximum attainable throughput on an end-end network
189	   path.  Retransmissions are a natural by-product of the TCP congestion
190	   avoidance algorithm as it seeks to achieve maximum throughput on
191	   the network path.

193	   - Retransmission phase, which include Fast Retransmit (Tahoe) and Fast
194	   Recovery (Reno and New Reno).  When a packet is lost, the Congestion
195	   avoidance phase transitions to a Fast Retransmission or Recovery
196	   Phase dependent upon the TCP implementation.

198	   The following diagram depicts these states.

200	            |        ssthresh
201	   TCP      |           |
202	   Through- |           |       Equilibrium
203	   put      |           |\      /\/\/\/\/\  Retransmit          /\/\ ...
204	            |           | \    /         |  Time-out           /
205	            |           |  \  /          |  _______          _/
206	            |  Slow   _/    |/           | /       | Slow  _/
207	            | Start _/      Congestion   |/        |Start_/   Congestion
208	            |     _/         Avoidance   Loss      |   _/     Avoidance
209	            |   _/                       Event     | _/
210	            | _/                                   |/
211	            |/___________________________________________________________
212	                                                            Time

214	   This TCP methodology provides guidelines to measure the equilibrium
215	   throughput which refers to the maximum sustained rate obtained by
216	   congestion avoidance before packet loss conditions occur (which would
217	   cause the state change from congestion avoidance to a retransmission
218	   phase). All maximum achievable throughputs specified in Section 3 are
219	   with respect to this Equilibrium state.

221	3. TCP Throughput Testing Methodology

223	   This section summarizes the specific test methodology to achieve the
224	   goals listed in Section 2.

226	   As stated in Section 1, it is considered best practice to verify
227	   the integrity of the network by conducting Layer2/3 stress tests
228	   such as RFC2544 or other methods of network stress tests.  If the
229	   network is not performing properly in terms of packet loss, jitter,
230	   etc. then the TCP layer testing will not be meaningful since the
231	   equalibrium throughput would be very difficult to achieve (in a
232	   "dysfunctional" network).

234	   The following provides the sequential order of steps to conduct the
235	   TCP throughput testing methodology:

237	   1. Identify the Path MTU.  Packetization Layer Path MTU Discovery
238	   or PLPMTUD (RFC4821) should be conducted to verify the minimum network
239	   path MTU.  Conducting PLPMTUD establishes the upper limit for the MSS
240	   to be used in subsequent steps.

242	   2. Baseline Round-trip Delay and Bandwidth. These measurements provide
243	   estimates of the ideal TCP window size, which will be used in
244	   subsequent test steps.

246	   3. Single TCP Connection Throughput Tests.  With baseline measurements
247	   of round trip delay and bandwidth, a series of single connection TCP
248	   throughput tests can be conducted to baseline the performance of the
249	   network against expectations.

251	   4. TCP MSS Throughput Testing.  By varying the MSS size of the TCP
252	   connection, the ability of the network to sustain expected TCP
253	   throughput can be verified.

255	   5. Multiple TCP Connection Throughput Tests.  Single connection TCP
256	   testing is a useful first step to measure expected versus actual TCP
257	   performance. The multiple connection test more closely emulates
258	   customer traffic, which comprise many TCP connections over a network
259	   link.

261	   Important to note are some of the key characteristics and
262	   considerations for the TCP test instrument.  The test host may be a
263	   standard computer or dedicated communications test instrument
264	   and these TCP test hosts be capable of emulating both a client and a
265	   server.  As a general rule of thumb, testing TCP throughput at rates
266	   greater than 250-500 Mbit/sec generally requires high performance
267	   server hardware or dedicated hardware based test tools.

269	   Whether the TCP test host is a standard computer or dedicated test
270	   instrument, the following areas should be considered when selecting
271	   a test host:

273	   - TCP implementation used by the test host OS, i.e. Linux OS kernel
274	   using TCP Reno, TCP options supported, etc.  This will obviously be
275	   more important when using custom test equipment where the TCP
276	   implementation may be customized or tuned to run in higher
277	   performance hardware

279	   - Most importantly, the TCP test host must be capable of generating
280	   and receiving stateful TCP test traffic at the full link speed of the
281	   network under test. This requirement is very serious and may require
282	   custom test equipment, especially on 1 GigE and 10 GigE networks.

284	3.1. Determine Network Path MTU

286	   TCP implementations should use Path MTU Discovery techniques (PMTUD),
287	   but this technique does not always prove reliable in real world
288	   situations.  Since PMTUD relies on ICMP messages (to inform the host
289	   that unfragmented transmission cannot occur), it's not always
290	   reliable since many network managers completely disable ICMP.

292	   Increasingly, network providers and enterprises are instituting fixed
293	   MTU sizes on the hosts to eliminate TCP fragmentation issues.

295	   Packetization Layer Path MTU Discovery or PLPMTUD (RFC4821) should
296	   be conducted to verify the minimum network path MTU.  PLPMTUD can
297	   be used with or without ICMP. The following sections provide a
298	   summary of the PLPMTUD approach and an example using the TCP
299	   protocol.

301	   RFC4821 specifies a search_high and search_low parameter for the
302	   MTU.  As specified in RFC4821, a value of 1024 is a generally safe
303	   value to choose for search_low in modern networks.

305	   It is important to determine the overhead of the links in the path,
306	   and then to select a TCP MSS size corresponding to the Layer 3 MTU.
307	   For example, if the MTU is 1024 bytes and the TCP/IP headers are 40
308	   bytes, then the MSS would be set to 984 bytes.

310	   An example scenario is a network where the actual path MTU is 1240
311	   bytes.  The TCP client probe MUST be capable of setting the MSS for
312	   the probe packets and could start at MSS = 984 (which corresponds
313	   to an MTU size of 1024 bytes).

315	   The TCP client probe would open a TCP connection and advertise the
316	   MSS as 984.  Note that the client probe MUST generate these packets
317	   with the DF bit set. The TCP client probe then sends test traffic
318	   per a nominal window size (8KB, etc.).  The window size should be
319	   kept small to minimize the possibility of congesting the network,
320	   which could induce congestive loss.  The duration of the test should
321	   also be short (10-30 seconds), again to minimize congestive effects
322	   during the test.

324	   In the example of a 1240 byte path MTU, probing with an MSS equal to
325	   984 would yield a successful probe and the test client packets would
326	   be successfully transferred to the test server.

328	   Also note that the test client MUST verify that the MSS advertised
329	   is indeed negotiated.  Network devices with built-in Layer 4
330	   capabilities can intercede during the connection establishment
331	   process and reduce the advertised MSS to avoid fragmentation.  This
332	   is certainly a desirable feature from a network perspective, but
333	   can yield erroneous test results if the client test probe does not
334	   confirm the negotiated MSS.

336	   The next test probe would use the search_high value and this would
337	   be set to MSS = 1460 to correspond to a 1500 byte MTU.  In this
338	   example, the test client would retransmit based upon time-outs (since
339	   no ACKs will be received from the test server).  This test probe is
340	   marked as a conclusive failure if none of the test packets are
341	   ACK'ed.  If any of the test packets are ACK'ed, congestive network
342	   may be the cause and the test probe is not conclusive.  Re-testing
343	   at other times of the day is recommended to further isolate.

345	   The test is repeated until the desired granularity of the MTU is
346	   discovered.  The method can yield precise results at the expense of
347	   probing time.  One approach would be to reduce the probe size to
348	   half between the unsuccessful search_high and successful search_low
349	   value, and increase by increments of 1/2 when seeking the upper
350	   limit.

352	3.2. Baseline Round-trip Delay and Bandwidth

354	   Before stateful TCP testing can begin, it is important to baseline
355	   the round trip delay and bandwidth of the network to be tested.
356	   These measurements provide estimates of the ideal TCP window size,
357	   which will be used in subsequent test steps.  These latency and
358	   bandwidth tests should be run over a long enough period of time to
359	   characterize the performance of the network over the course of a
360	   meaningful time period.

362	   One example would be to take samples during various times of the work
363	   day. The goal would be to determine a representative minimum, average,
364	   and maximum RTD and bandwidth for the network under test.  Topology
365	   changes are to be avoided during this time of initial convergence
366	   (e.g. in crossing BGP4 boundaries).

368	   In some cases, baselining bandwidth may not be required, since a
369	   network provider's end-to-end topology may be well enough defined.

371	   3.2.1 Techniques to Measure Round Trip Time

373	   We follow in the definitions used in the references of the appendix;
374	   hence Round Trip Time (RTT) is the time elapsed between the clocking
375	   in of the first bit of a payload packet to the receipt of the last
376	   bit of the corresponding acknowledgement.  Round Trip Delay (RTD)
377	   is used synonymously to twice the Link Latency.

379	   In any method used to baseline round trip delay between network
380	   end-points, it is important to realize that network latency is the
381	   sum of inherent network delay and congestion.  The RTT should be
382	   baselined during "off-peak" hours to obtain a reliable figure for
383	   network latency (versus additional delay caused by congestion).

385	   During the actual sustained TCP throughput tests, it is critical
386	   to measure RTT along with measured TCP throughput. Congestive
387	   effects can be isolated if RTT is concurrently measured

389	   This is not meant to provide an exhaustive list, but summarizes some
390	   of the more common ways to determine round trip time (RTT) through
391	   the network. The desired resolution of the measurement (i.e. msec
392	   versus usec) may dictate whether the RTT measurement can be achieved
393	   with standard tools such as ICMP ping techniques or whether
394	   specialized test equipment would be required with high precision
395	   timers.  The objective in this section is to list several techniques
396	   in order of decreasing accuracy.

398	   - Use test equipment on each end of the network, "looping" the
399	   far-end tester so that a packet stream can be measured end-end.  This
400	   test equipment RTT measurement may be compatible with delay
401	   measurement protocols specified in RFC5357.

403	   - Conduct packet captures of TCP test applications using for example
404	  "iperf" or FTP, etc.  By running multiple experiments, the packet
405	   captures can be studied to estimate RTT based upon the SYN -> SYN-ACK
406	   handshakes within the TCP connection set-up.

408	  - ICMP Pings may also be adequate to provide round trip time
409	   estimations.  Some limitations of ICMP Ping are the msec resolution
410	   and whether the network elements respond to pings (or block them).

412	   3.2.2 Techniques to Measure End-end Bandwidth

414	   There are many well established techniques available to provide
415	   estimated measures of bandwidth over a network.  This measurement
416	   should be conducted in both directions of the network, especially for
417	   access networks which are inherently asymmetrical.  Some of the
418	   asymmetric implications to TCP performance are documented in RFC-3449
419	   and the results of this work will be further studied to determine
420	   relevance to this draft.

422	   The bandwidth measurement test must be run with stateless IP streams
423	   (not stateful TCP) in order to determine the available bandwidth in
424	   each direction.  And this test should obviously be performed at
425	   various intervals throughout a business day (or even across a week).
426	   Ideally, the bandwidth test should produce a log output of the
427	   bandwidth achieved across the test interval AND the round trip delay.

429	   And during the actual TCP level performance measurements (Sections
430	   3.3 - 3.5), the test tool must be able to track round trip time
431	   of the TCP connection(s) during the test.  Measuring round trip time
432	   variation (aka "jitter") provides insight into effects of congestive
433	   delay on the sustained throughput achieved for the TCP layer test.

435	3.3. Single TCP Connection Throughput Tests

437	   This draft specifically defines TCP throughput techniques to verify
438	   sustained TCP performance in a managed business network.  Defined
439	   in section 2.1, the equalibrium throughput reflects the maximum
440	   rate achieved by a TCP connection within the congestion avoidance
441	   phase on a end-end network path.  This section and others will define
442	   the method to conduct these sustained throughput tests and guidelines
443	   of the predicted results.

445	   With baseline measurements of round trip time and bandwidth
446	   from section 3.2, a series of single connection TCP throughput tests
447	   can be conducted to baseline the performance of the network against
448	   expectations.  The optimum TCP window size can be calculated from
449	   the bandwidth delay product (BDP), which is:

451	   BDP = RTT x Bandwidth

453	   By dividing the BDP by 8, the "ideal" TCP window size is calculated.
454	   An example would be a T3 link with 25 msec RTT.  The BDP would equal
455	   ~1,105,000 bits and the ideal TCP window would equal ~138,000 bytes.

457	   The following table provides some representative network link speeds,
458	   latency, BDP, and associated "optimum" TCP window size.  Sustained
459	   TCP transfers should reach nearly 100% throughput, minus the overhead
460	   of Layers 1-3 and the divisor of the MSS into the window.

462	   For this single connection baseline test, the MSS size will effect
463	   the achieved throughput (especially for smaller TCP window sizes).
464	   Table 3.2 provides the achievable, equalibrium TCP
465	   throughput (at Layer 4) using 1000 byte MSS.  Also in this table,
466	   the case of 58 byte L1-L4 overhead including the Ethernet CRC32 is
467	   used for simplicity.

469	   Table 3.2: Link Speed, RTT and calculated BDP, TCP Throughput

471	   Link                               Ideal TCP      Maximum Achievable
472	   Speed*    RTT (ms)  BDP (bits)  Window (kbytes)  TCP Throughput(Mbps)
473	   ----------------------------------------------------------------------
474	    T1         20        30,720          3.84              1.20
475	    T1         50        76,800          9.60 	           1.44
476	    T1        100       153,600         19.20              1.44
477	    T3         10       442,100         55.26             41.60
478	    T3         15       663,150         82.89             41.13
479	    T3         25     1,105,250        138.16             41.92
480	    T3(ATM)    10       407,040         50.88             32.44
481	    T3(ATM)    15       610,560         76.32             32.44
482	    T3(ATM)    25     1,017,600        127.20             32.44
483	    100M        1       100,000         12.50             90.699
484	    100M        2       200,000         25.00             92.815

486	   Link                               Ideal TCP      Maximum Achievable
487	   Speed*    RTT (ms)  BDP (bits)  Window (kbytes)  TCP Throughput (Mbps)
488	   ----------------------------------------------------------------------
489	    100M        5       500,000         62.50             90.699
490	    1Gig      0.1       100,000         12.50            906.991
491	    1Gig      0.5       500,000         62.50            906.991
492	    1Gig        1     1,000,000        125.00            906.991
493	    10Gig     0.05      500,000         62.50          9,069.912
494	    10Gig     0.3     3,000,000        375.00          9,069.912

496	   * Note that link speed is the minimum link speed throughput a network;
497	   i.e. WAN with T1 link, etc.

499	   Also, the following link speeds (available payload bandwidth) were
500	   used for the WAN entries:

502	   - T1 = 1.536 Mbits/sec (B8ZS line encoding facility)
503	   - T3 = 44.21 Mbits/sec (C-Bit Framing)
504	   - T3(ATM) = 36.86 Mbits/sec (C-Bit Framing & PLCP, 96000 Cells per
505	     second)

507	   The calculation method used in this document is a 3 step process :

509	   1 - We determine what should be the optimal TCP Window size value
510	       based on the optimal quantity of "in-flight" octets discovered by
511	       the BDP calculation. We take into consideration that the TCP
512	       Window size has to be an exact multiple value of the MSS.
513	   2 - Then we calculate the achievable layer 2 throughput by multiplying
514	       the value determined in step 1 with the MSS & (MSS + L2 + L3 + L4
515	       Overheads) divided by the RTT.
516	   3 - Finally, we multiply the calculated value of step 2 by the MSS
517	       versus (MSS + L2 + L3 + L4 Overheads) ratio.

519	   This gives us the achievable TCP Throughput value.  Sometimes, the
520	   maximum achievable throughput is limited by the maximum achievable
521	   quantity of Ethernet Frames per second on the physical media. Then
522	   this value is used in step 2 instead of the calculated one.

524	   There are several TCP tools that are commonly used in the network
525	   provider world and one of the most common is the "iperf" tool.  With
526	   this tool, hosts are installed at each end of the network segment;
527	   one as client and the other as server.  The TCP Window size of both
528	   the client and the server can be maunally set and the achieved
529	   throughput is measured, either uni-directionally or bi-directionally.
530	   For higher BDP situations in lossy networks (long fat networks or
531	   satellite links, etc.), TCP options such as Selective Acknowledgment
532	   should be considered and also become part of the window
533	   size / throughput characterization.

535	The following diagram shows the achievable TCP throughput on a T3 with
536	the default Windows2000/XP TCP Window size of 17520 Bytes.

538	           45|
539	             |
540	           40|
541	TCP          |
542	Throughput 35|
543	in Mbps      |
544	           30|
545	             |
546	           25|
547	             |
548	           20|
549	             |
550	           15|         _______ 14.48M
551	             |         |     |
552	           10|         |     |         +-----+ 9.65M
553	             |         |     |         |     |        _______ 5.79M
554	            5|         |     |         |     |        |     |
555	             |_________+_____+_________+_____+________+____ +___________
556	                          10              15             25
557	                                RTT in milliseconds

559	The following diagram shows the achievable TCP throughput on a 25ms T3
560	when the TCP Window size is increased and with the RFC1323 TCP Window
561	scaling option.

563	           45|
564	             |                                              +-----+42.47M
565	           40|                                              |     |
566	TCP          |                                              |     |
567	Throughput 35|                                              |     |
568	in Mbps      |                                              |     |
569	           30|                                              |     |
570	             |                                              |     |
571	           25|                                              |     |
572	             |                                ______ 21.23M |     |
573	           20|                                |    |        |     |
574	             |                                |    |        |     |
575	           15|                                |    |        |     |
576	             |                                |    |        |     |
577	           10|               +----+10.62M     |    |        |     |
578	             |  _______5.31M |    |           |    |        |     |
579	            5|  |     |      |    |           |    |        |     |
580	             |__+_____+______+____+___________+____+________+_____+___
581	                   16           32           64              128
582	                               TCP Window size in Kili Bytes

584	   The single connection TCP throughput test must be run over a
585	   a long duration and results must be logged at the desired interval.
586	   The test must record RTT and TCP retransmissions at each interval.

588	   This correlation of retransmissions and RTT over the course of the
589	   test will clearly identify which portions of the transfer reached
590	   TCP Equilbrium state and to what effect increased RTT (congestive
591	   effects) may have been the cause of reduced equilibrium performance.

593	   Host hardware performance must be well understood before conducting
594	   this TCP single connection test and other tests in this section.
595	   Dedicated test equipment may be required, especially for line rates
596	   of GigE and 10 GigE.

598	3.3.1 Interpretation of the Single Connection TCP Throughput Results

600	   At the end of this step, the user will document the theoretical BDP
601	   and a set of Window size experiments with measured TCP throughput for
602	   each TCP window size setting.  For cases where the sustained TCP
603	   throughput does not equal the predicted value, some possible causes
604	   are listed:

606	   - Network congestion causing packet loss
607	   - Network congestion not causing packet loss, but effectively
608	   increasing the size of the required TCP window during the transfer
609	   - Intermediate network devices which actively regenerate the TCP
610	   connection and can alter window size, MSS, etc.

612	3.4. TCP MSS Throughput Testing

614	   This test setup should be conducted as a single TCP connection test.
615	   By varying the MSS size of the TCP connection, the ability of the
616	   network to sustain expected TCP throughput can be verified.  This is
617	   similar to frame and packet size techniques within RFC2-2544, which
618	   aim to determine the ability of the routing/switching devices to
619	   handle loads in term of packets/frames per second at various frame
620	   and packet sizes.  This test can also further characterize the
621	   performance of a network in the presence of active TCP elements
622	   (proxies, etc.), devices that fragment IP packets, and the actual
623	   end hosts themselves (servers, etc.).

625	3.4.1 MSS Size Testing Method

627	   The single connection testing listed in Section 3.3 should be
628	   repeated, using the appropriate window size and collecting
629	   throughput measurements per various MSS sizes.

631	   The following are the typical sizes of MSS settings for various
632	   link speeds:

634	   - 256 bytes for very low speed links such as 9.6Kbps (per RFC1144).
635	   - 536 bytes for low speed links (per RFC879) .
636	   - 966 bytes for SLIP high speed (per RFC1055).
637	   - 1380 bytes for IPSec VPN Tunnel testing
638	   - 1452 bytes for PPPoE connectivity (per RFC2516)
639	   - 1460 for Ethernet and Fast Ethernet (per RFC895).
640	   - 8960 byte jumbo frames for GigE

642	   Using the optimum window size determined by conducting steps 3.2 and
643	   3.3, a variety of window sizes should be tested according to the link
644	   speed under test.  Using Fast Ethernet with 5 msec RTT as an example,
645	   the optimum TCP window size would be 62.5 kbytes and the recommended
646	   MSS for Fast Ethernet is 1460 bytes.

648	   Link                  Achievable TCP Throughput (Mbps) for
649	   Speed    RTT(ms) MSS=1000 MSS=1260 MSS=1300 MSS=1380 MSS=1420 MSS=1460
650	   ----------------------------------------------------------------------
651	    T1         20 |   1.20     1.008    1.040    1.104   1.136    1.168
652	    T1         50 |   1.44     1.411    1.456    1.335   1.363    1.402
653	    T1        100 |   1.44     1.512    1.456    1.435   1.477    1.402
654	    T3         10 |  41.60    42.336   42.640   41.952  40.032   42.048
655	    T3         15 |  42.13    42.336   42.293   42.688  42.411   42.048
656	    T3         25 |  41.92    42.336   42.432   42.394  42.714   42.515
657	    T3(ATM)    10 |  32.44    33.815   34.477   35.482  36.022   36.495
658	    T3(ATM)    15 |  32.44    34.120   34.477   35.820  36.022   36.127
659	    T3(ATM)    25 |  32.44    34.363   34.860   35.684  36.022   36.274
660	    100M        1 |  90.699   89.093   91.970   86.866  89.424   91.982
661	    100M        2 |  92.815   93.226   93.275   88.505  90.973   93.442
662	    100M        5 |  90.699   92.481   92.697   88.245  90.844   93.442

664	    For GigE and 10GigE, Jumbo frames (9000 bytes) are becoming more
665	    common.  The following table adds jumbo frames to the possible MSS
666	    values.

668	    Link                  Achievable TCP Throughput (Mbps) for
669	   Speed    RTT(ms) MSS=1260 MSS=1300 MSS=1380 MSS=1420 MSS=1460 MSS=8960
670	   ----------------------------------------------------------------------
671	    1Gig      0.1 |  924.812  926.966  882.495  894.240  919.819  713.786
672	    1Gig      0.5 |  924.812  926.966  930.922  932.743  934.467  856.543
673	    1Gig      1.0 |  924.812  926.966  930.922  932.743  934.467  927.922
674	    10Gig     0.05| 9248.125 9269.655 9309.218 9839.790 9344.671 8565.435
675	    10Gig     0.3 | 9248.125 9269.655 9309.218 9839.790 9344.671 9755.079

677	   Each row in the table is a separate test that should be conducted
678	   over a predetermined test interval and the throughput,retransmissions,
679	   and RTT logged during the entire test interval.

681	3.4.2 Interpretation of TCP MSS Throughput Results

683	   For cases where the predicted TCP throughput does not equal the
684	   predicted throughput predicted for a given MSS, some possible causes
685	   are listed:

687	   - TBD

689	3.5. Multiple TCP Connection Throughput Tests

691	   After baselining the network under test with a single TCP connection
692	   (Section 3.3), the nominal capacity of the network has been
693	   determined.  The capacity measured in section 3.3 may be a capacity
694	   range and it is reasonable that some level of tuning may have been
695	   required (i.e. router shaping techniques employed, intermediary
696	   proxy like devices tuned, etc.).

698	   Single connection TCP testing is a useful first step to measure
699	   expected versus actual TCP performance and as a means to diagnose
700	   / tune issues in the network and active elements.  However, the
701	   ultimate goal of this methodology is to more closely emulate customer
702	   traffic, which comprise many TCP connections over a network link.
703	   This methodology inevitably seeks to provide the framework for
704	   testing stateful TCP connections in concurrence with stateless
705	   traffic streams, and this is described in Section 3.5.

707	3.5.1 Multiple TCP Connections - below Link Capacity

709	   First, the ability of the network to carry multiple TCP connections
710	   to full network capacity should be tested.  Prioritization and QoS
711	   settings are not considered during this step, since the network
712	   capacity is not to be exceeded by the test traffic (section 3.5.2
713	   covers the over capacity test case).

715	   For this multiple connection TCP throughput test, the number of
716	   connections will more than likely be limited by the test tool (host
717	   vs. dedicated test equipment).  As an example, for a GigE link with
718	   1 msec RTT, the optimum TCP window would equal ~128 KBytes. So under
719	   this condition, 8 concurrent connections with window size equal to
720	   16KB would fill the GigE link.  For 10G, 80 connections would be
721	   required to accomplish the same.

723	   Just as in section 3.3, the end host or test tool can not be the
724	   processing bottleneck or the throughput measurements will not be
725	   valid.  The test tool must be benchmarked in ideal lab conditions to
726	   verify it's ability to transfer stateful TCP traffic at the given
727	   network line rate.

729	   For this test step, it should be conducted over a reasonable test
730	   duration and results should be logged per interval such as throughput
731	   per connection, RTT, and retransmissions.

733	   Since the network is not to be driven into over capacity (by nature
734	   of the BDP allocated evenly to each connection), this test verifies
735	   the ability of the network to carry multiple TCP connections up to
736	   the link speed of the network.

738	3.5.2 Multiple TCP Connections - over Link Capacity

740	   In this step, the network bandwidth is intentionally exceeded with
741	   multiple TCP connections to test expected prioritization and queuing
742	   within the network.

744	   All conditions related to Section 3.3 set-up apply, especially the
745	   ability of the test hosts to transfer stateful TCP traffic at network
746	   line rates.

748	   Using the same example from Section 3.3, a GigE link with 1 msec
749	   RTT would require a window size of 128 KB to fill the link (with
750	   one TCP connection).  Assuming a 16KB window, 8 concurrent
751	   connections would fill the GigE link capacity and values higher than
752	   8 would over-subscribe the network capacity.  The user would select
753	   values to over-subscribe the network (i.e. possibly 10 15, 20, etc.)
754	   to conduct experiments to verify proper prioritization and queuing
755	   within the network.

757	3.5.3 Interpretation of Multiple TCP Connection Test Restults

759	   Without any prioritization in the network, the over subscribed test
760	   results could assist in the queuing studies.  With proper queuing,
761	   the bandwidth should be shared in a reasonable manner.  The author
762	   understands that the term "reasonable" is too wide open, and future
763	   draft versions of this memo would attempt to quantify this sharing
764	   in more tangible terms.  It is known that if a network element
765	   is not set for proper queuing (i.e. FIFO), then an oversubscribed
766	   TCP connection test will generally show a very uneven distribution of
767	   bandwidth.

769	   With prioritization in the network, different TCP connections can be
770	   assigned various QoS settings via the various mechanisms (i.e. per
771	   VLAN, DSCP, etc.), and the higher priority connections must be
772	   verified to achieve the expected throughput.

774	4.  Acknowledgements

776	   The author would like to thank Gilles Forget, Loki Jorgenson,
777	   and Reinhard Schrage for technical review and contributions to this
778	   draft-00 memo.

780	   Also thanks to Matt Mathis and Matt Zekauskas for many good comments
781	   through email exchange and for pointing me to great sources of
782	   information pertaining to past works in the TCP capacity area.

784	5.  References

786	   [RFC2581]  Allman, M., Paxson, V., Stevens W., "TCP Congestion
787	              Control", RFC 2581, May 1999.

789	   [RFC3148]  Mathis M., Allman, M., "A Framework for Defining
790	              Empirical Bulk Transfer Capacity Metrics", RFC 3148, July
791	              2001.

793	   [RFC2544]  Bradner, S., McQuaid, J., "Benchmarking Methodology for
794	              Network Interconnect Devices", RFC 2544, May 1999

796	   [RFC3449]  Balakrishnan, H., Padmanabhan, V. N., Fairhurst, G.,
797	              Sooriyabandara, M., "TCP Performance Implications of
798	              Network Path Asymmetry", RFC 3449, December 2002

800	   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz,
801	              J., "A Two-Way Active Measurement Protocol (TWAMP)",
802	              RFC 5357, October 2008

804	   [RFC4821]  Mathis, M., Heffner, J., "Packetization Layer Path MTU
805	              Discovery", RFC 4821, May 2007

807	              draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk
808	              Transfer Capacity Methodology for Cooperating Hosts",
809	              August 2001

811	   [MSMO]     The Macroscopic Behavior of the TCP Congestion Avoidance
812	              Algorithm Mathis, M.,Semke, J, Mahdavi, J, Ott, T
813	              July 1997 SIGCOMM Computer Communication Review,
814	              Volume 27 Issue 3

816	   [Stevens Vol1]  TCP/IP Illustrated, Vol1, The Protocols
817	              Addison-Wesley

819	Authors' Addresses

821	   Barry Constantine
822	   JDSU, Test and Measurement Division
823	   One Milesone Center Court
824	   Germantown, MD 20876-7100
825	   USA

827	   Phone: +1 240 404 2227
828	   Email: barry.constantine@jdsu.com

830	   Gilles Forget
831	   Independent Consultant to Bell Canada.
832	   308, rue de Monaco, St-Eustache
833	   Qc. CANADA, Postal Code : J7P-4T5

835	   Phone: (514) 895-8212
836	   gilles.forget@sympatico.ca

838	   Loki Jorgenson
839	   Apparent Networks

841	   Phone: (604) 433-2333 ext 105
842	   ljorgenson@apparentnetworks.com

844	   Reinhard Schrage
845	   Schrage Consulting

847	   Phone: +49 (0) 5137 909540
848	   reinhard@schrageconsult.com