idnits 2.17.1 

draft-ietf-ippm-tcp-throughput-tm-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 20 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 59 instances of too long lines in the document, the longest
     one being 2 characters in excess of 72.

  ** There are 2 instances of lines with control characters in the document.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 346: '...TCP client probe MUST be capable of se...'
     RFC 2119 keyword, line 351: '...the client probe MUST generate these p...'
     RFC 2119 keyword, line 363: '... the test client MUST verify that the ...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 124 has weird spacing: '...  It is  highl...'

  == Line 656 has weird spacing: '... Window    to ...'

  -- The document date (July 9, 2010) is 5033 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC2581' is defined on line 855, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3148' is defined on line 858, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2544' is defined on line 861, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3449' is defined on line 864, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5357' is defined on line 868, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4821' is defined on line 872, but no explicit
     reference was found in the text

  == Unused Reference: 'MSMO' is defined on line 879, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)


     Summary: 8 errors (**), 0 flaws (~~), 11 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                       B. Constantine
2	Internet-Draft                                                 	      JDSU
3	Intended status: Informational                                   G. Forget
4	Expires: January 9, 2011                     Bell Canada (Ext. Consultant)
5	                                                              L. Jorgenson
6	                                                                   nooCore
7	                                                          Reinhard Schrage
8	                                                        Schrage Consulting
9	                                                              July 9, 2010

11	                    TCP Throughput Testing Methodology
12	                draft-ietf-ippm-tcp-throughput-tm-04.txt

14	Abstract

16	   This memo describes a methodology for measuring sustained TCP
17	   throughput performance in an end-to-end managed network environment.
18	   This memo is intended to provide a practical approach to help users
19	   validate the TCP layer performance of a managed network, which should
20	   provide a better indication of end-user application level experience.
21	   In the methodology, various TCP and network parameters are identified
22	   that should be tested as part of the network verification at the TCP
23	   layer.

25	Status of this Memo

27	   This Internet-Draft is submitted to IETF in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF), its areas, and its working groups.  Note that
32	   other groups may also distribute working documents as Internet-
33	   Drafts.  Creation date July 9, 2010.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   The list of current Internet-Drafts can be accessed at
41	   http://www.ietf.org/ietf/1id-abstracts.txt.

43	   The list of Internet-Draft Shadow Directories can be accessed at
44	   http://www.ietf.org/shadow.html.

46	           This Internet-Draft will expire on January 9, 2011.

48	   Copyright Notice

50	   Copyright (c) 2010 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (http://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the BSD License.

63	Table of Contents

65	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
66	   2.  Goals of this Methodology. . . . . . . . . . . . . . . . . . .  4
67	     2.1   TCP Equilibrium State Throughput . . . . . . . . . . . . .  5
68	     2.2   Metrics for TCP Throughput Tests . . . . . . . . . . . . .  6
69	   3.  TCP Throughput Testing Methodology . . . . . . . . . . . . . .  6
70	     3.1   Determine Network Path MTU . . . . . . . . . . . . . . . .  8
71	     3.2.  Baseline Round-trip Delay and Bandwidth. . . . . . . . . .  9
72	         3.2.1  Techniques to Measure Round Trip Time . . . . . . . .  9
73	         3.2.2  Techniques to Measure End-end Bandwidth . . . . . . . 10
74	     3.3.  TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 10
75	         3.3.1 Calculate Optimum TCP Window Size. . . . . . . . . . . 11
76	         3.3.2 Conducting the TCP Throughput Tests. . . . . . . . . . 14
77	         3.3.3 Single vs. Multiple TCP Connection Testing . . . . . . 14
78	         3.3.4 Interpretation of the TCP Throughput Results . . . . . 15
79	     3.4. Traffic Management Tests .  . . . . . . . . . . . . . . . . 15
80	         3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 16
81	          3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 17
82	         3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 17
83	          3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 18
84	   4.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18
85	   5.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
86	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20

88	1. Introduction

90	   Even though RFC2544 was meant to benchmark network equipment and
91	   used by network equipment manufacturers (NEMs), network providers
92	   have used it to benchmark operational networks in order to
93	   verify SLAs (Service Level Agreements) before turning on a service
94	   to their business customers.  Testing an operational network prior to
95	   customer activation is referred to as "turn-up" testing and the SLA
96	   is generally Layer 2/3 packet throughput, delay, loss and
97	   jitter.

99	   Network providers are coming to the realization that Layer 2/3 testing
100	   and TCP layer testing are required to more adequately ensure end-user
101	   satisfaction. Therefore, the network provider community desires to
102	   measure network throughput performance at the TCP layer. Measuring
103	   TCP throughput provides a meaningful measure with respect to the end
104	   user's application SLA (and ultimately reach some level of TCP
105	   testing interoperability which does not exist today).

107	   Additionally, end-users (business enterprises) seek to conduct
108	   repeatable TCP throughput tests between enterprise locations.  Since
109	   these enterprises rely on the networks of the providers, a common test
110	   methodology (and metrics) would be equally beneficial to both parties.

112	   So the intent behind this draft TCP throughput work is to define
113	   a methodology for testing sustained TCP layer performance.  In this
114	   document, sustained TCP throughput is that amount of data per unit
115	   time that TCP transports during equilibrium (steady state), i.e.
116	   after the initial slow start phase. We refer to this state as TCP
117	   Equilibrium, and that the equalibrium throughput is the maximum
118	   achievable for the TCP connection(s).

120	   One other important note; the precursor to conducting the TCP tests
121	   test methodlogy is to perform "network stress tests" such as RFC2544
122	   Layer 2/3 tests or other conventional tests.  Examples include
123	   OWAMP or manual packet layer test techniques where packet throughput,
124	   loss, and delay measurements are conducted.  It is  highly recommended
125	   to run traditional Layer 2/3 type test to verify the integrity of the
126	   network before conducting TCP tests.

128	2. Goals of this Methodology

130	   Before defining the goals of this methodology, it is important to
131	   clearly define the areas that are not intended to be measured or
132	   analyzed by such a methodology.

134	   - The methodology is not intended to predict TCP throughput
135	   behavior during the transient stages of a TCP connection, such
136	   as initial slow start.
137	   - The methodology is not intended to definitively benchmark TCP
138	   implementations of one OS to another, although some users may find
139	   some value in conducting qualitative experiments

141	   - The methodology is not intended to provide detailed diagnosis
142	   of problems within end-points or the network itself as related to
143	   non-optimal TCP performance, although a results interpretation
144	   section for each test step may provide insight into potential
145	   issues within the network

147	   In contrast to the above exclusions, the goals of this methodology
148	   are to define a method to conduct a structured, end-to-end
149	   assessment of sustained TCP performance within a managed business
150	   class IP network.  A key goal is to establish a set of "best
151	   practices" that an engineer should apply when validating the
152	   ability of a managed network to carry end-user TCP applications.

154	   Some specific goals are to:

156	   - Provide a practical test approach that specifies the more well
157	   understood (and end-user configurable) TCP parameters such as Window
158	   size, MSS (Maximum Segment Size), # connections, and how these affect
159	   the outcome of TCP performance over a network.

161	   - Provide specific test conditions (link speed, RTT, window size,
162	   etc.) and maximum achievable TCP throughput under TCP Equilbrium
163	   conditions.  For guideline purposes, provide examples of these test
164	   conditions and the maximum achievable TCP throughput during the
165	   equilbrium state.  Section 2.1 provides specific details concerning
166	   the definition of TCP Equilibrium within the context of this draft.

168	   - Define two (2) basic metrics that can be used to compare the
169	   performance of TCP connections under various network conditions

171	   - In test situations where the recommended procedure does not yield
172	   the maximum achievable TCP throughput result, this draft provides some
173	   possible areas within the end host or network that should be
174	   considered for investigation (although again, this draft is not
175	   intended to provide a detailed diagnosis of these issues)

177	2.1 TCP Equilibrium State Throughput

179	   TCP connections have three (3) fundamental congestion window phases
180	   as documented in RFC2581.  These states are:

182	   - Slow Start, which occurs during the beginning of a TCP transmission
183	   or after a retransmission time out event

185	   - Congestion avoidance, which is the phase during which TCP ramps up
186	   to establish the maximum attainable throughput on an end-end network
187	   path.  Retransmissions are a natural by-product of the TCP congestion
188	   avoidance algorithm as it seeks to achieve maximum throughput on
189	   the network path.

191	   - Retransmission phase, which include Fast Retransmit (Tahoe) and Fast
192	   Recovery (Reno and New Reno).  When a packet is lost, the Congestion
193	   avoidance phase transitions to a Fast Retransmission or Recovery
194	   Phase dependent upon the TCP implementation.

196	   The following diagram depicts these states.

198	            |        ssthresh
199	   TCP      |           |
200	   Through- |           |       Equilibrium
201	   put      |           |\      /\/\/\/\/\  Retransmit          /\/\ ...
202	            |           | \    /         |  Time-out           /
203	            |           |  \  /          |  _______          _/
204	            |  Slow   _/    |/           | /       | Slow  _/
205	            | Start _/      Congestion   |/        |Start_/   Congestion
206	            |     _/         Avoidance   Loss      |   _/     Avoidance
207	            |   _/                       Event     | _/
208	            | _/                                   |/
209	            |/___________________________________________________________
210	                                                            Time

212	   This TCP methodology provides guidelines to measure the equilibrium
213	   throughput which refers to the maximum sustained rate obtained by
214	   congestion avoidance before packet loss conditions occur (which would
215	   cause the state change from congestion avoidance to a retransmission
216	   phase). All maximum achievable throughputs specified in Section 3 are
217	   with respect to this Equilibrium state.

219	2.2 Metrics for TCP Throughput Tests

221	   This draft focuses on a TCP throughtput methodology and also
222	   provides two basic metrics to compare results of various throughput
223	   tests.  It is recognized that the complexity and unpredictability of
224	   TCP makes it impossible to develop a complete set of metrics that
225	   account for the myriad of variables (i.e. RTT variation, loss
226	   conditions, TCP implementation, etc.).  However, these two basic
227	   metrics faciliate TCP throughput comparisons under varying network
228	   conditions and between network traffic management techniques.

230	   The TCP Efficiency metric is the percentage of bytes that were not
231	   retransmitted and is defined as:

233	                Transmitted Bytes - Retransmitted Bytes
234	                ---------------------------------------  x 100
235	                          Transmitted Bytes

237	   This metric provides a comparative measure between various QoS
238	   mechanisms such as traffic management, congestion avoidance, and also
239	   various TCP implementations (i.e. Reno, Vegas, etc.).

241	   As an example, if 1000 TCP segments were sent and 20 had to be
242	   retransmitted, the TCP Efficiency would be calculated as:

244	                   1000 - 20
245	                   ---------  x 100 = 98%
246	                     1000

248	   The second metric is the TCP Transfer Time, which is simply the time
249	   it takes to transfer a block of data across simultaneous TCP
250	   connections.  The concept is useful when benchmarking traffic
251	   management techniques, where multiple connections are generally
252	   required.  An example would be the bulk transfer of 10 MB upon 8
253	   separate TCP connections (each connection uploading 10 MB).  Each
254	   connection may achieve different throughputs during a test and the
255	   overall throughput rate is not always easy to determine (especially as
256	   the number of connections increases).  But by defining the TCP Transfer
257	   Time as the total transfer time of 10MB over all 8 connections, the
258	   single transfer time metric is a useful means to compare various
259	   traffic management techniques (i.e. FiFO, WFQ queuing, WRED, etc.).

261	3. TCP Throughput Testing Methodology

263	   This section summarizes the specific test methodology to achieve the
264	   goals listed in Section 2.

266	   As stated in Section 1, it is considered best practice to verify
267	   the integrity of the network by conducting Layer2/3 stress tests
268	   such as RFC2544 (or other methods of network stress tests).  If the
269	   network is not performing properly in terms of packet loss, jitter,
270	   etc. then the TCP layer testing will not be meaningful since the
271	   equalibrium throughput would be very difficult to achieve (in a
272	   "dysfunctional" network).

274	   The following represents the sequential order of steps to conduct the
275	   TCP throughput testing methodology:

277	   1. Identify the Path MTU.  Packetization Layer Path MTU Discovery
278	   or PLPMTUD (RFC4821) should be conducted to verify the minimum network
279	   path MTU.  Conducting PLPMTUD establishes the upper limit for the MSS
280	   to be used in subsequent steps.

282	   2. Baseline Round-trip Delay and Bandwidth. These measurements provide
283	   estimates of the ideal TCP window size, which will be used in
284	   subsequent test steps.

286	   3. TCP Connection Throughput Tests.  With baseline measurements
287	   of round trip delay and bandwidth, a series of single and multiple TCP
288	   connection throughput tests can be conducted to baseline the network
289	   performance expectations.

291	   4. Traffic Management Tests.  Various traffic management and queuing
292	   techniques are tested in this step, using multiple TCP connections.
293	   Multiple connection testing can verify that the network is configured
294	   properly for traffic shaping versus policing, various queuing
295	   implementations, and RED.

297	   Important to note are some of the key characteristics and
298	   considerations for the TCP test instrument.  The test host may be a
299	   standard computer or dedicated communications test instrument
300	   and these TCP test hosts be capable of emulating both a client and a
301	   server.

303	   Whether the TCP test host is a standard computer or dedicated test
304	   instrument, the following areas should be considered when selecting
305	   a test host:

307	   - TCP implementation used by the test host OS, i.e. Linux OS kernel
308	   using TCP Reno, TCP options supported, etc.  This will obviously be
309	   more important when using custom test equipment where the TCP
310	   implementation may be customized or tuned to run in higher
311	   performance hardware
312	   - Most importantly, the TCP test host must be capable of generating
313	   and receiving stateful TCP test traffic at the full link speed of the
314	   network under test. As a general rule of thumb, testing TCP throughput
315	   at rates greater than 100 Mbit/sec generally requires high
316	   performance server hardware or dedicated hardware based test tools.

318	3.1. Determine Network Path MTU

320	   TCP implementations should use Path MTU Discovery techniques (PMTUD).
321	   PMTUD relies on ICMP 'need to frag' messages to learn the path MTU.
322	   When a device has a packet to send which has the Don't Fragment (DF)
323	   bit in the IP header set and the packet is larger than the Maximum
324	   Transmission Unit (MTU) of the next hop link, the packet is dropped
325	   and the device sends an ICMP 'need to frag' message back to the host
326	   that originated the packet. The ICMP 'need to frag' message includes
327	   the next hop MTU which PMTUD uses to tune the TCP Maximum Segment
328	   Size (MSS). Unfortunately, because many network managers completely
329	   disable ICMP, this technique does not always prove reliable in real
330	   world situations.

332	   Packetization Layer Path MTU Discovery or PLPMTUD (RFC4821) should
333	   be conducted to verify the minimum network path MTU.  PLPMTUD can
334	   be used with or without ICMP. The following sections provide a
335	   summary of the PLPMTUD approach and an example using the TCP
336	   protocol. RFC4821 specifies a search_high and search_low parameter
337	   for the MTU.  As specified in RFC4821, a value of 1024 is a generally
338	   safe value to choose for search_low in modern networks.

340	   It is important to determine the overhead of the links in the path,
341	   and then to select a TCP MSS size corresponding to the Layer 3 MTU.
342	   For example, if the MTU is 1024 bytes and the TCP/IP headers are 40
343	   bytes, then the MSS would be set to 984 bytes.

345	   An example scenario is a network where the actual path MTU is 1240
346	   bytes.  The TCP client probe MUST be capable of setting the MSS for
347	   the probe packets and could start at MSS = 984 (which corresponds
348	   to an MTU size of 1024 bytes).

350	   The TCP client probe would open a TCP connection and advertise the
351	   MSS as 984.  Note that the client probe MUST generate these packets
352	   with the DF bit set. The TCP client probe then sends test traffic
353	   per a nominal window size (8KB, etc.).  The window size should be
354	   kept small to minimize the possibility of congesting the network,
355	   which could induce congestive loss.  The duration of the test should
356	   also be short (10-30 seconds), again to minimize congestive effects
357	   during the test.

359	   In the example of a 1240 byte path MTU, probing with an MSS equal to
360	   984 would yield a successful probe and the test client packets would
361	   be successfully transferred to the test server.

363	   Also note that the test client MUST verify that the MSS advertised
364	   is indeed negotiated.  Network devices with built-in Layer 4
365	   capabilities can intercede during the connection establishment
366	   process and reduce the advertised MSS to avoid fragmentation.  This
367	   is certainly a desirable feature from a network perspective, but
368	   can yield erroneous test results if the client test probe does not
369	   confirm the negotiated MSS.

371	   The next test probe would use the search_high value and this would
372	   be set to MSS = 1460 to correspond to a 1500 byte MTU.  In this
373	   example, the test client would retransmit based upon time-outs (since
374	   no ACKs will be received from the test server).  This test probe is
375	   marked as a conclusive failure if none of the test packets are
376	   ACK'ed.  If any of the test packets are ACK'ed, congestive network
377	   may be the cause and the test probe is not conclusive.  Re-testing
378	   at other times of the day is recommended to further isolate.

380	   The test is repeated until the desired granularity of the MTU is
381	   discovered.  The method can yield precise results at the expense of
382	   probing time.  One approach would be to reduce the probe size to
383	   half between the unsuccessful search_high and successful search_low
384	   value, and increase by increments of 1/2 when seeking the upper
385	   limit.

387	3.2. Baseline Round-trip Delay and Bandwidth

389	   Before stateful TCP testing can begin, it is important to baseline
390	   the round trip delay and bandwidth of the network to be tested.
391	   These measurements provide estimates of the ideal TCP window size,
392	   which will be used in subsequent test steps.  These latency and
393	   bandwidth tests should be run over a long enough period of time to
394	   characterize the performance of the network over the course of a
395	   meaningful time period.

397	   One example would be to take samples during various times of the work
398	   day. The goal would be to determine a representative minimum, average,
399	   and maximum RTD and bandwidth for the network under test.  Topology
400	   changes are to be avoided during this time of initial convergence
401	   (e.g. in crossing BGP4 boundaries).

403	   In some cases, baselining bandwidth may not be required, since a
404	   network provider's end-to-end topology may be well enough defined.

406	   3.2.1 Techniques to Measure Round Trip Time

408	   Following the definitions used in the references of the appendix;
409	   Round Trip Time (RTT) is the time elapsed between the clocking in of
410	   the first bit of a payload packet to the receipt of the last bit of the
411	   corresponding acknowledgement.  Round Trip Delay (RTD) is used
412	   synonymously to twice the Link Latency.

414	   In any method used to baseline round trip delay between network
415	   end-points, it is important to realize that network latency is the
416	   sum of inherent network delay and congestion.  The RTT should be
417	   baselined during "off-peak" hours to obtain a reliable figure for
418	   network latency (versus additional delay caused by congestion).

420	   During the actual sustained TCP throughput tests, it is critical
421	   to measure RTT along with measured TCP throughput. Congestive
422	   effects can be isolated if RTT is concurrently measured.

424	   This is not meant to provide an exhaustive list, but summarizes some
425	   of the more common ways to determine round trip time (RTT) through
426	   the network. The desired resolution of the measurement (i.e. msec
427	   versus usec) may dictate whether the RTT measurement can be achieved
428	   with standard tools such as ICMP ping techniques or whether
429	   specialized test equipment would be required with high precision
430	   timers.  The objective in this section is to list several techniques
431	   in order of decreasing accuracy.

433	   - Use test equipment on each end of the network, "looping" the
434	   far-end tester so that a packet stream can be measured end-end.  This
435	   test equipment RTT measurement may be compatible with delay
436	   measurement protocols specified in RFC5357.

438	   - Conduct packet captures of TCP test applications using for example
439	  "iperf" or FTP, etc.  By running multiple experiments, the packet
440	   captures can be studied to estimate RTT based upon the SYN -> SYN-ACK
441	   handshakes within the TCP connection set-up.

443	  - ICMP Pings may also be adequate to provide round trip time
444	   estimations.  Some limitations of ICMP Ping are the msec resolution
445	   and whether the network elements respond to pings (or block them).

447	   3.2.2 Techniques to Measure End-end Bandwidth

449	   There are many well established techniques available to provide
450	   estimated measures of bandwidth over a network.  This measurement
451	   should be conducted in both directions of the network, especially for
452	   access networks which are inherently asymmetrical.  Some of the
453	   asymmetric implications to TCP performance are documented in RFC-3449
454	   and the results of this work will be further studied to determine
455	   relevance to this draft.

457	   The bandwidth measurement test must be run with stateless IP streams
458	   (not stateful TCP) in order to determine the available bandwidth in
459	   each direction.  And this test should obviously be performed at
460	   various intervals throughout a business day (or even across a week).
461	   Ideally, the bandwidth test should produce a log output of the
462	   bandwidth achieved across the test interval AND the round trip delay.

464	   And during the actual TCP level performance measurements (Sections
465	   3.3 - 3.5), the test tool must be able to track round trip time
466	   of the TCP connection(s) during the test.  Measuring round trip time
467	   variation (aka "jitter") provides insight into effects of congestive
468	   delay on the sustained throughput achieved for the TCP layer test.

470	3.3. TCP Throughput Tests

472	   This draft specifically defines TCP throughput techniques to verify
473	   sustained TCP performance in a managed business network.  Defined
474	   in section 2.1, the equalibrium throughput reflects the maximum

476	   rate achieved by a TCP connection within the congestion avoidance
477	   phase on a end-end network path.  This section and others will define
478	   the method to conduct these sustained throughput tests and guidelines
479	   of the predicted results.

481	   With baseline measurements of round trip time and bandwidth
482	   from section 3.2, a series of single and multiple TCP connection
483	   throughput tests can be conducted to baseline network performance
484	   against expectations.

486	3.3.1 Calculate Optimum TCP Window Size

488	   The optimum TCP window size can be calculated from the bandwidth delay
489	   product (BDP), which is:

491	   BDP (bits) = RTT (sec) x Bandwidth (bps)

493	   By dividing the BDP by 8, the "ideal" TCP window size is calculated.
494	   An example would be a T3 link with 25 msec RTT.  The BDP would equal
495	   ~1,105,000 bits and the ideal TCP window would equal ~138,000 bytes.

497	   The following table provides some representative network link speeds,
498	   latency, BDP, and associated "optimum" TCP window size.  Sustained
499	   TCP transfers should reach nearly 100% throughput, minus the overhead
500	   of Layers 1-3 and the divisor of the MSS into the window.

502	   For this single connection baseline test, the MSS size will effect
503	   the achieved throughput (especially for smaller TCP window sizes).
504	   Table 3.2 provides the achievable, equalibrium TCP throughput (at
505	   Layer 4) using 1460 byte MSS.  Also in this table, the case of 58 byte
506	   L1-L4 overhead including the Ethernet CRC32 is used for simplicity.

508	   Table 3.2: Link Speed, RTT and calculated BDP, TCP Throughput

510	   Link                               Ideal TCP      Maximum Achievable
511	   Speed*    RTT (ms)  BDP (bits)  Window (kbytes)  TCP Throughput(Mbps)
512	   ----------------------------------------------------------------------
513	    T1         20        30,720          3.84              1.17
514	    T1         50        76,800          9.60 	           1.40
515	    T1        100       153,600         19.20              1.40
516	    T3         10       442,100         55.26             42.05
517	    T3         15       663,150         82.89             42.05
518	    T3         25     1,105,250        138.16             41.52
519	    T3(ATM)    10       407,040         50.88             36.50
520	    T3(ATM)    15       610,560         76.32             36.23
521	    T3(ATM)    25     1,017,600        127.20             36.27
522	    100M        1       100,000         12.50             91.98
523	    100M        2       200,000         25.00             93.44
524	    100M        5       500,000         62.50             93.44
525	    1Gig      0.1       100,000         12.50            919.82
526	    1Gig      0.5       500,000         62.50            934.47
527	    1Gig        1     1,000,000        125.00            934.47
528	    10Gig     0.05      500,000         62.50          9,344.67
529	    10Gig     0.3     3,000,000        375.00          9,344.67

531	   * Note that link speed is the minimum link speed throughput a network;
532	   i.e. WAN with T1 link, etc.

534	   Also, the following link speeds (available payload bandwidth) were
535	   used for the WAN entries:

537	   - T1 = 1.536 Mbits/sec (B8ZS line encoding facility)
538	   - T3 = 44.21 Mbits/sec (C-Bit Framing)
539	   - T3(ATM) = 36.86 Mbits/sec (C-Bit Framing & PLCP, 96000 Cells per
540	     second)

542	   The calculation method used in this document is a 3 step process :

544	   1 - We determine what should be the optimal TCP Window size value
545	       based on the optimal quantity of "in-flight" octets discovered by
546	       the BDP calculation. We take into consideration that the TCP
547	       Window size has to be an exact multiple value of the MSS.
548	   2 - Then we calculate the achievable layer 2 throughput by multiplying
549	       the value determined in step 1 with the MSS & (MSS + L2 + L3 + L4
550	       Overheads) divided by the RTT.
551	   3 - Finally, we multiply the calculated value of step 2 by the MSS
552	       versus (MSS + L2 + L3 + L4 Overheads) ratio.

554	   This gives us the achievable TCP Throughput value.  Sometimes, the
555	   maximum achievable throughput is limited by the maximum achievable
556	   quantity of Ethernet Frames per second on the physical media. Then
557	   this value is used in step 2 instead of the calculated one.

559	  The following diagram compares achievable TCP throughputs on a T3 link
560	  with Windows 2000/XP TCP window sizes of 16KB versus 64KB.

562	           45|
563	             |          _____42.1M
564	           40|          |64K|
565	TCP          |          |   |
566	Throughput 35|          |   |           _____34.3M
567	in Mbps      |          |   |           |64K|
568	           30|          |   |           |   |
569	             |          |   |           |   |
570	           25|          |   |           |   |
571	             |          |   |           |   |
572	           20|          |   |           |   |           _____20.5M
573	             |          |   |           |   |           |64K|
574	           15| 14.5M____|   |           |   |           |   |
575	             |      |16K|   |           |   |           |   |
576	           10|      |   |   |   9.6M+---+   |           |   |
577	             |      |   |   |       |16K|   |   5.8M____+   |
578	            5|      |   |   |       |   |   |       |16K|   |
579	             |______+___+___+_______+___+___+_______+__ +___+_______
580	                        10              15              25
581	                                RTT in milliseconds

583	   The following diagram shows the achievable TCP throughput on a 25ms T3
584	   when the TCP Window size is increased and with the RFC1323 TCP Window
585	   scaling option.

587	           45|
588	             |                                              +-----+42.47M
589	           40|                                              |     |
590	TCP          |                                              |     |
591	Throughput 35|                                              |     |
592	in Mbps      |                                              |     |
593	           30|                                              |     |
594	             |                                              |     |
595	           25|                                              |     |
596	             |                                ______ 21.23M |     |
597	           20|                                |    |        |     |
598	             |                                |    |        |     |
599	           15|                                |    |        |     |
600	             |                                |    |        |     |
601	           10|               +----+10.62M     |    |        |     |
602	             |  _______5.31M |    |           |    |        |     |
603	            5|  |     |      |    |           |    |        |     |
604	             |__+_____+______+____+___________+____+________+_____+___
605	                   16           32           64              128
606	                               TCP Window size in KBytes

608	3.3.2 Conducting the TCP Throughput Tests

610	   There are several TCP tools that are commonly used in the network
611	   world and one of the most common is the "iperf" tool.  With this tool,
612	   hosts are installed at each end of the network segment; one as client
613	   and the other as server.  The TCP Window size of both the client and
614	   the server can be maunally set and the achieved throughput is measured,
615	   either uni-directionally or bi-directionally.  For higher BDP
616	   situations in lossy networks (long fat networks or satellite links,
617	   etc.), TCP options such as Selective Acknowledgment should be
618	   considered and also become part of the window size / throughput
619	   characterization.

621	   Host hardware performance must be well understood before conducting
622	   the TCP throughput tests and other tests in the following sections.
623	   Dedicated test equipment will generally be required, especially for
624	   line rates of GigE and 10 GigE.

626	   The TCP throughput test should be run over a a long enough duration
627	   to properly exercise network buffers and also characterize performance
628	   during different time periods of the day.  The results must be logged
629	   at the desired interval and the test must record RTT and TCP
630	   retransmissions at each interval.

632	   This correlation of retransmissions and RTT over the course of the
633	   test will clearly identify which portions of the transfer reached
634	   TCP Equilbrium state and to what effect increased RTT (congestive
635	   effects) may have been the cause of reduced equilibrium performance.

637	   Additionally, the TCP Efficiency and TCP Transfer time metrics should
638	   be logged in order to further characterize the window size tests.

640	3.3.3 Single vs. Multiple TCP Connection Testing

642	   The decision whether to conduct single or multiple TCP connection
643	   tests depends upon the size of the BDP in relation to the window sizes
644	   configured in the end-user environment.  For example, if the BDP for a
645	   long-fat pipe turns out to be 2MB, then it is probably more realistic
646	   to test this pipe with multiple connections. Assuming typical host
647	   computer window settings of 64 KB, using 32 connections would
648	   realistically test this pipe.

650	   The following table is provided to illustrate the relationship of the
651	   BDP, window size, and the number of connections required to utilize the
652	   the available capacity.  For this example, the network bandwidth is
653	   500 Mbps, RTT is equal to 5 ms, and the BDP equates to 312 KBytes.

655	              #Connections
656	    Window    to Fill Link
657	   ------------------------
658	    16KB          20
659	    32KB          10
660	    64KB           5
661	    128KB          3

663	   The TCP Transfer Time metric is useful for conducting multiple
664	   connection tests.  Each connection should be configured to transfer
665	   a certain payload (i.e. 100 MB), and the TCP Transfer time provides
666	   a simple metric to verify the actual versus expected results.

668	   Note that the TCP transfer time is the time for all connections to
669	   complete the transfer of the configured payload size.  From the
670	   example table listed above, the 64KB window is considered.  Each of
671	   the 5 connections would be configured to transfer 100MB, and each
672	   TCP should obtain a maximum of 100 Mb/sec per connection.  So for this
673	   example, the 100MB payload should be transferred across the connections
674	   in approximately 8 seconds (which would be the ideal TCP transfer time
675	   for these conditions).

677	   Additionally, the TCP Efficiency metric should be computed for each
678	   connection tested (defined in section 2.2).

680	3.3.4 Interpretation of the TCP Throughput Results

682	   At the end of this step, the user will document the theoretical BDP
683	   and a set of Window size experiments with measured TCP throughput for
684	   each TCP window size setting.  For cases where the sustained TCP
685	   throughput does not equal the predicted value, some possible causes
686	   are listed:

688	   - Network congestion causing packet loss; the TCP Efficiency metric
689	   is a useful gauge to compare network performance
690	   - Network congestion not causing packet loss but increasing RTT
691	   - Intermediate network devices which actively regenerate the TCP
692	   connection and can alter window size, MSS, etc.
693	   - Over utilization of available link or rate limiting (policing). More
694	   discussion of traffic management tests follows in section 3.4

696	3.4. Traffic Management Tests

698	   In most cases, the network connection between two geographic locations
699	   (branch offices, etc.) is lower than the network connection of the
700	   host computers.  An example would be LAN connectivity of GigE and
701	   WAN connectivity of 100 Mbps.  The WAN connectivity may be physically
702	   100 Mbps or logically 100 Mbps (over a GigE WAN connection).  In the
703	   later case, rate limiting is used to provide the WAN bandwidth per the
704	   SLA.

706	   Traffic management techniques are employed to provide various forms of
707	   QoS, the more common include:

709	   - Traffic Shaping
710	   - Priority Queuing
711	   - Random Early Discard (RED, etc.)

713	   Configuring the end-end network with these various traffic management
714	   mechanisms is a complex under-taking.  For traffic shaping and RED
715	   techniques, the end goal is to provide better performance for bursty
716	   traffic such as TCP (RED is specifically intended for TCP).

718	   This section of the methodology provides guidelines to test traffic
719	   shaping and RED implementations.  As in section 3.3, host hardware
720	   performance must be well understood before conducting the traffic
721	   shaping and RED tests. Dedicated test equipment will generally be
722	   required, especially for line rates of GigE and 10 GigE.

724	3.4.1 Traffic Shaping Tests

726	   For services where the available bandwidth is rate limited, there are
727	   two (2) techniques used to implement rate limiting: traffic policing
728	   and traffic shaping.

730	   Simply stated, traffic policing marks and/or drops packets which
731	   exceed the SLA bandwidth (in most cases, excess traffic is dropped).
732	   Traffic shaping employs the use of queues to smooth the bursty
733	   traffic and then send out within the SLA bandwidth limit (without
734	   dropping packets unless the traffic shaping queue is exceeded).

736	   Traffic shaping is generally configured for TCP data services and
737	   can provide improved TCP performance since the retransmissions are
738	   reduced, which in turn optimizes TCP throughput for the given
739	   available bandwidth.  Through this section, the available rate-limited
740	   bandwidth shall be referred to as the "bottleneck bandwidth".

742	   The ability to detect proper traffic shaping is more easily diagnosed
743	   when conducting a multiple TCP connection test.  Proper shaping will
744	   provide a fair distribution of the available bottleneck bandwidth,
745	   while traffic policing will not.

747	   The traffic shaping tests build upon the concepts of multiple
748	   connection testing as defined in section 3.3.3.  Calculating the BDP
749	   for the bottleneck bandwidth is first required and then selecting
750	   the number of connections / window size per connection.

752	   Similar to the example in section 3.3, a typical test scenario might
753	   be:  GigE LAN with a 100Mbps bottleneck bandwidth (rate limited logical
754	   interface), and 5 msec RTT.  This would require five (5) TCP
755	   connections of 64 KB window size evenly fill the bottleneck bandwidth
756	   (about 100 Mbps per connection).

758	   The traffic shaping should be run over a long enough duration to
759	   properly exercise network buffers and also characterize performance
760	   during different time periods of the day.  The throughput of each
761	   connection must be logged during the entire test, along with the TCP
762	   Efficiency and TCP Transfer time metric. Additionally, it is
763	   recommended to log RTT and retransmissions per connection over the test
764	   interval.

766	3.4.1.1 Interpretation of Traffic Shaping Test Restults

768	   By plotting the throughput achieved by each TCP connection, the fair
769	   sharing of the bandwidth is generally very obvious when traffic shaping
770	   is properly configured for the bottleneck interface.  For the previous
771	   example of 5 connections sharing 500 Mbps, each connection would
772	   consume ~100 Mbps with a smooth variation.  If traffic policing was
773	   present on the bottleneck interface, the bandwidth sharing would not
774	   be fair and the resulting throughput plot would reveal "spikey"
775	   connection throughput consumption of the competing TCP connections
776	   (due to the retransmissions).

778	3.4.2 RED Tests

780	   Random Early Discard techniques are specifically targeted to provide
781	   congestion avoidance for TCP traffic.  Before the network element queue
782	   "fills" and enters the tail drop state, RED drops packets at
783	   configurable queue depth thresholds.  This action causes TCP
784	   connections to back-off which helps to prevent tail drop, which in
785	   turn helps to prevent global TCP synchronization.

787	   Again, rate limited interfaces can benefit greatly from RED based
788	   techniques.  Without RED, TCP is generally not able to achieve the full
789	   bandwidth of the bottleneck interface.  With RED enabled, TCP
790	   congestion avoidance throttles the connections on the higher speed
791	   interface (i.e. LAN) and can reach equalibrium with the bottleneck
792	   bandwidth (achieving closer to full throughput).

794	   The ability to detect proper RED configuration is more easily diagnosed
795	   when conducting a multiple TCP connection test.  Multiple TCP
796	   connections provide the multiple bursty sources that emulate the
797	   real-world conditions for which RED was intended.

799	   The RED tests also build upon the concepts of multiple connection
800	   testing as defined in secion 3.3.3.  Calculating the BDP for the
801	   bottleneck bandwidth is first required and then selecting the number of
802	   connections / window size per connection.

804	   For RED testing, the desired effect is to cause the TCP connections to
805	   burst beyond the bottleneck bandwidth so that queue drops will occur.
806	   Using the same example from section 3.4.1 (traffic shaping), the
807	   500 Mbps bottleneck bandwidth requires 5 TCP connections (with window
808	   size of 64Kb) to fill the capacity.  Some experimentation is required,
809	   but it is recommended to start with double the number of connections
810	   to stress the network element buffers / queues.  In this example, 10
811	   connections would produce TCP bursts of 64KB for each connection.
812	   If the timing of the TCP tester permits, these TCP bursts could stress
813	   queue sizes in the 512KB range.  Again experimentation will be required
814	   and the proper number of TCP connections / window size will be dictated
815	   by the size the network element queue.

817	3.4.2.1 Interpretation of RED Results

819	   The default queuing technique for most network devices is FIFO based.
820	   Without RED, the FIFO based queue will cause excessive loss to all of
821	   the TCP connections and in the worst case global TCP synchronization.

823	   By plotting the aggregate throughput achieved on the bottleneck
824	   interface, proper RED operation can be determined if the bottleneck
825	   bandwidth is fully utilized.  For the previous example of 10
826	   connections (window = 64 KB) sharing 500 Mbps, each connection should
827	   consume ~50 Mbps.  If RED was not properly enabled on the interface,
828	   then the TCP connections will retransmit at a higher rate and the net
829	   effect is that the bottleneck bandwidth is not fully utilized.

831	   Another means to study non-RED versus RED implementation is to use
832	   the TCP Transfer Time metric for all of the connections.  In this
833	   example, a 100 MB payload transfer should take ideally 16 seconds
834	   across all 10 connections (with RED enabled).  With RED not enabled,
835	   the throughput across the bottleneck bandwidth would be greatly reduced
836	   (generally 20-40%) and the TCP Transfer time would be proportionally
837	   longer then the ideal transfer time.

839	   Additionally, the TCP Transfer Efficiency metric is useful, since
840	   non-RED implementations will exhibit a lower TCP Tranfer Efficiency
841	   than RED implementations.

843	4.  Acknowledgements

845	   The author would like to thank Gilles Forget, Loki Jorgenson,
846	   and Reinhard Schrage for technical review and contributions to this
847	   draft-03 memo.

849	   Also thanks to Matt Mathis and Matt Zekauskas for many good comments
850	   through email exchange and for pointing us to great sources of
851	   information pertaining to past works in the TCP capacity area.

853	5.  References

855	   [RFC2581]  Allman, M., Paxson, V., Stevens W., "TCP Congestion
856	              Control", RFC 2581, June 1999.

858	   [RFC3148]  Mathis M., Allman, M., "A Framework for Defining
859	              Empirical Bulk Transfer Capacity Metrics", RFC 3148, July
860	              2001.
861	   [RFC2544]  Bradner, S., McQuaid, J., "Benchmarking Methodology for
862	              Network Interconnect Devices", RFC 2544, June 1999

864	   [RFC3449]  Balakrishnan, H., Padmanabhan, V. N., Fairhurst, G.,
865	              Sooriyabandara, M., "TCP Performance Implications of
866	              Network Path Asymmetry", RFC 3449, December 2002

868	   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz,
869	              J., "A Two-Way Active Measurement Protocol (TWAMP)",
870	              RFC 5357, October 2008

872	   [RFC4821]  Mathis, M., Heffner, J., "Packetization Layer Path MTU
873	              Discovery", RFC 4821, June 2007

875	              draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk
876	              Transfer Capacity Methodology for Cooperating Hosts",
877	              August 2001

879	   [MSMO]     The Macroscopic Behavior of the TCP Congestion Avoidance
880	              Algorithm Mathis, M.,Semke, J, Mahdavi, J, Ott, T
881	              July 1997 SIGCOMM Computer Communication Review,
882	              Volume 27 Issue 3

884	   [Stevens Vol1]  TCP/IP Illustrated, Vol1, The Protocols
885	              Addison-Wesley

887	Authors' Addresses

889	   Barry Constantine
890	   JDSU, Test and Measurement Division
891	   One Milesone Center Court
892	   Germantown, MD 20876-7100
893	   USA

895	   Phone: +1 240 404 2227
896	   barry.constantine@jdsu.com

898	   Gilles Forget
899	   Independent Consultant to Bell Canada.
900	   308, rue de Monaco, St-Eustache
901	   Qc. CANADA, Postal Code : J7P-4T5

903	   Phone: (514) 895-8212
904	   gilles.forget@sympatico.ca

906	   Loki Jorgenson
907	   nooCore

909	   Phone: (604) 908-5833
910	   ljorgenson@nooCore.com

912	   Reinhard Schrage
913	   Schrage Consulting

915	   Phone: +49 (0) 5137 909540
916	   reinhard@schrageconsult.com