idnits 2.17.1 

draft-ietf-ippm-tcp-throughput-tm-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 20 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 67 instances of too long lines in the document, the longest
     one being 2 characters in excess of 72.

  ** There are 2 instances of lines with control characters in the document.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 374: '...TCP client probe MUST be capable of se...'
     RFC 2119 keyword, line 379: '...the client probe MUST generate these p...'
     RFC 2119 keyword, line 391: '... the test client MUST verify that the ...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 684 has weird spacing: '... Window    to ...'

  -- The document date (August 12, 2010) is 4999 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC2581' is defined on line 883, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3148' is defined on line 886, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2544' is defined on line 889, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3449' is defined on line 892, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5357' is defined on line 896, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4821' is defined on line 900, but no explicit
     reference was found in the text

  == Unused Reference: 'MSMO' is defined on line 906, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)


     Summary: 8 errors (**), 0 flaws (~~), 10 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                       B. Constantine
2	Internet-Draft                                                 	      JDSU
3	Intended status: Informational                                   G. Forget
4	Expires: February 12, 2011                   Bell Canada (Ext. Consultant)
5	                                                              L. Jorgenson
6	                                                                   nooCore
7	                                                          Reinhard Schrage
8	                                                        Schrage Consulting
9	                                                           August 12, 2010

11	                    TCP Throughput Testing Methodology
12	                draft-ietf-ippm-tcp-throughput-tm-05.txt

14	Abstract

16	   This memo describes a methodology for measuring sustained TCP
17	   throughput performance in an end-to-end managed network environment.
18	   This memo is intended to provide a practical approach to help users
19	   validate the TCP layer performance of a managed network, which should
20	   provide a better indication of end-user application level experience.
21	   In the methodology, various TCP and network parameters are identified
22	   that should be tested as part of the network verification at the TCP
23	   layer.

25	Status of this Memo

27	   This Internet-Draft is submitted to IETF in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF), its areas, and its working groups.  Note that
32	   other groups may also distribute working documents as Internet-
33	   Drafts.  Creation date August 12, 2010.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   The list of current Internet-Drafts can be accessed at
41	   http://www.ietf.org/ietf/1id-abstracts.txt.

43	   The list of Internet-Draft Shadow Directories can be accessed at
44	   http://www.ietf.org/shadow.html.

46	           This Internet-Draft will expire on February 12, 2011.

48	   Copyright Notice

50	   Copyright (c) 2010 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (http://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the BSD License.

63	Table of Contents

65	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
66	   2.  Goals of this Methodology. . . . . . . . . . . . . . . . . . .  4
67	     2.1   TCP Equilibrium State Throughput . . . . . . . . . . . . .  5
68	     2.2   Metrics for TCP Throughput Tests . . . . . . . . . . . . .  6
69	   3.  TCP Throughput Testing Methodology . . . . . . . . . . . . . .  7
70	     3.1   Determine Network Path MTU . . . . . . . . . . . . . . . .  8
71	     3.2.  Baseline Round-trip Delay and Bandwidth. . . . . . . . . . 10
72	         3.2.1  Techniques to Measure Round Trip Time . . . . . . . . 10
73	         3.2.2  Techniques to Measure End-end Bandwidth . . . . . . . 11
74	     3.3.  TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 11
75	         3.3.1 Calculate Optimum TCP Window Size. . . . . . . . . . . 12
76	         3.3.2 Conducting the TCP Throughput Tests. . . . . . . . . . 14
77	         3.3.3 Single vs. Multiple TCP Connection Testing . . . . . . 15
78	         3.3.4 Interpretation of the TCP Throughput Results . . . . . 16
79	     3.4. Traffic Management Tests .  . . . . . . . . . . . . . . . . 16
80	         3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 16
81	          3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 17
82	         3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 18
83	          3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 18
84	   4.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
85	   5.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
86	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20

88	1. Introduction

90	   Testing an operational network prior to customer activation is referred
91	   to as "turn-up" testing and the SLA is generally Layer 2/3 packet
92	   throughput, delay, loss and jitter.

94	   Network providers are coming to the realization that Layer 2/3 testing
95	   and TCP layer testing are required to more adequately ensure end-user
96	   satisfaction. Therefore, the network provider community desires to
97	   measure network throughput performance at the TCP layer. Measuring
98	   TCP throughput provides a meaningful measure with respect to the end
99	   user's application SLA (and ultimately reach some level of TCP
100	   testing interoperability which does not exist today).

102	   Additionally, end-users (business enterprises) seek to conduct
103	   repeatable TCP throughput tests between enterprise locations.  Since
104	   these enterprises rely on the networks of the providers, a common test
105	   methodology (and metrics) would be equally beneficial to both parties.

107	   So the intent behind this TCP throughput draft is to define
108	   a methodology for testing sustained TCP layer performance.  In this
109	   document, sustained TCP throughput is that amount of data per unit
110	   time that TCP transports during equilibrium (steady state), i.e.
111	   after the initial slow start phase. We refer to this state as TCP
112	   Equilibrium, and that the equalibrium throughput is the maximum
113	   achievable for the TCP connection(s).

115	   There are many variables to consider when conducting a TCP throughput
116	   test and this methodology focuses on some of the most common
117	   parameters that should be considered such as:

119	   - Path MTU and Maximum Segment Size (MSS)
120	   - RTT and Bottleneck BW
121	   - Ideal TCP Window (Bandwidth Delay Product)
122	   - Single Connection and Multiple Connection testing

124	   One other important note, it is highly recommended that traditional
125	   Layer 2/3 type tests are conducted to verify the integrity of the
126	   network before conducting TCP tests.  Examples include RFC2544, iperf
127	   (UDP mode), or manual packet layer test techniques where packet
128	   throughput, loss, and delay measurements are conducted.

130	2. Goals of this Methodology

132	   Before defining the goals of this methodology, it is important to
133	   clearly define the areas that are not intended to be measured or
134	   analyzed by such a methodology.

136	   - The methodology is not intended to predict TCP throughput
137	   behavior during the transient stages of a TCP connection, such
138	   as initial slow start.

140	   - The methodology is not intended to definitively benchmark TCP
141	   implementations of one OS to another, although some users may find
142	   some value in conducting qualitative experiments

144	   - The methodology is not intended to provide detailed diagnosis
145	   of problems within end-points or the network itself as related to
146	   non-optimal TCP performance, although a results interpretation
147	   section for each test step may provide insight into potential
148	   issues within the network

150	   In contrast to the above exclusions, the goals of this methodology
151	   are to define a method to conduct a structured, end-to-end
152	   assessment of sustained TCP performance within a managed business
153	   class IP network.  A key goal is to establish a set of "best
154	   practices" that an engineer should apply when validating the
155	   ability of a managed network to carry end-user TCP applications.

157	   Some specific goals are to:

159	   - Provide a practical test approach that specifies the more well
160	   understood (and end-user configurable) TCP parameters such as Window
161	   size, MSS (Maximum Segment Size), # connections, and how these affect
162	   the outcome of TCP performance over a network.

164	   - Provide specific test conditions (link speed, RTT, window size,
165	   etc.) and maximum achievable TCP throughput under TCP Equilbrium
166	   conditions.  For guideline purposes, provide examples of these test
167	   conditions and the maximum achievable TCP throughput during the
168	   equilbrium state.  Section 2.1 provides specific details concerning
169	   the definition of TCP Equilibrium within the context of this draft.

171	   - Define two (2) basic metrics that can be used to compare the
172	   performance of TCP connections under various network conditions

174	   - In test situations where the recommended procedure does not yield
175	   the maximum achievable TCP throughput result, this draft provides some
176	   possible areas within the end host or network that should be
177	   considered for investigation (although again, this draft is not
178	   intended to provide a detailed diagnosis of these issues)

180	2.1 TCP Equilibrium State Throughput

182	   TCP connections have three (3) fundamental congestion window phases
183	   as documented in RFC2581.  These states are:

185	   - Slow Start, which occurs during the beginning of a TCP transmission
186	   or after a retransmission time out event

188	   - Congestion avoidance, which is the phase during which TCP ramps up
189	   to establish the maximum attainable throughput on an end-end network
190	   path.  Retransmissions are a natural by-product of the TCP congestion
191	   avoidance algorithm as it seeks to achieve maximum throughput on
192	   the network path.

194	   - Retransmission phase, which include Fast Retransmit (Tahoe) and Fast
195	   Recovery (Reno and New Reno).  When a packet is lost, the Congestion
196	   avoidance phase transitions to a Fast Retransmission or Recovery
197	   Phase dependent upon the TCP implementation.

199	   The following diagram depicts these states.

201	            |        ssthresh
202	   TCP      |           |
203	   Through- |           |       Equilibrium
204	   put      |           |\      /\/\/\/\/\  Retransmit          /\/\ ...
205	            |           | \    /         |  Time-out           /
206	            |           |  \  /          |  _______          _/
207	            |  Slow   _/    |/           | /       | Slow  _/
208	            | Start _/      Congestion   |/        |Start_/   Congestion
209	            |     _/         Avoidance   Loss      |   _/     Avoidance
210	            |   _/                       Event     | _/
211	            | _/                                   |/
212	            |/___________________________________________________________
213	                                                            Time

215	   This TCP methodology provides guidelines to measure the equilibrium
216	   throughput which refers to the maximum sustained rate obtained by
217	   congestion avoidance before packet loss conditions occur (which would
218	   cause the state change from congestion avoidance to a retransmission
219	   phase). All maximum achievable throughputs specified in Section 3 are
220	   with respect to this Equilibrium state.

222	2.2 Metrics for TCP Throughput Tests

224	   This draft focuses on a TCP throughtput methodology and also
225	   provides two basic metrics to compare results of various throughput
226	   tests.  It is recognized that the complexity and unpredictability of
227	   TCP makes it impossible to develop a complete set of metrics that
228	   account for the myriad of variables (i.e. RTT variation, loss
229	   conditions, TCP implementation, etc.).  However, these two basic
230	   metrics faciliate TCP throughput comparisons under varying network
231	   conditions and between network traffic management techniques.

233	   The TCP Efficiency metric is the percentage of bytes that were not
234	   retransmitted and is defined as:

236	                Transmitted Bytes - Retransmitted Bytes
237	                ---------------------------------------  x 100
238	                          Transmitted Bytes

240	   This metric provides a comparative measure between various QoS
241	   mechanisms such as traffic management, congestion avoidance, and also
242	   various TCP implementations (i.e. Reno, Vegas, etc.).

244	   As an example, if 100,000 bytes were sent and 2,000 had to be
245	   retransmitted, the TCP Efficiency would be calculated as:

247	                   100,000 - 2,000
248	                   ----------------  x 100 = 98%
249	                        100,000

251	   Note that the retranmitted bytes may have occurred more than once,
252	   and these multiple retransmissions are added to the bytes retransmitted
253	   count.

255	   The second metric is the TCP Transfer Time, which is simply the time
256	   it takes to transfer a block of data across simultaneous TCP
257	   connections.  The concept is useful when benchmarking traffic
258	   management techniques, where multiple connections are generally
259	   required.

261	   The TCP Transfer time can also be used to provide a normalized ratio of
262	   the actual TCP Transfer Time versus ideal Transfer Time.  This ratio
263	   is called the TCP Transfer Index and is defined as:

265	                     Actual TCP Transfer Time
266	                    -------------------------
267	                     Ideal TCP Transfer Time

269	   An example would be the bulk transfer of 100 MB upon 5 simultaneous TCP
270	   connections over a 500 Mbit/s Ethernet service (each connection
271	   uploading 100 MB).  Each connection may achieve different throughputs
272	   during a test and the overall throughput rate is not always easy to
273	   determine (especially as the number of connections increases).

275	   The ideal TCP Transfer Time would be ~8 seconds, but in this example,
276	   the actual TCP Transfer Time was 12 seconds.  The TCP Transfer Index
277	   would be 12/8 = 1.5, which indicates that the transfer across all
278	   connections took 1.5 times longer than the ideal.

280	   Note that both the TCP Efficiency and TCP Transfer Time metrics must be
281	   measured during each throughput test. The correlation of TCP Transfer
282	   Time with TCP Efficiency can help to diagnose whether the TCP Transfer
283	   Time was negatively impacted by retransmissions (poor TCP Efficiency).

285	3. TCP Throughput Testing Methodology

287	   As stated in Section 1, it is considered best practice to verify
288	   the integrity of the network by conducting Layer2/3 stress tests
289	   such as RFC2544 (or other methods of network stress tests).  If the
290	   network is not performing properly in terms of packet loss, jitter,
291	   etc. then the TCP layer testing will not be meaningful since the
292	   equalibrium throughput would be very difficult to achieve (in a
293	   "dysfunctional" network).

295	   The following represents the sequential order of steps to conduct the
296	   TCP throughput testing methodology:

298	   1. Identify the Path MTU.  Packetization Layer Path MTU Discovery
299	   or PLPMTUD (RFC4821) should be conducted to verify the minimum network
300	   path MTU.  Conducting PLPMTUD establishes the upper limit for the MSS
301	   to be used in subsequent steps.

303	   2. Baseline Round-trip Delay and Bandwidth. These measurements provide
304	   estimates of the ideal TCP window size, which will be used in
305	   subsequent test steps.

307	   3. TCP Connection Throughput Tests.  With baseline measurements
308	   of round trip delay and bandwidth, a series of single and multiple TCP
309	   connection throughput tests can be conducted to baseline the network
310	   performance expectations.

312	   4. Traffic Management Tests.  Various traffic management and queuing
313	   techniques are tested in this step, using multiple TCP connections.
314	   Multiple connection testing can verify that the network is configured
315	   properly for traffic shaping versus policing, various queuing
316	   implementations, and RED.

318	   Important to note are some of the key characteristics and
319	   considerations for the TCP test instrument.  The test host may be a
320	   standard computer or dedicated communications test instrument
321	   and these TCP test hosts be capable of emulating both a client and a
322	   server.

324	   Whether the TCP test host is a standard computer or dedicated test
325	   instrument, the following areas should be considered when selecting
326	   a test host:

328	   - TCP implementation used by the test host OS, i.e. Linux OS kernel
329	   using TCP Reno, TCP options supported, etc.  This will obviously be
330	   more important when using custom test equipment where the TCP
331	   implementation may be customized or tuned to run in higher
332	   performance hardware

334	   - Most importantly, the TCP test host must be capable of generating
335	   and receiving stateful TCP test traffic at the full link speed of the
336	   network under test. As a general rule of thumb, testing TCP throughput
337	   at rates greater than 100 Mbit/sec generally requires high
338	   performance server hardware or dedicated hardware based test tools.

340	   - To measure RTT and TCP Efficiency per connection, this will generally
341	   require dedicated hardware based test tools. In the absence of
342	   dedciated hardware based test tools, these measurements may need to be
343	   conducted with packet capture tools (conduct TCP throughput tests and
344	   analyze RTT and retransmission results with packet captures).

346	3.1. Determine Network Path MTU

348	   TCP implementations should use Path MTU Discovery techniques (PMTUD).
349	   PMTUD relies on ICMP 'need to frag' messages to learn the path MTU.
350	   When a device has a packet to send which has the Don't Fragment (DF)
351	   bit in the IP header set and the packet is larger than the Maximum
352	   Transmission Unit (MTU) of the next hop link, the packet is dropped
353	   and the device sends an ICMP 'need to frag' message back to the host
354	   that originated the packet. The ICMP 'need to frag' message includes
355	   the next hop MTU which PMTUD uses to tune the TCP Maximum Segment
356	   Size (MSS). Unfortunately, because many network managers completely
357	   disable ICMP, this technique does not always prove reliable in real
358	   world situations.

360	   Packetization Layer Path MTU Discovery or PLPMTUD (RFC4821) should
361	   be conducted to verify the minimum network path MTU.  PLPMTUD can
362	   be used with or without ICMP. The following sections provide a
363	   summary of the PLPMTUD approach and an example using the TCP
364	   protocol. RFC4821 specifies a search_high and search_low parameter
365	   for the MTU.  As specified in RFC4821, a value of 1024 is a generally
366	   safe value to choose for search_low in modern networks.

368	   It is important to determine the overhead of the links in the path,
369	   and then to select a TCP MSS size corresponding to the Layer 3 MTU.
370	   For example, if the MTU is 1024 bytes and the TCP/IP headers are 40
371	   bytes, then the MSS would be set to 984 bytes.

373	   An example scenario is a network where the actual path MTU is 1240
374	   bytes.  The TCP client probe MUST be capable of setting the MSS for
375	   the probe packets and could start at MSS = 984 (which corresponds
376	   to an MTU size of 1024 bytes).

378	   The TCP client probe would open a TCP connection and advertise the
379	   MSS as 984.  Note that the client probe MUST generate these packets
380	   with the DF bit set. The TCP client probe then sends test traffic
381	   per a nominal window size (8KB, etc.).  The window size should be
382	   kept small to minimize the possibility of congesting the network,
383	   which could induce congestive loss.  The duration of the test should
384	   also be short (10-30 seconds), again to minimize congestive effects
385	   during the test.

387	   In the example of a 1240 byte path MTU, probing with an MSS equal to
388	   984 would yield a successful probe and the test client packets would
389	   be successfully transferred to the test server.

391	   Also note that the test client MUST verify that the MSS advertised
392	   is indeed negotiated.  Network devices with built-in Layer 4
393	   capabilities can intercede during the connection establishment
394	   process and reduce the advertised MSS to avoid fragmentation.  This
395	   is certainly a desirable feature from a network perspective, but
396	   can yield erroneous test results if the client test probe does not
397	   confirm the negotiated MSS.

399	   The next test probe would use the search_high value and this would
400	   be set to MSS = 1460 to correspond to a 1500 byte MTU.  In this
401	   example, the test client would retransmit based upon time-outs (since
402	   no ACKs will be received from the test server).  This test probe is
403	   marked as a conclusive failure if none of the test packets are
404	   ACK'ed.  If any of the test packets are ACK'ed, congestive network
405	   may be the cause and the test probe is not conclusive.  Re-testing
406	   at other times of the day is recommended to further isolate.

408	   The test is repeated until the desired granularity of the MTU is
409	   discovered.  The method can yield precise results at the expense of
410	   probing time.  One approach would be to reduce the probe size to
411	   half between the unsuccessful search_high and successful search_low
412	   value, and increase by increments of 1/2 when seeking the upper
413	   limit.

415	3.2. Baseline Round-trip Delay and Bandwidth

417	   Before stateful TCP testing can begin, it is important to baseline
418	   the round trip delay and bandwidth of the network to be tested.
419	   These measurements provide estimates of the ideal TCP window size,
420	   which will be used in subsequent test steps.  These latency and
421	   bandwidth tests should be run during the time of day for which
422	   the TCP throughput tests will occur.

424	   The baseline RTT is used to predict the bandwidth delay product and
425	   the TCP Transfer Time for the subsequent throughput tests. Since this
426	   methodology requires that RTT be measured during the entire throughput
427	   test, the extent by which the RTT varied during the throughput test can
428	   be quantified.

430	   3.2.1 Techniques to Measure Round Trip Time

432	   Following the definitions used in the references of the appendix;
433	   Round Trip Time (RTT) is the time elapsed between the clocking in of
434	   the first bit of a payload packet to the receipt of the last bit of the
435	   corresponding acknowledgement.  Round Trip Delay (RTD) is used
436	   synonymously to twice the Link Latency.

438	   In any method used to baseline round trip delay between network
439	   end-points, it is important to realize that network latency is the
440	   sum of inherent network delay and congestion.  The RTT should be
441	   baselined during "off-peak" hours to obtain a reliable figure for
442	   network latency (versus additional delay caused by congestion).

444	   During the actual sustained TCP throughput tests, it is critical
445	   to measure RTT along with measured TCP throughput. Congestive
446	   effects can be isolated if RTT is concurrently measured.

448	   This is not meant to provide an exhaustive list, but summarizes some
449	   of the more common ways to determine round trip time (RTT) through
450	   the network. The desired resolution of the measurement (i.e. msec
451	   versus usec) may dictate whether the RTT measurement can be achieved
452	   with standard tools such as ICMP ping techniques or whether
453	   specialized test equipment would be required with high precision
454	   timers.  The objective in this section is to list several techniques
455	   in order of decreasing accuracy.

457	   - Use test equipment on each end of the network, "looping" the
458	   far-end tester so that a packet stream can be measured end-end.  This
459	   test equipment RTT measurement may be compatible with delay
460	   measurement protocols specified in RFC5357.

462	   - Conduct packet captures of TCP test applications using for example
463	  "iperf" or FTP, etc.  By running multiple experiments, the packet
464	   captures can be studied to estimate RTT based upon the SYN -> SYN-ACK
465	   handshakes within the TCP connection set-up.

467	  - ICMP Pings may also be adequate to provide round trip time
468	   estimations.  Some limitations of ICMP Ping are the msec resolution
469	   and whether the network elements respond to pings (or block them).

471	   3.2.2 Techniques to Measure End-end Bandwidth

473	   There are many well established techniques available to provide
474	   estimated measures of bandwidth over a network.  This measurement
475	   should be conducted in both directions of the network, especially for
476	   access networks which are inherently asymmetrical.  Some of the
477	   asymmetric implications to TCP performance are documented in RFC-3449
478	   and the results of this work will be further studied to determine
479	   relevance to this draft.

481	   The bandwidth measurement test must be run with stateless IP streams
482	   (not stateful TCP) in order to determine the available bandwidth in
483	   each direction.  And this test should obviously be performed at
484	   various intervals throughout a business day (or even across a week).
485	   Ideally, the bandwidth test should produce a log output of the
486	   bandwidth achieved across the test interval AND the round trip delay.

488	   And during the actual TCP level performance measurements (Sections
489	   3.3 - 3.5), the test tool must be able to track round trip time
490	   of the TCP connection(s) during the test.  Measuring round trip time
491	   variation (aka "jitter") provides insight into effects of congestive
492	   delay on the sustained throughput achieved for the TCP layer test.

494	3.3. TCP Throughput Tests

496	   This draft specifically defines TCP throughput techniques to verify
497	   sustained TCP performance in a managed business network.  Defined
498	   in section 2.1, the equalibrium throughput reflects the maximum
499	   rate achieved by a TCP connection within the congestion avoidance
500	   phase on a end-end network path.  This section and others will define
501	   the method to conduct these sustained throughput tests and guidelines
502	   of the predicted results.

504	   With baseline measurements of round trip time and bandwidth
505	   from section 3.2, a series of single and multiple TCP connection
506	   throughput tests can be conducted to baseline network performance
507	   against expectations.

509	   It is recommended to run the tests in each direction independently
510	   first, then run both directions simultaneously.  In each case, the TCP
511	   Efficiency and TCP Transfer Time metrics must be measured in each
512	   direction.

514	3.3.1 Calculate Optimum TCP Window Size

516	   The optimum TCP window size can be calculated from the bandwidth delay
517	   product (BDP), which is:

519	   BDP (bits) = RTT (sec) x Bandwidth (bps)

521	   By dividing the BDP by 8, the "ideal" TCP window size is calculated.
522	   An example would be a T3 link with 25 msec RTT.  The BDP would equal
523	   ~1,105,000 bits and the ideal TCP window would equal ~138,000 bytes.

525	   The following table provides some representative network link speeds,
526	   latency, BDP, and associated "optimum" TCP window size.  Sustained
527	   TCP transfers should reach nearly 100% throughput, minus the overhead
528	   of Layers 1-3 and the divisor of the MSS into the window.

530	   For this single connection baseline test, the MSS size will effect
531	   the achieved throughput (especially for smaller TCP window sizes).
532	   Table 3.2 provides the achievable, equalibrium TCP throughput (at
533	   Layer 4) using 1460 byte MSS.  Also in this table, the case of 58 byte
534	   L1-L4 overhead including the Ethernet CRC32 is used for simplicity.

536	   Table 3.2: Link Speed, RTT and calculated BDP, TCP Throughput

538	   Link                               Ideal TCP      Maximum Achievable
539	   Speed*    RTT (ms)  BDP (bits)  Window (kbytes)  TCP Throughput(Mbps)
540	   ----------------------------------------------------------------------
541	    T1         20        30,720          3.84              1.17
542	    T1         50        76,800          9.60 	           1.40
543	    T1        100       153,600         19.20              1.40
544	    T3         10       442,100         55.26             42.05
545	    T3         15       663,150         82.89             42.05
546	    T3         25     1,105,250        138.16             41.52
547	    T3(ATM)    10       407,040         50.88             36.50
548	    T3(ATM)    15       610,560         76.32             36.23
549	    T3(ATM)    25     1,017,600        127.20             36.27
550	    100M        1       100,000         12.50             91.98
551	    100M        2       200,000         25.00             93.44
552	    100M        5       500,000         62.50             93.44
553	    1Gig      0.1       100,000         12.50            919.82
554	    1Gig      0.5       500,000         62.50            934.47
555	    1Gig        1     1,000,000        125.00            934.47
556	    10Gig     0.05      500,000         62.50          9,344.67
557	    10Gig     0.3     3,000,000        375.00          9,344.67

559	   * Note that link speed is the minimum link speed throughput a network;
560	   i.e. WAN with T1 link, etc.

562	   Also, the following link speeds (available payload bandwidth) were
563	   used for the WAN entries:

565	   - T1 = 1.536 Mbits/sec (B8ZS line encoding facility)
566	   - T3 = 44.21 Mbits/sec (C-Bit Framing)
567	   - T3(ATM) = 36.86 Mbits/sec (C-Bit Framing & PLCP, 96000 Cells per
568	     second)

570	   The calculation method used in this document is a 3 step process :

572	   1 - We determine what should be the optimal TCP Window size value
573	       based on the optimal quantity of "in-flight" octets discovered by
574	       the BDP calculation. We take into consideration that the TCP
575	       Window size has to be an exact multiple value of the MSS.
576	   2 - Then we calculate the achievable layer 2 throughput by multiplying
577	       the value determined in step 1 with the MSS & (MSS + L2 + L3 + L4
578	       Overheads) divided by the RTT.
579	   3 - Finally, we multiply the calculated value of step 2 by the MSS
580	       versus (MSS + L2 + L3 + L4 Overheads) ratio.

582	   This gives us the achievable TCP Throughput value.  Sometimes, the
583	   maximum achievable throughput is limited by the maximum achievable
584	   quantity of Ethernet Frames per second on the physical media. Then
585	   this value is used in step 2 instead of the calculated one.

587	  The following diagram compares achievable TCP throughputs on a T3 link
588	  with Windows 2000/XP TCP window sizes of 16KB versus 64KB.

590	           45|
591	             |          _____42.1M
592	           40|          |64K|
593	TCP          |          |   |
594	Throughput 35|          |   |           _____34.3M
595	in Mbps      |          |   |           |64K|
596	           30|          |   |           |   |
597	             |          |   |           |   |
598	           25|          |   |           |   |
599	             |          |   |           |   |
600	           20|          |   |           |   |           _____20.5M
601	             |          |   |           |   |           |64K|
602	           15| 14.5M____|   |           |   |           |   |
603	             |      |16K|   |           |   |           |   |
604	           10|      |   |   |   9.6M+---+   |           |   |
605	             |      |   |   |       |16K|   |   5.8M____+   |
606	            5|      |   |   |       |   |   |       |16K|   |
607	             |______+___+___+_______+___+___+_______+__ +___+_______
608	                        10              15              25
609	                                RTT in milliseconds

611	   The following diagram shows the achievable TCP throughput on a 25ms T3
612	   when the TCP Window size is increased and with the RFC1323 TCP Window
613	   scaling option.

615	           45|
616	             |                                              +-----+42.47M
617	           40|                                              |     |
618	TCP          |                                              |     |
619	Throughput 35|                                              |     |
620	in Mbps      |                                              |     |
621	           30|                                              |     |
622	             |                                              |     |
623	           25|                                              |     |
624	             |                                ______ 21.23M |     |
625	           20|                                |    |        |     |
626	             |                                |    |        |     |
627	           15|                                |    |        |     |
628	             |                                |    |        |     |
629	           10|               +----+10.62M     |    |        |     |
630	             |  _______5.31M |    |           |    |        |     |
631	            5|  |     |      |    |           |    |        |     |
632	             |__+_____+______+____+___________+____+________+_____+___
633	                   16           32           64              128
634	                               TCP Window size in KBytes

636	3.3.2 Conducting the TCP Throughput Tests

638	   There are several TCP tools that are commonly used in the network
639	   world and one of the most common is the "iperf" tool.  With this tool,
640	   hosts are installed at each end of the network segment; one as client
641	   and the other as server.  The TCP Window size of both the client and
642	   the server can be maunally set and the achieved throughput is measured,
643	   either uni-directionally or bi-directionally.  For higher BDP
644	   situations in lossy networks (long fat networks or satellite links,
645	   etc.), TCP options such as Selective Acknowledgment should be
646	   considered and also become part of the window size / throughput
647	   characterization.

649	   Host hardware performance must be well understood before conducting
650	   the TCP throughput tests and other tests in the following sections.
651	   Dedicated test equipment will generally be required, especially for
652	   line rates of GigE and 10 GigE.

654	   The TCP throughput test should be run over a a long enough duration
655	   to properly exercise network buffers and also characterize performance
656	   during different time periods of the day.  The results must be logged
657	   at the desired interval and the test must record RTT and TCP
658	   retransmissions at each interval.

660	   This correlation of retransmissions and RTT over the course of the
661	   test will clearly identify which portions of the transfer reached
662	   TCP Equilbrium state and to what effect increased RTT (congestive
663	   effects) may have been the cause of reduced equilibrium performance.

665	   Additionally, the TCP Efficiency and TCP Transfer time metrics should
666	   be logged in order to further characterize the window size tests.

668	3.3.3 Single vs. Multiple TCP Connection Testing

670	   The decision whether to conduct single or multiple TCP connection
671	   tests depends upon the size of the BDP in relation to the window sizes
672	   configured in the end-user environment.  For example, if the BDP for a
673	   long-fat pipe turns out to be 2MB, then it is probably more realistic
674	   to test this pipe with multiple connections. Assuming typical host
675	   computer window settings of 64 KB, using 32 connections would
676	   realistically test this pipe.

678	   The following table is provided to illustrate the relationship of the
679	   BDP, window size, and the number of connections required to utilize the
680	   the available capacity.  For this example, the network bandwidth is
681	   500 Mbps, RTT is equal to 5 ms, and the BDP equates to 312 KBytes.

683	              #Connections
684	    Window    to Fill Link
685	   ------------------------
686	    16KB          20
687	    32KB          10
688	    64KB           5
689	    128KB          3

691	   The TCP Transfer Time metric is useful for conducting multiple
692	   connection tests.  Each connection should be configured to transfer
693	   a certain payload (i.e. 100 MB), and the TCP Transfer time provides
694	   a simple metric to verify the actual versus expected results.

696	   Note that the TCP transfer time is the time for all connections to
697	   complete the transfer of the configured payload size.  From the
698	   example table listed above, the 64KB window is considered.  Each of
699	   the 5 connections would be configured to transfer 100MB, and each
700	   TCP should obtain a maximum of 100 Mb/sec per connection.  So for this
701	   example, the 100MB payload should be transferred across the connections
702	   in approximately 8 seconds (which would be the ideal TCP transfer time
703	   for these conditions).

705	   Additionally, the TCP Efficiency metric should be computed for each
706	   connection tested (defined in section 2.2).

708	3.3.4 Interpretation of the TCP Throughput Results

710	   At the end of this step, the user will document the theoretical BDP
711	   and a set of Window size experiments with measured TCP throughput for
712	   each TCP window size setting.  For cases where the sustained TCP
713	   throughput does not equal the predicted value, some possible causes
714	   are listed:

716	   - Network congestion causing packet loss; the TCP Efficiency metric
717	   is a useful gauge to compare network performance
718	   - Network congestion not causing packet loss but increasing RTT
719	   - Intermediate network devices which actively regenerate the TCP
720	   connection and can alter window size, MSS, etc.
721	   - Over utilization of available link or rate limiting (policing). More
722	   discussion of traffic management tests follows in section 3.4

724	3.4. Traffic Management Tests

726	   In most cases, the network connection between two geographic locations
727	   (branch offices, etc.) is lower than the network connection of the
728	   host computers.  An example would be LAN connectivity of GigE and
729	   WAN connectivity of 100 Mbps.  The WAN connectivity may be physically
730	   100 Mbps or logically 100 Mbps (over a GigE WAN connection).  In the
731	   later case, rate limiting is used to provide the WAN bandwidth per the
732	   SLA.

734	   Traffic management techniques are employed to provide various forms of
735	   QoS, the more common include:

737	   - Traffic Shaping
738	   - Priority Queuing
739	   - Random Early Discard (RED, etc.)

741	   Configuring the end-end network with these various traffic management
742	   mechanisms is a complex under-taking.  For traffic shaping and RED
743	   techniques, the end goal is to provide better performance for bursty
744	   traffic such as TCP (RED is specifically intended for TCP).

746	   This section of the methodology provides guidelines to test traffic
747	   shaping and RED implementations.  As in section 3.3, host hardware
748	   performance must be well understood before conducting the traffic
749	   shaping and RED tests. Dedicated test equipment will generally be
750	   required, especially for line rates of GigE and 10 GigE.

752	3.4.1 Traffic Shaping Tests

754	   For services where the available bandwidth is rate limited, there are
755	   two (2) techniques used to implement rate limiting: traffic policing
756	   and traffic shaping.

758	   Simply stated, traffic policing marks and/or drops packets which
759	   exceed the SLA bandwidth (in most cases, excess traffic is dropped).
760	   Traffic shaping employs the use of queues to smooth the bursty
761	   traffic and then send out within the SLA bandwidth limit (without
762	   dropping packets unless the traffic shaping queue is exceeded).

764	   Traffic shaping is generally configured for TCP data services and
765	   can provide improved TCP performance since the retransmissions are
766	   reduced, which in turn optimizes TCP throughput for the given
767	   available bandwidth.  Through this section, the available rate-limited
768	   bandwidth shall be referred to as the "bottleneck bandwidth".

770	   The ability to detect proper traffic shaping is more easily diagnosed
771	   when conducting a multiple TCP connection test.  Proper shaping will
772	   provide a fair distribution of the available bottleneck bandwidth,
773	   while traffic policing will not.

775	   The traffic shaping tests build upon the concepts of multiple
776	   connection testing as defined in section 3.3.3.  Calculating the BDP
777	   for the bottleneck bandwidth is first required and then selecting
778	   the number of connections / window size per connection.

780	   Similar to the example in section 3.3, a typical test scenario might
781	   be:  GigE LAN with a 100Mbps bottleneck bandwidth (rate limited logical
782	   interface), and 5 msec RTT.  This would require five (5) TCP
783	   connections of 64 KB window size evenly fill the bottleneck bandwidth
784	   (about 100 Mbps per connection).

786	   The traffic shaping should be run over a long enough duration to
787	   properly exercise network buffers and also characterize performance
788	   during different time periods of the day.  The throughput of each
789	   connection must be logged during the entire test, along with the TCP
790	   Efficiency and TCP Transfer time metric. Additionally, it is
791	   recommended to log RTT and retransmissions per connection over the test
792	   interval.

794	3.4.1.1 Interpretation of Traffic Shaping Test Restults

796	   By plotting the throughput achieved by each TCP connection, the fair
797	   sharing of the bandwidth is generally very obvious when traffic shaping
798	   is properly configured for the bottleneck interface.  For the previous
799	   example of 5 connections sharing 500 Mbps, each connection would
800	   consume ~100 Mbps with a smooth variation.  If traffic policing was
801	   present on the bottleneck interface, the bandwidth sharing would not
802	   be fair and the resulting throughput plot would reveal "spikey"
803	   connection throughput consumption of the competing TCP connections
804	   (due to the retransmissions).

806	3.4.2 RED Tests

808	   Random Early Discard techniques are specifically targeted to provide
809	   congestion avoidance for TCP traffic.  Before the network element queue
810	   "fills" and enters the tail drop state, RED drops packets at
811	   configurable queue depth thresholds.  This action causes TCP
812	   connections to back-off which helps to prevent tail drop, which in
813	   turn helps to prevent global TCP synchronization.

815	   Again, rate limited interfaces can benefit greatly from RED based
816	   techniques.  Without RED, TCP is generally not able to achieve the full
817	   bandwidth of the bottleneck interface.  With RED enabled, TCP
818	   congestion avoidance throttles the connections on the higher speed
819	   interface (i.e. LAN) and can reach equalibrium with the bottleneck
820	   bandwidth (achieving closer to full throughput).

822	   The ability to detect proper RED configuration is more easily diagnosed
823	   when conducting a multiple TCP connection test.  Multiple TCP
824	   connections provide the multiple bursty sources that emulate the
825	   real-world conditions for which RED was intended.

827	   The RED tests also build upon the concepts of multiple connection
828	   testing as defined in secion 3.3.3.  Calculating the BDP for the
829	   bottleneck bandwidth is first required and then selecting the number of
830	   connections / window size per connection.

832	   For RED testing, the desired effect is to cause the TCP connections to
833	   burst beyond the bottleneck bandwidth so that queue drops will occur.
834	   Using the same example from section 3.4.1 (traffic shaping), the
835	   500 Mbps bottleneck bandwidth requires 5 TCP connections (with window
836	   size of 64Kb) to fill the capacity.  Some experimentation is required,
837	   but it is recommended to start with double the number of connections
838	   to stress the network element buffers / queues.  In this example, 10
839	   connections would produce TCP bursts of 64KB for each connection.
840	   If the timing of the TCP tester permits, these TCP bursts could stress
841	   queue sizes in the 512KB range.  Again experimentation will be required
842	   and the proper number of TCP connections / window size will be dictated
843	   by the size the network element queue.

845	3.4.2.1 Interpretation of RED Results

847	   The default queuing technique for most network devices is FIFO based.
848	   Without RED, the FIFO based queue will cause excessive loss to all of
849	   the TCP connections and in the worst case global TCP synchronization.

851	   By plotting the aggregate throughput achieved on the bottleneck
852	   interface, proper RED operation can be determined if the bottleneck
853	   bandwidth is fully utilized.  For the previous example of 10
854	   connections (window = 64 KB) sharing 500 Mbps, each connection should
855	   consume ~50 Mbps.  If RED was not properly enabled on the interface,
856	   then the TCP connections will retransmit at a higher rate and the net
857	   effect is that the bottleneck bandwidth is not fully utilized.

859	   Another means to study non-RED versus RED implementation is to use
860	   the TCP Transfer Time metric for all of the connections.  In this
861	   example, a 100 MB payload transfer should take ideally 16 seconds
862	   across all 10 connections (with RED enabled).  With RED not enabled,
863	   the throughput across the bottleneck bandwidth would be greatly reduced
864	   (generally 20-40%) and the TCP Transfer time would be proportionally
865	   longer then the ideal transfer time.

867	   Additionally, the TCP Transfer Efficiency metric is useful, since
868	   non-RED implementations will exhibit a lower TCP Tranfer Efficiency
869	   than RED implementations.

871	4.  Acknowledgements

873	   The author would like to thank Gilles Forget, Loki Jorgenson,
874	   and Reinhard Schrage for technical review and original contributions
875	   to this draft-03.

877	   Also thanks to Matt Mathis and Matt Zekauskas for many good comments
878	   through email exchange and for pointing us to great sources of
879	   information pertaining to past works in the TCP capacity area.

881	5.  References

883	   [RFC2581]  Allman, M., Paxson, V., Stevens W., "TCP Congestion
884	              Control", RFC 2581, June 1999.

886	   [RFC3148]  Mathis M., Allman, M., "A Framework for Defining
887	              Empirical Bulk Transfer Capacity Metrics", RFC 3148, July
888	              2001.
889	   [RFC2544]  Bradner, S., McQuaid, J., "Benchmarking Methodology for
890	              Network Interconnect Devices", RFC 2544, June 1999

892	   [RFC3449]  Balakrishnan, H., Padmanabhan, V. N., Fairhurst, G.,
893	              Sooriyabandara, M., "TCP Performance Implications of
894	              Network Path Asymmetry", RFC 3449, December 2002

896	   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz,
897	              J., "A Two-Way Active Measurement Protocol (TWAMP)",
898	              RFC 5357, October 2008

900	   [RFC4821]  Mathis, M., Heffner, J., "Packetization Layer Path MTU
901	              Discovery", RFC 4821, June 2007
902	              draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk
903	              Transfer Capacity Methodology for Cooperating Hosts",
904	              August 2001

906	   [MSMO]     The Macroscopic Behavior of the TCP Congestion Avoidance
907	              Algorithm Mathis, M.,Semke, J, Mahdavi, J, Ott, T
908	              July 1997 SIGCOMM Computer Communication Review,
909	              Volume 27 Issue 3

911	   [Stevens Vol1]  TCP/IP Illustrated, Vol1, The Protocols
912	              Addison-Wesley

914	Authors' Addresses

916	   Barry Constantine
917	   JDSU, Test and Measurement Division
918	   One Milesone Center Court
919	   Germantown, MD 20876-7100
920	   USA

922	   Phone: +1 240 404 2227
923	   barry.constantine@jdsu.com

925	   Gilles Forget
926	   Independent Consultant to Bell Canada.
927	   308, rue de Monaco, St-Eustache
928	   Qc. CANADA, Postal Code : J7P-4T5

930	   Phone: (514) 895-8212
931	   gilles.forget@sympatico.ca

933	   Loki Jorgenson
934	   nooCore

936	   Phone: (604) 908-5833
937	   ljorgenson@nooCore.com

939	   Reinhard Schrage
940	   Schrage Consulting

942	   Phone: +49 (0) 5137 909540
943	   reinhard@schrageconsult.com