idnits 2.17.1 

draft-ietf-ippm-tcp-throughput-tm-10.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (January 2, 2011) is 4857 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323)


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                     B. Constantine
2	Internet-Draft                                                      JDSU
3	Intended status: Informational                                 G. Forget
4	Expires: July 2, 2011                      Bell Canada (Ext. Consultant)
5	                                                            Rudiger Geib
6	                                                        Deutsche Telekom
7	                                                        Reinhard Schrage
8	                                                      Schrage Consulting

10	                                                         January 2, 2011

12	                  Framework for TCP Throughput Testing
13	                draft-ietf-ippm-tcp-throughput-tm-10.txt

15	Abstract

17	   This framework describes a methodology for measuring end-to-end TCP
18	   throughput performance in a managed IP network. The intention is to
19	   provide a practical methodology to validate TCP layer performance.
20	   The goal is to provide a better indication of the user experience.
21	   In this framework, various TCP and IP parameters are identified and
22	   should be tested as part of a managed IP network.

24	Requirements Language

26	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
27	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
28	   document are to be interpreted as described in RFC 2119 [RFC2119].

30	Status of this Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at http://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	          This Internet-Draft will expire on July 2, 2011.

47	   Copyright Notice

49	   Copyright (c) 2011 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
65	     1.1   Terminology  . . . . . . . . . . . . . . . . . . . . . . .  4
66	     1.2   Test Set-up  . . . . . . . . . . . . . . . . . . . . . . .  5
67	   2.  Scope and Goals of this methodology. . . . . . . . . . . . . .  5
68	     2.1   TCP Equilibrium. . . . . . . . . . . . . . . . . . . . . .  6
69	   3.  TCP Throughput Testing Methodology . . . . . . . . . . . . . .  7
70	     3.1   Determine Network Path MTU . . . . . . . . . . . . . . . .  9
71	     3.2.  Baseline Round Trip Time and Bandwidth . . . . . . . . . . 10
72	         3.2.1  Techniques to Measure Round Trip Time . . . . . . . . 11
73	         3.2.2  Techniques to Measure end-to-end Bandwidth. . . . . . 12
74	     3.3.  TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 12
75	         3.3.1 Calculate Ideal maximum TCP RWIN Size. . . . . . . . . 12
76	         3.3.2 Metrics for TCP Throughput Tests . . . . . . . . . . . 15
77	         3.3.3 Conducting the TCP Throughput Tests. . . . . . . . . . 19
78	         3.3.4 Single vs. Multiple TCP Connection Testing . . . . . . 19
79	         3.3.5 Interpretation of the TCP Throughput Results . . . . . 20
80	         3.3.6 High Performance Network Options . . . . . . . . . . . 20
81	     3.4. Traffic Management Tests .  . . . . . . . . . . . . . . . . 22
82	         3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 23
83	          3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 23
84	         3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 24
85	          3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 25
86	   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 25
87	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 25
88	   6.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 26
89	   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 26
90	     7.1   Normative References . . . . . . . . . . . . . . . . . . . 26
91	     7.2   Informative References . . . . . . . . . . . . . . . . . . 26

93	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27

95	1. Introduction

97	   Network providers are coming to the realization that Layer 2/3
98	   testing is not enough to adequately ensure end-user's satisfaction.
99	   An SLA (Service Level Agreement) is provided to business customers
100	   and is generally based upon Layer 2/3 criteria such as access rate,
101	   latency, packet loss and delay variations.  On the other hand,
102	   measuring TCP throughput provides meaningful results with respect to
103	   user experience.  Thus, the network provider community desires to
104	   measure IP network throughput performance at the TCP layer.

106	   Additionally, business enterprise customers seek to conduct
107	   repeatable TCP throughput tests between locations.  Since these
108	   enterprises rely on the networks of the providers, a common test
109	   methodology with predefined metrics will benefit both parties.

111	   Note that the primary focus of this methodology is managed business
112	   class IP networks; i.e. those Ethernet terminated services for which
113	   businesses are provided an SLA from the network provider.  End-users
114	   with "best effort" access between locations can use this methodology,
115	   but this framework and its metrics are intended to be used in a
116	   predictable managed IP service environment.

118	   So the intent behind this document is to define a methodology for
119	   testing sustained TCP layer performance.  In this document, the
120	   maximum achievable TCP Throughput is that amount of data per unit
121	   time that TCP transports when trying to reach Equilibrium, i.e.
122	   after the initial slow start and congestion avoidance phases.

124	   TCP is connection oriented and at the transmitting side of the
125	   connection it uses a congestion window, (TCP CWND), to determine how
126	   many packets it can send at one time.  The network path bandwidth
127	   delay product (BDP) determines the ideal TCP CWND.  With the help of
128	   slow start and congestion avoidance mechanisms, TCP probes the IP
129	   network path. So up to the bandwidth limit, a larger TCP CWND permits
130	   a higher throughput.  And up to local host limits, TCP "Slow Start"
131	   and "Congestion Avoidance" algorithms together will determine the TCP
132	   CWND size.  This TCP CWND will vary during the session, but the
133	   Maximum TCP CWND size is tributary to the buffer space allocated by
134	   the kernel for each socket.

136	   At the receiving end of the connection, TCP uses a receive window,
137	   (TCP RWIN), to inform the transmitting end on how many Bytes it is
138	   capabable to receive between acknowledgements (TCP ACK).  This TCP
139	   RWIN will also vary during the session and the Maximum TCP RWIN Size
140	   is also tributary to the buffer space allocated by the kernel for
141	   each socket.

143	   At both end of the TCP connection and for each socket, there are
144	   default buffer sizes that can be changed by programs using system
145	   libraries called just before opening the socket.  There are also
146	   kernel enforced maximum buffer sizes.  These buffer sizes can be
147	   adjusted at both ends (transmitting and receiving).  In order to
148	   obtain the maximum throughput, it is critical to use optimal TCP
149	   Send and Receive Socket Buffer sizes.

151	   Note that some TCP/IP stack implementations are using Receive Window
152	   Auto-Tuning and cannot be adjusted until this feature is disabled.

154	   There are many variables to consider when conducting a TCP throughput
155	   test, but this methodology focuses on:
156	   - RTT and Bottleneck BW
157	   - Ideal Send Socket Buffer (Ideal maximum TCP CWND)
158	   - Ideal Receive Socket Buffer (Ideal maximum TCP RWIN)
159	   - Path MTU and Maximum Segment Size (MSS)
160	   - Single Connection and Multiple Connections testing

162	   This methodology proposes TCP testing that should be performed in
163	   addition to traditional Layer 2/3 type tests.  Layer 2/3 tests are
164	   required to verify the integrity of the network before conducting TCP
165	   tests.  Examples include iperf (UDP mode) or manual packet layer test
166	   techniques where packet throughput, loss, and delay measurements are
167	   conducted.  When available, standardized testing similar to RFC 2544
168	   [RFC2544] but adapted for use in operational networks may be used.
169	   Note: RFC 2544 was never meant to be used outside a lab environment.

171	   The following 2 sections provide a general overview of the test
172	   methodology.

174	1.1 Terminology

176	   Common terminologies used in the test methodology are:

178	   - TCP Throughput Test Device (TCP TTD), refers to compliant TCP
179	     host that generates traffic and measures metrics as defined in
180	     this methodology. i.e. a dedicated communications test instrument.
181	   - Customer Provided Equipment (CPE), refers to customer owned
182	     equipment (routers, switches, computers, etc.)
183	   - Customer Edge (CE), refers to provider owned demarcation device.
184	   - Provider Edge (PE), refers to provider's distribution equipment.
185	   - Bottleneck Bandwidth (BB), lowest bandwidth along the complete
186	     path. Bottleneck Bandwidth and Bandwidth are used synonymously
187	     in this document. Most of the time the Bottleneck Bandwidth is
188	     in the access portion of the wide area network (CE - PE).
189	   - Provider (P), refers to provider core network equipment.
190	   - Network Under Test (NUT), refers to the tested IP network path.
191	   - Round-Trip Time (RTT), refers to Layer 4 back and forth delay.

193	   Figure 1.1 Devices, Links and Paths

195	 +----+ +----+ +----+  +----+ +---+  +---+ +----+  +----+ +----+ +----+
196	 | TCP|-| CPE|-| CE |--| PE |-| P |--| P |-| PE |--| CE |-| CPE|-| TCP|
197	 | TTD| |    | |    |BB|    | |   |  |   | |    |BB|    | |    | | TTD|
198	 +----+ +----+ +----+  +----+ +---+  +---+ +----+  +----+ +----+ +----+
199	        <------------------------ NUT ------------------------->
200	    R >-----------------------------------------------------------|
201	    T                                                             |
202	    T <-----------------------------------------------------------|

204	   Note that the NUT may consist of a variety of devices including but
205	   not limited to, load balancers, proxy servers or WAN acceleration
206	   devices.  The detailed topology of the NUT should be well understood
207	   when conducting the TCP throughput tests, although this methodology
208	   makes no attempt to characterize specific network architectures.

210	   1.2 Test Set-up

212	   This methodology is intended for operational and managed IP networks.
213	   A multitude of network architectures and topologies can be tested.
214	   The above set-up diagram is very general and it only illustrates the
215	   segmentation within end-user and network provider domains.

217	2. Scope and Goals of this Methodology

219	   Before defining the goals, it is important to clearly define the
220	   areas that are out-of-scope.

222	   - This methodology is not intended to predict the TCP throughput
223	   during the transient stages of a TCP connection, such as the initial
224	   slow start.

226	   - This methodology is not intended to definitively benchmark TCP
227	   implementations of one OS to another, although some users may find
228	   some value in conducting qualitative experiments.

230	   - This methodology is not intended to provide detailed diagnosis
231	   of problems within end-points or within the network itself as
232	   related to non-optimal TCP performance, although a results
233	   interpretation section for each test step may provide insight in
234	   regards with potential issues.

236	   - This methodology does not propose to operate permanently with high
237	   measurement loads.  TCP performance and optimization within
238	   operational networks may be captured and evaluated by using data
239	   from the "TCP Extended Statistics MIB" [RFC4898].

241	   - This methodology is not intended to measure TCP throughput as part
242	   of an SLA, or to compare the TCP performance between service
243	   providers or to compare between implementations of this methodology
244	   in dedicated communications test instruments.

246	   In contrast to the above exclusions, a primary goal is to define a
247	   method to conduct a practical, end-to-end assessment of sustained
248	   TCP performance within a managed business class IP network.  Another
249	   key goal is to establish a set of "best practices" that a non-TCP
250	   expert should apply when validating the ability of a managed network
251	   to carry end-user TCP applications.

253	   Specific goals are to :

255	   - Provide a practical test approach that specifies tunable parameters
256	   such as MSS (Maximum Segment Size) and Socket Buffer sizes and how
257	   these affect the outcome of TCP performances over an IP network.
258	   See section 3.3.3.

260	   - Provide specific test conditions like link speed, RTT, MSS, Socket
261	   Buffer sizes and maximum achievable TCP throughput when trying to
262	   reach TCP Equilibrium.  For guideline purposes, provide examples of
263	   test conditions and their maximum achievable TCP throughput.
264	   Section 2.1 provides specific details concerning the definition of
265	   TCP Equilibrium within this methodology while section 3 provides
266	   specific test conditions with examples.

268	   - Define three (3) basic metrics to compare the performance of TCP
269	   connections under various network conditions.  See section 3.3.2.

271	   - In test situations where the recommended procedure does not yield
272	   the maximum achievable TCP throughput results, this methodology
273	   provides some possible areas within the end host or the network that
274	   should be considered for investigation.   Although again, this
275	   methodology is not intended to provide a detailed diagnosis on these
276	   issues.  See section 3.3.5.

278	2.1 TCP Equilibrium

280	   TCP connections have three (3) fundamental congestion window phases:

282	   1 - The Slow Start phase, which occurs at the beginning of a TCP
283	   transmission or after a retransmission time out.

285	   2 - The Congestion Avoidance phase, during which TCP ramps up to
286	   establish the maximum attainable throughput on an end-to-end network
287	   path.  Retransmissions are a natural by-product of the TCP congestion
288	   avoidance algorithm as it seeks to achieve maximum throughput.

290	   3 - The Loss Recovery phase, which could include Fast Retransmit
291	   (Tahoe) or Fast Recovery (Reno & New Reno).  When packet loss occurs,
292	   Congestion Avoidance phase transitions either to Fast Retransmission
293	   or Fast Recovery depending upon TCP implementations.  If a Time-Out
294	   occurs, TCP transitions back to the Slow Start phase.

296	   The following diagram depicts these 3 phases.

298	   Figure 2.1 TCP CWND Phases

300	        /\  |          Trying to reach TCP Equilibrium > > > > > > > > >
301	        /\  |
302	        /\  |High ssthresh  TCP CWND
303	        /\  |Loss Event *   halving    3-Loss Recovery
304	        /\  |          * \  upon loss                         Adjusted
305	        /\  |          *  \    /  \        Time-Out           ssthresh
306	        /\  |          *   \  /    \      +--------+         *
307	   TCP      |          *    \/      \    / Multiple|        *
308	   Through- |          * 2-Congestion\  /  Loss    |        *
309	   put      |         *    Avoidance  \/   Event   |       *
310	            |        *              Half           |     *
311	            |      *                TCP CWND       | * 1-Slow Start
312	            | * 1-Slow Start                      Min TCP CWND after T-O
313	            +-----------------------------------------------------------
314	                       Time > > > > > > > > > > > > > > >

316	   Note : ssthresh = Slow Start threshold.

318	   A well tuned and managed IP network with appropriate TCP adjustments
319	   in it's IP hosts and applications should perform very close to TCP
320	   Equilibrium and to the BB (Bottleneck Bandwidth).

322	   This TCP methodology provides guidelines to measure the maximum
323	   achievable TCP throughput or maximum TCP sustained rate obtained
324	   after TCP CWND has stabilized to an optimal value.  All maximum
325	   achievable TCP throughputs specified in section 3 are with respect to
326	   this condition.

328	   It is important to clarify the interaction between the sender's Send
329	   Socket Buffer and the receiver's advertised TCP RWIN Size.  TCP test
330	   programs such as iperf, ttcp, etc. allow the sender to control the
331	   quantity of TCP Bytes transmitted and unacknowledged (in-flight),
332	   commonly referred to as the Send Socket Buffer.   This is done
333	   independently of the TCP RWIN Size advertised by the
334	   receiver.  Implications to the capabilities of the Throughput Test
335	   Device (TTD) are covered at the end of section 3.

337	3. TCP Throughput Testing Methodology

339	   As stated earlier in section 1, it is considered best practice to
340	   verify the integrity of the network by conducting Layer2/3 tests such
341	   as [RFC2544] or other methods of network stress tests.  Although, it
342	   is important to mention here that RFC 2544 was never meant to be used
343	   outside a lab environment.

345	   If the network is not performing properly in terms of packet loss,
346	   jitter, etc. then the TCP layer testing will not be meaningful.  A
347	   dysfunctional network will not acheive optimal TCP throughputs in
348	   regards with the available bandwidth.

350	   TCP Throughput testing may require cooperation between the end-user
351	   customer and the network provider.  In a Layer 2/3 VPN architecture,
352	   the testing should be conducted either on the CPE or on the CE device
353	   and not on the PE (Provider Edge) router.

355	   The following represents the sequential order of steps for this
356	   testing methodology:

358	   1. Identify the Path MTU.  Packetization Layer Path MTU Discovery
359	   or PLPMTUD, [RFC4821], MUST be conducted to verify the network path
360	   MTU.  Conducting PLPMTUD establishes the upper limit for the MSS to
361	   be used in subsequent steps.

363	   2. Baseline Round Trip Time and Bandwidth. This step establishes the
364	   inherent, non-congested Round Trip Time (RTT) and the bottleneck
365	   bandwidth of the end-to-end network path.  These measurements are
366	   used to provide estimates of the ideal maximum TCP RWIN and Send
367	   Socket Buffer Sizes that SHOULD be used in subsequent test steps.
368	   These measurements reference [RFC2681] and [RFC4898] to measure RTD
369	   and the associated RTT.

371	   3. TCP Connection Throughput Tests.  With baseline measurements
372	   of Round Trip Time and bottleneck bandwidth, single and multiple TCP
373	   connection throughput tests SHOULD be conducted to baseline network
374	   performance expectations.

376	   4. Traffic Management Tests.  Various traffic management and queuing
377	   techniques can be tested in this step, using multiple TCP
378	   connections.  Multiple connections testing should verify that the
379	   network is configured properly for traffic shaping versus policing,
380	   various queuing implementations and Random Early Discards (RED).

382	   Important to note are some of the key characteristics and
383	   considerations for the TCP test instrument.  The test host may be a
384	   standard computer or a dedicated communications test instrument.
385	   In both cases, they must be capable of emulating both client and
386	   server.

388	   The following criteria should be considered when selecting whether
389	   the TCP test host can be a standard computer or has to be a dedicated
390	   communications test instrument:

392	   - TCP implementation used by the test host, OS version, i.e. Linux OS
393	   kernel using TCP New Reno, TCP options supported, etc.  These will
394	   obviously be more important when using dedicated communications test
395	   instruments where the TCP implementation may be customized or tuned
396	   to run in higher performance hardware.  When a compliant TCP TTD is
397	   used, the TCP implementation MUST be identified in the test results.
398	   The compliant TCP TTD should be usable for complete end-to-end
399	   testing through network security elements and should also be usable
400	   for testing network sections.

402	   - More important, the TCP test host MUST be capable to generate
403	   and receive stateful TCP test traffic at the full link speed of the
404	   network under test. Stateful TCP test traffic means that the test
405	   host MUST fully implement a TCP/IP stack; this is generally a comment
406	   aimed at dedicated communications test equipments which sometimes
407	   "blast" packets with TCP headers. As a general rule of thumb, testing
408	   TCP throughput at rates greater than 100 Mbit/sec MAY require high
409	   performance server hardware or dedicated hardware based test tools.

411	   - A compliant TCP Throughput Test Device MUST allow adjusting both
412	   Send and Receive Socket Buffer sizes. The Send Socket Buffer MUST be
413	   large enough to accommodate the maximum TCP CWND Size. The Receive
414	   Socket Buffer MUST be large enough to accommodate the maximum TCP
415	   RWIN Size.

417	   - Measuring RTT and retransmissions per connection will generally
418	   require a dedicated communications test instrument. In the absence of
419	   dedicated hardware based test tools, these measurements may need to
420	   be conducted with packet capture tools, i.e. conduct TCP throughput
421	   tests and analyze RTT and retransmission results in packet captures.
422	   Another option may be to use "TCP Extended Statistics MIB" per
423	   [RFC4898].

425	   - The RFC4821 PLPMTUD test SHOULD be conducted with a dedicated
426	   tester which exposes the ability to run the PLPMTUD algorithm
427	   independent from the OS stack.

429	3.1. Determine Network Path MTU

431	   TCP implementations should use Path MTU Discovery techniques (PMTUD).
432	   PMTUD relies on ICMP 'need to frag' messages to learn the path MTU.
433	   When a device has a packet to send which has the Don't Fragment (DF)
434	   bit in the IP header set and the packet is larger than the Maximum
435	   Transmission Unit (MTU) of the next hop, the packet is dropped and
436	   the device sends an ICMP 'need to frag' message back to the host that
437	   originated the packet. The ICMP 'need to frag' message includes
438	   the next hop MTU which PMTUD uses to tune the TCP Maximum Segment
439	   Size (MSS). Unfortunately, because many network managers completely
440	   disable ICMP, this technique does not always prove reliable.

442	   Packetization Layer Path MTU Discovery or PLPMTUD [RFC4821] MUST then
443	   be conducted to verify the network path MTU.  PLPMTUD can be used
444	   with or without ICMP. The following sections provide a summary of the
445	   PLPMTUD approach and an example using TCP. [RFC4821] specifies a
446	   search_high and a search_low parameter for the MTU.  As specified in
447	   [RFC4821], 1024 Bytes is a safe value for search_low in modern
448	   networks.

450	   It is important to determine the links overhead along the IP path,
451	   and then to select a TCP MSS size corresponding to the Layer 3 MTU.
452	   For example, if the MTU is 1024 Bytes and the TCP/IP headers are 40
453	   Bytes, then the MSS would be set to 984 Bytes.

455	   An example scenario is a network where the actual path MTU is 1240
456	   Bytes.  The TCP client probe MUST be capable of setting the MSS for
457	   the probe packets and could start at MSS = 984 (which corresponds
458	   to an MTU size of 1024 Bytes).

460	   The TCP client probe would open a TCP connection and advertise the
461	   MSS as 984.  Note that the client probe MUST generate these packets
462	   with the DF bit set. The TCP client probe then sends test traffic
463	   per a small default Send Socket Buffer size of ~8KBytes.  It should
464	   be kept small to minimize the possibility of congesting the network,
465	   which may induce packet loss.  The duration of the test should also
466	   be short (10-30 seconds), again to minimize congestive effects
467	   during the test.

469	   In the example of a 1240 Bytes path MTU, probing with an MSS equal to
470	   984 would yield a successful probe and the test client packets would
471	   be successfully transferred to the test server.

473	   Also note that the test client MUST verify that the MSS advertised
474	   is indeed negotiated.  Network devices with built-in Layer 4
475	   capabilities can intercede during the connection establishment and
476	   reduce the advertised MSS to avoid fragmentation.  This is certainly
477	   a desirable feature from a network perspective, but it can yield
478	   erroneous test results if the client test probe does not confirm the
479	   negotiated MSS.

481	   The next test probe would use the search_high value and this would
482	   be set to MSS = 1460 to correspond to a 1500 Bytes MTU.  In this
483	   example, the test client will retransmit based upon time-outs, since
484	   no ACKs will be received from the test server.  This test probe is
485	   marked as a conclusive failure if none of the test packets are
486	   ACK'ed.  If any of the test packets are ACK'ed, congestive network
487	   may be the cause and the test probe is not conclusive.  Re-testing
488	   at other times of the day is recommended to further isolate.

490	   The test is repeated until the desired granularity of the MTU is
491	   discovered.  The method can yield precise results at the expense of
492	   probing time.  One approach may be to reduce the probe size to
493	   half between the unsuccessful search_high and successful search_low
494	   value and raise it by half also when seeking the upper limit.

496	3.2. Baseline Round Trip Time and Bandwidth

498	   Before stateful TCP testing can begin, it is important to determine
499	   the baseline Round Trip Time (non-congested inherent delay) and
500	   bottleneck bandwidth of the end-to-end network to be tested.  These
501	   measurements are used to provide estimates of the ideal maximum TCP
502	   RWIN and Send Socket Buffer Sizes that SHOULD be used in
503	   subsequent test steps.

505	3.2.1 Techniques to Measure Round Trip Time

507	   Following the definitions used in section 1.1, Round Trip Time (RTT)
508	   is the elapsed time between the clocking in of the first bit of a
509	   payload sent packet to the receipt of the last bit of the
510	   corresponding Acknowledgment.  Round Trip Delay (RTD) is used
511	   synonymously to twice the Link Latency.  RTT measurements SHOULD use
512	   techniques defined in [RFC2681] or statistics available from MIBs
513	   defined in [RFC4898].

515	   The RTT SHOULD be baselined during "off-peak" hours to obtain a
516	   reliable figure for inherent network latency versus additional delay
517	   caused by network buffering.  When sampling values of RTT over a test
518	   interval, the minimum value measured SHOULD be used as the baseline
519	   RTT since this will most closely estimate the inherent network
520	   latency.  This inherent RTT is also used to determine the Buffer
521	   Delay Percentage metric which is defined in Section 3.3.2

523	   The following list is not meant to be exhaustive,  although it
524	   summarizes some of the most common ways to determine round trip time.
525	   The desired resolution of the measurement (i.e. msec versus usec) may
526	   dictate whether the RTT measurement can be achieved with ICMP pings
527	   or by a dedicated communications test instrument with precision
528	   timers.

530	   The objective in this section is to list several techniques
531	   in order of decreasing accuracy.

533	   - Use test equipment on each end of the network, "looping" the
534	   far-end tester so that a packet stream can be measured back and forth
535	   from end-to-end. This RTT measurement may be compatible with delay
536	   measurement protocols specified in [RFC5357].

538	   - Conduct packet captures of TCP test sessions using "iperf" or FTP,
539	   or other TCP test applications.   By running multiple experiments,
540	   packet captures can then be analyzed to estimate RTT.  It is
541	   important to note that results based upon the SYN -> SYN-ACK at the
542	   beginning of TCP sessions should be avoided since Firewalls might
543	   slow down 3 way handshakes.

545	   - ICMP pings may also be adequate to provide round trip time
546	   estimates, provided that the packet size is factored into the
547	   estimates (i.e. pings with different packet sizes might be required).
548	   Some limitations with ICMP Ping may include msec resolution and
549	   whether the network elements are responding to pings or not.  Also,
550	   ICMP is often rate-limited and segregated into different buffer
551	   queues and is not as reliable and accurate as in-band measurements.

553	3.2.2 Techniques to Measure end-to-end Bandwidth

555	   Before any TCP Throughput test can be done, bandwidth measurement
556	   tests MUST be run with stateless IP streams (i.e. not stateful TCP)
557	   in order to determine the available bandwidths.   These measurements
558	   SHOULD be conducted in both directions of the network, especially for
559	   access networks, which may be asymmetrical.   These tests should
560	   obviously be performed at various intervals throughout a business day
561	   or even across a week.  Ideally, the bandwidth tests should produce
562	   logged outputs of the achieved bandwidths across the tests durations.

564	   There are many well established techniques available to provide
565	   estimated measures of bandwidth over a network.  It is a common
566	   practice for network providers to conduct Layer2/3 bandwidth capacity
567	   tests using [RFC2544], although it is understood that RFC 2544 was
568	   never meant to be used outside a lab environment.   Ideally, these
569	   bandwidth measurements SHOULD use network capacity techniques as
570	   defined in [RFC5136].

572	   The bandwidth results should be at least 90% of the business customer
573	   SLA or to the IP-type-P Available Path Capacity defined in RFC5136.

575	3.3. TCP Throughput Tests

577	   This methodology specifically defines TCP throughput techniques to
578	   verify sustained TCP performance in a managed business IP network, as
579	   defined in section 2.1. This section and others will define the
580	   method to conduct these sustained TCP throughput tests and guidelines
581	   for the predicted results.

583	   With baseline measurements of round trip time and bandwidth
584	   from section 3.2, a series of single and multiple TCP connection
585	   throughput tests SHOULD be conducted to baseline network performance
586	   against expectations.  The number of trials and the type of testing
587	   (single versus multiple connections) will vary according to the
588	   intention of the test.  One example would be a single connection test
589	   in which the throughput achieved by large Send and Receive Socket
590	   Buffers sizes (i.e. 256KB) is to be measured. It would be advisable
591	   to test performance at various times of the business day.

593	   It is RECOMMENDED to run the tests in each direction independently
594	   first, then run both directions simultaneously.  In each case,
595	   TCP Transfer Time, TCP Efficiency, and Buffer Delay Percentage MUST
596	   be measured in each direction.  These metrics are defined in 3.3.2.

598	3.3.1 Calculate Ideal maximum TCP RWIN Size

600	   The ideal maximum TCP RWIN Size can be calculated from the
601	   bandwidth delay product (BDP), which is:

603	   BDP (bits) = RTT (sec) x Bandwidth (bps)

605	   Note that the RTT is being used as the "Delay" variable in the
606	   BDP calculations.

608	   Then, by dividing the BDP by 8, we obtain the "ideal" maximum TCP
609	   RWIN Size in Bytes.  For optimal results, the Send Socket
610	   Buffer size must be adjusted to the same value at the opposite end
611	   of the network path.

613	   Ideal maximum TCP RWIN = BDP / 8

615	   An example would be a T3 link with 25 msec RTT.  The BDP would equal
616	   ~1,105,000 bits and the ideal maximum TCP RWIN would be ~138 KBytes.

618	   Note that separate calculations are required on asymetrical paths.
619	   An asymetrical path example would be a 90 msec RTT ADSL line with
620	   5Mbps downstream and 640Kbps upstream. The downstream BDP would equal
621	   ~450,000 bits while the upstream one would be only ~57,600 bits.

623	   The following table provides some representative network Link Speeds,
624	   RTT, BDP, and associated Ideal maximum TCP RWIN Sizes.

626	   Table 3.3.1: Link Speed, RTT, calculated BDP & max TCP RWIN

628	      Link                                            Ideal max
629	      Speed*         RTT              BDP             TCP RWIN
630	      (Mbps)         (ms)            (bits)           (KBytes)
631	   ---------------------------------------------------------------------
632	        1.536        20              30,720              3.84
633	        1.536        50              76,800              9.60
634	        1.536       100             153,600             19.20
635	       44.210        10             442,100             55.26
636	       44.210        15             663,150             82.89
637	       44.210        25           1,105,250            138.16
638	      100             1             100,000             12.50
639	      100             2             200,000             25.00
640	      100             5             500,000             62.50
641	    1,000             0.1           100,000             12.50
642	    1,000             0.5           500,000             62.50
643	    1,000             1           1,000,000            125.00
644	   10,000             0.05          500,000             62.50
645	   10,000             0.3         3,000,000            375.00

647	   * Note that link speed is the bottleneck bandwidth (BB) for the NUT

649	   The following serial link speeds are used:
650	   - T1 = 1.536 Mbits/sec (for a B8ZS line encoding facility)
651	   - T3 = 44.21 Mbits/sec (for a C-Bit Framing facility)

653	   The above table illustrates the ideal maximum TCP RWIN.
654	   If a smaller TCP RWIN Size is used, then the TCP Throughput
655	   is not optimal. To calculate the TCP Throughput, the following
656	   formula is used: TCP Throughput = max TCP RWIN X 8 / RTT
657	   An example could be a 100 Mbps IP path with 5 ms RTT and a maximum
658	   TCP RWIN Size of 16KB, then:

660	   TCP Throughput = 16 KBytes X 8 bits / 5 ms.
661	   TCP Throughput = 128,000 bits / 0.005 sec.
662	   TCP Throughput = 25.6 Mbps.

664	   Another example for a T3 using the same calculation formula is
665	   illustrated on the next page:
666	   TCP Throughput = max TCP RWIN X 8 / RTT.
667	   TCP Throughput = 16 KBytes X 8 bits / 10 ms.
668	   TCP Throughput = 128,000 bits / 0.01 sec.
669	   TCP Throughput = 12.8 Mbps.

671	   When the maximum TCP RWIN Size exceeds the BDP (T3 link,
672	   64 KBytes max TCP RWIN on a 10 ms RTT path), the maximum
673	   frames per second limit of 3664 is reached and the formula is:

675	   TCP Throughput = Max FPS X MSS X 8.
676	   TCP Throughput = 3664 FPS X 1460 Bytes X 8 bits.
677	   TCP Throughput = 42.8 Mbps

679	   The following diagram compares achievable TCP throughputs on a T3
680	   with Send Socket Buffer & max TCP RWIN Sizes of 16KB vs. 64KB.

682	   Figure 3.3.1a TCP Throughputs on a T3 at different RTTs

684	           45|
685	             |           _______42.8M
686	           40|           |64KB |
687	TCP          |           |     |
688	Throughput 35|           |     |
689	in Mbps      |           |     |          +-----+34.1M
690	           30|           |     |          |64KB |
691	             |           |     |          |     |
692	           25|           |     |          |     |
693	             |           |     |          |     |
694	           20|           |     |          |     |          _______20.5M
695	             |           |     |          |     |          |64KB |
696	           15|           |     |          |     |          |     |
697	             |12.8M+-----|     |          |     |          |     |
698	           10|     |16KB |     |          |     |          |     |
699	             |     |     |     |8.5M+-----|     |          |     |
700	            5|     |     |     |    |16KB |     |5.1M+-----|     |
701	             |_____|_____|_____|____|_____|_____|____|16KB |_____|_____
702	                        10               15               25
703	                                RTT in milliseconds

705	   The following diagram shows the achievable TCP throughput on a 25ms
706	   T3 when Send Socket Buffer & maximum TCP RWIN Sizes are increased.

708	   Figure 3.3.1b TCP Throughputs on a T3 with different TCP RWIN

710	           45|
711	             |
712	           40|                                             +-----+40.9M
713	TCP          |                                             |     |
714	Throughput 35|                                             |     |
715	in Mbps      |                                             |     |
716	           30|                                             |     |
717	             |                                             |     |
718	           25|                                             |     |
719	             |                                             |     |
720	           20|                               +-----+20.5M  |     |
721	             |                               |     |       |     |
722	           15|                               |     |       |     |
723	             |                               |     |       |     |
724	           10|                  +-----+10.2M |     |       |     |
725	             |                  |     |      |     |       |     |
726	            5|     +-----+5.1M  |     |      |     |       |     |
727	             |_____|_____|______|_____|______|_____|_______|_____|_____
728	                     16           32           64            128*
729	                      maximum TCP RWIN Size in KBytes

731	   * Note that 128KB requires [RFC1323] TCP Window scaling option.

733	3.3.2 Metrics for TCP Throughput Tests

735	   This framework focuses on a TCP throughput methodology and also
736	   provides several basic metrics to compare results of various
737	   throughput tests.  It is recognized that the complexity and
738	   unpredictability of TCP makes it impossible to develop a complete
739	   set of metrics that accounts for the myriad of variables (i.e. RTT
740	   variation, loss conditions, TCP implementation, etc.).  However,
741	   these basic metrics will facilitate TCP throughput comparisons
742	   under varying network conditions and between network traffic
743	   management techniques.

745	   The first metric is the TCP Transfer Time, which is simply the
746	   measured time it takes to transfer a block of data across
747	   simultaneous TCP connections.  This concept is useful when
748	   benchmarking traffic management techniques and where multiple
749	   TCP connections are required.

751	   TCP Transfer time may also be used to provide a normalized ratio of
752	   the actual TCP Transfer Time versus the Ideal Transfer Time.  This
753	   ratio is called the TCP Transfer Index and is defined as:

755	                     Actual TCP Transfer Time
756	                    -------------------------
757	                     Ideal TCP Transfer Time

759	   The Ideal TCP Transfer time is derived from the network path
760	   bottleneck bandwidth and various Layer 1/2/3/4 overheads associated
761	   with the network path.  Additionally, both the maximum TCP RWIN and
762	   the Send Socket Buffer Sizes must be tuned to equal the bandwidth
763	   delay product (BDP) as described in section 3.3.1.

765	   The following table illustrates the Ideal TCP Transfer time of a
766	   single TCP connection when its maximum TCP RWIN and Send Socket
767	   Buffer Sizes are equal to the BDP.

769	   Table 3.3.2: Link Speed, RTT, BDP, TCP Throughput, and
770	                Ideal TCP Transfer time for a 100 MB File

772	       Link                             Maximum            Ideal TCP
773	       Speed                   BDP      Achievable TCP     Transfer time
774	       (Mbps)     RTT (ms)   (KBytes)   Throughput(Mbps)   (seconds)
775	   --------------------------------------------------------------------
776	         1.536    50            9.6            1.4             571
777	        44.21     25          138.2           42.8              18
778	       100         2           25.0           94.9               9
779	     1,000         1          125.0          949.2               1
780	    10,000         0.05        62.5        9,492                 0.1

782	    Transfer times are rounded for simplicity.

784	   For a 100MB file(100 x 8 = 800 Mbits), the Ideal TCP Transfer Time
785	   is derived as follows:

787	                                           800 Mbits
788	       Ideal TCP Transfer Time = -----------------------------------
789	                                  Maximum Achievable TCP Throughput

791	   The maximum achievable layer 2 throughput on T1 and T3 Interfaces
792	   is based on the maximum frames per second (FPS) permitted by the
793	   actual layer 1 speed when the MTU is 1500 Bytes.

795	   The maximum FPS for a T1 is 127 and the calculation formula is:
796	   FPS = T1 Link Speed / ((MTU + PPP + Flags + CRC16) X 8)
797	   FPS = (1.536M /((1500 Bytes + 4 Bytes + 2 Bytes + 2 Bytes) X 8 )))
798	   FPS = (1.536M / (1508 Bytes X 8))
799	   FPS = 1.536 Mbps / 12064 bits
800	   FPS = 127

802	   The maximum FPS for a T3 is 3664 and the calculation formula is:
803	   FPS = T3 Link Speed / ((MTU + PPP + Flags + CRC16) X 8)
804	   FPS = (44.21M /((1500 Bytes + 4 Bytes + 2 Bytes + 2 Bytes) X 8 )))
805	   FPS = (44.21M / (1508 Bytes X 8))
806	   FPS = 44.21 Mbps / 12064 bits
807	   FPS = 3664
808	   The 1508 equates to:

810	     MTU + PPP + Flags + CRC16

812	   Where MTU is 1500 Bytes, PPP is 4 Bytes, Flags are 2 Bytes and CRC16
813	   is 2 Bytes.

815	   Then, to obtain the Maximum Achievable TCP Throughput (layer 4), we
816	   simply use: MSS in Bytes X 8 bits X max FPS.
817	   For a T3, the maximum TCP Throughput = 1460 Bytes X 8 bits X 3664 FPS
818	   Maximum TCP Throughput = 11680 bits X 3664 FPS
819	   Maximum TCP Throughput = 42.8 Mbps.

821	   The maximum achievable layer 2 throughput on Ethernet Interfaces is
822	   based on the maximum frames per second permitted by the IEEE802.3
823	   standard when the MTU is 1500 Bytes.

825	   The maximum FPS for 100M Ethernet is 8127 and the calculation is:
826	   FPS = (100Mbps /(1538 Bytes X 8 bits))

828	   The maximum FPS for GigE is 81274 and the calculation formula is:
829	   FPS = (1Gbps /(1538 Bytes X 8 bits))

831	   The maximum FPS for 10GigE is 812743 and the calculation formula is:
832	   FPS = (10Gbps /(1538 Bytes X 8 bits))

834	   The 1538 equates to:

836	     MTU + Eth + CRC32 + IFG + Preamble + SFD

838	   Where MTU is 1500 Bytes, Ethernet is 14 Bytes, CRC32 is 4 Bytes,
839	   IFG is 12 Bytes, Preamble is 7 Bytes and SFD is 1 Byte.

841	   Note that better results could be obtained with jumbo frames on
842	   GigE and 10 GigE.

844	   Then, to obtain the Maximum Achievable TCP Throughput (layer 4), we
845	   simply use: MSS in Bytes X 8 bits X max FPS.
846	   For a 100M, the maximum TCP Throughput = 1460 B X 8 bits X 8127 FPS
847	   Maximum TCP Throughput = 11680 bits X 8127 FPS
848	   Maximum TCP Throughput = 94.9 Mbps.

850	   To illustrate the TCP Transfer Time Index, an example would be the
851	   bulk transfer of 100 MB over 5 simultaneous TCP connections  (each
852	   connection uploading 100 MB).  In this example, the Ethernet service
853	   provides a Committed Access Rate (CAR) of 500 Mbit/s.  Each
854	   connection may achieve different throughputs during a test and the
855	   overall throughput rate is not always easy to determine (especially
856	   as the number of connections increases).

858	   The ideal TCP Transfer Time would be ~8 seconds, but in this example,
859	   the actual TCP Transfer Time was 12 seconds.  The TCP Transfer Index
860	   would then be 12/8 = 1.5, which indicates that the transfer across
861	   all connections took 1.5 times longer than the ideal.

863	   The second metric is TCP Efficiency, which is the percentage of Bytes
864	   that were not retransmitted and is defined as:

866	                Transmitted Bytes - Retransmitted Bytes
867	                ---------------------------------------  x 100
868	                          Transmitted Bytes

870	   Transmitted Bytes are the total number of TCP payload Bytes to be
871	   transmitted which includes the original and retransmitted Bytes. This
872	   metric provides a comparative measure between various QoS mechanisms
873	   like traffic management or congestion avoidance.  Various TCP
874	   implementations like Reno, Vegas, etc. could also be compared.

876	   As an example, if 100,000 Bytes were sent and 2,000 had to be
877	   retransmitted, the TCP Efficiency should be calculated as:

879	                   102,000 - 2,000
880	                   ----------------  x 100 = 98.03%
881	                        102,000

883	   Note that the retransmitted Bytes may have occurred more than once,
884	   and these multiple retransmissions are added to the Retransmitted
885	   Bytes count (and the Transmitted Bytes count).

887	   The third metric is the Buffer Delay Percentage, which represents the
888	   increase in RTT during a TCP throughput test with respect to
889	   inherent or baseline network RTT. The baseline RTT is the round-trip
890	   time inherent to the network path under non-congested conditions.
891	   (See 3.2.1 for details concerning the baseline RTT measurements).

893	   The Buffer Delay Percentage is defined as:

895	              Average RTT during Transfer - Baseline RTT
896	              ------------------------------------------ x 100
897	                             Baseline RTT

899	   As an example, the baseline RTT for the network path is 25 msec.
900	   During the course of a TCP transfer, the average RTT across the
901	   entire transfer increased to 32 msec.  In this example, the Buffer
902	   Delay Percentage would be calculated as:

904	                          32 - 25
905	                          ------- x 100 = 28%
906	                             25

908	   Note that the TCP Transfer Time, TCP Efficiency, and Buffer Delay
909	   Percentage MUST be measured during each throughput test. Poor TCP
910	   Transfer Time Indexes (TCP Transfer Time greater than Ideal TCP
911	   Transfer Times) may be diagnosed by correlating with sub-optimal TCP
912	   Efficiency and/or Buffer Delay Percentage metrics.

914	3.3.3 Conducting the TCP Throughput Tests

916	   Several TCP tools are currently used in the network world and one of
917	   the most common is "iperf". With this tool, hosts are installed at
918	   each end of the network path; one acts as client and the other as
919	   a server.  The Send Socket Buffer and the maximum TCP RWIN Sizes
920	   of both client and server can be manually set.  The achieved
921	   throughput can then be measured, either uni-directionally or
922	   bi-directionally.  For higher BDP situations in lossy networks
923	   (long fat networks or satellite links, etc.), TCP options such as
924	   Selective Acknowledgment SHOULD be considered and become part of
925	   the window size / throughput characterization.

927	   Host hardware performance must be well understood before conducting
928	   the tests described in the following sections.  A dedicated
929	   communications test instrument will generally be required, especially
930	   for line rates of GigE and 10 GigE.  A compliant TCP TTD SHOULD
931	   provide a warning message when the expected test throughput will
932	   exceed 10% of the network bandwidth capacity.  If the throughput test
933	   is expected to exceed 10% of the provider bandwidth, then the test
934	   should be coordinated with the network provider.  This does not
935	   include the customer premise bandwidth, the 10% refers directly to
936	   the provider's bandwidth (Provider Edge to Provider router).

938	   The TCP throughput test should be run over a long enough duration
939	   to properly exercise network buffers (greater than 30 seconds) and
940	   also characterize performance at different time periods of the day.

942	3.3.4 Single vs. Multiple TCP Connection Testing

944	   The decision whether to conduct single or multiple TCP connection
945	   tests depends upon the size of the BDP in relation to the maximum
946	   TCP RWIN configured in the end-user environment. For example, if
947	   the BDP for a long fat network turns out to be 2MB, then it is
948	   probably more realistic to test this network path with multiple
949	   connections.  Assuming typical host computer maximum TCP RWIN Sizes
950	   of 64 KB, using 32 TCP connections would realistically test this
951	   path.

953	   The following table is provided to illustrate the relationship
954	   between the maximum TCP RWIN and the number of TCP connections
955	   required to utilize the available capacity of a given BDP. For this
956	   example, the network bandwidth is 500 Mbps and the RTT is 5 ms, then
957	   the BDP equates to 312.5 KBytes.

959	   Table 3.3.4 Number of TCP connections versus maximum TCP RWIN

961	      Maximum    Number of TCP Connections
962	      TCP RWIN   to fill available bandwidth
963	     -------------------------------------
964	       16KB             20
965	       32KB             10
966	       64KB              5
967	      128KB              3

969	   The TCP Transfer Time metric is useful for conducting multiple
970	   connection tests.  Each connection should be configured to transfer
971	   payloads of the same size (i.e. 100 MB), and the TCP Transfer time
972	   should provide a simple metric to verify the actual versus expected
973	   results.

975	   Note that the TCP transfer time is the time for all connections to
976	   complete the transfer of the configured payload size.  From the
977	   previous table, the 64KB window is considered.  Each of the 5
978	   TCP connections would be configured to transfer 100MB, and each one
979	   should obtain a maximum of 100 Mb/sec.  So for this example, the
980	   100MB payload should be transferred across the connections in
981	   approximately 8 seconds (which would be the ideal TCP transfer time
982	   under these conditions).

984	   Additionally, the TCP Efficiency metric MUST be computed for each
985	   connection tested as defined in section 3.3.2.

987	3.3.5 Interpretation of the TCP Throughput Results

989	   At the end of this step, the user will document the theoretical BDP
990	   and a set of Window size experiments with measured TCP throughput for
991	   each TCP window size.  For cases where the sustained TCP throughput
992	   does not equal the ideal value, some possible causes are:

994	   - Network congestion causing packet loss which MAY be inferred from
995	     a poor TCP Efficiency % (higher TCP Efficiency % = less packet
996	     loss)
997	   - Network congestion causing an increase in RTT which MAY be inferred
998	     from the Buffer Delay Percentage (i.e., 0% = no increase in RTT
999	     over baseline)
1000	   - Intermediate network devices which actively regenerate the TCP
1001	     connection and can alter TCP RWIN Size, MSS, etc.
1002	   - Rate limiting (policing).  More details on traffic management
1003	     tests follows in section 3.4

1005	3.3.6 High Performance Network Options

1007	   For cases where the network outperforms the client/server IP hosts
1008	   some possible causes are:

1010	   - Maximum TCP Buffer space.  All operating systems have a global
1011	   mechanism to limit the quantity of system memory to be used by TCP
1012	   connections. On some systems, each connection is subject to a memory
1013	   limit that is applied to the total memory used for input data, output
1014	   data and controls. On other systems, there are separate limits for
1015	   input and output buffer spaces per connection.  Client/server IP
1016	   hosts might be configured with Maximum Buffer Space limits that are
1017	   far too small for high performance networks.

1019	   - Socket Buffer Sizes.  Most operating systems support separate per
1020	   connection send and receive buffer limits that can be adjusted as
1021	   long as they stay within the maximum memory limits.  These socket
1022	   buffers must be large enough to hold a full BDP of TCP segments plus
1023	   some overhead.  There are several methods that can be used to adjust
1024	   socket buffer sizes, but TCP Auto-Tuning automatically adjusts these
1025	   as needed to optimally balance TCP performance and memory usage.
1026	   It is important to note that Auto-Tuning is enabled by default in
1027	   LINUX since the kernel release 2.6.6 and in UNIX since FreeBSD 7.0.
1028	   It is also enabled by default in Windows since Vista and in MAC since
1029	   OS X version 10.5 (leopard).  Over buffering can cause some
1030	   applications to behave poorly, typically causing sluggish interactive
1031	   response and risk running the system out of memory.   Large default
1032	   socket buffers have to be considered carefully on multi-user systems.

1034	   - TCP Window Scale Option, RFC1323.  This option enables TCP to
1035	   support large BDP paths.  It provides a scale factor which is
1036	   required for TCP to support window sizes larger than 64KB. Most
1037	   systems automatically request WSCALE under some conditions, such as
1038	   when the receive socket buffer is larger than 64KB or when the other
1039	   end of the TCP connection requests it first.  WSCALE can only be
1040	   negotiated during the 3 way handhsake.  If either end fails to
1041	   request WSCALE or requests an insufficient value, it cannot be
1042	   renegotiated. Different systems use different algorithms to select
1043	   WSCALE, but they are all tributary to the maximum permitted buffer
1044	   size, the current receiver buffer size for this connection, or a
1045	   global system setting.  Note that under these constraints, a client
1046	   application wishing to send data at high rates may need to set its
1047	   own receive buffer to something larger than 64K Bytes before it
1048	   opens the connection to ensure that the server properly negotiates
1049	   WSCALE.  A system administrator might have to explicitly enable
1050	   RFC1323 extensions.  Otherwise, the client/server IP host would not
1051	   support TCP window sizes (BDP) larger than 64KB.  Most of the time,
1052	   performance gains will be obtained by enabling this option in Long
1053	   Fat Networks. (i.e.Networks with large BDP, see Figure 3.3.1b).

1055	   - TCP Timestamps Option, RFC1323.  This feature provides better
1056	   measurements of the Round Trip Time and protects TCP from data
1057	   corruption that might occur if packets are delivered so late that the
1058	   sequence numbers wrap before they are delivered.  Wrapped sequence
1059	   numbers do not pose a serious risk below 100 Mbps, but the risk
1060	   increases at higher data rates. Most of the time, performance gains
1061	   will be obtained by enabling this option in Gigabit bandwidth
1062	   networks.

1064	   - TCP Selective Acknowledgments Option (SACK), RFC2018.  This allows
1065	   a TCP receiver to inform the sender about exactly which data segment
1066	   is missing and needs to be retransmitted.  Without SACK, TCP has to
1067	   estimate which data segment is missing, which works just fine if all
1068	   losses are isolated (i.e. only one loss in any given round trip).
1069	   Without SACK, TCP takes a very long time to recover after multiple
1070	   and consecutive losses.  SACK is now supported by most operating
1071	   systems, but it may have to be explicitly enabled by the system
1072	   administrator.  In most situations, enabling TCP SACK will improve
1073	   throughput performances, but it is important to note that it might
1074	   need to be disabled in network architectures where TCP randomization
1075	   is done by network security appliances.

1077	   - Path MTU.  The client/server IP host system must use the largest
1078	   possible MTU for the path.  This may require enabling Path MTU
1079	   Discovery (RFC1191 & RFC4821).  Since RFC1191 is flawed it is
1080	   sometimes not enabled by default and may need to be explicitly
1081	   enabled by the system administrator. RFC4821 describes a new, more
1082	   robust algorithm for MTU discovery and ICMP black hole recovery.

1084	   - TOE (TCP Offload Engine). Some recent Network Interface Cards (NIC)
1085	   are equipped with drivers that can do part or all of the TCP/IP
1086	   protocol processing.  TOE implementations require additional work
1087	   (i.e. hardware-specific socket manipulation) to set up and tear down
1088	   connections.  For connection intensive protocols such as HTTP, TOE
1089	   might need to be disabled to increase performances.  Because TOE NICs
1090	   configuration parameters are vendor specific and not necessarily
1091	   RFC-compliant, they are poorly integrated with UNIX & LINUX.
1092	   Occasionally, TOE might need to be disabled in a server because its
1093	   NIC does not have enough memory resources to buffer thousands of
1094	   connections.

1096	   Note that both ends of a TCP connection must be properly tuned.

1098	3.4. Traffic Management Tests

1100	   In most cases, the network connection between two geographic
1101	   locations (branch offices, etc.) is lower than the network connection
1102	   to host computers.  An example would be LAN connectivity of GigE
1103	   and WAN connectivity of 100 Mbps.  The WAN connectivity may be
1104	   physically 100 Mbps or logically 100 Mbps (over a GigE WAN
1105	   connection). In the later case, rate limiting is used to provide the
1106	   WAN bandwidth per the SLA.

1108	   Traffic management techniques are employed to provide various forms
1109	   of QoS, the more common include:

1111	   - Traffic Shaping
1112	   - Priority queuing
1113	   - Random Early Discard (RED)
1114	   Configuring the end-to-end network with these various traffic
1115	   management mechanisms is a complex under-taking. For traffic shaping
1116	   and RED techniques, the end goal is to provide better performance to
1117	   bursty traffic such as TCP,(RED is specifically intended for TCP).

1119	   This section of the methodology provides guidelines to test traffic
1120	   shaping and RED implementations.  As in section 3.3, host hardware
1121	   performance must be well understood before conducting the traffic
1122	   shaping and RED tests. Dedicated communications test instrument will
1123	   generally be REQUIRED for line rates of GigE and 10 GigE.  If the
1124	   throughput test is expected to exceed 10% of the provider bandwidth,
1125	   then the test should be coordinated with the network provider.  This
1126	   does not include the customer premises bandwidth, the 10% refers to
1127	   the provider's bandwidth (Provider Edge to Provider router). Note
1128	   that GigE and 10 GigE interfaces might benefit from hold-queue
1129	   adjustments in order to prevent the saw-tooth TCP traffic pattern.

1131	3.4.1 Traffic Shaping Tests

1133	   For services where the available bandwidth is rate limited, two (2)
1134	   techniques can be used: traffic policing or traffic shaping.

1136	   Simply stated, traffic policing marks and/or drops packets which
1137	   exceed the SLA bandwidth (in most cases, excess traffic is dropped).
1138	   Traffic shaping employs the use of queues to smooth the bursty
1139	   traffic and then send out within the SLA bandwidth limit (without
1140	   dropping packets unless the traffic shaping queue is exhausted).

1142	   Traffic shaping is generally configured for TCP data services and
1143	   can provide improved TCP performance since the retransmissions are
1144	   reduced, which in turn optimizes TCP throughput for the available
1145	   bandwidth.  Through this section, the rate-limited bandwidth shall
1146	   be referred to as the "bottleneck bandwidth".

1148	   The ability to detect proper traffic shaping is more easily diagnosed
1149	   when conducting a multiple TCP connections test.  Proper shaping will
1150	   provide a fair distribution of the available bottleneck bandwidth,
1151	   while traffic policing will not.

1153	   The traffic shaping tests are built upon the concepts of multiple
1154	   connections testing as defined in section 3.3.3.  Calculating the BDP
1155	   for the bottleneck bandwidth is first required before selecting the
1156	   number of connections, the Send Socket Buffer and maximum TCP RWIN
1157	   Sizes per connection.

1159	   Similar to the example in section 3.3, a typical test scenario might
1160	   be:  GigE LAN with a 100Mbps bottleneck bandwidth (rate limited
1161	   logical interface), and 5 msec RTT.  This would require five (5) TCP
1162	   connections of 64 KB Send Socket Buffer and maximum TCP RWIN Sizes
1163	   to evenly fill the bottleneck bandwidth (~100 Mbps per connection).

1165	   The traffic shaping test should be run over a long enough duration to
1166	   properly exercise network buffers (greater than 30 seconds) and also
1167	   characterize performance during different time periods of the day.
1168	   The throughput of each connection MUST be logged during the entire
1169	   test, along with the TCP Transfer Time, TCP Efficiency, and
1170	   Buffer Delay Percentage.

1172	3.4.1.1 Interpretation of Traffic Shaping Test Results

1174	   By plotting the throughput achieved by each TCP connection, we should
1175	   see fair sharing of the bandwidth when traffic shaping is properly
1176	   configured for the bottleneck interface.  For the previous example of
1177	   5 connections sharing 500 Mbps, each connection would consume
1178	   ~100 Mbps with smooth variations.

1180	   When traffic shaping is not configured properly or if traffic
1181	   policing is present on the bottleneck interface,  the bandwidth
1182	   sharing may not be fair.  The resulting throughput plot may reveal
1183	   "spikey" throughput consumption of the competing TCP connections (due
1184	   to the high rate of TCP retransmissions).

1186	3.4.2 RED Tests

1188	   Random Early Discard techniques are specifically targeted to provide
1189	   congestion avoidance for TCP traffic.  Before the network element
1190	   queue "fills" and enters the tail drop state, RED drops packets at
1191	   configurable queue depth thresholds.  This action causes TCP
1192	   connections to back-off which helps to prevent tail drop, which in
1193	   turn helps to prevent global TCP synchronization.

1195	   Again, rate limited interfaces may benefit greatly from RED based
1196	   techniques.  Without RED, TCP may not be able to achieve the full
1197	   bottleneck bandwidth.  With RED enabled, TCP congestion avoidance
1198	   throttles the connections on the higher speed interface (i.e. LAN)
1199	   and can help achieve the full bottleneck bandwidth.  The burstiness
1200	   of TCP traffic is a key factor in the overall effectiveness of RED
1201	   techniques; steady state bulk transfer flows will generally not
1202	   benefit from RED.  With bulk transfer flows, network device queues
1203	   gracefully throttle the effective throughput rates due to increased
1204	   delays.

1206	   The ability to detect proper RED configuration is more easily
1207	   diagnosed when conducting a multiple TCP connections test.  Multiple
1208	   TCP connections provide the bursty sources that emulate the
1209	   real-world conditions for which RED was intended.

1211	   The RED tests also builds upon the concepts of multiple connections
1212	   testing as defined in section 3.3.3.  Calculating the BDP for the
1213	   bottleneck bandwidth is first required before selecting the number
1214	   of connections, the Send Socket Buffer size and the maximum TCP RWIN
1215	   Size per connection.

1217	   For RED testing, the desired effect is to cause the TCP connections
1218	   to burst beyond the bottleneck bandwidth so that queue drops will
1219	   occur.  Using the same example from section 3.4.1 (traffic shaping),
1220	   the 500 Mbps bottleneck bandwidth requires 5 TCP connections (with
1221	   window size of 64KB) to fill the capacity.  Some experimentation is
1222	   required, but it is recommended to start with double the number of
1223	   connections in order to stress the network element buffers / queues
1224	   (10 connections for this example).

1226	   The TCP TTD must be configured to generate these connections as
1227	   shorter (bursty) flows versus bulk transfer type flows.  These TCP
1228	   bursts should stress queue sizes in the 512KB range.  Again
1229	   experimentation will be required; the proper number of TCP
1230	   connections, the Send Socket Buffer and maximum TCP RWIN Sizes will
1231	   be dictated by the size of the network element queue.

1233	3.4.2.1 Interpretation of RED Results

1235	   The default queuing technique for most network devices is FIFO based.
1236	   Without RED, the FIFO based queue may cause excessive loss to all of
1237	   the TCP connections and in the worst case global TCP synchronization.

1239	   By plotting the aggregate throughput achieved on the bottleneck
1240	   interface, proper RED operation may be determined if the bottleneck
1241	   bandwidth is fully utilized.  For the previous example of 10
1242	   connections (window = 64 KB) sharing 500 Mbps, each connection should
1243	   consume ~50 Mbps.  If RED was not properly enabled on the interface,
1244	   then the TCP connections will retransmit at a higher rate and the
1245	   net effect is that the bottleneck bandwidth is not fully utilized.

1247	   Another means to study non-RED versus RED implementations is to use
1248	   the TCP Transfer Time metric for all of the connections.  In this
1249	   example, a 100 MB payload transfer should take ideally 16 seconds
1250	   across all 10 connections (with RED enabled).  With RED not enabled,
1251	   the throughput across the bottleneck bandwidth may be greatly
1252	   reduced (generally 10-20%) and the actual TCP Transfer time may be
1253	   proportionally longer then the Ideal TCP Transfer time.

1255	   Additionally, non-RED implementations may exhibit a lower TCP
1256	   Transfer Efficiency.

1258	4. Security Considerations

1260	   The security considerations that apply to any active measurement of
1261	   live networks are relevant here as well.  See [RFC4656] and
1262	   [RFC5357].

1264	5. IANA Considerations

1266	   This document does not REQUIRE an IANA registration for ports
1267	   dedicated to the TCP testing described in this document.

1269	6. Acknowledgments

1271	   Thanks to Lars Eggert, Al Morton, Matt Mathis, Matt Zekauskas,
1272	   Yaakov Stein, and Loki Jorgenson for many good comments and for
1273	   pointing us to great sources of information pertaining to past works
1274	   in the TCP capacity area.

1276	7. References

1278	7.1 Normative References

1280	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1281	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1283	   [RFC4656]  Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M.
1284	              Zekauskas, "A One-way Active Measurement Protocol
1285	              (OWAMP)", RFC 4656, September 2006.

1287	   [RFC2544]  Bradner, S., McQuaid, J., "Benchmarking Methodology for
1288	              Network Interconnect Devices", RFC 2544, June 1999

1290	   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz,
1291	              J., "A Two-Way Active Measurement Protocol (TWAMP)",
1292	              RFC 5357, October 2008

1294	   [RFC4821]  Mathis, M., Heffner, J., "Packetization Layer Path MTU
1295	              Discovery", RFC 4821, June 2007

1297	              draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk
1298	              Transfer Capacity Methodology for Cooperating Hosts",
1299	              August 2001

1301	   [RFC2681]  Almes G., Kalidindi S., Zekauskas, M., "A Round-trip Delay
1302	              Metric for IPPM", RFC 2681, September, 1999

1304	   [RFC4898]  Mathis, M., Heffner, J., Raghunarayan, R., "TCP Extended
1305	              Statistics MIB", May 2007

1307	   [RFC5136]  Chimento P., Ishac, J., "Defining Network Capacity",
1308	              February 2008

1310	   [RFC1323]  Jacobson, V., Braden, R., Borman D., "TCP Extensions for
1311	              High Performance", May 1992

1313	7.2. Informative References
1314	Authors' Addresses

1316	   Barry Constantine
1317	   JDSU, Test and Measurement Division
1318	   One Milesone Center Court
1319	   Germantown, MD 20876-7100
1320	   USA

1322	   Phone: +1 240 404 2227
1323	   barry.constantine@jdsu.com

1325	   Gilles Forget
1326	   Independent Consultant to Bell Canada.
1327	   308, rue de Monaco, St-Eustache
1328	   Qc. CANADA, Postal Code : J7P-4T5

1330	   Phone: (514) 895-8212
1331	   gilles.forget@sympatico.ca

1333	   Rudiger Geib
1334	   Heinrich-Hertz-Strasse (Number: 3-7)
1335	   Darmstadt, Germany, 64295

1337	   Phone: +49 6151 6282747
1338	   Ruediger.Geib@telekom.de

1340	   Reinhard Schrage
1341	   Schrage Consulting

1343	   Phone: +49 (0) 5137 909540
1344	   reinhard@schrageconsult.com