idnits 2.17.1 

draft-ietf-ippm-tcp-throughput-tm-09.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 7, 2010) is 4888 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323)


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                     B. Constantine
2	Internet-Draft                                                      JDSU
3	Intended status: Informational                                 G. Forget
4	Expires: June 7, 2011                      Bell Canada (Ext. Consultant)
5	                                                            Rudiger Geib
6	                                                        Deutsche Telekom
7	                                                        Reinhard Schrage
8	                                                      Schrage Consulting

10	                                                        December 7, 2010

12	                  Framework for TCP Throughput Testing
13	                draft-ietf-ippm-tcp-throughput-tm-09.txt

15	Abstract

17	   This framework describes a methodology for measuring end-to-end TCP
18	   throughput performance in a managed IP network. The intention is to
19	   provide a practical methodology to validate TCP layer performance.
20	   The goal is to provide a better indication of the user experience.
21	   In this framework, various TCP and IP parameters are identified and
22	   should be tested as part of a managed IP network.

24	Requirements Language

26	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
27	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
28	   document are to be interpreted as described in RFC 2119 [RFC2119].

30	Status of this Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at http://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	           This Internet-Draft will expire on June 7, 2011.

47	   Copyright Notice

49	   Copyright (c) 2010 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
65	     1.1   Terminology  . . . . . . . . . . . . . . . . . . . . . . .  4
66	     1.2   Test Set-up  . . . . . . . . . . . . . . . . . . . . . . .  4
67	   2.  Scope and Goals of this methodology. . . . . . . . . . . . . .  5
68	     2.1   TCP Equilibrium. . . . . . . . . . . . . . . . . . . . . .  6
69	   3.  TCP Throughput Testing Methodology . . . . . . . . . . . . . .  7
70	     3.1   Determine Network Path MTU . . . . . . . . . . . . . . . .  9
71	     3.2.  Baseline Round Trip Time and Bandwidth . . . . . . . . . . 10
72	         3.2.1  Techniques to Measure Round Trip Time . . . . . . . . 10
73	         3.2.2  Techniques to Measure end-to-end Bandwidth. . . . . . 11
74	     3.3.  TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 12
75	         3.3.1 Calculate Ideal TCP Receive Window Size. . . . . . . . 12
76	         3.3.2 Metrics for TCP Throughput Tests . . . . . . . . . . . 15
77	         3.3.3 Conducting the TCP Throughput Tests. . . . . . . . . . 18
78	         3.3.4 Single vs. Multiple TCP Connection Testing . . . . . . 19
79	         3.3.5 Interpretation of the TCP Throughput Results . . . . . 20
80	     3.4. Traffic Management Tests .  . . . . . . . . . . . . . . . . 20
81	         3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 21
82	          3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 21
83	         3.4.2 RED Tests. . . . . . . . . . . . . . . . . . . . . . . 22
84	          3.4.2.1 Interpretation of RED Results . . . . . . . . . . . 23
85	   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 23
86	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 23
87	   6.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 23
88	   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 24
89	     7.1   Normative References . . . . . . . . . . . . . . . . . . . 24
90	     7.2   Informative References . . . . . . . . . . . . . . . . . . 24

92	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25

94	1. Introduction

96	   Network providers are coming to the realization that Layer 2/3
97	   testing is not enough to adequately ensure end-user's satisfaction.
98	   An SLA (Service Level Agreement) is provided to business customers
99	   and is generally based upon Layer 2/3 criteria such as access rate,
100	   latency, packet loss and delay variations.  On the other hand,
101	   measuring TCP throughput provides meaningful results with respect to
102	   user experience.  Thus, the network provider community desires to
103	   measure IP network throughput performance at the TCP layer.

105	   Additionally, business enterprise customers seek to conduct
106	   repeatable TCP throughput tests between locations.  Since these
107	   enterprises rely on the networks of the providers, a common test
108	   methodology with predefined metrics will benefit both parties.

110	   Note that the primary focus of this methodology is managed business
111	   class IP networks; i.e. those Ethernet terminated services for which
112	   businesses are provided an SLA from the network provider.  End-users
113	   with "best effort" access between locations can use this methodology,
114	   but this framework and its metrics are intended to be used in a
115	   predictable managed IP service environment.

117	   So the intent behind this document is to define a methodology for
118	   testing sustained TCP layer performance.  In this document, the
119	   maximum achievable TCP Throughput is that amount of data per unit
120	   time that TCP transports when trying to reach Equilibrium, i.e.
121	   after the initial slow start and congestion avoidance phases.

123	   TCP uses a congestion window, (TCP CWND), to determine how many
124	   packets it can send at one time. The network path bandwidth delay
125	   product (BDP) determines the ideal TCP CWND. With the help of slow
126	   start and congestion avoidance mechanisms, TCP probes the network
127	   path. So up to the bandwidth limit, a larger TCP CWND permits a
128	   higher throughput.  And up to local host limits, TCP "Slow Start" and
129	   "Congestion Avoidance" algorithms together will determine the TCP
130	   CWND size.  The Maximum TCP CWND size is also tributary to the buffer
131	   space allocated by the kernel for each socket. For each socket, there
132	   is a default buffer size that can be changed by the program using a
133	   system library called just before opening the socket.  There is also
134	   a kernel enforced maximum buffer size.  This buffer size can be
135	   adjusted at both ends of the socket (send and receive).  In order
136	   to obtain the maximum throughput, it is critical to use optimal TCP
137	   Send and Receive Socket Buffer sizes.

139	   There are many variables to consider when conducting a TCP throughput
140	   test, but this methodology focuses on:
141	   - RTT and Bottleneck BW
142	   - Ideal TCP Receive Window (Ideal Receive Socket Buffer)
143	   - Ideal Send Socket Buffer
144	   - TCP Congestion Window (TCP CWND)
145	   - Path MTU and Maximum Segment Size (MSS)
146	   - Single Connection and Multiple Connections testing
147	   This methodology proposes TCP testing that should be performed in
148	   addition to traditional Layer 2/3 type tests.  Layer 2/3 tests are
149	   required to verify the integrity of the network before conducting TCP
150	   tests.  Examples include iperf (UDP mode) or manual packet layer test
151	   techniques where packet throughput, loss, and delay measurements are
152	   conducted.  When available, standardized testing similar to RFC 2544
153	   [RFC2544] but adapted for use in operational networks may be used.
154	   Note: RFC 2544 was never meant to be used outside a lab environment.

156	   The following 2 sections provide a general overview of the test
157	   methodology.

159	1.1 Terminology

161	   Common terminologies used in the test methodology are:

163	   - TCP Throughput Test Device (TCP TTD), refers to compliant TCP
164	     host that generates traffic and measures metrics as defined in
165	     this methodology. i.e. a dedicated communications test instrument.
166	   - Customer Provided Equipment (CPE), refers to customer owned
167	     equipment (routers, switches, computers, etc.)
168	   - Customer Edge (CE), refers to provider owned demarcation device.
169	   - Provider Edge (PE), refers to provider's distribution equipment.
170	   - Bottleneck Bandwidth (BB), lowest bandwidth along the complete
171	     path. Bottleneck Bandwidth and Bandwidth are used synonymously
172	     in this document. Most of the time the Bottleneck Bandwidth is
173	     in the access portion of the wide area network (CE - PE)
174	   - Provider (P), refers to provider core network equipment.
175	   - Network Under Test (NUT), refers to the tested IP network path.
176	   - Round-Trip Time (RTT), refers to Layer 4 back and forth delay.

178	 +----+ +----+ +----+  +----+ +---+  +---+ +----+  +----+ +----+ +----+
179	 | TCP|-| CPE|-| CE |--| PE |-| P |--| P |-| PE |--| CE |-| CPE|-| TCP|
180	 | TTD| |    | |    |BB|    | |   |  |   | |    |BB|    | |    | | TTD|
181	 +----+ +----+ +----+  +----+ +---+  +---+ +----+  +----+ +----+ +----+
182	        <------------------------ NUT ------------------------>
183	    R >-----------------------------------------------------------|
184	    T                                                             |
185	    T <-----------------------------------------------------------|

187	   Note that the NUT may consist of a variety of devices including but
188	   not limited to, load balancers, proxy servers or WAN acceleration
189	   devices.  The detailed topology of the NUT should be well understood
190	   when conducting the TCP throughput tests, although this methodology
191	   makes no attempt to characterize specific network architectures.

193	   1.2 Test Set-up

195	   This methodology is intended for operational and managed IP networks.
196	   A multitude of network architectures and topologies can be tested.
197	   The above set-up diagram is very general and it only illustrates the
198	   segmentation within end-user and network provider domains.

200	2. Scope and Goals of this Methodology

202	   Before defining the goals, it is important to clearly define the
203	   areas that are out-of-scope.

205	   - This methodology is not intended to predict the TCP throughput
206	   during the transient stages of a TCP connection, such as the initial
207	   slow start.

209	   - This methodology is not intended to definitively benchmark TCP
210	   implementations of one OS to another, although some users may find
211	   some value in conducting qualitative experiments.

213	   - This methodology is not intended to provide detailed diagnosis
214	   of problems within end-points or within the network itself as
215	   related to non-optimal TCP performance, although a results
216	   interpretation section for each test step may provide insight in
217	   regards with potential issues.

219	   - This methodology does not propose to operate permanently with high
220	   measurement loads.  TCP performance and optimization within
221	   operational networks may be captured and evaluated by using data
222	   from the "TCP Extended Statistics MIB" [RFC4898].

224	   - This methodology is not intended to measure TCP throughput as part
225	   of an SLA, or to compare the TCP performance between service
226	   providers or to compare between implementations of this methodology
227	   in dedicated communications test instruments.

229	   In contrast to the above exclusions, a primary goal is to define a
230	   method to conduct a practical, end-to-end assessment of sustained
231	   TCP performance within a managed business class IP network.  Another
232	   key goal is to establish a set of "best practices" that a non-TCP
233	   expert should apply when validating the ability of a managed network
234	   to carry end-user TCP applications.

236	   Other specific goals are to :

238	   - Provide a practical test approach that specifies IP hosts
239	   configurable TCP parameters such as TCP Receive Window size, Socket
240	   Buffer size, MSS (Maximum Segment Size), number of connections, and
241	   how these affect the outcome of TCP performance over a network.
242	   See section 3.3.3.

244	   - Provide specific test conditions like link speed, RTT, TCP Receive
245	   Window size, Socket Buffer size and maximum achievable TCP throughput
246	   when trying to reach TCP Equilibrium.  For guideline purposes,
247	   provide examples of test conditions and their maximum achievable
248	   TCP throughput.  Section 2.1 provides specific details concerning the
249	   definition of TCP Equilibrium within this methodology while section 3
250	   provides specific test conditions with examples.

252	   Note that some TCP/IP stack implementations are using Receive Window
253	   Auto-Tuning and cannot be adjusted until this feature is disabled.

255	   - Define three (3) basic metrics to compare the performance of TCP
256	   connections under various network conditions.  See section 3.3.2.

258	   - In test situations where the recommended procedure does not yield
259	   the maximum achievable TCP throughput results, this methodology
260	   provides some possible areas within the end host or the network that
261	   should be considered for investigation.   Although again, this
262	   methodology is not intended to provide a detailed diagnosis on these
263	   issues.  See section 3.3.5.

265	2.1 TCP Equilibrium

267	   TCP connections have three (3) fundamental congestion window phases :

269	   1 - The Slow Start phase, which occurs at the beginning of a TCP
270	   transmission or after a retransmission time out.

272	   2 - The Congestion Avoidance phase, during which TCP ramps up to
273	   establish the maximum attainable throughput on an end-to-end network
274	   path.  Retransmissions are a natural by-product of the TCP congestion
275	   avoidance algorithm as it seeks to achieve maximum throughput.

277	   3 - The Loss Recovery phase, which could include Fast Retransmit
278	   (Tahoe) or Fast Recovery (Reno & New Reno).  When packet loss occurs,
279	   Congestion Avoidance phase transitions either to Fast Retransmission
280	   or Fast Recovery depending upon TCP implementations.  If a Time-Out
281	   occurs, TCP transitions back to the Slow Start phase.

283	   The following diagram depicts these 3 phases.

285	        /\  |          Trying to reach TCP Equilibrium > > > > > > > > >
286	        /\  |
287	        /\  |High ssthresh  TCP CWND
288	        /\  |Loss Event *   halving    3-Loss Recovery
289	        /\  |          * \  upon loss                         Adjusted
290	        /\  |          *  \    /  \        Time-Out           ssthresh
291	        /\  |          *   \  /    \      +--------+         *
292	   TCP      |          *    \/      \    / Multiple|        *
293	   Through- |          * 2-Congestion\  /  Loss    |        *
294	   put      |         *    Avoidance  \/   Event   |       *
295	            |        *              Half           |     *
296	            |      *                TCP CWND       | * 1-Slow Start
297	            | * 1-Slow Start                      Min TCP CWND after T-O
298	            +-----------------------------------------------------------
299	                 Time > > > > > > > > > > > > > > >

301	   Note : ssthresh = Slow Start threshold.

303	   A well tuned and managed IP network with appropriate TCP adjustments
304	   in it's IP hosts and applications should perform very close to TCP
305	   Equilibrium and to the BB (Bottleneck Bandwidth).

307	   This TCP methodology provides guidelines to measure the maximum
308	   achievable TCP throughput or maximum TCP sustained rate obtained
309	   after TCP CWND has stabilized to an optimal value.  All maximum
310	   achievable TCP throughputs specified in section 3 are with respect to
311	   this condition.

313	   It is important to clarify the interaction between the sender's Send
314	   Socket Buffer and the receiver's advertised TCP Receive Window.  TCP
315	   test programs such as iperf, ttcp, etc. allow the sender to control
316	   the quantity of TCP Bytes transmitted and unacknowledged (in-flight),
317	   commonly referred to as the Send Socket Buffer.   This is done
318	   independently of the TCP Receive Window size advertised by the
319	   receiver.  Implications to the capabilities of the Throughput Test
320	   Device (TTD) are covered at the end of section 3.

322	3. TCP Throughput Testing Methodology

324	   As stated earlier in section 1, it is considered best practice to
325	   verify the integrity of the network by conducting Layer2/3 tests such
326	   as [RFC2544] or other methods of network stress tests.  Although, it
327	   is important to mention here that RFC 2544 was never meant to be used
328	   outside a lab environment.

330	   If the network is not performing properly in terms of packet loss,
331	   jitter, etc. then the TCP layer testing will not be meaningful.  A
332	   dysfunctional network will not acheive optimal TCP throughputs in
333	   regards with the available bandwidth.

335	   TCP Throughput testing may require cooperation between the end-user
336	   customer and the network provider.  In a Layer 2/3 VPN architecture,
337	   the testing should be conducted either on the CPE or on the CE device
338	   and not on the PE (Provider Edge) router.

340	   The following represents the sequential order of steps for this
341	   testing methodology:

343	   1. Identify the Path MTU.  Packetization Layer Path MTU Discovery
344	   or PLPMTUD, [RFC4821], MUST be conducted to verify the network path
345	   MTU.  Conducting PLPMTUD establishes the upper limit for the MSS to
346	   be used in subsequent steps.

348	   2. Baseline Round Trip Time and Bandwidth. This step establishes the
349	   inherent, non-congested Round Trip Time (RTT) and the bottleneck
350	   bandwidth of the end-to-end network path.  These measurements are
351	   used to provide estimates of the ideal TCP Receive Window and Send
352	   Socket Buffer sizes that SHOULD be used in subsequent test steps.
353	   These measurements reference [RFC2681] and [RFC4898] to measure RTD
354	   and the associated RTT.

356	   3. TCP Connection Throughput Tests.  With baseline measurements
357	   of Round Trip Time and bottleneck bandwidth, single and multiple TCP
358	   connection throughput tests SHOULD be conducted to baseline network
359	   performance expectations.

361	   4. Traffic Management Tests.  Various traffic management and queuing
362	   techniques can be tested in this step, using multiple TCP
363	   connections.  Multiple connections testing should verify that the
364	   network is configured properly for traffic shaping versus policing,
365	   various queuing implementations and Random Early Discards (RED).

367	   Important to note are some of the key characteristics and
368	   considerations for the TCP test instrument.  The test host may be a
369	   standard computer or a dedicated communications test instrument.
370	   In both cases, they must be capable of emulating both client and
371	   server.

373	   The following criteria should be considered when selecting whether
374	   the TCP test host can be a standard computer or has to be a dedicated
375	   communications test instrument:

377	   - TCP implementation used by the test host, OS version, i.e. Linux OS
378	   kernel using TCP New Reno, TCP options supported, etc.  These will
379	   obviously be more important when using dedicated communications test
380	   instruments where the TCP implementation may be customized or tuned
381	   to run in higher performance hardware.  When a compliant TCP TTD is
382	   used, the TCP implementation MUST be identified in the test results.
383	   The compliant TCP TTD should be usable for complete end-to-end
384	   testing through network security elements and should also be usable
385	   for testing network sections.

387	   - More important, the TCP test host MUST be capable to generate
388	   and receive stateful TCP test traffic at the full link speed of the
389	   network under test. Stateful TCP test traffic means that the test
390	   host MUST fully implement a TCP/IP stack; this is generally a comment
391	   aimed at dedicated communications test equipments which sometimes
392	   "blast" packets with TCP headers. As a general rule of thumb, testing
393	   TCP throughput at rates greater than 100 Mbit/sec MAY require high
394	   performance server hardware or dedicated hardware based test tools.

396	   - A compliant TCP Throughput Test Device MUST allow adjusting both
397	   Send and Receive Socket Buffer sizes.  The Receive Socket Buffer MUST
398	   be large enough to accommodate the TCP Receive Window Size. Note that
399	   some TCP/IP stack implementations are using Receive Window
400	   Auto-Tuning and cannot be adjusted until this feature is disabled.

402	   - Measuring RTT and retransmissions per connection will generally
403	   require a dedicated communications test instrument. In the absence of
404	   dedicated hardware based test tools, these measurements may need to
405	   be conducted with packet capture tools, i.e. conduct TCP throughput
406	   tests and analyze RTT and retransmission results in packet captures.
407	   Another option may be to use "TCP Extended Statistics MIB" per
408	   [RFC4898].

410	   - The RFC4821 PLPMTUD test SHOULD be conducted with a dedicated
411	   tester which exposes the ability to run the PLPMTUD algorithm
412	   independent from the OS stack.

414	3.1. Determine Network Path MTU

416	   TCP implementations should use Path MTU Discovery techniques (PMTUD).
417	   PMTUD relies on ICMP 'need to frag' messages to learn the path MTU.
418	   When a device has a packet to send which has the Don't Fragment (DF)
419	   bit in the IP header set and the packet is larger than the Maximum
420	   Transmission Unit (MTU) of the next hop, the packet is dropped and
421	   the device sends an ICMP 'need to frag' message back to the host that
422	   originated the packet. The ICMP 'need to frag' message includes
423	   the next hop MTU which PMTUD uses to tune the TCP Maximum Segment
424	   Size (MSS). Unfortunately, because many network managers completely
425	   disable ICMP, this technique does not always prove reliable.

427	   Packetization Layer Path MTU Discovery or PLPMTUD [RFC4821] MUST then
428	   be conducted to verify the network path MTU.  PLPMTUD can be used
429	   with or without ICMP. The following sections provide a summary of the
430	   PLPMTUD approach and an example using TCP. [RFC4821] specifies a
431	   search_high and a search_low parameter for the MTU.  As specified in
432	   [RFC4821], 1024 Bytes is a safe value for search_low in modern
433	   networks.

435	   It is important to determine the links overhead along the IP path,
436	   and then to select a TCP MSS size corresponding to the Layer 3 MTU.
437	   For example, if the MTU is 1024 Bytes and the TCP/IP headers are 40
438	   Bytes, then the MSS would be set to 984 Bytes.

440	   An example scenario is a network where the actual path MTU is 1240
441	   Bytes.  The TCP client probe MUST be capable of setting the MSS for
442	   the probe packets and could start at MSS = 984 (which corresponds
443	   to an MTU size of 1024 Bytes).

445	   The TCP client probe would open a TCP connection and advertise the
446	   MSS as 984.  Note that the client probe MUST generate these packets
447	   with the DF bit set. The TCP client probe then sends test traffic
448	   per a small default Send Socket Buffer size of ~8KBytes.  It should
449	   be kept small to minimize the possibility of congesting the network,
450	   which may induce packet loss.  The duration of the test should also
451	   be short (10-30 seconds), again to minimize congestive effects
452	   during the test.

454	   In the example of a 1240 Bytes path MTU, probing with an MSS equal to
455	   984 would yield a successful probe and the test client packets would
456	   be successfully transferred to the test server.

458	   Also note that the test client MUST verify that the MSS advertised
459	   is indeed negotiated.  Network devices with built-in Layer 4
460	   capabilities can intercede during the connection establishment and
461	   reduce the advertised MSS to avoid fragmentation.  This is certainly
462	   a desirable feature from a network perspective, but it can yield
463	   erroneous test results if the client test probe does not confirm the
464	   negotiated MSS.

466	   The next test probe would use the search_high value and this would
467	   be set to MSS = 1460 to correspond to a 1500 Bytes MTU.  In this
468	   example, the test client will retransmit based upon time-outs, since
469	   no ACKs will be received from the test server.  This test probe is
470	   marked as a conclusive failure if none of the test packets are
471	   ACK'ed.  If any of the test packets are ACK'ed, congestive network
472	   may be the cause and the test probe is not conclusive.  Re-testing
473	   at other times of the day is recommended to further isolate.

475	   The test is repeated until the desired granularity of the MTU is
476	   discovered.  The method can yield precise results at the expense of
477	   probing time.  One approach may be to reduce the probe size to
478	   half between the unsuccessful search_high and successful search_low
479	   value and raise it by half also when seeking the upper limit.

481	3.2. Baseline Round Trip Time and Bandwidth

483	   Before stateful TCP testing can begin, it is important to determine
484	   the baseline Round Trip Time (non-congested inherent delay) and
485	   bottleneck bandwidth of the end-to-end network to be tested.  These
486	   measurements are used to provide estimates of the ideal TCP Receive
487	   Window and Send Socket Buffer sizes that SHOULD be used in subsequent
488	   test steps.

490	3.2.1 Techniques to Measure Round Trip Time

492	   Following the definitions used in section 1.1, Round Trip Time (RTT)
493	   is the elapsed time between the clocking in of the first bit of a
494	   payload sent packet to the receipt of the last bit of the
495	   corresponding Acknowledgment.  Round Trip Delay (RTD) is used
496	   synonymously to twice the Link Latency.  RTT measurements SHOULD use
497	   techniques defined in [RFC2681] or statistics available from MIBs
498	   defined in [RFC4898].

500	   The RTT SHOULD be baselined during "off-peak" hours to obtain a
501	   reliable figure for inherent network latency versus additional delay
502	   caused by network buffering.  When sampling values of RTT over a test
503	   interval, the minimum value measured SHOULD be used as the baseline
504	   RTT since this will most closely estimate the inherent network
505	   latency.  This inherent RTT is also used to determine the Buffer
506	   Delay Percentage metric which is defined in Section 3.3.2
507	   The following list is not meant to be exhaustive,  although it
508	   summarizes some of the most common ways to determine round trip time.
509	   The desired resolution of the measurement (i.e. msec versus usec) may
510	   dictate whether the RTT measurement can be achieved with ICMP pings
511	   or by a dedicated communications test instrument with precision
512	   timers.

514	   The objective in this section is to list several techniques
515	   in order of decreasing accuracy.

517	   - Use test equipment on each end of the network, "looping" the
518	   far-end tester so that a packet stream can be measured back and forth
519	   from end-to-end. This RTT measurement may be compatible with delay
520	   measurement protocols specified in [RFC5357].

522	   - Conduct packet captures of TCP test sessions using "iperf" or FTP,
523	   or other TCP test applications.   By running multiple experiments,
524	   packet captures can then be analyzed to estimate RTT.  It is
525	   important to note that results based upon the SYN -> SYN-ACK at the
526	   beginning of TCP sessions should be avoided since Firewalls might
527	   slow down 3 way handshakes.

529	   - ICMP pings may also be adequate to provide round trip time
530	   estimates, provided that the packet size is factored into the
531	   estimates (i.e. pings with different packet sizes might be required).
532	   Some limitations with ICMP Ping may include msec resolution and
533	   whether the network elements are responding to pings or not.  Also,
534	   ICMP is often rate-limited and segregated into different buffer
535	   queues and is not as reliable and accurate as in-band measurements.

537	3.2.2 Techniques to Measure end-to-end Bandwidth

539	   There are many well established techniques available to provide
540	   estimated measures of bandwidth over a network.  These measurements
541	   SHOULD be conducted in both directions of the network, especially for
542	   access networks, which may be asymmetrical.  Measurements SHOULD use
543	   network capacity techniques defined in [RFC5136].

545	   Before any TCP Throughput test can be done, a bandwidth measurement
546	   test MUST be run with stateless IP streams(not stateful TCP) in order
547	   to determine the available bandwidths in each direction.  This test
548	   should obviously be performed at various intervals throughout a
549	   business day or even across a week.  Ideally, the bandwidth test
550	   should produce logged outputs of the achieved bandwidths across the
551	   test interval.

553	3.3. TCP Throughput Tests

555	   This methodology specifically defines TCP throughput techniques to
556	   verify sustained TCP performance in a managed business IP network, as
557	   defined in section 2.1. This section and others will define the
558	   method to conduct these sustained TCP throughput tests and guidelines
559	   for the predicted results.

561	   With baseline measurements of round trip time and bandwidth
562	   from section 3.2, a series of single and multiple TCP connection
563	   throughput tests SHOULD be conducted to baseline network performance
564	   against expectations.  The number of trials and the type of testing
565	   (single versus multiple connections) will vary according to the
566	   intention of the test.  One example would be a single connection test
567	   in which the throughput achieved by large Send Socket Buffer and TCP
568	   Receive Window sizes (i.e. 256KB) is to be measured. It would be
569	   advisable to test performance at various times of the business day.

571	   It is RECOMMENDED to run the tests in each direction independently
572	   first, then run both directions simultaneously.  In each case,
573	   TCP Transfer Time, TCP Efficiency, and Buffer Delay Percentage MUST
574	   be measured in each direction.  These metrics are defined in 3.3.2.

576	3.3.1 Calculate Ideal TCP Receive Window Size

578	   The ideal TCP Receive Window size can be calculated from the
579	   bandwidth delay product (BDP), which is:

581	   BDP (bits) = RTT (sec) x Bandwidth (bps)

583	   Note that the RTT is being used as the "Delay" variable in the
584	   BDP calculations.

586	   Then, by dividing the BDP by 8, we obtain the "ideal" TCP Receive
587	   Window size in Bytes.  For optimal results, the Send Socket Buffer
588	   size must be adjusted to the same value at the opposite end of the
589	   network path.

591	   Ideal TCP RWIN = BDP / 8

593	   An example would be a T3 link with 25 msec RTT.  The BDP would equal
594	   ~1,105,000 bits and the ideal TCP Receive Window would be ~138
595	   KBytes.

597	   Note that separate calculations are required on asymetrical paths.
598	   An asymetrical path example would be a 90 msec RTT ADSL line with
599	   5Mbps downstream and 640Kbps upstream. The downstream BDP would equal
600	   ~450,000 bits while the upstream one would be only ~57,600 bits.

602	   The following table provides some representative network Link Speeds,
603	   RTT, BDP, and their associated Ideal TCP Receive Window sizes.

605	   Table 3.3.1: Link Speed, RTT and calculated BDP & TCP Receive Window

607	      Link                                            Ideal TCP
608	      Speed*         RTT              BDP           Receive Window
609	      (Mbps)         (ms)            (bits)           (KBytes)
610	   ---------------------------------------------------------------------
611	        1.536        20              30,720              3.84
612	        1.536        50              76,800              9.60
613	        1.536       100             153,600             19.20
614	       44.210        10             442,100             55.26
615	       44.210        15             663,150             82.89
616	       44.210        25           1,105,250            138.16
617	      100             1             100,000             12.50
618	      100             2             200,000             25.00
619	      100             5             500,000             62.50
620	    1,000             0.1           100,000             12.50
621	    1,000             0.5           500,000             62.50
622	    1,000             1           1,000,000            125.00
623	   10,000             0.05          500,000             62.50
624	   10,000             0.3         3,000,000            375.00

626	   * Note that link speed is the bottleneck bandwidth for the NUT

628	   The following serial link speeds are used:
629	   - T1 = 1.536 Mbits/sec (for a B8ZS line encoding facility)
630	   - T3 = 44.21 Mbits/sec (for a C-Bit Framing facility)

632	   The above table illustrates the ideal TCP Receive Window size.
633	   If a smaller TCP Receive Window is used, then the TCP Throughput
634	   is not optimal. To calculate the TCP Throughput, the following
635	   formula is used: TCP Throughput = TCP RWIN X 8 / RTT

637	   An example could be a 100 Mbps IP path with 5 ms RTT and a TCP
638	   Receive Window size of 16KB, then:

640	   TCP Throughput = 16 KBytes X 8 bits / 5 ms.
641	   TCP Throughput = 128,000 bits / 0.005 sec.
642	   TCP Throughput = 25.6 Mbps.

644	   Another example for a T3 using the same calculation formula is
645	   illustrated on the next page:
646	   TCP Throughput = TCP RWIN X 8 / RTT.
647	   TCP Throughput = 16 KBytes X 8 bits / 10 ms.
648	   TCP Throughput = 128,000 bits / 0.01 sec.
649	   TCP Throughput = 12.8 Mbps.

651	   When the TCP Receive Window size exceeds the BDP (i.e. T3 link,
652	   64 KBytes TCP Receive Window on a 10 ms RTT path), the maximum frames
653	   per second limit of 3664 is reached and the calculation formula is:

655	   TCP Throughput = Max FPS X MSS X 8.
656	   TCP Throughput = 3664 FPS X 1460 Bytes X 8 bits.
657	   TCP Throughput = 42.8 Mbps
658	   The following diagram compares achievable TCP throughputs on a T3
659	   with Send Socket Buffer & TCP Receive Window sizes of 16KB vs. 64KB.

661	           45|
662	             |           _______42.8M
663	           40|           |64KB |
664	TCP          |           |     |
665	Throughput 35|           |     |
666	in Mbps      |           |     |          +-----+34.1M
667	           30|           |     |          |64KB |
668	             |           |     |          |     |
669	           25|           |     |          |     |
670	             |           |     |          |     |
671	           20|           |     |          |     |          _______20.5M
672	             |           |     |          |     |          |64KB |
673	           15|           |     |          |     |          |     |
674	             |12.8M+-----|     |          |     |          |     |
675	           10|     |16KB |     |          |     |          |     |
676	             |     |     |     |8.5M+-----|     |          |     |
677	            5|     |     |     |    |16KB |     |5.1M+-----|     |
678	             |_____|_____|_____|____|_____|_____|____|16KB |_____|_____
679	                        10               15               25
680	                                RTT in milliseconds

682	   The following diagram shows the achievable TCP throughput on a 25ms
683	   T3 when Send Socket Buffer & TCP Receive Window sizes are increased.

685	           45|
686	             |
687	           40|                                             +-----+40.9M
688	TCP          |                                             |     |
689	Throughput 35|                                             |     |
690	in Mbps      |                                             |     |
691	           30|                                             |     |
692	             |                                             |     |
693	           25|                                             |     |
694	             |                                             |     |
695	           20|                               +-----+20.5M  |     |
696	             |                               |     |       |     |
697	           15|                               |     |       |     |
698	             |                               |     |       |     |
699	           10|                  +-----+10.2M |     |       |     |
700	             |                  |     |      |     |       |     |
701	            5|     +-----+5.1M  |     |      |     |       |     |
702	             |_____|_____|______|_____|______|_____|_______|_____|_____
703	                     16           32           64            128*
704	                          TCP Receive Window size in KBytes

706	   * Note that 128KB requires [RFC1323] TCP Window scaling option.

708	   Note that some TCP/IP stack implementations are using Receive Window
709	   Auto-Tuning and cannot be adjusted until the feature is disabled.

711	3.3.2 Metrics for TCP Throughput Tests

713	   This framework focuses on a TCP throughput methodology and also
714	   provides several basic metrics to compare results of various
715	   throughput tests.  It is recognized that the complexity and
716	   unpredictability of TCP makes it impossible to develop a complete
717	   set of metrics that accounts for the myriad of variables (i.e. RTT
718	   variation, loss conditions, TCP implementation, etc.).  However,
719	   these basic metrics will facilitate TCP throughput comparisons
720	   under varying network conditions and between network traffic
721	   management techniques.

723	   The first metric is the TCP Transfer Time, which is simply the
724	   measured time it takes to transfer a block of data across
725	   simultaneous TCP connections.  This concept is useful when
726	   benchmarking traffic management techniques and where multiple
727	   TCP connections are required.

729	   TCP Transfer time may also be used to provide a normalized ratio of
730	   the actual TCP Transfer Time versus the Ideal Transfer Time.  This
731	   ratio is called the TCP Transfer Index and is defined as:

733	                     Actual TCP Transfer Time
734	                    -------------------------
735	                     Ideal TCP Transfer Time

737	   The Ideal TCP Transfer time is derived from the network path
738	   bottleneck bandwidth and various Layer 1/2/3/4 overheads associated
739	   with the network path.  Additionally, both the TCP Receive Window and
740	   the Send Socket Buffer sizes must be tuned to equal the bandwidth
741	   delay product (BDP) as described in section 3.3.1.

743	   The following table illustrates the Ideal TCP Transfer time of a
744	   single TCP connection when its TCP Receive Window and Send Socket
745	   Buffer sizes are equal to the BDP.

747	   Table 3.3.2: Link Speed, RTT, BDP, TCP Throughput, and
748	                Ideal TCP Transfer time for a 100 MB File

750	       Link                             Maximum            Ideal TCP
751	       Speed                   BDP      Achievable TCP     Transfer time
752	       (Mbps)     RTT (ms)   (KBytes)   Throughput(Mbps)   (seconds)
753	   --------------------------------------------------------------------
754	         1.536    50            9.6            1.4             571
755	        44.21     25          138.2           42.8              18
756	       100         2           25.0           94.9               9
757	     1,000         1          125.0          949.2               1
758	    10,000         0.05        62.5        9,492                 0.1

760	    Transfer times are rounded for simplicity.

762	   For a 100MB file(100 x 8 = 800 Mbits), the Ideal TCP Transfer Time
763	   is derived as follows:

765	                                           800 Mbits
766	       Ideal TCP Transfer Time = -----------------------------------
767	                                  Maximum Achievable TCP Throughput

769	   The maximum achievable layer 2 throughput on T1 and T3 Interfaces
770	   is based on the maximum frames per second (FPS) permitted by the
771	   actual layer 1 speed when the MTU is 1500 Bytes.

773	   The maximum FPS for a T1 is 127 and the calculation formula is:
774	   FPS = T1 Link Speed / ((MTU + PPP + Flags + CRC16) X 8)
775	   FPS = (1.536M /((1500 Bytes + 4 Bytes + 2 Bytes + 2 Bytes) X 8 )))
776	   FPS = (1.536M / (1508 Bytes X 8))
777	   FPS = 1.536 Mbps / 12064 bits
778	   FPS = 127

780	   The maximum FPS for a T3 is 3664 and the calculation formula is:
781	   FPS = T3 Link Speed / ((MTU + PPP + Flags + CRC16) X 8)
782	   FPS = (44.21M /((1500 Bytes + 4 Bytes + 2 Bytes + 2 Bytes) X 8 )))
783	   FPS = (44.21M / (1508 Bytes X 8))
784	   FPS = 44.21 Mbps / 12064 bits
785	   FPS = 3664

787	   The 1508 equates to:

789	     MTU + PPP + Flags + CRC16

791	   Where MTU is 1500 Bytes, PPP is 4 Bytes, Flags are 2 Bytes and CRC16
792	   is 2 Bytes.

794	   Then, to obtain the Maximum Achievable TCP Throughput (layer 4), we
795	   simply use: MSS in Bytes X 8 bits X max FPS.
796	   For a T3, the maximum TCP Throughput = 1460 Bytes X 8 bits X 3664 FPS
797	   Maximum TCP Throughput = 11680 bits X 3664 FPS
798	   Maximum TCP Throughput = 42.8 Mbps.

800	   The maximum achievable layer 2 throughput on Ethernet Interfaces is
801	   based on the maximum frames per second permitted by the IEEE802.3
802	   standard when the MTU is 1500 Bytes.

804	   The maximum FPS for 100M Ethernet is 8127 and the calculation is:
805	   FPS = (100Mbps /(1538 Bytes X 8 bits))

807	   The maximum FPS for GigE is 81274 and the calculation formula is:
808	   FPS = (1Gbps /(1538 Bytes X 8 bits))

810	   The maximum FPS for 10GigE is 812743 and the calculation formula is:
811	   FPS = (10Gbps /(1538 Bytes X 8 bits))
812	   The 1538 equates to:

814	     MTU + Eth + CRC32 + IFG + Preamble + SFD

816	   Where MTU is 1500 Bytes, Ethernet is 14 Bytes, CRC32 is 4 Bytes,
817	   IFG is 12 Bytes, Preamble is 7 Bytes and SFD is 1 Byte.

819	   Note that better results could be obtained with jumbo frames on
820	   GigE and 10 GigE.

822	   Then, to obtain the Maximum Achievable TCP Throughput (layer 4), we
823	   simply use: MSS in Bytes X 8 bits X max FPS.
824	   For a 100M, the maximum TCP Throughput = 1460 B X 8 bits X 8127 FPS
825	   Maximum TCP Throughput = 11680 bits X 8127 FPS
826	   Maximum TCP Throughput = 94.9 Mbps.

828	   To illustrate the TCP Transfer Time Index, an example would be the
829	   bulk transfer of 100 MB over 5 simultaneous TCP connections  (each
830	   connection uploading 100 MB).  In this example, the Ethernet service
831	   provides a Committed Access Rate (CAR) of 500 Mbit/s.  Each
832	   connection may achieve different throughputs during a test and the
833	   overall throughput rate is not always easy to determine (especially
834	   as the number of connections increases).

836	   The ideal TCP Transfer Time would be ~8 seconds, but in this example,
837	   the actual TCP Transfer Time was 12 seconds.  The TCP Transfer Index
838	   would then be 12/8 = 1.5, which indicates that the transfer across
839	   all connections took 1.5 times longer than the ideal.

841	   The second metric is TCP Efficiency, which is the percentage of Bytes
842	   that were not retransmitted and is defined as:

844	                Transmitted Bytes - Retransmitted Bytes
845	                ---------------------------------------  x 100
846	                          Transmitted Bytes

848	   Transmitted Bytes are the total number of TCP payload Bytes to be
849	   transmitted which includes the original and retransmitted Bytes. This
850	   metric provides a comparative measure between various QoS mechanisms
851	   like traffic management or congestion avoidance.  Various TCP
852	   implementations like Reno, Vegas, etc. could also be compared.

854	   As an example, if 100,000 Bytes were sent and 2,000 had to be
855	   retransmitted, the TCP Efficiency should be calculated as:

857	                   102,000 - 2,000
858	                   ----------------  x 100 = 98.03%
859	                        102,000

861	   Note that the retransmitted Bytes may have occurred more than once,
862	   and these multiple retransmissions are added to the Retransmitted
863	   Bytes count (and the Transmitted Bytes count).

865	   The third metric is the Buffer Delay Percentage, which represents the
866	   increase in RTT during a TCP throughput test with respect to
867	   inherent or baseline network RTT. The baseline RTT is the round-trip
868	   time inherent to the network path under non-congested conditions.
869	   (See 3.2.1 for details concerning the baseline RTT measurements).

871	   The Buffer Delay Percentage is defined as:

873	              Average RTT during Transfer - Baseline RTT
874	              ------------------------------------------ x 100
875	                             Baseline RTT

877	   As an example, the baseline RTT for the network path is 25 msec.
878	   During the course of a TCP transfer, the average RTT across the
879	   entire transfer increased to 32 msec.  In this example, the Buffer
880	   Delay Percentage would be calculated as:

882	                          32 - 25
883	                          ------- x 100 = 28%
884	                             25

886	   Note that the TCP Transfer Time, TCP Efficiency, and Buffer Delay
887	   Percentage MUST be measured during each throughput test. Poor TCP
888	   Transfer Time Indexes (TCP Transfer Time greater than Ideal TCP
889	   Transfer Times) may be diagnosed by correlating with sub-optimal TCP
890	   Efficiency and/or Buffer Delay Percentage metrics.

892	3.3.3 Conducting the TCP Throughput Tests

894	   Several TCP tools are currently used in the network world and one of
895	   the most common is "iperf". With this tool, hosts are installed at
896	   each end of the network path; one acts as client and the other as
897	   a server.  The Send Socket Buffer and the TCP Receive Window sizes
898	   of both client and server can be manually set.  The achieved
899	   throughput can then be measured, either uni-directionally or
900	   bi-directionally.  For higher BDP situations in lossy networks
901	   (long fat networks or satellite links, etc.), TCP options such as
902	   Selective Acknowledgment SHOULD be considered and become part of
903	   the window size / throughput characterization.

905	   Note that some TCP/IP stack implementations are using Receive Window
906	   Auto-Tuning and cannot be adjusted until this feature is disabled.

908	   Host hardware performance must be well understood before conducting
909	   the tests described in the following sections.  A dedicated
910	   communications test instrument will generally be required, especially
911	   for line rates of GigE and 10 GigE.  A compliant TCP TTD SHOULD
912	   provide a warning message when the expected test throughput will
913	   exceed 10% of the network bandwidth capacity.  If the throughput test
914	   is expected to exceed 10% of the provider bandwidth, then the test
915	   should be coordinated with the network provider.  This does not
916	   include the customer premise bandwidth, the 10% refers directly to
917	   the provider's bandwidth (Provider Edge to Provider router).

919	   The TCP throughput test should be run over a long enough duration
920	   to properly exercise network buffers (greater than 30 seconds) and
921	   also characterize performance at different time periods of the day.

923	3.3.4 Single vs. Multiple TCP Connection Testing

925	   The decision whether to conduct single or multiple TCP connection
926	   tests depends upon the size of the BDP in relation to the configured
927	   TCP Receive Window sizes configured in the end-user environment.
928	   For example, if the BDP for a long fat network turns out to be 2MB,
929	   then it is probably more realistic to test this network path with
930	   multiple connections.  Assuming typical host computer TCP Receive
931	   Window Sizes of 64 KB, using 32 TCP connections would realistically
932	   test this path.

934	   The following table is provided to illustrate the relationship
935	   between the TCP Receive Window size and the number of TCP connections
936	   required to utilize the available capacity of a given BDP. For this
937	   example, the network bandwidth is 500 Mbps and the RTT is 5 ms, then
938	   the BDP equates to 312.5 KBytes.

940	      TCP        Number of TCP Connections
941	      Window     to fill available bandwidth
942	     -------------------------------------
943	       16KB             20
944	       32KB             10
945	       64KB              5
946	      128KB              3

948	   Note that some TCP/IP stack implementations are using Receive Window
949	   Auto-Tuning and cannot be adjusted until this feature is disabled.

951	   The TCP Transfer Time metric is useful for conducting multiple
952	   connection tests.  Each connection should be configured to transfer
953	   payloads of the same size (i.e. 100 MB), and the TCP Transfer time
954	   should provide a simple metric to verify the actual versus expected
955	   results.

957	   Note that the TCP transfer time is the time for all connections to
958	   complete the transfer of the configured payload size.  From the
959	   previous table, the 64KB window is considered.  Each of the 5
960	   TCP connections would be configured to transfer 100MB, and each one
961	   should obtain a maximum of 100 Mb/sec.  So for this example, the
962	   100MB payload should be transferred across the connections in
963	   approximately 8 seconds (which would be the ideal TCP transfer time
964	   under these conditions).

966	   Additionally, the TCP Efficiency metric MUST be computed for each
967	   connection tested as defined in section 3.3.2.

969	3.3.5 Interpretation of the TCP Throughput Results

971	   At the end of this step, the user will document the theoretical BDP
972	   and a set of Window size experiments with measured TCP throughput for
973	   each TCP window size.  For cases where the sustained TCP throughput
974	   does not equal the ideal value, some possible causes are:

976	   - Network congestion causing packet loss which MAY be inferred from
977	     a poor TCP Efficiency % (higher TCP Efficiency % = less packet
978	     loss)
979	   - Network congestion causing an increase in RTT which MAY be inferred
980	     from the Buffer Delay Percentage (i.e., 0% = no increase in RTT
981	     over baseline)
982	   - Intermediate network devices which actively regenerate the TCP
983	     connection and can alter TCP Receive Window size, MSS, etc.
984	   - Rate limiting (policing).  More details on traffic management
985	     tests follows in section 3.4

987	3.4. Traffic Management Tests

989	   In most cases, the network connection between two geographic
990	   locations (branch offices, etc.) is lower than the network connection
991	   to host computers.  An example would be LAN connectivity of GigE
992	   and WAN connectivity of 100 Mbps.  The WAN connectivity may be
993	   physically 100 Mbps or logically 100 Mbps (over a GigE WAN
994	   connection). In the later case, rate limiting is used to provide the
995	   WAN bandwidth per the SLA.

997	   Traffic management techniques are employed to provide various forms
998	   of QoS, the more common include:

1000	   - Traffic Shaping
1001	   - Priority queuing
1002	   - Random Early Discard (RED)

1004	   Configuring the end-to-end network with these various traffic
1005	   management mechanisms is a complex under-taking. For traffic shaping
1006	   and RED techniques, the end goal is to provide better performance to
1007	   bursty traffic such as TCP,(RED is specifically intended for TCP).

1009	   This section of the methodology provides guidelines to test traffic
1010	   shaping and RED implementations.  As in section 3.3, host hardware
1011	   performance must be well understood before conducting the traffic
1012	   shaping and RED tests. Dedicated communications test instrument will
1013	   generally be REQUIRED for line rates of GigE and 10 GigE.  If the
1014	   throughput test is expected to exceed 10% of the provider bandwidth,
1015	   then the test should be coordinated with the network provider.  This
1016	   does not include the customer premises bandwidth, the 10% refers to
1017	   the provider's bandwidth (Provider Edge to Provider router). Note
1018	   that GigE and 10 GigE interfaces might benefit from hold-queue
1019	   adjustments in order to prevent the saw-tooth TCP traffic pattern.

1021	3.4.1 Traffic Shaping Tests

1023	   For services where the available bandwidth is rate limited, two (2)
1024	   techniques can be used: traffic policing or traffic shaping.

1026	   Simply stated, traffic policing marks and/or drops packets which
1027	   exceed the SLA bandwidth (in most cases, excess traffic is dropped).
1028	   Traffic shaping employs the use of queues to smooth the bursty
1029	   traffic and then send out within the SLA bandwidth limit (without
1030	   dropping packets unless the traffic shaping queue is exhausted).

1032	   Traffic shaping is generally configured for TCP data services and
1033	   can provide improved TCP performance since the retransmissions are
1034	   reduced, which in turn optimizes TCP throughput for the available
1035	   bandwidth.  Through this section, the rate-limited bandwidth shall
1036	   be referred to as the "bottleneck bandwidth".

1038	   The ability to detect proper traffic shaping is more easily diagnosed
1039	   when conducting a multiple TCP connections test.  Proper shaping will
1040	   provide a fair distribution of the available bottleneck bandwidth,
1041	   while traffic policing will not.

1043	   The traffic shaping tests are built upon the concepts of multiple
1044	   connections testing as defined in section 3.3.3.  Calculating the BDP
1045	   for the bottleneck bandwidth is first required before selecting the
1046	   number of connections, the Send Socket Buffer and TCP Receive Window
1047	   sizes per connection.

1049	   Similar to the example in section 3.3, a typical test scenario might
1050	   be:  GigE LAN with a 100Mbps bottleneck bandwidth (rate limited
1051	   logical interface), and 5 msec RTT.  This would require five (5) TCP
1052	   connections of 64 KB Send Socket Buffer and TCP Receive Window sizes
1053	   to evenly fill the bottleneck bandwidth (~100 Mbps per connection).

1055	   The traffic shaping test should be run over a long enough duration to
1056	   properly exercise network buffers (greater than 30 seconds) and also
1057	   characterize performance during different time periods of the day.
1058	   The throughput of each connection MUST be logged during the entire
1059	   test, along with the TCP Transfer Time, TCP Efficiency, and
1060	   Buffer Delay Percentage.

1062	3.4.1.1 Interpretation of Traffic Shaping Test Results

1064	   By plotting the throughput achieved by each TCP connection, the fair
1065	   sharing of the bandwidth is generally very obvious when traffic
1066	   shaping is properly configured for the bottleneck interface.  For the
1067	   previous example of 5 connections sharing 500 Mbps, each connection
1068	   would consume ~100 Mbps with a smooth variation.

1070	   If traffic policing was present on the bottleneck interface, the
1071	   bandwidth sharing may not be fair and the resulting throughput plot
1072	   may reveal "spikey" throughput consumption of the competing TCP
1073	   connections (due to the TCP retransmissions).

1075	3.4.2 RED Tests

1077	   Random Early Discard techniques are specifically targeted to provide
1078	   congestion avoidance for TCP traffic.  Before the network element
1079	   queue "fills" and enters the tail drop state, RED drops packets at
1080	   configurable queue depth thresholds.  This action causes TCP
1081	   connections to back-off which helps to prevent tail drop, which in
1082	   turn helps to prevent global TCP synchronization.

1084	   Again, rate limited interfaces may benefit greatly from RED based
1085	   techniques.  Without RED, TCP may not be able to achieve the full
1086	   bottleneck bandwidth.  With RED enabled, TCP congestion avoidance
1087	   throttles the connections on the higher speed interface (i.e. LAN)
1088	   and can help achieve the full bottleneck bandwidth.  The burstiness
1089	   of TCP traffic is a key factor in the overall effectiveness of RED
1090	   techniques; steady state bulk transfer flows will generally not
1091	   benefit from RED.  With bulk transfer flows, network device queues
1092	   gracefully throttle the effective throughput rates due to increased
1093	   delays.

1095	   The ability to detect proper RED configuration is more easily
1096	   diagnosed when conducting a multiple TCP connections test.  Multiple
1097	   TCP connections provide the bursty sources that emulate the
1098	   real-world conditions for which RED was intended.

1100	   The RED tests also builds upon the concepts of multiple connections
1101	   testing as defined in section 3.3.3.  Calculating the BDP for the
1102	   bottleneck bandwidth is first required before selecting the number
1103	   of connections, the Send Socket Buffer size and the TCP Receive
1104	   Window size per connection.

1106	   For RED testing, the desired effect is to cause the TCP connections
1107	   to burst beyond the bottleneck bandwidth so that queue drops will
1108	   occur.  Using the same example from section 3.4.1 (traffic shaping),
1109	   the 500 Mbps bottleneck bandwidth requires 5 TCP connections (with
1110	   window size of 64KB) to fill the capacity.  Some experimentation is
1111	   required, but it is recommended to start with double the number of
1112	   connections to stress the network element buffers / queues (10
1113	   connections for this example).

1115	   The TCP TTD must be configured to generate these connections as
1116	   shorter (bursty) flows versus bulk transfer type flows.  These TCP
1117	   bursts should stress queue sizes in the 512KB range.  Again
1118	   experimentation will be required; the proper number of TCP
1119	   connections, the Send Socket Buffer and TCP Receive Window sizes will
1120	   be dictated by the size of the network element queue.

1122	3.4.2.1 Interpretation of RED Results

1124	   The default queuing technique for most network devices is FIFO based.
1125	   Without RED, the FIFO based queue may cause excessive loss to all of
1126	   the TCP connections and in the worst case global TCP synchronization.

1128	   By plotting the aggregate throughput achieved on the bottleneck
1129	   interface, proper RED operation may be determined if the bottleneck
1130	   bandwidth is fully utilized.  For the previous example of 10
1131	   connections (window = 64 KB) sharing 500 Mbps, each connection should
1132	   consume ~50 Mbps.  If RED was not properly enabled on the interface,
1133	   then the TCP connections will retransmit at a higher rate and the
1134	   net effect is that the bottleneck bandwidth is not fully utilized.

1136	   Another means to study non-RED versus RED implementation is to use
1137	   the TCP Transfer Time metric for all of the connections.  In this
1138	   example, a 100 MB payload transfer should take ideally 16 seconds
1139	   across all 10 connections (with RED enabled).  With RED not enabled,
1140	   the throughput across the bottleneck bandwidth may be greatly
1141	   reduced (generally 10-20%) and the actual TCP Transfer time may be
1142	   proportionally longer then the Ideal TCP Transfer time.

1144	   Additionally, non-RED implementations may exhibit a lower TCP
1145	   Transfer Efficiency.

1147	4. Security Considerations

1149	   The security considerations that apply to any active measurement of
1150	   live networks are relevant here as well.  See [RFC4656] and
1151	   [RFC5357].

1153	5. IANA Considerations

1155	   This document does not REQUIRE an IANA registration for ports
1156	   dedicated to the TCP testing described in this document.

1158	6. Acknowledgments

1160	   Thanks to Lars Eggert, Al Morton, Matt Mathis, Matt Zekauskas,
1161	   Yaakov Stein, and Loki Jorgenson for many good comments and for
1162	   pointing us to great sources of information pertaining to past works
1163	   in the TCP capacity area.

1165	7. References

1167	7.1 Normative References

1169	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1170	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1172	   [RFC4656]  Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M.
1173	              Zekauskas, "A One-way Active Measurement Protocol
1174	              (OWAMP)", RFC 4656, September 2006.

1176	   [RFC2544]  Bradner, S., McQuaid, J., "Benchmarking Methodology for
1177	              Network Interconnect Devices", RFC 2544, June 1999

1179	   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz,
1180	              J., "A Two-Way Active Measurement Protocol (TWAMP)",
1181	              RFC 5357, October 2008

1183	   [RFC4821]  Mathis, M., Heffner, J., "Packetization Layer Path MTU
1184	              Discovery", RFC 4821, June 2007

1186	              draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk
1187	              Transfer Capacity Methodology for Cooperating Hosts",
1188	              August 2001

1190	   [RFC2681]  Almes G., Kalidindi S., Zekauskas, M., "A Round-trip Delay
1191	              Metric for IPPM", RFC 2681, September, 1999

1193	   [RFC4898]  Mathis, M., Heffner, J., Raghunarayan, R., "TCP Extended
1194	              Statistics MIB", May 2007

1196	   [RFC5136]  Chimento P., Ishac, J., "Defining Network Capacity",
1197	              February 2008

1199	   [RFC1323]  Jacobson, V., Braden, R., Borman D., "TCP Extensions for
1200	              High Performance", May 1992

1202	7.2. Informative References
1203	Authors' Addresses

1205	   Barry Constantine
1206	   JDSU, Test and Measurement Division
1207	   One Milesone Center Court
1208	   Germantown, MD 20876-7100
1209	   USA

1211	   Phone: +1 240 404 2227
1212	   barry.constantine@jdsu.com

1214	   Gilles Forget
1215	   Independent Consultant to Bell Canada.
1216	   308, rue de Monaco, St-Eustache
1217	   Qc. CANADA, Postal Code : J7P-4T5

1219	   Phone: (514) 895-8212
1220	   gilles.forget@sympatico.ca

1222	   Rudiger Geib
1223	   Heinrich-Hertz-Strasse (Number: 3-7)
1224	   Darmstadt, Germany, 64295

1226	   Phone: +49 6151 6282747
1227	   Ruediger.Geib@telekom.de

1229	   Reinhard Schrage
1230	   Schrage Consulting

1232	   Phone: +49 (0) 5137 909540
1233	   reinhard@schrageconsult.com