idnits 2.17.1 

draft-ietf-ippm-tcp-throughput-tm-11.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (January 31, 2011) is 4824 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323)


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                     B. Constantine
2	Internet-Draft                                                      JDSU
3	Intended status: Informational                                 G. Forget
4	Expires: July 31, 2011                     Bell Canada (Ext. Consultant)
5	                                                            Rudiger Geib
6	                                                        Deutsche Telekom
7	                                                        Reinhard Schrage
8	                                                      Schrage Consulting

10	                                                        January 31, 2011

12	                  Framework for TCP Throughput Testing
13	                draft-ietf-ippm-tcp-throughput-tm-11.txt

15	Abstract

17	   This framework describes a practical methodology for measuring end-
18	   to-end TCP throughput in a managed IP network. The goal is to provide
19	   a better indication in regards to user experience. In this framework,
20	   TCP and IP parameters are specified and should be configured as
21	   recommended.

23	Requirements Language

25	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
26	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
27	   document are to be interpreted as described in RFC 2119 [RFC2119].

29	Status of this Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at http://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	          This Internet-Draft will expire on July 31, 2011.

46	   Copyright Notice

48	   Copyright (c) 2011 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (http://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
64	     1.1   Terminology  . . . . . . . . . . . . . . . . . . . . . . .  4
65	     1.2   Test Set-up  . . . . . . . . . . . . . . . . . . . . . . .  5
66	   2.  Scope and Goals of this methodology. . . . . . . . . . . . . .  5
67	     2.1   TCP Equilibrium. . . . . . . . . . . . . . . . . . . . . .  6
68	   3.  TCP Throughput Testing Methodology . . . . . . . . . . . . . .  7
69	     3.1   Determine Network Path MTU . . . . . . . . . . . . . . . .  9
70	     3.2.  Baseline Round Trip Time and Bandwidth . . . . . . . . . . 10
71	         3.2.1  Techniques to Measure Round Trip Time . . . . . . . . 11
72	         3.2.2  Techniques to Measure end-to-end Bandwidth. . . . . . 12
73	     3.3.  TCP Throughput Tests . . . . . . . . . . . . . . . . . . . 12
74	         3.3.1 Calculate minimum required TCP RWND Size. . . . .  . . 12
75	         3.3.2 Metrics for TCP Throughput Tests . . . . . . . . . . . 15
76	         3.3.3 Conducting the TCP Throughput Tests. . . . . . . . . . 19
77	         3.3.4 Single vs. Multiple TCP Connection Testing . . . . . . 19
78	         3.3.5 Interpretation of the TCP Throughput Results . . . . . 20
79	         3.3.6 High Performance Network Options . . . . . . . . . . . 20
80	     3.4. Traffic Management Tests .  . . . . . . . . . . . . . . . . 22
81	         3.4.1 Traffic Shaping Tests. . . . . . . . . . . . . . . . . 23
82	          3.4.1.1 Interpretation of Traffic Shaping Test Results. . . 23
83	         3.4.2 AQM Tests. . . . . . . . . . . . . . . . . . . . . . . 24
84	          3.4.2.1 Interpretation of AQM Results . . . . . . . . . . . 25
85	   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 26
86	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 26
87	   6.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 26
88	   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 26
89	     7.1   Normative References . . . . . . . . . . . . . . . . . . . 26
90	     7.2   Informative References . . . . . . . . . . . . . . . . . . 27

92	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27

94	1. Introduction

96	   The SLA (Service Level Agreement) provided to business class
97	   customers is generally based upon Layer 2/3 criteria such as :
98	   Guaranteed bandwidth, maximum network latency, maximum packet loss
99	   percentage and maximum delay variation (i.e. maximum jitter).
100	   Network providers are coming to the realization that Layer 2/3
101	   testing is not enough to adequately ensure end-user's satisfaction.
102	   In addition to Layer 2/3 performance, measuring TCP throughput
103	   provides more meaningful results with respect to user experience.

105	   Additionally, business class customers seek to conduct repeatable TCP
106	   throughput tests between locations. Since these organizations rely on
107	   the networks of the providers, a common test methodology with
108	   predefined metrics would benefit both parties.

110	   Note that the primary focus of this methodology is managed business
111	   class IP networks; i.e. those Ethernet terminated services for which
112	   organizations are provided an SLA from the network provider.  Because
113	   of the SLA, the expectation is that the TCP Throughput should achieve
114	   the guaranteed bandwidth.   End-users with "best effort" access could
115	   use this methodology, but this framework and its metrics are intended
116	   to be used in a predictable managed IP network.   No end-to-end
117	   performance can be guaranteed when only the access portion is being
118	   provisioned to a specific bandwidth capacity.

120	   The intent behind this document is to define a methodology for
121	   testing sustained TCP layer performance.  In this document, the
122	   achievable TCP Throughput is that amount of data per unit time that
123	   TCP transports when in the TCP Equilibrium state.  (See section 2.1
124	   for TCP Equilibrium definition).  Throughout this document, maximum
125	   achievable throughput refers to the theoretical achievable throughput
126	   when TCP is in the Equilibrium state.

128	   TCP is connection oriented and at the transmitting side it uses a
129	   congestion window, (TCP CWND).  At the receiving end, TCP uses a
130	   receive window, (TCP RWND) to inform the transmitting end on how
131	   many Bytes it is capable to accept at a given time.

133	   Derived from Round Trip Time (RTT) and network path bandwidth, the
134	   bandwidth delay product (BDP) determines the Send and Received Socket
135	   buffers sizes required to achieve the maximum TCP throughput.  Then,
136	   with the help of slow start and congestion avoidance algorithms, a
137	   TCP CWND is calculated based on the IP network path loss rate.
138	   Finally, the minimum value between the calculated TCP CWND and the
139	   TCP RWND advertised by the opposite end will determine how many Bytes
140	   can actually be sent by the transmitting side at a given time.

142	   Both TCP Window sizes (RWND and CWND) may vary during any given TCP
143	   session, although up to bandwidth limits, larger RWND and larger CWND
144	   will achieve higher throughputs by permitting more in-flight Bytes.

146	   At both ends of the TCP connection and for each socket, there are
147	   default buffer sizes.  There are also kernel enforced maximum buffer
148	   sizes.  These buffer sizes can be adjusted at both ends (transmitting
149	   and receiving).  Some TCP/IP stack implementations use Receive Window
150	   Auto-Tuning, although in order to obtain the maximum throughput it is
151	   critical to use large enough TCP Send and Receive Socket Buffer
152	   sizes.  In fact, they should be equal to or greater than BDP.

154	   Many variables are involved in TCP throughput performance, but this
155	   methodology focuses on:
156	   - BB (Bottleneck Bandwidth)
157	   - RTT (Round Trip Time)
158	   - Send and Receive Socket Buffers
159	   - Minimum TCP RWND
160	   - Path MTU (Maximum Transmission Unit)
161	   - Path MSS (Maximum Segment Size)

163	   This methodology proposes TCP testing that should be performed in
164	   addition to traditional Layer 2/3 type tests.   In fact, Layer 2/3
165	   tests are required to verify the integrity of the network before
166	   conducting TCP tests.  Examples include iperf (UDP mode) and manual
167	   packet layer test techniques where packet throughput, loss, and delay
168	   measurements are conducted.  When available, standardized testing
169	   similar to [RFC2544] but adapted for use in operational networks may
170	   be used.

172	   Note: RFC 2544 was never meant to be used outside a lab environment.

174	   Sections 2 and 3 of this document provide a general overview of the
175	   proposed methodology.

177	1.1 Terminology

179	   The common definitions used in this methodology are:

181	   - TCP Throughput Test Device (TCP TTD), refers to compliant TCP
182	     host that generates traffic and measures metrics as defined in
183	     this methodology. i.e. a dedicated communications test instrument.
184	   - Customer Provided Equipment (CPE), refers to customer owned
185	     equipment (routers, switches, computers, etc.)
186	   - Customer Edge (CE), refers to provider owned demarcation device.
187	   - Provider Edge (PE), refers to provider's distribution equipment.
188	   - Bottleneck Bandwidth (BB), lowest bandwidth along the complete
189	     path. Bottleneck Bandwidth and Bandwidth are used synonymously
190	     in this document. Most of the time the Bottleneck Bandwidth is
191	     in the access portion of the wide area network (CE - PE).
192	   - Provider (P), refers to provider core network equipment.
193	   - Network Under Test (NUT), refers to the tested IP network path.
194	   - Round Trip Time (RTT), refers to Layer 4 back and forth delay.

196	   Figure 1.1 Devices, Links and Paths

198	 +----+ +----+ +----+  +----+ +---+  +---+ +----+  +----+ +----+ +----+
199	 | TCP|-| CPE|-| CE |--| PE |-| P |--| P |-| PE |--| CE |-| CPE|-| TCP|
200	 | TTD| |    | |    |BB|    | |   |  |   | |    |BB|    | |    | | TTD|
201	 +----+ +----+ +----+  +----+ +---+  +---+ +----+  +----+ +----+ +----+
202	        <------------------------ NUT ------------------------->
203	    R >-----------------------------------------------------------|
204	    T                                                             |
205	    T <-----------------------------------------------------------|

207	   Note that the NUT may be built with of a variety of devices including
208	   but not limited to, load balancers, proxy servers or WAN acceleration
209	   appliances.   The detailed topology of the NUT should be well known
210	   when conducting the TCP throughput tests, although this methodology
211	   makes no attempt to characterize specific network architectures.

213	   1.2 Test Set-up

215	   This methodology is intended for operational and managed IP networks.
216	   A multitude of network architectures and topologies can be tested.
217	   The above diagram is very general and is only there to illustrate
218	   typical segmentation within end-user and network provider domains.

220	2. Scope and Goals of this Methodology

222	   Before defining the goals, it is important to clearly define the
223	   areas that are out-of-scope.

225	   - This methodology is not intended to predict the TCP throughput
226	   during the transient stages of a TCP connection, such as during the
227	   initial slow start phase.

229	   - This methodology is not intended to definitively benchmark TCP
230	   implementations of one OS to another, although some users may find
231	   value in conducting qualitative experiments.

233	   - This methodology is not intended to provide detailed diagnosis
234	   of problems within end-points or within the network itself as
235	   related to non-optimal TCP performance, although a results
236	   interpretation section for each test step may provide insights to
237	   potential issues.

239	   - This methodology does not propose to operate permanently with high
240	   measurement loads.  TCP performance and optimization within
241	   operational networks may be captured and evaluated by using data
242	   from the "TCP Extended Statistics MIB" [RFC4898].

244	   - This methodology is not intended to measure TCP throughput as part
245	   of an SLA, or to compare the TCP performance between service
246	   providers or to compare between implementations of this methodology
247	   in dedicated communications test instruments.

249	   In contrast to the above exclusions, the primary goal is to define a
250	   method to conduct a practical end-to-end assessment of sustained
251	   TCP performance within a managed business class IP network.  Another
252	   key goal is to establish a set of "best practices" that a non-TCP
253	   expert should apply when validating the ability of a managed IP
254	   network to carry end-user TCP applications.

256	   Specific goals are to :

258	   - Provide a practical test approach that specifies tunable parameters
259	   (such as MSS (Maximum Segment Size) and Socket Buffer sizes) and how
260	   these affect the outcome of TCP performances over an IP network.
261	   See section 3.3.3.

263	   - Provide specific test conditions like link speed, RTT, MSS, Socket
264	   Buffer sizes and achievable TCP throughput when TCP is in the
265	   Equilibrium state.  For guideline purposes, provide examples of
266	   test conditions and their maximum achievable TCP throughput.
267	   Section 2.1 provides specific details concerning the definition of
268	   TCP Equilibrium within this methodology while section 3 provides
269	   specific test conditions with examples.

271	   - Define three (3) basic metrics to compare the performance of TCP
272	   connections under various network conditions.  See section 3.3.2.

274	   - In test situations where the recommended procedure does not yield
275	   the maximum achievable TCP throughput, this methodology provides
276	   some possible areas within the end host or the network that should
277	   be considered for investigation.   Although again, this methodology
278	   is not intended to provide detailed diagnosis on these issues.
279	   See section 3.3.5.

281	2.1 TCP Equilibrium

283	   TCP connections have three (3) fundamental congestion window phases:

285	   1 - The Slow Start phase, which occurs at the beginning of a TCP
286	   transmission or after a retransmission time out.

288	   2 - The Congestion Avoidance phase, during which TCP ramps up to
289	   establish the maximum achievable throughput.  It is important to note
290	   that retransmissions are a natural by-product of the TCP congestion
291	   avoidance algorithm as it seeks to achieve maximum throughput.

293	   3 - The Loss Recovery phase, which could include Fast Retransmit
294	   (Tahoe) or Fast Recovery (Reno & New Reno).  When packet loss occurs,
295	   Congestion Avoidance phase transitions either to Fast Retransmission
296	   or Fast Recovery depending upon the TCP implementation. If a Time-Out
297	   occurs, TCP transitions back to the Slow Start phase.

299	   The following diagram depicts these 3 phases.

301	   Figure 2.1 TCP CWND Phases

303	        /\  |                                                TCP
304	        /\  |                                                Equilibrium
305	        /\  |High ssthresh  TCP CWND
306	        /\  |Loss Event *   halving    3-Loss Recovery
307	        /\  |          * \  upon loss                         Adjusted
308	        /\  |          *  \    /  \        Time-Out           ssthresh
309	        /\  |          *   \  /    \      +--------+         *
310	        /\  |          *    \/      \    / Multiple|        *
311	        /\  |          * 2-Congestion\  /  Loss    |        *
312	        /\  |         *    Avoidance  \/   Event   |       *
313	   TCP      |        *              Half           |     *
314	   Through- |      *                TCP CWND       | * 1-Slow Start
315	   put      | * 1-Slow Start                      Min TCP CWND after T-O
316	            +-----------------------------------------------------------
317	             Time > > > > > > > > > > > > > > > > > > > > > > > > > > >

319	   Note : ssthresh = Slow Start threshold.

321	   A well tuned and managed IP network with appropriate TCP adjustments
322	   in the IP hosts and applications should perform very close to the
323	   BB (Bottleneck Bandwidth) when TCP is in the Equilibrium state.

325	   This TCP methodology provides guidelines to measure the maximum
326	   achievable TCP throughput when TCP is in the Equilibrium state.
327	   All maximum achievable TCP throughputs specified in section 3 are
328	   with respect to this condition.

330	   It is important to clarify the interaction between the sender's Send
331	   Socket Buffer and the receiver's advertised TCP RWND Size.  TCP test
332	   programs such as iperf, ttcp, etc. allows the sender to control the
333	   quantity of TCP Bytes transmitted and unacknowledged (in-flight),
334	   commonly referred to as the Send Socket Buffer.   This is done
335	   independently of the TCP RWND Size advertised by the receiver.
336	   Implications to the capabilities of the Throughput Test Device (TTD)
337	   are covered at the end of section 3.

339	3. TCP Throughput Testing Methodology

341	   As stated earlier in section 1, it is considered best practice to
342	   verify the integrity of the network by conducting Layer 2/3 tests
343	   such as [RFC2544] or other methods of network stress tests.
344	   Although, it is important to mention here that RFC 2544 was never
345	   meant to be used outside a lab environment.

347	   If the network is not performing properly in terms of packet loss,
348	   jitter, etc. then the TCP layer testing will not be meaningful.  A
349	   dysfunctional network will not achieve optimal TCP throughputs in
350	   regards with the available bandwidth.

352	   TCP Throughput testing may require cooperation between the end-user
353	   customer and the network provider.  As an example, in an MPLS (Multi-
354	   Protocol Label Switching) network architecture, the testing should be
355	   conducted either on the CPE or on the CE device and not on the PE
356	   (Provider Edge) router.

358	   The following represents the sequential order of steps for this
359	   testing methodology:

361	   1. Identify the Path MTU.  Packetization Layer Path MTU Discovery
362	   or PLPMTUD, [RFC4821], MUST be conducted to verify the network path
363	   MTU.  Conducting PLPMTUD establishes the upper limit for the MSS to
364	   be used in subsequent steps.

366	   2. Baseline Round Trip Time and Bandwidth. This step establishes the
367	   inherent, non-congested Round Trip Time (RTT) and the Bottleneck
368	   Bandwidth of the end-to-end network path.  These measurements are
369	   used to provide estimates of the TCP RWND and Send Socket Buffer
370	   Sizes that SHOULD be used during subsequent test steps.   These
371	   measurements refers to [RFC2681] and [RFC4898] in order to measure
372	   RTD and associated RTT.

374	   3. TCP Connection Throughput Tests.  With baseline measurements
375	   of Round Trip Time and Bottleneck Bandwidth, single and multiple TCP
376	   connection throughput tests SHOULD be conducted to baseline network
377	   performances.

379	   4. Traffic Management Tests.  Various traffic management and queuing
380	   techniques can be tested in this step, using multiple TCP
381	   connections.  Multiple connections testing should verify that the
382	   network is configured properly for traffic shaping versus policing
383	   and that Active Queue Management implementations are used.

385	   Important to note are some of the key characteristics and
386	   considerations for the TCP test instrument.  The test host may be a
387	   standard computer or a dedicated communications test instrument.
388	   In both cases, it must be capable of emulating both a client and a
389	   server.

391	   The following criteria should be considered when selecting whether
392	   the TCP test host can be a standard computer or has to be a dedicated
393	   communications test instrument:

395	   - TCP implementation used by the test host, OS version, i.e. LINUX OS
396	   kernel using TCP New Reno, TCP options supported, etc.  These will
397	   obviously be more important when using dedicated communications test
398	   instruments where the TCP implementation may be customized or tuned
399	   to run in higher performance hardware.  When a compliant TCP TTD is
400	   used, the TCP implementation MUST be identified in the test results.
401	   The compliant TCP TTD should be usable for complete end-to-end
402	   testing through network security elements and should also be usable
403	   for testing network sections.

405	   - More important, the TCP test host MUST be capable to generate
406	   and receive stateful TCP test traffic at the full link speed of the
407	   network under test. Stateful TCP test traffic means that the test
408	   host MUST fully implement a TCP/IP stack; this is generally a comment
409	   aimed at dedicated communications test equipments which sometimes
410	   "blast" packets with TCP headers. As a general rule of thumb, testing
411	   TCP throughput at rates greater than 100 Mbit/sec MAY require high
412	   performance server hardware or dedicated hardware based test tools.

414	   - A compliant TCP Throughput Test Device MUST allow adjusting both
415	   Send and Receive Socket Buffer sizes.  The Socket Buffers MUST be
416	   large enough to fill the BDP.

418	   - Measuring RTT and retransmissions per connection will generally
419	   require a dedicated communications test instrument. In the absence of
420	   dedicated hardware based test tools, these measurements may need to
421	   be conducted with packet capture tools, i.e. conduct TCP throughput
422	   tests and analyze RTT and retransmissions in packet captures.
423	   Another option may be to use "TCP Extended Statistics MIB" per
424	   [RFC4898].

426	   - The RFC4821 PLPMTUD test SHOULD be conducted with a dedicated
427	   tester which exposes the ability to run the PLPMTUD algorithm
428	   independently from the OS stack.

430	3.1. Determine Network Path MTU

432	   TCP implementations should use Path MTU Discovery techniques (PMTUD).
433	   PMTUD relies on ICMP 'need to frag' messages to learn the path MTU.
434	   When a device has a packet to send which has the Don't Fragment (DF)
435	   bit in the IP header set and the packet is larger than the Maximum
436	   Transmission Unit (MTU) of the next hop, the packet is dropped and
437	   the device sends an ICMP 'need to frag' message back to the host that
438	   originated the packet. The ICMP 'need to frag' message includes
439	   the next hop MTU which PMTUD uses to tune the TCP Maximum Segment
440	   Size (MSS). Unfortunately, because many network managers completely
441	   disable ICMP, this technique does not always prove reliable.

443	   Packetization Layer Path MTU Discovery or PLPMTUD [RFC4821] MUST then
444	   be conducted to verify the network path MTU.  PLPMTUD can be used
445	   with or without ICMP. The following sections provide a summary of the
446	   PLPMTUD approach and an example using TCP. [RFC4821] specifies a
447	   search_high and a search_low parameter for the MTU.  As specified in
448	   [RFC4821], 1024 Bytes is a safe value for search_low in modern
449	   networks.

451	   It is important to determine the links overhead along the IP path,
452	   and then to select a TCP MSS size corresponding to the Layer 3 MTU.
453	   For example, if the MTU is 1024 Bytes and the TCP/IP headers are 40
454	   Bytes, (20 for IP + 20 for TCP) then the MSS would be 984 Bytes.

456	   An example scenario is a network where the actual path MTU is 1240
457	   Bytes.  The TCP client probe MUST be capable of setting the MSS for
458	   the probe packets and could start at MSS = 984 (which corresponds
459	   to an MTU size of 1024 Bytes).

461	   The TCP client probe would open a TCP connection and advertise the
462	   MSS as 984.  Note that the client probe MUST generate these packets
463	   with the DF bit set. The TCP client probe then sends test traffic
464	   per a small default Send Socket Buffer size of ~8KBytes.  It should
465	   be kept small to minimize the possibility of congesting the network,
466	   which may induce packet loss.  The duration of the test should also
467	   be short (10-30 seconds), again to minimize congestive effects
468	   during the test.

470	   In the example of a 1240 Bytes path MTU, probing with an MSS equal to
471	   984 would yield a successful probe and the test client packets would
472	   be successfully transferred to the test server.

474	   Also note that the test client MUST verify that the advertised MSS
475	   is indeed negotiated.  Network devices with built-in Layer 4
476	   capabilities can intercede during the connection establishment and
477	   reduce the advertised MSS to avoid fragmentation.  This is certainly
478	   a desirable feature from a network perspective, but it can yield
479	   erroneous test results if the client test probe does not confirm the
480	   negotiated MSS.

482	   The next test probe would use the search_high value and it would be
483	   set to a MSS of 1460 in order to produce a 1500 Bytes MTU.  In this
484	   example, the test client will retransmit based upon time-outs, since
485	   no ACKs will be received from the test server.  This test probe is
486	   marked as a conclusive failure if none of the test packets are
487	   ACK'ed.  If none of the test packets are ACK'ed, congestive network
488	   may be the cause and the test probe is not conclusive.  Re-testing
489	   at another time is recommended to further isolate.

491	   The test is repeated until the desired granularity of the MTU is
492	   discovered.  The method can yield precise results at the expense of
493	   probing time.  One approach may be to reduce the probe size to
494	   half between the unsuccessful search_high and successful search_low
495	   value and raise it by half when seeking the upper limit.

497	3.2. Baseline Round Trip Time and Bandwidth

499	   Before stateful TCP testing can begin, it is important to determine
500	   the baseline Round Trip Time (i.e. non-congested inherent delay) and
501	   Bottleneck Bandwidth of the end-to-end network to be tested.   These
502	   measurements are used to calculate the BDP and to provide estimates
503	   of the TCP RWND and Send Socket Buffer Sizes that SHOULD be used in
504	   subsequent test steps.

506	3.2.1 Techniques to Measure Round Trip Time

508	   Following the definitions used in section 1.1, Round Trip Time (RTT)
509	   is the elapsed time between the clocking in of the first bit of a
510	   payload sent packet and the receipt of the last bit of the
511	   corresponding Acknowledgment.  Round Trip Delay (RTD) is used
512	   synonymously to twice the Link Latency.  RTT measurements SHOULD use
513	   techniques defined in [RFC2681] or statistics available from MIBs
514	   defined in [RFC4898].

516	   The RTT SHOULD be baselined during off-peak hours in order to obtain
517	   a reliable figure of the inherent network latency.  Otherwise,
518	   additional delay caused by network buffering can occur.  Also, when
519	   sampling RTT values over a given test interval, the minimum
520	   measured value SHOULD be used as the baseline RTT.  This will most
521	   closely estimate the real inherent RTT.  This value is also used to
522	   determine the Buffer Delay Percentage metric defined in Section 3.3.2

524	   The following list is not meant to be exhaustive,  although it
525	   summarizes some of the most common ways to determine Round Trip Time.
526	   The desired measurement precision (i.e. msec versus usec) may dictate
527	   whether the RTT measurement can be achieved with ICMP pings or by a
528	   dedicated communications test instrument with precision timers.

530	   The objective in this section is to list several techniques
531	   in order of decreasing accuracy.

533	   - Use test equipment on each end of the network, "looping" the
534	   far-end tester so that a packet stream can be measured back and forth
535	   from end-to-end. This RTT measurement may be compatible with delay
536	   measurement protocols specified in [RFC5357].

538	   - Conduct packet captures of TCP test sessions using "iperf" or FTP,
539	   or other TCP test applications.   By running multiple experiments,
540	   packet captures can then be analyzed to estimate RTT.  It is
541	   important to note that results based upon the SYN -> SYN-ACK at the
542	   beginning of TCP sessions should be avoided since Firewalls might
543	   slow down 3 way handshakes.  Also, at the senders side, Ostermann's
544	   LINUX TCPTRACE utility with -l -r arguments can be used to extract
545	   the RTT results directly from the packet captures.

547	   - ICMP pings may also be adequate to provide Round Trip Time
548	   estimates, provided that the packet size is factored into the
549	   estimates (i.e. pings with different packet sizes might be required).
550	   Some limitations with ICMP Ping may include msec resolution and
551	   whether the network elements are responding to pings or not.  Also,
552	   ICMP is often rate-limited or segregated into different buffer
553	   queues.   ICMP might not work if QoS (Quality of Service)
554	   reclassification is done at any hop.   ICMP is not as reliable and
555	   accurate as in-band measurements.

557	3.2.2 Techniques to Measure end-to-end Bandwidth

559	   Before any TCP Throughput test can be conducted, bandwidth
560	   measurement tests MUST be run with stateless IP streams (i.e. not
561	   stateful TCP) in order to determine the available path bandwidth.
562	   These measurements SHOULD be conducted in both directions,
563	   especially in asymmetrical access networks (e.g. ADSL access).
564	   These tests should obviously be performed at various intervals
565	   throughout a business day or even across a week.  Ideally, the
566	   bandwidth tests should produce logged outputs of the achieved
567	   bandwidths across the complete test duration.

569	   There are many well established techniques available to provide
570	   estimated measures of bandwidth over a network.  It is a common
571	   practice for network providers to conduct Layer 2/3 bandwidth
572	   capacity tests using [RFC2544], although it is understood that
573	   [RFC2544] was never meant to be used outside a lab environment.
574	   Ideally, these bandwidth measurements SHOULD use network capacity
575	   techniques as defined in [RFC5136].

577	3.3. TCP Throughput Tests

579	   This methodology specifically defines TCP throughput techniques to
580	   verify maximum achievable TCP performance in a managed business
581	   class IP network, as defined in section 2.1. This document defines
582	   a method to conduct these maximum achievable TCP throughput tests
583	   as well as guidelines on the predicted results.

585	   With baseline measurements of Round Trip Time and bandwidth from
586	   section 3.2, a series of single and multiple TCP connection
587	   throughput tests SHOULD be conducted in order to measure network
588	   performance against expectations.  The number of trials and the type
589	   of testing (i.e. single versus multiple connections) will vary
590	   according to the intention of the test.   One example would be a
591	   single connection test in which the throughput achieved by large
592	   Send and Receive Socket Buffer sizes (i.e. 256KB) is to be measured.
593	   It would be advisable to test at various times of the business day.

595	   It is RECOMMENDED to run the tests in each direction independently
596	   first, then run both directions simultaneously.  In each case, the
597	   TCP Transfer Time, TCP Efficiency, and Buffer Delay Percentage
598	   metrics MUST be measured in each direction.  These metrics are
599	   defined in 3.3.2.

601	3.3.1 Calculate minimum required TCP RWND Size

603	   The minimum required TCP RWND Size can be calculated from the
604	   bandwidth delay product (BDP), which is:

606	   BDP (bits) = RTT (sec) x Bandwidth (bps)

608	   Note that the RTT is being used as the "Delay" variable in the
609	   BDP calculations.

611	   Then, by dividing the BDP by 8, we obtain the minimum required TCP
612	   RWND Size in Bytes.  For optimal results, the Send Socket Buffer size
613	   must be adjusted to the same value at the opposite end of the network
614	   path.

616	   Minimum required TCP RWND = BDP / 8

618	   An example would be a T3 link with 25 msec RTT.  The BDP would equal
619	   ~1,105,000 bits and the minimum required TCP RWND would be ~138
620	   KBytes.

622	   Note that separate calculations are required on asymmetrical paths.
623	   An asymmetrical path example would be a 90 msec RTT ADSL line with
624	   5Mbps downstream and 640Kbps upstream. The downstream BDP would equal
625	   ~450,000 bits while the upstream one would be only ~57,600 bits.

627	   The following table provides some representative network Link Speeds,
628	   RTT, BDP, and their associated minimum required TCP RWND Sizes.

630	   Table 3.3.1: Link Speed, RTT, calculated BDP & minimum TCP RWND

632	      Link                                         Minimum required
633	      Speed*         RTT              BDP             TCP RWND
634	      (Mbps)         (ms)            (bits)           (KBytes)
635	   ---------------------------------------------------------------------
636	        1.536        20              30,720              3.84
637	        1.536        50              76,800              9.60
638	        1.536       100             153,600             19.20
639	       44.210        10             442,100             55.26
640	       44.210        15             663,150             82.89
641	       44.210        25           1,105,250            138.16
642	      100             1             100,000             12.50
643	      100             2             200,000             25.00
644	      100             5             500,000             62.50
645	    1,000             0.1           100,000             12.50
646	    1,000             0.5           500,000             62.50
647	    1,000             1           1,000,000            125.00
648	   10,000             0.05          500,000             62.50
649	   10,000             0.3         3,000,000            375.00

651	   * Note that link speed is the Bottleneck Bandwidth (BB) for the NUT

653	   The following serial link speeds are used:
654	   - T1 = 1.536 Mbits/sec (for a B8ZS line encoding facility)
655	   - T3 = 44.21 Mbits/sec (for a C-Bit Framing facility)

657	   The above table illustrates the minimum required TCP RWND.
658	   If a smaller TCP RWND Size is used, then the TCP Throughput
659	   can not be optimal. To calculate the TCP Throughput, the following
660	   formula is used: TCP Throughput = TCP RWND X 8 / RTT
661	   An example could be a 100 Mbps IP path with 5 ms RTT and a TCP RWND
662	   of 16KB, then:

664	   TCP Throughput = 16 KBytes X 8 bits / 5 ms.
665	   TCP Throughput = 128,000 bits / 0.005 sec.
666	   TCP Throughput = 25.6 Mbps.

668	   Another example for a T3 using the same calculation formula is
669	   illustrated on the next page:

671	   TCP Throughput = 16 KBytes X 8 bits / 10 ms.
672	   TCP Throughput = 128,000 bits / 0.01 sec.
673	   TCP Throughput = 12.8 Mbps.

675	   When the TCP RWND Size exceeds the BDP (T3 link and 64 KBytes TCP
676	   RWND on a 10 ms RTT path), the maximum frames per second limit of
677	   3664 is reached and then the formula is:

679	   TCP Throughput = Max FPS X MSS X 8.
680	   TCP Throughput = 3664 FPS X 1460 Bytes X 8 bits.
681	   TCP Throughput = 42.8 Mbps

683	   The following diagram compares achievable TCP throughputs on a T3
684	   with Send Socket Buffer & TCP RWND Sizes of 16KB vs. 64KB.

686	   Figure 3.3.1a TCP Throughputs on a T3 at different RTTs

688	           45|
689	             |           _______42.8M
690	           40|           |64KB |
691	TCP          |           |     |
692	Throughput 35|           |     |
693	in Mbps      |           |     |          +-----+34.1M
694	           30|           |     |          |64KB |
695	             |           |     |          |     |
696	           25|           |     |          |     |
697	             |           |     |          |     |
698	           20|           |     |          |     |          _______20.5M
699	             |           |     |          |     |          |64KB |
700	           15|           |     |          |     |          |     |
701	             |12.8M+-----|     |          |     |          |     |
702	           10|     |16KB |     |          |     |          |     |
703	             |     |     |     |8.5M+-----|     |          |     |
704	            5|     |     |     |    |16KB |     |5.1M+-----|     |
705	             |_____|_____|_____|____|_____|_____|____|16KB |_____|_____
706	                        10               15               25
707	                                RTT in milliseconds

709	   The following diagram shows the achievable TCP throughput on a 25ms
710	   T3 when Send Socket Buffer & TCP RWND Sizes are increased.

712	   Figure 3.3.1b TCP Throughputs on a T3 with different TCP RWND

714	           45|
715	             |
716	           40|                                             +-----+40.9M
717	TCP          |                                             |     |
718	Throughput 35|                                             |     |
719	in Mbps      |                                             |     |
720	           30|                                             |     |
721	             |                                             |     |
722	           25|                                             |     |
723	             |                                             |     |
724	           20|                               +-----+20.5M  |     |
725	             |                               |     |       |     |
726	           15|                               |     |       |     |
727	             |                               |     |       |     |
728	           10|                  +-----+10.2M |     |       |     |
729	             |                  |     |      |     |       |     |
730	            5|     +-----+5.1M  |     |      |     |       |     |
731	             |_____|_____|______|_____|______|_____|_______|_____|_____
732	                     16           32           64            128*
733	                          TCP RWND Size in KBytes

735	   * Note that 128KB requires [RFC1323] TCP Window scaling option.

737	3.3.2 Metrics for TCP Throughput Tests

739	   This framework focuses on a TCP throughput methodology and also
740	   provides several basic metrics to compare results between various
741	   throughput tests.  It is recognized that the complexity and
742	   unpredictability of TCP makes it impossible to develop a complete
743	   set of metrics that accounts for the myriad of variables (i.e. RTT
744	   variation, loss conditions, TCP implementation, etc.).  However,
745	   these basic metrics will facilitate TCP throughput comparisons
746	   under varying network conditions and between network traffic
747	   management techniques.

749	   The first metric is the TCP Transfer Time, which is simply the
750	   measured time required to transfer a block of data across
751	   simultaneous TCP connections.  This concept is useful when
752	   benchmarking traffic management techniques and when multiple
753	   TCP connections are required.

755	   TCP Transfer time may also be used to provide a normalized ratio of
756	   the actual TCP Transfer Time versus the Ideal Transfer Time.  This
757	   ratio is called the TCP Transfer Index and is defined as:

759	                     Actual TCP Transfer Time
760	                    -------------------------
761	                     Ideal TCP Transfer Time

763	   The Ideal TCP Transfer time is derived from the network path
764	   Bottleneck Bandwidth and Layer 1/2/3/4 overheads associated with the
765	   network path.  Additionally, both the TCP RWND and the Send Socket
766	   Buffer Sizes must be tuned to equal or exceed the bandwidth delay
767	   product (BDP) as described in section 3.3.1.

769	   The following table illustrates the Ideal TCP Transfer time of a
770	   single TCP connection when its TCP RWND and Send Socket Buffer Sizes
771	   equals or exceeds the BDP.

773	   Table 3.3.2: Link Speed, RTT, BDP, TCP Throughput, and
774	                Ideal TCP Transfer time for a 100 MB File

776	       Link                             Maximum            Ideal TCP
777	       Speed                   BDP      Achievable TCP     Transfer time
778	       (Mbps)     RTT (ms)   (KBytes)   Throughput(Mbps)   (seconds)
779	   --------------------------------------------------------------------
780	         1.536    50            9.6            1.4             571
781	        44.21     25          138.2           42.8              18
782	       100         2           25.0           94.9               9
783	     1,000         1          125.0          949.2               1
784	    10,000         0.05        62.5        9,492                 0.1

786	    Transfer times are rounded for simplicity.

788	   For a 100MB file(100 x 8 = 800 Mbits), the Ideal TCP Transfer Time
789	   is derived as follows:

791	                                           800 Mbits
792	       Ideal TCP Transfer Time = -----------------------------------
793	                                  Maximum Achievable TCP Throughput

795	   The maximum achievable layer 2 throughput on T1 and T3 Interfaces
796	   is based on the maximum frames per second (FPS) permitted by the
797	   actual layer 1 speed with an MTU of 1500 Bytes.

799	   The maximum FPS for a T1 is 127 and the calculation formula is:
800	   FPS = T1 Link Speed / ((MTU + PPP + Flags + CRC16) X 8)
801	   FPS = (1.536M /((1500 Bytes + 4 Bytes + 2 Bytes + 2 Bytes) X 8 )))
802	   FPS = (1.536M / (1508 Bytes X 8))
803	   FPS = 1.536 Mbps / 12064 bits
804	   FPS = 127

806	   The maximum FPS for a T3 is 3664 and the calculation formula is:
807	   FPS = T3 Link Speed / ((MTU + PPP + Flags + CRC16) X 8)
808	   FPS = (44.21M /((1500 Bytes + 4 Bytes + 2 Bytes + 2 Bytes) X 8 )))
809	   FPS = (44.21M / (1508 Bytes X 8))
810	   FPS = 44.21 Mbps / 12064 bits
811	   FPS = 3664
812	   The 1508 equates to:

814	     MTU + PPP + Flags + CRC16

816	   Where the MTU is 1500 Bytes, PPP is 4 Bytes, the 2 Flags are 1 Byte
817	   each and the CRC16 is 2 Bytes.

819	   Then, to obtain the Maximum Achievable TCP Throughput (layer 4), we
820	   simply use: MSS in Bytes X 8 bits X max FPS.
821	   For a T3, the maximum TCP Throughput = 1460 Bytes X 8 bits X 3664 FPS
822	   Maximum TCP Throughput = 11680 bits X 3664 FPS
823	   Maximum TCP Throughput = 42.8 Mbps.

825	   The maximum achievable layer 2 throughput on Ethernet Interfaces is
826	   based on the maximum frames per second permitted by the IEEE802.3
827	   standard when the MTU is 1500 Bytes.

829	   The maximum FPS for 100M Ethernet is 8127 and the calculation is:
830	   FPS = (100Mbps /(1538 Bytes X 8 bits))

832	   The maximum FPS for GigE is 81274 and the calculation formula is:
833	   FPS = (1Gbps /(1538 Bytes X 8 bits))

835	   The maximum FPS for 10GigE is 812743 and the calculation formula is:
836	   FPS = (10Gbps /(1538 Bytes X 8 bits))

838	   The 1538 equates to:

840	     MTU + Eth + CRC32 + IFG + Preamble + SFD
841	        (IFG = Inter-Frame Gap and SFD = Start of Frame Delimiter)
842	   Where MTU is 1500 Bytes, Ethernet is 14 Bytes, CRC32 is 4 Bytes,
843	   IFG is 12 Bytes, Preamble is 7 Bytes and SFD is 1 Byte.

845	   Note that better results could be obtained with jumbo frames on
846	   GigE and 10 GigE.

848	   Then, to obtain the Maximum Achievable TCP Throughput (layer 4), we
849	   simply use: MSS in Bytes X 8 bits X max FPS.
850	   For a 100M, the maximum TCP Throughput = 1460 B X 8 bits X 8127 FPS
851	   Maximum TCP Throughput = 11680 bits X 8127 FPS
852	   Maximum TCP Throughput = 94.9 Mbps.

854	   To illustrate the TCP Transfer Time Index, an example would be the
855	   bulk transfer of 100 MB over 5 simultaneous TCP connections  (each
856	   connection transferring 100 MB).  In this example, the Ethernet
857	   service provides a Committed Access Rate (CAR) of 500 Mbit/s.  Each
858	   connection may achieve different throughputs during a test and the
859	   overall throughput rate is not always easy to determine (especially
860	   as the number of connections increases).

862	   The ideal TCP Transfer Time would be ~8 seconds, but in this example,
863	   the actual TCP Transfer Time was 12 seconds.  The TCP Transfer Index
864	   would then be 12/8 = 1.5, which indicates that the transfer across
865	   all connections took 1.5 times longer than the ideal.

867	   The second metric is TCP Efficiency, which is the percentage of Bytes
868	   that were not retransmitted and is defined as:

870	                Transmitted Bytes - Retransmitted Bytes
871	                ---------------------------------------  x 100
872	                          Transmitted Bytes

874	   Transmitted Bytes are the total number of TCP Bytes to be transmitted
875	   including the original and the retransmitted Bytes.   This metric
876	   provides comparative results between various traffic management and
877	   congestion avoidance mechanisms.   Performance between different TCP
878	   implementations could also be compared. (e.g. Reno, Vegas, etc).

880	   As an example, if 100,000 Bytes were sent and 2,000 had to be
881	   retransmitted, the TCP Efficiency should be calculated as:

883	                   102,000 - 2,000
884	                   ----------------  x 100 = 98.03%
885	                        102,000

887	   Note that the Retransmitted Bytes may have occurred more than once,
888	   if so, then these multiple retransmissions are added to the
889	   Retransmitted Bytes and to the Transmitted Bytes counts.

891	   The third metric is the Buffer Delay Percentage, which represents the
892	   increase in RTT during a TCP throughput test versus the inherent or
893	   baseline RTT. The baseline RTT is the Round Trip Time inherent to
894	   the network path under non-congested conditions.
895	   (See 3.2.1 for details concerning the baseline RTT measurements).

897	   The Buffer Delay Percentage is defined as:

899	              Average RTT during Transfer - Baseline RTT
900	              ------------------------------------------ x 100
901	                             Baseline RTT

903	   As an example, consider a network path with a baseline RTT of 25
904	   msec.  During the course of a TCP transfer, the average RTT across
905	   the entire transfer increases to 32 msec.  Then, the Buffer Delay
906	   Percentage would be calculated as:

908	                          32 - 25
909	                          ------- x 100 = 28%
910	                             25

912	   Note that the TCP Transfer Time, TCP Efficiency, and Buffer Delay
913	   Percentage MUST be measured during each throughput test. Poor TCP
914	   Transfer Time Indexes (TCP Transfer Time greater than Ideal TCP
915	   Transfer Times) may be diagnosed by correlating with sub-optimal TCP
916	   Efficiency and/or Buffer Delay Percentage metrics.

918	3.3.3 Conducting the TCP Throughput Tests

920	   Several TCP tools are currently used in the network world and one of
921	   the most common is "iperf".  With this tool, hosts are installed at
922	   each end of the network path; one acts as client and the other as
923	   a server.  The Send Socket Buffer and the TCP RWND Sizes of both
924	   client and server can be manually set.  The achieved throughput can
925	   then be measured, either uni-directionally or bi-directionally.  For
926	   higher BDP situations in lossy networks (long fat networks or
927	   satellite links, etc.), TCP options such as Selective Acknowledgment
928	   SHOULD be considered and become part of the window size / throughput
929	   characterization.

931	   Host hardware performance must be well understood before conducting
932	   the tests described in the following sections.  A dedicated
933	   communications test instrument will generally be required, especially
934	   for line rates of GigE and 10 GigE.  A compliant TCP TTD SHOULD
935	   provide a warning message when the expected test throughput will
936	   exceed 10% of the network bandwidth capacity.  If the throughput test
937	   is expected to exceed 10% of the provider bandwidth, then the test
938	   should be coordinated with the network provider.  This does not
939	   include the customer premise bandwidth, the 10% refers directly to
940	   the provider's bandwidth (Provider Edge to Provider router).

942	   The TCP throughput test should be run over a long enough duration
943	   to properly exercise network buffers (i.e. greater than 30 seconds)
944	   and should also characterize performance at different times of day.

946	3.3.4 Single vs. Multiple TCP Connection Testing

948	   The decision whether to conduct single or multiple TCP connection
949	   tests depends upon the size of the BDP in relation to the TCP RWND
950	   configured in the end-user environment. For example, if the BDP for
951	   a long fat network turns out to be 2MB, then it is probably more
952	   realistic to test this network path with multiple connections.
953	   Assuming typical host computer TCP RWND Sizes of 64 KB (i.e. Windows
954	   XP), using 32 TCP connections would emulate a typical small office
955	   scenario.

957	   The following table is provided to illustrate the relationship
958	   between the TCP RWND and the number of TCP connections required to
959	   fill the available capacity of a given BDP. For this example, the
960	   network bandwidth is 500 Mbps and the RTT is 5 ms, then the BDP
961	   equates to 312.5 KBytes.

963	   Table 3.3.4 Number of TCP connections versus TCP RWND

965	                 Number of TCP Connections
966	      TCP RWND   to fill available bandwidth
967	     -------------------------------------
968	       16KB             20
969	       32KB             10
970	       64KB              5
971	      128KB              3

973	   The TCP Transfer Time metric is useful for conducting multiple
974	   connection tests.  Each connection should be configured to transfer
975	   payloads of the same size (i.e. 100 MB), and the TCP Transfer time
976	   provides a simple metric to verify the actual versus expected
977	   results.

979	   Note that the TCP transfer time is the time for all connections to
980	   complete the transfer of the configured payload size.  From the
981	   previous table, the 64KB window is considered.  Each of the 5
982	   TCP connections would be configured to transfer 100MB, and each one
983	   should obtain a maximum of 100 Mb/sec.  So for this example, the
984	   100MB payload should be transferred across the connections in
985	   approximately 8 seconds (which would be the ideal TCP transfer time
986	   under these conditions).

988	   Additionally, the TCP Efficiency metric MUST be computed for each
989	   connection as defined in section 3.3.2.

991	3.3.5 Interpretation of the TCP Throughput Results

993	   At the end of this step, the user will document the theoretical BDP
994	   and a set of Window size experiments with measured TCP throughput for
995	   each TCP window size.  For cases where the sustained TCP throughput
996	   does not equal the ideal value, some possible causes are:

998	   - Network congestion causing packet loss which MAY be inferred from
999	     a poor TCP Efficiency % (higher TCP Efficiency % = less packet
1000	     loss)
1001	   - Network congestion causing an increase in RTT which MAY be inferred
1002	     from the Buffer Delay Percentage (i.e., 0% = no increase in RTT
1003	     over baseline)
1004	   - Intermediate network devices which actively regenerate the TCP
1005	     connection and can alter TCP RWND Size, MSS, etc.
1006	   - Rate limiting (policing).  More details on traffic management
1007	     tests follows in section 3.4

1009	3.3.6 High Performance Network Options

1011	   For cases where the network outperforms the client/server IP hosts
1012	   some possible causes are:

1014	   - Maximum TCP Buffer space.  All operating systems have a global
1015	   mechanism to limit the quantity of system memory to be used by TCP
1016	   connections. On some systems, each connection is subject to a memory
1017	   limit that is applied to the total memory used for input data, output
1018	   data and controls. On other systems, there are separate limits for
1019	   input and output buffer spaces per connection.  Client/server IP
1020	   hosts might be configured with Maximum Buffer Space limits that are
1021	   far too small for high performance networks.

1023	   - Socket Buffer Sizes.  Most operating systems support separate per
1024	   connection send and receive buffer limits that can be adjusted as
1025	   long as they stay within the maximum memory limits.  These socket
1026	   buffers must be large enough to hold a full BDP of TCP Bytes plus
1027	   some overhead.  There are several methods that can be used to adjust
1028	   socket buffer sizes, but TCP Auto-Tuning automatically adjusts these
1029	   as needed to optimally balance TCP performance and memory usage.
1030	   It is important to note that Auto-Tuning is enabled by default in
1031	   LINUX since the kernel release 2.6.6 and in UNIX since FreeBSD 7.0.
1032	   It is also enabled by default in Windows since Vista and in MAC since
1033	   OS X version 10.5 (leopard).  Over buffering can cause some
1034	   applications to behave poorly, typically causing sluggish interactive
1035	   response and risk running the system out of memory.   Large default
1036	   socket buffers have to be considered carefully on multi-user systems.

1038	   - TCP Window Scale Option, RFC1323.  This option enables TCP to
1039	   support large BDP paths.  It provides a scale factor which is
1040	   required for TCP to support window sizes larger than 64KB. Most
1041	   systems automatically request WSCALE under some conditions, such as
1042	   when the receive socket buffer is larger than 64KB or when the other
1043	   end of the TCP connection requests it first.  WSCALE can only be
1044	   negotiated during the 3 way handshake.  If either end fails to
1045	   request WSCALE or requests an insufficient value, it cannot be
1046	   renegotiated. Different systems use different algorithms to select
1047	   WSCALE, but it is very important to have large enough buffer
1048	   sizes.  Note that under these constraints, a client application
1049	   wishing to send data at high rates may need to set its own receive
1050	   buffer to something larger than 64K Bytes before it opens the
1051	   connection to ensure that the server properly negotiates WSCALE.
1052	   A system administrator might have to explicitly enable RFC1323
1053	   extensions.  Otherwise, the client/server IP host would not support
1054	   TCP window sizes (BDP) larger than 64KB.  Most of the time,
1055	   performance gains will be obtained by enabling this option in Long
1056	   Fat Networks. (i.e., networks with large BDP, see Figure 3.3.1b).

1058	   - TCP Timestamps Option, RFC1323.  This feature provides better
1059	   measurements of the Round Trip Time and protects TCP from data
1060	   corruption that might occur if packets are delivered so late that the
1061	   sequence numbers wrap before they are delivered.  Wrapped sequence
1062	   numbers do not pose a serious risk below 100 Mbps, but the risk
1063	   increases at higher data rates. Most of the time, performance gains
1064	   will be obtained by enabling this option in Gigabit bandwidth
1065	   networks.

1067	   - TCP Selective Acknowledgments Option (SACK), RFC2018.  This allows
1068	   a TCP receiver to inform the sender about exactly which data segment
1069	   is missing and needs to be retransmitted.  Without SACK, TCP has to
1070	   estimate which data segment is missing, which works just fine if all
1071	   losses are isolated (i.e. only one loss in any given round trip).
1072	   Without SACK, TCP takes a very long time to recover after multiple
1073	   and consecutive losses.  SACK is now supported by most operating
1074	   systems, but it may have to be explicitly enabled by the system
1075	   administrator. In networks with unknown load and error patterns, TCP
1076	   SACK will improve throughput performances.  On the other hand,
1077	   security appliances vendors might have implemented TCP randomization
1078	   without considering TCP SACK and under such circumstances, SACK might
1079	   need to be disabled in the client/server IP hosts until the vendor
1080	   corrects the issue.  Also, poorly implemented SACK algorithms might
1081	   cause extreme CPU loads and might need to be disabled.

1083	   - Path MTU.  The client/server IP host system must use the largest
1084	   possible MTU for the path.  This may require enabling Path MTU
1085	   Discovery (RFC1191 & RFC4821).  Since RFC1191 is flawed it is
1086	   sometimes not enabled by default and may need to be explicitly
1087	   enabled by the system administrator. RFC4821 describes a new, more
1088	   robust algorithm for MTU discovery and ICMP black hole recovery.

1090	   - TOE (TCP Offload Engine). Some recent Network Interface Cards (NIC)
1091	   are equipped with drivers that can do part or all of the TCP/IP
1092	   protocol processing.  TOE implementations require additional work
1093	   (i.e. hardware-specific socket manipulation) to set up and tear down
1094	   connections.  Because TOE NICs configuration parameters are vendor
1095	   specific and not necessarily RFC-compliant,  they are poorly
1096	   integrated with UNIX & LINUX.  Occasionally, TOE might need to be
1097	   disabled in a server because its NIC does not have enough memory
1098	   resources to buffer thousands of connections.

1100	   Note that both ends of a TCP connection must be properly tuned.

1102	3.4. Traffic Management Tests

1104	   In most cases, the network connection between two geographic
1105	   locations (branch offices, etc.) is lower than the network connection
1106	   to host computers.  An example would be LAN connectivity of GigE
1107	   and WAN connectivity of 100 Mbps.  The WAN connectivity may be
1108	   physically 100 Mbps or logically 100 Mbps (over a GigE WAN
1109	   connection). In the later case, rate limiting is used to provide the
1110	   WAN bandwidth per the SLA.

1112	   Traffic management techniques might be employed and the most common
1113	   are:

1115	   - Traffic Policing and/or Shaping
1116	   - Priority queuing
1117	   - Active Queue Management (AQM)
1118	   Configuring the end-to-end network with these various traffic
1119	   management mechanisms is a complex under-taking. For traffic shaping
1120	   and AQM techniques, the end goal is to provide better performance to
1121	   bursty traffic.

1123	   This section of the methodology provides guidelines to test traffic
1124	   shaping and AQM implementations.  As in section 3.3, host hardware
1125	   performance must be well understood before conducting the traffic
1126	   shaping and AQM tests. Dedicated communications test instrument will
1127	   generally be REQUIRED for line rates of GigE and 10 GigE.  If the
1128	   throughput test is expected to exceed 10% of the provider bandwidth,
1129	   then the test should be coordinated with the network provider.  This
1130	   does not include the customer premises bandwidth, the 10% refers to
1131	   the provider's bandwidth (Provider Edge to Provider router). Note
1132	   that GigE and 10 GigE interfaces might benefit from hold-queue
1133	   adjustments in order to prevent the saw-tooth TCP traffic pattern.

1135	3.4.1 Traffic Shaping Tests

1137	   For services where the available bandwidth is rate limited, two (2)
1138	   techniques can be used: traffic policing or traffic shaping.

1140	   Simply stated, traffic policing marks and/or drops packets which
1141	   exceed the SLA bandwidth (in most cases, excess traffic is dropped).
1142	   Traffic shaping employs the use of queues to smooth the bursty
1143	   traffic and then send out within the SLA bandwidth limit (without
1144	   dropping packets unless the traffic shaping queue is exhausted).

1146	   Traffic shaping is generally configured for TCP data services and
1147	   can provide improved TCP performance since the retransmissions are
1148	   reduced, which in turn optimizes TCP throughput for the available
1149	   bandwidth.  Throughout this section, the rate-limited bandwidth shall
1150	   be referred to as the "Bottleneck Bandwidth".

1152	   The ability to detect proper traffic shaping is more easily diagnosed
1153	   when conducting a multiple TCP connections test.  Proper shaping will
1154	   provide a fair distribution of the available Bottleneck Bandwidth,
1155	   while traffic policing will not.

1157	   The traffic shaping tests are built upon the concepts of multiple
1158	   connections testing as defined in section 3.3.3.  Calculating the BDP
1159	   for the Bottleneck Bandwidth is first required before selecting the
1160	   number of connections, the Send Socket Buffer and TCP RWND Sizes per
1161	   connection.

1163	   Similar to the example in section 3.3, a typical test scenario might
1164	   be:  GigE LAN with a 100Mbps Bottleneck Bandwidth (rate limited
1165	   logical interface), and 5 msec RTT.  This would require five (5) TCP
1166	   connections of 64 KB Send Socket Buffer and TCP RWND Sizes to evenly
1167	   fill the Bottleneck Bandwidth (~100 Mbps per connection).

1169	   The traffic shaping test should be run over a long enough duration to
1170	   properly exercise network buffers (i.e. greater than 30 seconds) and
1171	   should also characterize performance at different times of day.  The
1172	   throughput of each connection MUST be logged during the entire test,
1173	   along with the TCP Transfer Time, TCP Efficiency, and Buffer Delay
1174	   Percentage.

1176	3.4.1.1 Interpretation of Traffic Shaping Test Results

1178	   By plotting the throughput achieved by each TCP connection, we should
1179	   see fair sharing of the bandwidth when traffic shaping is properly
1180	   configured.  For the previous example of 5 connections sharing 500
1181	   Mbps, each connection would consume ~100 Mbps with smooth variations.

1183	   If traffic shaping is not configured properly or if traffic policing
1184	   is present on the bottleneck interface,  the bandwidth sharing may
1185	   not be fair.  The resulting throughput plot may reveal "spikey"
1186	   throughput consumption of the competing TCP connections (due to the
1187	   high rate of TCP retransmissions).

1189	3.4.2 AQM Tests

1191	   Active Queue Management techniques are specifically targeted to
1192	   provide congestion avoidance to TCP traffic.  As an example, before
1193	   the network element queue "fills" and enters the tail drop state, an
1194	   AQM implementation like RED (Random Early Discard) drops packets at
1195	   pre-configurable queue depth thresholds.  This action causes TCP
1196	   connections to back-off which helps prevent tail drops and in
1197	   turn helps avoid global TCP synchronization.

1199	   RED is just an example and other AQM implementations like WRED
1200	   (Weighted Random Early Discard) or REM (Random Exponential Marking)
1201	   or AREM (Adaptive Random Exponential Marking), just to name a few,
1202	   could be used.

1204	   Again, rate limited interfaces may benefit greatly from AQM based
1205	   techniques.  With a default FIFO queue, bloated buffering is
1206	   increasingly a common encounter and has dire effects on TCP
1207	   connections.  However, the main effect is the delayed congestion
1208	   feedback (poor TCP control loop response) and enormous queuing
1209	   delays on all other traffic flows.

1211	   In a FIFO based queue, the TCP traffic may not be able to achieve
1212	   the full throughput available on the Bottleneck Bandwidth link.
1213	   While with an AQM implementation, TCP congestion avoidance would
1214	   throttle the connections on the higher speed interface (i.e. LAN)
1215	   and could help achieve the full throughput (up to the Bottleneck
1216	   Bandwidth).  The bursty nature of TCP traffic is a key factor in the
1217	   overall effectiveness of AQM techniques; steady state bulk transfer
1218	   flows will generally not benefit from AQM because with bulk transfer
1219	   flows, network device queues gracefully throttle the effective
1220	   throughput rates due to increased delays.

1222	   The ability to detect proper AQM configuration is more easily
1223	   diagnosed when conducting a multiple TCP connections test.  Multiple
1224	   TCP connections provide the bursty sources that emulate the
1225	   real-world conditions for which AQM implementations are intended.

1227	   AQM testing also builds upon the concepts of multiple connections
1228	   testing as defined in section 3.3.3.  Calculating the BDP for the
1229	   Bottleneck Bandwidth is first required before selecting the number
1230	   of connections, the Send Socket Buffer size and the TCP RWND Size
1231	   per connection.

1233	   For AQM testing, the desired effect is to cause the TCP connections
1234	   to burst beyond the Bottleneck Bandwidth so that queue drops will
1235	   occur.  Using the same example from section 3.4.1 (traffic shaping),
1236	   the 500 Mbps Bottleneck Bandwidth requires 5 TCP connections (with
1237	   window size of 64KB) to fill the capacity.  Some experimentation is
1238	   required, but it is recommended to start with double the number of
1239	   connections in order to stress the network element buffers / queues
1240	   (10 connections for this example).

1242	   The TCP TTD must be configured to generate these connections as
1243	   shorter (bursty) flows versus bulk transfer type flows.  These TCP
1244	   bursts should stress queue sizes in the 512KB range.  Again
1245	   experimentation will be required; the proper number of TCP
1246	   connections, the Send Socket Buffer and TCP RWND Sizes will be
1247	   dictated by the size of the network element queue.

1249	3.4.2.1 Interpretation of AQM Results

1251	   The default queuing technique for most network devices is FIFO based.
1252	   Under heavy traffic conditions, FIFO based queue management may cause
1253	   enormous queuing delays plus delayed congestion feedback to all TCP
1254	   applications. This can cause excessive loss on all of the TCP
1255	   connections and in the worst cases, global TCP synchronization.

1257	   AQM implementation can be detected by plotting individual and
1258	   aggregate throughput results achieved by multiple TCP connections on
1259	   the bottleneck interface. Proper AQM operation may be determined if
1260	   the TCP throughput is fully utilized (up to the Bottleneck Bandwidth)
1261	   and fairly shared between TCP connections.  For the previous example
1262	   of 10 connections (window = 64 KB) sharing 500 Mbps, each connection
1263	   should consume ~50 Mbps.  If AQM was not properly enabled on the
1264	   interface, then the TCP connections would retransmit at higher rates
1265	   and the net effect is that the Bottleneck Bandwidth is not fully
1266	   utilized.

1268	   Another means to study non-AQM versus AQM implementations is to use
1269	   the Buffer Delay Percent metric for all of the connections.  The
1270	   Buffer Delay Percentage should be significantly lower in AQM
1271	   implementations versus default FIFO queuing.

1273	   Additionally, non-AQM implementations may exhibit a lower TCP
1274	   Transfer Efficiency.

1276	4. Security Considerations

1278	   The security considerations that apply to any active measurement of
1279	   live networks are relevant here as well.  See [RFC4656] and
1280	   [RFC5357].

1282	5. IANA Considerations

1284	   This document does not REQUIRE an IANA registration for ports
1285	   dedicated to the TCP testing described in this document.

1287	6. Acknowledgments

1289	   Thanks to Lars Eggert, Al Morton, Matt Mathis, Matt Zekauskas,
1290	   Yaakov Stein, and Loki Jorgenson for many good comments and for
1291	   pointing us to great sources of information pertaining to past works
1292	   in the TCP capacity area.

1294	7. References

1296	7.1 Normative References

1298	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1299	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1301	   [RFC4656]  Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M.
1302	              Zekauskas, "A One-way Active Measurement Protocol
1303	              (OWAMP)", RFC 4656, September 2006.

1305	   [RFC2544]  Bradner, S., McQuaid, J., "Benchmarking Methodology for
1306	              Network Interconnect Devices", RFC 2544, June 1999

1308	   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., Babiarz,
1309	              J., "A Two-Way Active Measurement Protocol (TWAMP)",
1310	              RFC 5357, October 2008

1312	   [RFC4821]  Mathis, M., Heffner, J., "Packetization Layer Path MTU
1313	              Discovery", RFC 4821, June 2007

1315	              draft-ietf-ippm-btc-cap-00.txt Allman, M., "A Bulk
1316	              Transfer Capacity Methodology for Cooperating Hosts",
1317	              August 2001

1319	   [RFC2681]  Almes G., Kalidindi S., Zekauskas, M., "A Round-trip Delay
1320	              Metric for IPPM", RFC 2681, September, 1999

1322	   [RFC4898]  Mathis, M., Heffner, J., Raghunarayan, R., "TCP Extended
1323	              Statistics MIB", May 2007

1325	   [RFC5136]  Chimento P., Ishac, J., "Defining Network Capacity",
1326	              February 2008

1328	   [RFC1323]  Jacobson, V., Braden, R., Borman D., "TCP Extensions for
1329	              High Performance", May 1992

1331	7.2. Informative References

1333	Authors' Addresses

1335	   Barry Constantine
1336	   JDSU, Test and Measurement Division
1337	   One Milesone Center Court
1338	   Germantown, MD 20876-7100
1339	   USA

1341	   Phone: +1 240 404 2227
1342	   barry.constantine@jdsu.com

1344	   Gilles Forget
1345	   Independent Consultant to Bell Canada.
1346	   308, rue de Monaco, St-Eustache
1347	   Qc. CANADA, Postal Code : J7P-4T5

1349	   Phone: (514) 895-8212
1350	   gilles.forget@sympatico.ca

1352	   Rudiger Geib
1353	   Heinrich-Hertz-Strasse (Number: 3-7)
1354	   Darmstadt, Germany, 64295

1356	   Phone: +49 6151 6282747
1357	   Ruediger.Geib@telekom.de

1359	   Reinhard Schrage
1360	   Schrage Consulting

1362	   Phone: +49 (0) 5137 909540
1363	   reinhard@schrageconsult.com