idnits 2.17.1 

draft-ietf-tcpm-initcwnd-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There is 1 instance of too long lines in the document, the longest one
     being 1 character in excess of 72.

  ** The abstract seems to contain references ([RFC2119]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  -- The draft header indicates that this document updates RFC3390, but the
     abstract doesn't seem to directly say this.  It does mention RFC3390
     though, so this could be OK.

  -- The draft header indicates that this document updates RFC5681, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

     (Using the creation date from RFC3390, updated by this document, for
     RFC5378 checks: 2001-05-25)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Full Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'PWSB09' is defined on line 832, but no explicit
     reference was found in the text

  == Unused Reference: 'Sch08' is defined on line 866, but no explicit
     reference was found in the text

  ** Downref: Normative reference to an Proposed Standard RFC: RFC 6298

  ** Downref: Normative reference to an Proposed Standard RFC: RFC 2018

  ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231,
     RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298)

  ** Downref: Normative reference to an Proposed Standard RFC: RFC 3390

  ** Downref: Normative reference to an Draft Standard RFC: RFC 5681

  ** Downref: Normative reference to an Experimental RFC: RFC 5827

  == Outdated reference: A later version (-08) exists of
     draft-irtf-iccrg-welzl-congestion-control-open-research-05

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)

  -- Obsolete informational reference (is this intentional?): RFC 2414
     (Obsoleted by RFC 3390)

  -- Obsolete informational reference (is this intentional?): RFC 3782
     (Obsoleted by RFC 6582)

  == Outdated reference: A later version (-03) exists of
     draft-touch-tcpm-automatic-iw-01


     Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Draft                                                    J. Chu
3	draft-ietf-tcpm-initcwnd-02.txt                             N. Dukkipati
4	Intended status: Standard                                       Y. Cheng
5	Updates: 3390, 5681                                            M. Mathis
6	Creation date: October 16, 2011                             Google, Inc.
7	Expiration date: April 2012

9	                    Increasing TCP's Initial Window

11	Status of this Memo

13	   Distribution of this memo is unlimited.

15	   This Internet-Draft is submitted in full conformance with the
16	   provisions of BCP 78 and BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups. Note that other
20	   groups may also distribute working documents as Internet-Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time. It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/1id-abstracts.html

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html

33	   This Internet-Draft will expire on October, 2011.

35	Copyright Notice

37	   Copyright (c) 2011 IETF Trust and the persons identified as the
38	   document authors. All rights reserved.

40	   This document is subject to BCP 78 and the IETF Trust's Legal
41	   Provisions Relating to IETF Documents
42	   (http://trustee.ietf.org/license-info) in effect on the date of
43	   publication of this document. Please review these documents
44	   carefully, as they describe your rights and restrictions with respect
45	   to this document. Code Components extracted from this document must
46	   include Simplified BSD License text as described in Section 4.e of
47	   the Trust Legal Provisions and are provided without warranty as
48	   described in the Simplified BSD License.

50	Abstract

52	   This document proposes an increase in the permitted TCP initial
53	   window (IW) from between 2 and 4 segments, as specified in RFC 3390,
54	   to 10 segments. It discusses the motivation behind the increase, the
55	   advantages and disadvantages of the higher initial window, and
56	   presents results from several large scale experiments showing that
57	   the higher initial window improves the overall performance of many
58	   web services without risking congestion collapse. The document closes
59	   with a discussion of a list of concerns, and some results from recent
60	   studies to address the concerns.

62	Terminology

64	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
65	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
66	   document are to be interpreted as described in RFC 2119 [RFC2119].

68	Table of Contents

70	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 2
71	   2.  TCP Modification  . . . . . . . . . . . . . . . . . . . . . . . 3
72	   3.  Implementation Issues . . . . . . . . . . . . . . . . . . . . . 4
73	   4.  Background  . . . . . . . . . . . . . . . . . . . . . . . . . . 5
74	   5.  Advantages of Larger Initial Windows  . . . . . . . . . . . . . 6
75	      5.1 Reducing Latency . . . . . . . . . . . . . . . . . . . . . . 6
76	      5.2 Keeping up with the growth of web object size  . . . . . . . 7
77	      5.3 Recovering faster from loss on under-utilized or wireless
78	          links  . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
79	   7.  Disadvantages of Larger Initial Windows for the Network . . . . 9
80	   8.  Mitigation of Negative Impact . . . . . . . . . . . . . . . . . 9
81	   9.  Interactions with the Retransmission Timer  . . . . . . . . . . 9
82	   10. Experimental Results From Large Scale Cluster Tests . . . . .  10
83	      10.1 The benefits  . . . . . . . . . . . . . . . . . . . . . .  10
84	      10.2 The cost  . . . . . . . . . . . . . . . . . . . . . . . .  11
85	   11. List of Concerns and Corresponding Test Results . . . . . . .  12
86	   12. Related Proposals . . . . . . . . . . . . . . . . . . . . . .  14
87	   14. Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . .  15
88	   15. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
89	   16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  15
90	   Normative References  . . . . . . . . . . . . . . . . . . . . . .  16
91	   Informative References  . . . . . . . . . . . . . . . . . . . . .  16
92	   Author's Addresses  . . . . . . . . . . . . . . . . . . . . . . .  20
93	   Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . .  20

95	1.  Introduction
96	   This document updates RFC 3390 to raise the upper bound on TCP's
97	   initial window (IW) to 10 segments or roughly 15KB. It is patterned
98	   after and borrows heavily from RFC 3390 [RFC3390] and earlier work in
99	   this area.

101	   The primary argument in favor of raising IW follows from the evolving
102	   scale of the Internet. Ten segments are likely to fit into queue
103	   space available at any broadband access link, even when there are a
104	   reasonable number of concurrent connections.

106	   Lower speed links can be treated with environment specific
107	   configurations, such that they can be protected from being
108	   overwhelmed by large initial window bursts without imposing a
109	   suboptimal initial window on the rest of the Internet.

111	   This document reviews the advantages and disadvantages of using a
112	   larger initial window, and includes summaries of several large scale
113	   experiments showing that an initial window of 10 segments provides
114	   benefits across the board for a variety of BW, RTT, and BDP classes.
115	   These results show significant benefits for increasing IW for users
116	   at much smaller data rates than had been previously anticipated.
117	   However, at initial windows larger than 10, the results are mixed. We
118	   believe that these mixed results are not intrinsic, but are the
119	   consequence of various implementation artifacts, including overly
120	   aggressive applications employing many simultaneous connections.

122	   We propose that all TCP implementations should have a settable TCP IW
123	   parameter; the default setting may start at 10 segments and should be
124	   raised as we come to understand and and correct things that conflict.

126	   In addition, we introduce a minor revision to RFC 3390 and RFC 5681
127	   [RFC5681] to eliminate resetting the initial window when the SYN or
128	   SYN/ACK is lost.

130	   The document closes with a discussion of a list of concerns that have
131	   been brought up, and some recent test results showing most of the
132	   concerns can not be validated.

134	   A complementary set of slides for this proposal can be found at
135	   [CD10].

137	2.  TCP Modification

139	   This document proposes an increase in the permitted upper bound for
140	   TCP's initial window (IW) to 10 segments. This increase is optional:
141	   a TCP MAY start with a larger initial window up to 10 segments.

143	   This upper bound for the initial window size represents a change from
144	   RFC 3390 [RFC3390], which specified that the congestion window be
145	   initialized between 2 and 4 segments depending on the MSS.

147	   This change applies to the initial window of the connection in the
148	   first round trip time (RTT) of data transmission following the TCP
149	   three-way handshake. Neither the SYN/ACK nor its acknowledgment (ACK)
150	   in the three-way handshake should increase the initial window size.

152	   Furthermore, RFC 3390 and RFC 5681 [RFC5681] state that

154	         "If the SYN or SYN/ACK is lost, the initial window used by a
155	         sender after a correctly transmitted SYN MUST be one segment
156	         consisting of MSS bytes."

158	   The proposed change to reduce the default RTO to 1 second [RFC6298]
159	   increases the chance for spurious SYN or SYN/ACK retransmission, thus
160	   unnecessarily penalizing connections with RTT > 1 second if their
161	   initial window is reduced to 1 segment. For this reason, it is
162	   RECOMMENDED that implementations refrain from resetting the initial
163	   window to 1 segment, unless either there have been multiple SYN or
164	   SYN/ACK retransmissions, or true loss detection has been made.

166	   TCP implementations use slow start in as many as three different
167	   ways: (1) to start a new connection (the initial window); (2) to
168	   restart transmission after a long idle period (the restart window);
169	   and (3) to restart transmission after a retransmit timeout (the loss
170	   window).  The change specified in this document affects the value of
171	   the initial window.  Optionally, a TCP MAY set the restart window to
172	   the minimum of the value used for the initial window and the current
173	   value of cwnd (in other words, using a larger value for the restart
174	   window should never increase the size of cwnd).  These changes do NOT
175	   change the loss window, which must remain 1 segment of MSS bytes (to
176	   permit the lowest possible window size in the case of severe
177	   congestion).

179	   Furthermore, to limit any negative effect that a larger initial
180	   window may have on links with limited bandwidth or buffer space,
181	   implementations SHOULD fall back to RFC 3390 for the restart window
182	   (RW), if any packet loss is detected during either the initial
183	   window, or a restart window, when more than 4KB of data is sent.

185	3.  Implementation Issues

187	   [Need to decide if a different formula is needed for PMTU != 1500.]

189	   HTTP 1.1 specification allows only two simultaneous connections per
190	   domain, while web browsers open more simultaneous TCP connections
191	   [Ste08], partly to circumvent the small initial window in order to
192	   speed up the loading of web pages as described above.

194	   When web browsers open simultaneous TCP connections to the same
195	   destination, they are working against TCP's congestion control
196	   mechanisms [FF99]. Combining this behavior with larger initial
197	   windows further increases the burstiness and unfairness to other
198	   traffic in the network. A larger initial window will incentivize
199	   applications to use fewer concurrent TCP connections.

201	   Some implementations advertise small initial receive window (Table 2
202	   in [Duk10]), effectively limiting how much window a remote host may
203	   use. In order to realize the full benefit of the large initial
204	   window, implementations are encouraged to advertise an initial
205	   receive window of at least 10 segments, except for the circumstances
206	   where a larger initial window is deemed harmful. (See the Mitigation
207	   section below.)

209	   TCP SACK option ([RFC2018]) was thought to be required in order for
210	   the larger initial window to perform well. But measurements from both
211	   a testbed and live tests showed that IW=10 without the SACK option
212	   still beats the performance of IW=3 with the SACK option [CW10].

214	4.  Background

216	   TCP congestion window was introduced as part of the congestion
217	   control algorithm by Van Jacobson in 1988 [Jac88]. The initial value
218	   of one segment was used as the starting point for newly established
219	   connections to probe the available bandwidth on the network.

221	   Today's Internet is dominated by web traffic running on top of short-
222	   lived TCP connections [IOR2009]. The relatively small initial window
223	   has become a limiting factor for the performance of many web
224	   applications.

226	   The global Internet has continued to grow, both in speed and
227	   penetration. According to the latest report from Akamai [AKAM10], the
228	   global broadband (> 2Mbps) adoption has surpassed 50%, propelling the
229	   average connection speed to reach 1.7Mbps, while the narrowband (<
230	   256Kbps) usage has dropped to 5%. In contrast, TCP's initial window
231	   has remained 4KB for a decade [RFC2414], corresponding to a bandwidth
232	   utilization of less than 200Kbps per connection, assuming an RTT of
233	   200ms.

235	   A large proportion of flows on the Internet are short web
236	   transactions over TCP, and complete before exiting TCP slow start.
237	   Speeding up the TCP flow startup phase, including circumventing the
238	   initial window limit, has been an area of active research [PWSB09,
239	   Sch08]. Numerous proposals exist [LAJW07, RFC4782, PRAKS02, PK98].

241	   Some require router support [RFC4782, PK98], hence are not practical
242	   for the public Internet. Others suggested bold, but often radical
243	   ideas, likely requiring more years of research before standardization
244	   and deployment.

246	   In the mean time, applications have responded to TCP's "slow" start.
247	   Web sites use multiple sub-domains [Bel10] to circumvent HTTP 1.1
248	   regulation on two connections per physical host [RFC2616]. As of
249	   today, major web browsers open multiple connections to the same site
250	   (up to six connections per domain [Ste08] and the number is growing).
251	   This trend is to remedy HTTP serialized download to achieve
252	   parallelism and higher performance. But it also implies today most
253	   access links are severely under-utilized, hence having multiple TCP
254	   connections improves performance most of the time. While raising the
255	   initial congestion window may cause congestion for certain users
256	   using these browsers, we argue that the browsers and other
257	   application need to respect HTTP 1.1 regulation and stop increasing
258	   number of simultaneous TCP connections. We believe a modest increase
259	   of the initial window will help to stop this trend, and provide the
260	   best interim solution to improve overall user performance, and reduce
261	   the server, client, and network load.

263	   Note that persistent connections and pipelining are designed to
264	   address some of the issues with HTTP above [RFC2616]. Their presence
265	   does not diminish the need for a larger initial window. E.g., data
266	   from the Chrome browser show that 35% of HTTP requests are made on
267	   new TCP connections. Our test data also confirm significant latency
268	   reduction with the large initial window even with these two HTTP
269	   features ([Duk10]).

271	   Also note that packet pacing has been suggested as an effective
272	   mechanism to avoid large bursts and their associated damage [VH97].
273	   We do not require pacing in our proposal due to our strong preference
274	   for a simple solution. We suspect for packet bursts of a moderate
275	   size, packet pacing will not be necessary. This seems to be confirmed
276	   by our test results.

278	   More discussion of the increase in initial window, including the
279	   choice of 10 segments can be found in [Duk10, CD10].

281	5.  Advantages of Larger Initial Windows

283	5.1 Reducing Latency

285	   An increase of the initial window from 3 segments to 10 segments
286	   reduces the total transfer time for data sets greater than 4KB by up
287	   to 4 round trips.

289	   The table below compares the number of round trips between IW=3 and
290	   IW=10 for different transfer sizes, assuming infinite bandwidth, no
291	   packet loss, and the standard delayed acks with large delayed-ack
292	   timer.

294	         ---------------------------------------
295	        | total segments |   IW=3   |   IW=10   |
296	         ---------------------------------------
297	        |         3      |     1    |      1    |
298	        |         6      |     2    |      1    |
299	        |        10      |     3    |      1    |
300	        |        12      |     3    |      2    |
301	        |        21      |     4    |      2    |
302	        |        25      |     5    |      2    |
303	        |        33      |     5    |      3    |
304	        |        46      |     6    |      3    |
305	        |        51      |     6    |      4    |
306	        |        78      |     7    |      4    |
307	        |        79      |     8    |      4    |
308	        |       120      |     8    |      5    |
309	        |       127      |     9    |      5    |
310	         ---------------------------------------

312	   For example, with the larger initial window, a transfer of 32
313	   segments of data will require only two rather than five round trips
314	   to complete.

316	5.2 Keeping up with the growth of web object size

318	   RFC 3390 stated that the main motivation for increasing the initial
319	   window to 4KB was to speed up connections that only transmit a small
320	   amount of data, e.g., email and web. The majority of transfers back
321	   then were less than 4KB, and could be completed in a single RTT
322	   [All00].

324	   Since RFC 3390 was published, web objects have gotten significantly
325	   larger [Chu09, RJ10]. Today only a small percentage of web objects
326	   (e.g., 10% of Google's search responses) can fit in the 4KB initial
327	   window. The average HTTP response size of gmail.com, a highly
328	   scripted web-site, is 8KB (Figure 1. in [Duk10]). The average web
329	   page, including all static and dynamic scripted web objects on the
330	   page, has seen even greater growth in size [RJ10]. HTTP pipelining
331	   [RFC2616] and new web transport protocols like SPDY [SPDY] allow
332	   multiple web objects to be sent in a single transaction, potentially
333	   requiring even larger initial window in order to transfer a whole web
334	   page in one round trip.

336	5.3 Recovering faster from loss on under-utilized or wireless links
337	   A greater-than-3-segment initial window increases the chance to
338	   recover packet loss through Fast Retransmit rather than the lengthy
339	   initial RTO [RFC5681]. This is because the fast retransmit algorithm
340	   requires three duplicate acks as an indication that a segment has
341	   been lost rather than reordered. While newer loss recovery techniques
342	   such as Limited Transmit [RFC3042] and Early Retransmit [RFC5827]
343	   have been proposed to help speeding up loss recovery from a smaller
344	   window, both algorithms can still benefit from the larger initial
345	   window because of a better chance to receive more ACKs to react upon.

347	6.  Disadvantages of Larger Initial Windows for the Individual Connection

349	   The larger bursts from an increase in the initial window may cause
350	   buffer overrun and packet drop in routers with small buffers, or
351	   routers experiencing congestion. This could result in unnecessary
352	   retransmit timeouts. For a large-window connection that is able to
353	   recover without a retransmit timeout, this could result in an
354	   unnecessarily-early transition from the slow-start to the congestion-
355	   avoidance phase of the window increase algorithm. [Note: knowing the
356	   large initial window may cause premature segment drop, should one
357	   make an exception for it, i.e., by allowing ssthresh to remain
358	   unchanged if loss is from an enlarged initial window?]

360	   Premature segment drops are unlikely to occur in uncongested networks
361	   with sufficient buffering, or in moderately-congested networks where
362	   the congested router uses active queue management (such as Random
363	   Early Detection [FJ93, RFC2309, RFC3150]).

365	   Insufficient buffering is more likely to exist in the access routers
366	   connecting slower links. A recent study of access router buffer size
367	   [DGHS07] reveals the majority of access routers provision enough
368	   buffer for 130ms or longer, sufficient to cover a burst of more than
369	   10 packets at 1Mbps speed, but possibly not sufficient for browsers
370	   opening simultaneous connections.

372	   A testbed study [CW10] on the effect of the larger initial window
373	   with five simultaneously opened connections revealed that, even with
374	   limited buffer size on slow links, IW=10 still reduced the total
375	   latency of web transactions, although at the cost of higher packet
376	   drop rates as compared to IW=3.

378	   Some TCP connections will receive better performance with the larger
379	   initial window even if the burstiness of the initial window results
380	   in premature segment drops.  This will be true if (1) the TCP
381	   connection recovers from the segment drop without a retransmit
382	   timeout, and (2) the TCP connection is ultimately limited to a small
383	   congestion window by either network congestion or by the receiver's
384	   advertised window.

386	7.  Disadvantages of Larger Initial Windows for the Network

388	   An increase in the initial window may increase congestion in a
389	   network. However, since the increase is one-time only (at the
390	   beginning of a connection), and the rest of TCP's congestion backoff
391	   mechanism remains in place, it's highly unlikely the increase will
392	   render a network in a persistent state of congestion, or even
393	   congestion collapse. This seems to have been confirmed by our large
394	   scale experiments described later.

396	   Some of the discussions from RFC 3390 are still valid for IW=10.
397	   Moreover, it is worth noting that although TCP NewReno increases the
398	   chance of duplicate segments when trying to recover multiple packet
399	   losses from a large window [RFC3782], the wide support of TCP
400	   Selective Acknowledgment (SACK) option [RFC2018] in all major OSes
401	   today should keep the volume of duplicate segments in check.

403	   Recent measurements [Get11] provide evidence of extremely large
404	   queues (in the order of one second) at access networks of the
405	   Internet. While a significant part of the buffer bloat is contributed
406	   by large downloads/uploads such as video files, emails with large
407	   attachments, backups and download of movies to disk, some of the
408	   problem is also caused by Web browsing of image heavy sites [Get11].
409	   This queuing delay is generally considered harmful for responsiveness
410	   of latency sensitive traffic such as DNS queries, ARP, DHCP, VoIP and
411	   Gaming. IW=10 can exacerbate this problem when doing short downloads
412	   such as Web browsing. The mitigations proposed for the broader
413	   problem of buffer bloating are also applicable in this case, such as
414	   the use of ECN, AQM schemes and traffic classification (QoS).

416	8.  Mitigation of Negative Impact

418	   Much of the negative impact from an increase in the initial window is
419	   likely to be felt by users behind slow links with limited buffers.
420	   The negative impact can be mitigated by hosts directly connected to a
421	   low-speed link advertising a smaller initial receive window than 10
422	   segments. This can be achieved either through manual configuration by
423	   the users, or through the host stack auto-detecting the low bandwidth
424	   links.

426	   More suggestions to improve the end-to-end performance of slow links
427	   can be found in RFC 3150 [RFC3150].

429	   [Note: if packet loss is detected during IW through fast retransmit,
430	   should cwnd back down to 2 rather than FlightSize / 2?]

432	9.  Interactions with the Retransmission Timer
433	   A large initial window increases the chance of spurious RTO on a low-
434	   bandwidth path because the packet transmission time will dominate the
435	   round-trip time. To minimize spurious retransmissions,
436	   implementations MUST follow RFC 2988 [RFC2988] to restart the
437	   retransmission timer with the current value of RTO for each ack
438	   received that acknowledges new data.

440	10. Experimental Results From Large Scale Cluster Tests

442	   In this section we summarize our findings from large scale Internet
443	   experiments with an initial window of 10 segments, conducted via
444	   Google's front-end infrastructure serving a diverse set of
445	   applications. We present results from two data centers, each chosen
446	   because of the specific characteristics of subnets served: AvgDC has
447	   connection bandwidths closer to the worldwide average reported in
448	   [AKAM10], with a median connection speed of about 1.7Mbps; SlowDC has
449	   a larger proportion of traffic from slow bandwidth subnets with
450	   nearly 20% of traffic from connections below 100Kbps, and a third
451	   below 256Kbps.

453	   Guided by measurements data, we answer two key questions: what is the
454	   latency benefit when TCP connections start with a higher initial
455	   window, and on the flip side, what is the cost?

457	10.1 The benefits

459	   The average web search latency improvement over all responses in
460	   AvgDC is 11.7% (68 ms) and 8.7% (72 ms) in SlowDC. We further
461	   analyzed the data based on traffic characteristics and subnet
462	   properties such as bandwidth (BW), round-trip time (RTT), and
463	   bandwidth-delay product (BDP). The average response latency improved
464	   across the board for a variety of subnets with the largest benefits
465	   of over 20% from high RTT and high BDP networks, wherein most
466	   responses can fit within the pipe. Correspondingly, responses from
467	   low RTT paths experienced the smallest improvements of about 5%.

469	   Contrary to what we expected, responses from low bandwidth subnets
470	   experienced the best latency improvements (between 10-20%) in the
471	   buckets 0-56Kbps and 56-256Kbps buckets. We speculate low BW networks
472	   observe improved latency for two plausible reasons: 1) fewer slow-
473	   start rounds: unlike many large BW networks, low BW subnets with
474	   dial-up modems have inherently large RTTs; and 2) faster loss
475	   recovery: an initial window larger than 3 segments increases the
476	   chances of a lost packet to be recovered through Fast Retransmit as
477	   opposed to a lengthy RTO.

479	   Responses of different sizes benefited to varying degrees; those
480	   larger than 3 segments naturally demonstrated larger improvements,
481	   because they finished in fewer rounds in slow start as compared to
482	   the baseline. In our experiments, response sizes <= 3 segments also
483	   demonstrated small latency benefits.

485	   To find out how individual subnets performed, we analyzed average
486	   latency at a /24 subnet level (an approximation to a user base
487	   offered similar set of services by a common ISP). We find even at the
488	   subnet granularity, latency improved at all quantiles ranging from 5-
489	   11%.

491	10.2 The cost

493	   To quantify the cost of raising the initial window, we analyzed the
494	   data specifically for subnets with low bandwidth and BDP,
495	   retransmission rates for different kinds of applications, as well as
496	   latency for applications operating with multiple concurrent TCP
497	   connections. From our measurements we found no evidence of a negative
498	   latency impacts that correlate to BW or BDP alone, but in fact both
499	   kinds of subnets demonstrated latency improvements across averages
500	   and quantiles.

502	   As expected, the retransmission rate increased modestly when
503	   operating with larger initial congestion window. The overall increase
504	   in AvgDC is 0.3% (from 1.98% to 2.29%) and in SlowDC is 0.7% (from
505	   3.54% to 4.21%). In our investigation, with the exception of one
506	   application, the larger window resulted in a retransmission increase
507	   of < 0.5% for services in the AvgDC.  The exception is the Maps
508	   application that operates with multiple concurrent TCP connections,
509	   which increased its retransmission rate by 0.9% in AvgDC and 1.85% in
510	   SlowDC (from 3.94% to 5.79%).

512	   In our experiments, the percentage of traffic experiencing
513	   retransmissions did not increase significantly. E.g. 90% of web
514	   search and maps experienced zero retransmission in SlowDC
515	   (percentages are higher for AvgDC); a break up of retransmissions by
516	   percentiles indicate that most increases come from portion of traffic
517	   already experiencing retransmissions in the baseline with initial
518	   window of 3 segments.

520	   Traffic patterns from applications using multiple concurrent TCP
521	   connections all operating with a large initial window represent one
522	   of the worst case scenarios where latency can be adversely impacted
523	   due to bottleneck buffer overflow. Our investigation shows that such
524	   a traffic pattern has not been a problem in AvgDC, where all these
525	   applications, specifically maps and image thumbnails, demonstrated
526	   improved latencies varying from 2-20%. In the case of SlowDC, while
527	   these applications continued showing a latency improvement in the
528	   mean, their latencies in higher quantiles (96 and above for maps)
529	   indicated instances where latency with larger window is worse than
530	   the baseline, e.g. the 99% latency for maps has increased by 2.3%
531	   (80ms) when compared to the baseline. There is no evidence from our
532	   measurements that such a cost on latency is a result of subnet
533	   bandwidth alone. Although we have no way of knowing from our data, we
534	   conjecture that the amount of buffering at bottleneck links plays a
535	   key role in performance of these applications.

537	   Further details on our experiments and analysis can be found in
538	   [Duk10, DCCM10].

540	11. List of Concerns and Corresponding Test Results

542	   Concerns have been raised since we first published our proposal based
543	   on a set of large scale experiments. To better understand the impact
544	   of a larger initial window in order to confirm or dismiss these
545	   concerns, we, as well as people outside of Google have conducted
546	   numerous additional tests in the past year, using either Google's
547	   large scale clusters, simulations, or real testbeds. The following is
548	   a list of concerns and some of the findings.

550	   A complete list of tests conducted, their results and related studies
551	   can be found at [IW10].

553	   o How complete are our tests in traffic pattern coverage?

555	     Google today offers a large portfolio of services beyond web
556	     search. The list includes Gmail, Google Maps, Photos, News, Sites,
557	     Images, Videos,..., etc. Our tests included most of Google's
558	     services, covering a wide variety of traffic sizes and patterns.
559	     One notable exception is YouTube because we don't think the large
560	     initial window will have much material impact, either positive or
561	     negative, on bulk data services.

563	     [CW10] contains some result from a testbed study on how short flows
564	     with a larger initial window might affect the throughput
565	     performance of other co-existing, long lived, bulk data transfers.

567	   o Larger bursts from the increase in the initial window cause
568	     significantly more packet drops

570	     All the known tests conducted on this subject so far [Duk10, Sch11,
571	     Sch11-1, CW10] show that, although bursts from the larger initial
572	     window tend to cause more packet drops, the increase tends to be
573	     very modest. The only exception is from our own testbed study
574	     [CW10] when under extremely high load and/or simultaneous opens.
575	     But both IW=3 and IW=10 suffered very high packet loss rates under
576	     those conditions.

578	   o A large initial window may severely impact TCP performance over
579	     highly multiplexed links still common in developing regions

581	     Our large scale experiments described in section 10 above also
582	     covered Africa and South America. Measurement data from those
583	     regions [DCCM10] revealed improved latency even for those Google
584	     services that employ multiple simultaneous connections, at the cost
585	     of small increase in the retransmission rate. It seems that the
586	     round trip savings from a larger initial window more than make up
587	     the time spent on recovering more lost packets.

589	     Similar phenomenon have also been observed from our testbed study
590	     [CW10].

592	   o Why 10 segments?

594	     Questions have been raised on how the number 10 was picked. We have
595	     tried different sizes in our large scale experiments, and found
596	     that 10 segments seem to give most of the benefits for the services
597	     we tested while not causing significant increase in the
598	     retransmission rates. Going forward 10 segments may turn out to be
599	     too small when the average of web object sizes continue to grow. A
600	     scheme to attempt to right size the initial window automatically
601	     over long timescales has been proposed in [Tou10].

603	   o Need more thorough analysis of the impact on slow links

605	     Although data from [Duk10] showed the large initial window reduced
606	     the average latency even for the dialup link class of only 56Kbps
607	     in bandwidth, it is only prudent to perform more microscopic
608	     analysis on its effect on slow links. We set up two testbeds for
609	     this purpose [CW10].

611	     Both testbeds were used to emulate a 300ms RTT, bottleneck link
612	     bandwidth as low as 64Kbps, and route queue size as low as 40
613	     packets. Although we've tried a large combination of test
614	     parameters, almost all tests we ran managed to show some latency
615	     improvement from IW=10, with only a modest increase in the packet
616	     drop rate until a very high load was injected. The testbed result
617	     was consistent with both our own large scale data center
618	     experiments [CD10, DCCM10] and a separate study using NSC
619	     simulations [Sch11, Sch11-1].

621	   o How will the larger initial window affect flows with initial
622	     windows 4KB or less?

624	     Flows with the larger initial window will likely grab more
625	     bandwidth from a bottleneck link when competing against flows with
626	     smaller initial window, at least initially. How long will this
627	     "unfairness" last? Will there be any "capture effect" where flows
628	     with larger initial window possess a disproportional share of
629	     bandwidth beyond just a few round trips?

631	     If there is any "unfairness" issue from flows with different
632	     initial windows, it did not show up in our large scale experiments,
633	     as the average latency for the bucket of all responses < 4KB did
634	     not seem to be affected by the presence of many other larger
635	     responses employing large initial window.  As a matter of fact they
636	     seemed to benefit from the large initial window too, as shown in
637	     Figure 7 of [Duk10].

639	     The same phenomenon seems to exist in our testbed experiments.
640	     Flows with IW=3 only suffered slightly when competing against flows
641	     with IW=10 in light to median loads. Under high load both flows'
642	     latency improved when mixed together. Also long-lived, background
643	     bulk-data flows seemed to enjoy higher throughput when running
644	     against many foreground short flows of IW=10 than against short
645	     flows of IW=3. One plausible explanation was IW=10 enabled short
646	     flows to complete sooner, leaving more room for the long-lived,
647	     background flows.

649	     An independent study using NSC simulator has also concluded that
650	     IW=10 works rather well and is quite fair against IW=3 [Sch11,
651	     Sch11-1].

653	   o How will a larger initial window perform over cellular networks?

655	     Some simulation studies [JNDK10, JNDK10-1] have been conducted to
656	     study the effect of a larger initial window on wireless links from
657	     2G to 4G networks (EGDE/HSPA/LTE). The overall result seems mixed
658	     in both raw performance and the fairness index.

660	     There has been on-going studies by people from Nokia on the effect
661	     of a larger initial window on GPRS and HSDPA networks. Initial test
662	     results seem to show no or little improvement from flows with a
663	     larger initial window. More studies are needed to understand why.

665	12. Related Proposals

667	   Two other proposals [All10, Tou10] have been made with the goal to
668	   raise TCP's initial window size over a large timescale. Both aim at
669	   addressing the concern about the uncertain impact from raising the
670	   initial window size at an Internet wide scale. Moreover, [Tou10]
671	   seeks an algorithm to automate the adjustment of IW safely over long
672	   haul period.

674	   Based on our test results from the past couple of years, we believe
675	   our proposal - a modest, static increase of IW to 10, to be the best
676	   near-term solution that is both simple and effective. The other
677	   proposals, with their added complexity and much longer deployment
678	   cycles, seem best suited for growing IW beyond 10 in the long run.

680	13. Security Considerations

682	   This document discusses the initial congestion window permitted for
683	   TCP connections. Changing this value does not raise any known new
684	   security issues with TCP.

686	14. Conclusion

688	   This document suggests a simple change to TCP that will reduce the
689	   application latency over short-lived TCP connections or links with
690	   long RTTs (saving several RTTs during the initial slow-start phase)
691	   with little or no negative impact over other flows. Extensive tests
692	   have been conducted through both testbeds and large data centers with
693	   most results showing improved latency with only a small increase in
694	   the packet retransmission rate. Based on these results we believe a
695	   modest increase of IW to 10 is the best near-term proposal while
696	   other proposals [All10, Tou10] may be best suited to grow IW beyond
697	   10 in the long run.

699	15. IANA Considerations

701	   None

703	16. Acknowledgments

705	   Many people at Google have helped to make the set of large scale
706	   tests possible. We would especially like to acknowledge Amit Agarwal,
707	   Tom Herbert, Arvind Jain and Tiziana Refice for their major
708	   contributions.

710	Normative References

712	   [RFC6298] Paxson, V., Allman, M., Chu, J. and M. Sargent, "Computing
713	             TCP's Retransmission Timer", RFC6298, June 2011.

715	   [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP
716	             Selective Acknowledgement Options", RFC 2018, October 1996.

718	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
719	             Requirement Levels", BCP 14, RFC 2119, March 1997.

721	   [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter,
722	             L., Leach, P. and T. Berners-Lee, "Hypertext Transfer
723	             Protocol -- HTTP/1.1", RFC 2616, June 1999.

725	   [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
726	             Timer", RFC 2988, November 2000.

728	   [RFC3390] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's
729	             Initial Window", RFC 3390, October 2002.

731	   [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion
732	             Control", RFC 5681, September 2009.

734	   [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J. and P.
735	             Hurtig, "Early Retransmit for TCP and SCTP", RFC 5827,
736	             April 2010.

738	Informative References

740	   [AKAM10]  "The State of the Internet, 3rd Quarter 2009", Akamai
741	             Technologies, Inc., January 2010.

743	   [All00]   Allman, M., "A Web Server's View of the Transport Layer",
744	             ACM Computer Communication Review, 30(5), October 2000.

746	   [All10]   Allman, M., "Initial Congestion Window Specification",
747	             Internet-draft draft-allman-tcpm-bump-initcwnd-00.txt work
748	             in progress.

750	   [Bel10]   Belshe, M., "A Client-Side Argument For Changing TCP Slow
751	             Start", January, 2010. URL
752	             http://sites.google.com/a/chromium.org/dev/spdy/
753	             An_Argument_For_Changing_TCP_Slow_Start.pdf

755	   [CD10]    Chu, J. and N. Dukkipati, "Increasing TCP's Initial
756	             Window", Presented to 77th IRTF ICCRG & IETF TCPM working
757	             group meetings, March 2010. URL
758	             http://www.ietf.org/proceedings/77/slides/tcpm-4.pdf

760	   [Chu09]   Chu, J., "Tuning TCP Parameters for the 21st Century",
761	             Presented to 75th IETF TCPM working group meeting, July
762	             2009. URL http://www.ietf.org/proceedings/75/slides/tcpm-
763	             1.pdf.

765	   [CW10]    Chu, J. and Wang, Y., "A Testbed Study on IW10 vs IW3",
766	             Presented to 79th IETF TCPM working group meeting, Nov.
767	             2010. URL http://www.ietf.org/proceedings/79/slides/tcpm-
768	             0.pdf.

770	   [DCCM10]  Dukkipati, D., Cheng, Y., Chu, J. and M. Mathis,
771	             "Increasing TCP initial window", Presented to 78th IRTF
772	             ICCRG working group meeting, July 2010. URL
773	             http://www.ietf.org/proceedings/78/slides/iccrg-3.pdf

775	   [DGHS07]  Dischinger, M., Gummadi, K., Haeberlen, A. and S. Saroiu,
776	             "Characterizing Residential Broadband Networks", Internet
777	             Measurement Conference, October 24-26, 2007.

779	   [Duk10]   Dukkipati, N., Refice, T., Cheng, Y., Chu, J., Sutin, N.,
780	             Agarwal, A., Herbert, T. and J. Arvind, "An Argument for
781	             Increasing TCP's Initial Congestion Window", ACM SIGCOMM
782	             Computer Communications Review, vol. 40 (2010), pp. 27-33.
783	             July 2010. URL
784	             http://www.google.com/research/pubs/pub36640.html

786	   [FF99]    Floyd, S., and K. Fall, "Promoting the Use of End-to-End
787	             Congestion Control in the Internet", IEEE/ACM Transactions
788	             on Networking, August 1999.

790	   [FJ93]    Floyd, S. and V. Jacobson, "Random Early Detection gateways
791	             for Congestion Avoidance", IEEE/ACM Transactions on
792	             Networking, V.1 N.4, August 1993, p. 397-413.

794	   [Get11]   Gettys, J., "Bufferbloat: Dark buffers in the Internet",
795	             Presented to 80th IETF TSV Area meeting, March 2011. URL
796	             http://www.ietf.org/proceedings/80/slides/tsvarea-1.pdf

798	   [IOR2009] Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide,
799	             J. Jahanian, F. and M. Karir, "Atlas Internet Observatory
800	             2009 Annual Report", 47th NANOG Conference, October 2009.

802	   [IW10]   "TCP IW10 links", URL
803	             http://code.google.com/speed/protocols/tcpm-IW10.html

805	   [Jac88]   Jacobson, V., "Congestion Avoidance and Control", Computer
806	             Communication Review, vol. 18, no. 4, pp. 314-329, Aug.
807	             1988.

809	   [JNDK10]   Jarvinen, I., Nyrhinen. A., Ding, A. and M. Kojo, "A
810	             Simulation Study on Increasing TCP's IW", Presented to 78th
811	             IRTF ICCRG working group meeting, July 2010. URL
812	             http://www.ietf.org/proceedings/78/slides/iccrg-7.pdf

814	   [JNDK10-1] Jarvinen, I., Nyrhinen. A., Ding, A. and M. Kojo, "Effect
815	             of IW and Initial RTO changes", Presented to 79th IETF TCPM
816	             working group meeting, Nov. 2010. URL
817	             http://www.ietf.org/proceedings/79/slides/tcpm-1.pdf

819	   [LAJW07]  Liu, D., Allman, M., Jin, S. and L. Wang, "Congestion
820	             Control Without a Startup Phase", Protocols for Fast, Long
821	             Distance Networks (PFLDnet) Workshop, February 2007. URL
822	             http://www.icir.org/mallman/papers/jumpstart-pfldnet07.pdf

824	   [PK98]    Padmanabhan V.N. and R. Katz, "TCP Fast Start: A technique
825	             for speeding up web transfers", in Proceedings of IEEE
826	             Globecom '98 Internet Mini-Conference, 1998.

828	   [PRAKS02] Partridge, C., Rockwell, D., Allman, M., Krishnan, R. and
829	             J. Sterbenz, "A Swifter Start for TCP", Technical Report
830	             No. 8339, BBN Technologies, March 2002.

832	   [PWSB09]  Papadimitriou, D., Welzl, M., Scharf, M. and B. Briscoe,
833	             "Open Research Issues in Internet Congestion Control",
834	             section 3.4, Internet-draft draft-irtf-iccrg-welzl-
835	             congestion-control-open-research-05.txt, work in progress.

837	   [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
838	             S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
839	             Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S.,
840	             Wroclawski, J. and L. Zhang, "Recommendations on Queue
841	             Management and Congestion Avoidance in the Internet", RFC
842	             2309, April 1998.

844	   [RFC2414] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's
845	             Initial Window", RFC 2414, September 1998.

847	   [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's
848	             Loss Recovery Using Limited Transmit", RFC 3042, January
849	             2001.

851	   [RFC3150] Dawkins, S., Montenegro, G., Kojo, M. and V. Magret, "End-
852	             to-end Performance Implications of Slow Links", RFC 3150,
853	             July 2001.

855	   [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
856	             Modification to TCP's Fast Recovery Algorithm", RFC 3782,
857	             April 2004.

859	   [RFC4782] Floyd, S., Allman, M., Jain, A. and P. Sarolahti, "Quick-
860	             Start for TCP and IP", RFC 4782, January 2007.

862	   [RJ10]    Ramachandran, S. and A. Jain, "Aggregate Statistics of Size
863	             Related Metrics of Web Pages metrics", 2010. URL
864	             http://code.google.com/speed/articles/web-metrics.html

866	   [Sch08]   Scharf, M., "Quick-Start, Jump-Start, and Other Fast
867	             Startup Approaches", November 17, 2008. URL
868	             http://www.ietf.org/old/2009/proceedings/08nov/slides/
869	             iccrg-2.pdf

871	   [Sch11]   Scharf, M., "Performance and Fairness Evaluation of IW10
872	             and Other Fast Startup Schemes", Presented to 80th IRTF
873	             ICCRG working group meeting, Nov. 2010. URL
874	             http://www.ietf.org/proceedings/80/slides/iccrg-1.pdf

876	   [Sch11-1]  Scharf, M., "Comparison of end-to-end and network-
877	             supported fast startup congestion control schemes",
878	             Computer Networks, Feb. 2011. URL
879	             http://dx.doi.org/10.1016/j.comnet.2011.02.002

881	   [SPDY]    "SPDY: An experimental protocol for a faster web", URL
882	             http://dev.chromium.org/spdy

884	   [Ste08]   Sounders S., "Roundup on Parallel Connections", High
885	             Performance Web Sites blog. URL
886	             http://www.stevesouders.com/blog/2008/03/20/roundup-on-
887	             parallel-connections

889	   [Tou10]   Touch, J., "Automating the Initial Window in TCP",
890	             Internet-draft draft-touch-tcpm-automatic-iw-01.txt, work
891	             in progress.

893	   [VH97]    Visweswaraiah, V. and J. Heidemann, "Improving Restart of
894	             Idle TCP Connections", Technical Report 97-661, University
895	             of Southern California, November 1997.

897	Author's Addresses

899	   Jerry Chu
900	   Google, Inc.
901	   1600 Amphitheatre Parkway
902	   Mountain View, CA 94043
903	   USA
904	   EMail: hkchu@google.com

906	   Nandita Dukkipati
907	   Google, Inc.
908	   1600 Amphitheatre Parkway
909	   Mountain View, CA 94043
910	   USA
911	   EMail: nanditad@google.com

913	   Yuchung Cheng
914	   Google, Inc.
915	   1600 Amphitheatre Parkway
916	   Mountain View, CA 94043
917	   USA
918	   EMail: ycheng@google.com

920	   Matt Mathis
921	   Google, Inc.
922	   1600 Amphitheatre Parkway
923	   Mountain View, CA 94043
924	   USA
925	   EMail: mattmathis@google.com

927	Acknowledgement

929	   Funding for the RFC Editor function is currently provided by the
930	   Internet Society.