idnits 2.17.1 

draft-hkchu-tcpm-initcwnd-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RFC2119]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  -- The draft header indicates that this document updates RFC3390, but the
     abstract doesn't seem to directly say this.  It does mention RFC3390
     though, so this could be OK.

  -- The draft header indicates that this document updates RFC5681, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

     (Using the creation date from RFC3390, updated by this document, for
     RFC5378 checks: 2001-05-25)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Full Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'PWSB09' is defined on line 665, but no explicit
     reference was found in the text

  == Unused Reference: 'Sch08' is defined on line 702, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-02) exists of
     draft-paxson-tcpm-rfc2988bis-00

  ** Downref: Normative reference to an Proposed Standard draft:
     draft-paxson-tcpm-rfc2988bis (ref. 'PAC10')

  ** Downref: Normative reference to an Proposed Standard RFC: RFC 2018

  ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231,
     RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298)

  ** Downref: Normative reference to an Proposed Standard RFC: RFC 3390

  ** Downref: Normative reference to an Draft Standard RFC: RFC 5681

  == Outdated reference: A later version (-08) exists of
     draft-irtf-iccrg-welzl-congestion-control-open-research-05

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)

  -- Obsolete informational reference (is this intentional?): RFC 2414
     (Obsoleted by RFC 3390)

  -- Obsolete informational reference (is this intentional?): RFC 3782
     (Obsoleted by RFC 6582)


     Summary: 7 errors (**), 0 flaws (~~), 5 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Draft                                                    J. Chu
3	draft-hkchu-tcpm-initcwnd-01.txt                            N. Dukkipati
4	Intended status: Standard                                       Y. Cheng
5	Updates: 3390, 5681                                            M. Mathis
6	Creation date: July 12, 2010                                Google, Inc.
7	Expiration date: January 2011

9	                    Increasing TCP's Initial Window

11	Status of this Memo

13	   Distribution of this memo is unlimited.

15	   This Internet-Draft is submitted in full conformance with the
16	   provisions of BCP 78 and BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups. Note that other
20	   groups may also distribute working documents as Internet-Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time. It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/1id-abstracts.html

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html

33	   This Internet-Draft will expire on January, 2011.

35	Copyright Notice

37	   Copyright (c) 2010 IETF Trust and the persons identified as the
38	   document authors. All rights reserved.

40	   This document is subject to BCP 78 and the IETF Trust's Legal
41	   Provisions Relating to IETF Documents
42	   (http://trustee.ietf.org/license-info) in effect on the date of
43	   publication of this document. Please review these documents
44	   carefully, as they describe your rights and restrictions with respect
45	   to this document. Code Components extracted from this document must
46	   include Simplified BSD License text as described in Section 4.e of
47	   the Trust Legal Provisions and are provided without warranty as
48	   described in the Simplified BSD License.

50	Abstract

52	   This document proposes an increase in the permitted TCP initial
53	   window (IW) from between 2 and 4 segments, as specified in RFC 3390,
54	   to 10 segments. It discusses the motivation behind the increase, the
55	   advantages and disadvantages of the higher initial window, and
56	   presents results from several large scale experiments showing that
57	   the higher initial window improves the overall performance of many
58	   web services without risking congestion collapse. Finally, it
59	   outlines a list of concerns to be addressed in future tests.

61	Terminology

63	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
64	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
65	   document are to be interpreted as described in RFC 2119 [RFC2119].

67	Editor's Note

69	   This draft aims at updating RFC 3390, thus it follows RFC 3390's
70	   layout closely. Much of the analysis from RFC 3390 remains valid.
71	   Some non-critical details are intentionally excluded from this draft.
72	   The intent is to have the draft published to solicit feedbacks early.
73	   All the excluded pieces will be supplied in later revisions.

75	   The choice of 10 for initial window may not be the "optimal" one. Our
76	   most recent tests gave better performance with IW=16 compared to
77	   IW=10, while still showing little negative impact. We are still in
78	   the process of completing the latest large scale tests and analysis
79	   for IW=16, which might be a better, more future proofing choice.

81	1.  Introduction

83	   TCP congestion window was introduced as part of the congestion
84	   control algorithm by Van Jacobson in 1988 [Jac88]. The initial value
85	   of one segment was used as the starting point for newly established
86	   connections to probe the available bandwidth on the network.

88	   The default value was increased to roughly 4KB more than a decade ago
89	   [RFC2414]. Since then, the Internet has continued to grow, both in
90	   speed and penetration [AKAM10]. Today's Internet is dominated by web
91	   traffic running on top of short-lived TCP connections [IOR2009]. The
92	   relatively small initial window has become a limiting factor for the
93	   performance of many web applications.

95	   This document proposes an optional standard to allow TCP's initial
96	   window to start at 10 segments or roughly 15KB, updating RFC 3390
97	   [RFC3390]. It discusses the motivation, the advantages and
98	   disadvantages of the higher initial window, and includes test results
99	   from several large scale experiments showing improved latency across
100	   the board for a variety of BW, RTT, and BDP classes.

102	   It also discusses potential negative impacts and suggests mitigation.
103	   A minor change to RFC 3390 and RFC 5681 [RFC5681] is proposed on
104	   resetting the initial window when the SYN or SYN/ACK is lost.

106	   The document closes with a discussion on remaining concerns, and
107	   future tests to further validate the higher initial window.

109	2.  TCP Modification

111	   This document proposes an increase in the permitted upper bound for
112	   TCP's initial window (IW) to 10 segments. This increase is optional:
113	   a TCP MAY start with a larger initial window up to 10 segments.

115	   This upper bound for the initial window size represents a change from
116	   RFC 3390 [RFC3390], which specified that the congestion window be
117	   initialized between 2 and 4 segments depending on the MSS.

119	   This change applies to the initial window of the connection in the
120	   first round trip time (RTT) of data transmission following the TCP
121	   three-way handshake. Neither the SYN/ACK nor its acknowledgment (ACK)
122	   in the three-way handshake should increase the initial window size
123	   beyond 10 segments.

125	   Furthermore, RFC 3390 and RFC 5681 [RFC5681] state that

127	         "If the SYN or SYN/ACK is lost, the initial window used by a
128	         sender after a correctly transmitted SYN MUST be one segment
129	         consisting of MSS bytes."

131	   The proposed change to reduce the default RTO to 1 second [PAC10]
132	   increases the chance for spurious SYN or SYN/ACK retransmission, thus
133	   unnecessarily penalizing connections with RTT > 1 second if their
134	   initial window is reduced to 1 segment. For this reason, it is
135	   RECOMMENDED that implementations refrain from resetting the initial
136	   window to 1 segment, unless either there have been multiple SYN or
137	   SYN/ACK retransmissions, or true loss detection has been made.

139	   TCP implementations use slow start in as many as three different
140	   ways: (1) to start a new connection (the initial window); (2) to
141	   restart transmission after a long idle period (the restart window);
142	   and (3) to restart transmission after a retransmit timeout (the loss
143	   window).  The change specified in this document affects the value of
144	   the initial window.  Optionally, a TCP MAY set the restart window to
145	   the minimum of the value used for the initial window and the current
146	   value of cwnd (in other words, using a larger value for the restart
147	   window should never increase the size of cwnd).  These changes do NOT
148	   change the loss window, which must remain 1 segment of MSS bytes (to
149	   permit the lowest possible window size in the case of severe
150	   congestion).

152	   Furthermore, to limit any negative effect that a larger initial
153	   window may have on links with limited bandwidth or buffer space,
154	   implementations SHOULD fall back to RFC 3390 for the restart window
155	   (RW), if any packet loss is detected during either the initial
156	   window, or a restart window, when more than 4KB of data is sent.

158	3.  Motivation

160	   The global Internet has continued to grow, both in speed and
161	   penetration. According to the latest report from Akamai [AKAM10], the
162	   global broadband (> 2Mbps) adoption has surpassed 50%, propelling the
163	   average connection speed to reach 1.7Mbps, while the narrowband (<
164	   256Kbps) usage has dropped to 5%. In contrast, TCP's initial window
165	   has remained 4KB for a decade, corresponding to a bandwidth
166	   utilization of less than 200Kbps per connection, assuming an RTT of
167	   200ms.

169	   A large proportion of flows on the Internet are short web
170	   transactions over TCP, and complete before exiting TCP slow start.
171	   Speeding up the TCP flow startup phase, including circumventing the
172	   initial window limit, has been an area of active research [PWSB09,
173	   Sch08]. Numerous proposals exist [LAJW07, RFC4782, PRAKS02, PK98].
174	   Some require router support [RFC4782, PK98], hence are not practical
175	   for the public Internet. Others suggested bold, but often radical
176	   ideas, likely requiring more years of research before standardization
177	   and deployment.

179	   In the mean time, applications have responded to TCP's "slow" start.
180	   Web sites use multiple sub-domains [Bel10] to circumvent HTTP 1.1
181	   regulation on two connections per physical host [RFC2616]. As of
182	   today, major web browsers open multiple connections to the same site
183	   (up to six connections per domain [Ste08] and the number is growing).
184	   This trend is to remedy HTTP serialized download to achieve
185	   parallelism and higher performance. But it also implies today most
186	   access links are severely under-utilized, hence having multiple TCP
187	   connections improves performance most of the time. While raising the
188	   initial congestion window may cause congestion for certain users
189	   using these browsers, we argue that the browsers and other
190	   application need to respect HTTP 1.1 regulation and stop increasing
191	   number of simultaneous TCP connections. We believe a modest increase
192	   of the initial window will help to stop this trend, and provide the
193	   best interim solution to improve overall user performance, and reduce
194	   the server, client, and network load.

196	   Note that persistent connections and pipelining are designed to
197	   address some of the issues with HTTP above [RFC2616]. Their presence
198	   does not diminish the need for a larger initial window, as the first
199	   data chunk to respond is often the largest, and will easily hit the
200	   initial window limit. Our test data confirm significant latency
201	   reduction with the large initial window even with these two HTTP
202	   features ([Duk10]).

204	   Also note that packet pacing has been suggested as an effective
205	   mechanism to avoid large bursts and their associated damage [VH97].
206	   We do not require pacing in our proposal due to our strong preference
207	   for a simple solution. We suspect for packet bursts of a moderate
208	   size, packet pacing will not be necessary. This seems to be confirmed
209	   by our test results.

211	   More discussion of the increase in initial window, including the
212	   choice of 10 segments can be found in [Duk10].

214	4.  Implementation Issues

216	   [Need to decide if a different formula is needed for PMTU != 1500.]

218	   HTTP 1.1 specification allows only two simultaneous connections per
219	   domain, while web browsers open more simultaneous TCP connections
220	   [Ste08], partly to circumvent the small initial window in order to
221	   speed up the loading of web pages as described above.

223	   When web browsers open simultaneous TCP connections to the same
224	   destination, they are working against TCP's congestion control
225	   mechanisms [FF99]. Combining this behavior with larger initial
226	   windows further increases the burstiness and unfairness to other
227	   traffic in the network. A larger initial window will incent
228	   applications to use fewer concurrent TCP connections.

230	   Some implementations advertise small initial receive window (Table 2
231	   in [Duk10]), effectively limiting how much window a remote host may
232	   use. In order to realize the full benefit of the large initial
233	   window, implementations are encouraged to advertise an initial
234	   receive window of at least 10 segments, except for the circumstances
235	   where a larger initial window is deemed harmful. (See the Mitigation
236	   section below.)

238	5.  Advantages of Larger Initial Windows

240	   1.  Reducing Latency
241	       An increase of the initial window from 3 segments to 10 segments
242	       reduces the total transfer time for data sets greater than 4KB by
243	       up to 4 round trips.

245	       The table below compares the number of round trips between IW=3
246	       and IW=10 for different transfer sizes, assuming infinite
247	       bandwidth, no packet loss, and the standard delayed acks with
248	       large delay-ack timer.

250	             ---------------------------------------
251	            | total segments |   IW=3   |   IW=10   |
252	             ---------------------------------------
253	            |         3      |     1    |      1    |
254	            |         6      |     2    |      1    |
255	            |        10      |     3    |      1    |
256	            |        12      |     3    |      2    |
257	            |        21      |     4    |      2    |
258	            |        25      |     5    |      2    |
259	            |        32      |     5    |      3    |
260	            |        46      |     6    |      3    |
261	            |        51      |     6    |      4    |
262	            |        79      |     7    |      4    |
263	            |       121      |     8    |      5    |
264	            |       128      |     9    |      5    |
265	             ---------------------------------------

267	       For example, with the larger initial window, a transfer of 32KB
268	       data will require only two rather than five round trips to
269	       complete.

271	   2.  Keeping up with the growth of web object size

273	       RFC 3390 stated that the main motivation for increasing the
274	       initial window to 4KB was to speed up connections that only
275	       transmit a small amount of data, e.g., email and web. The
276	       majority of transfers back then were less than 4KB, and could be
277	       completed in a single RTT [All00].

279	       Since RFC 3390 was published, web objects have gotten
280	       significantly larger [Chu09, RJ10]. A large percentage of web
281	       objects today no longer fit in the 4KB initial window, and will
282	       require more than one round trip to transfer. E.g., only 10% of
283	       Google's search responses can fit in 4KB, while 90% can fit in 10
284	       segments (15KB). The average HTTP response size of gmail.com, a
285	       highly scripted web-site, is 8KB (Figure 1. in [Duk10]).

287	       During the same period, the average web page, including all
288	       static and dynamic scripted web objects on the page, has seen
289	       even greater growth in size [RJ10]. HTTP pipelining [RFC2616] and
290	       new web transport protocols like SPDY [SPDY] allow multiple web
291	       objects to be sent in a single transaction, potentially requiring
292	       even larger initial window in order to transfer a whole web page
293	       in one round trip.

295	   3.  Recovering faster from loss on under-utilized or wireless links

297	       A greater-than-3-segment initial window increases the chance to
298	       recover packet loss through Fast Retransmit rather than the
299	       lengthy initial RTO [RFC5681]. This is because the fast
300	       retransmit algorithm requires three duplicate acks as an
301	       indication that a segment has been lost rather than reordered.
302	       While newer loss recovery techniques such as Limited Transmit
303	       [RFC3042] and Early Retransmit [AAABH10] have been proposed to
304	       help speeding up loss recovery from a smaller window, both
305	       algorithms can still benefit from the larger initial window
306	       because of a better chance to receive more ACKs to react upon.

308	6.  Disadvantages of Larger Initial Windows for the Individual
309	    Connection

311	   The larger bursts from an increase in the initial window may cause
312	   buffer overrun and packet drop in routers with small buffers, or
313	   routers experiencing congestion. This could result in unnecessary
314	   retransmit timeouts. For a large-window connection that is able to
315	   recover without a retransmit timeout, this could result in an
316	   unnecessarily-early transition from the slow-start to the congestion-
317	   avoidance phase of the window increase algorithm. [Note: knowing the
318	   large initial window may cause premature segment drop, should one
319	   make an exception for it, i.e., by allowing ssthresh to remain
320	   unchanged if loss is from an enlarged initial window?]

322	   Premature segment drops are unlikely to occur in uncongested networks
323	   with sufficient buffering, or in moderately-congested networks where
324	   the congested router uses active queue management (such as Random
325	   Early Detection [FJ93, RFC2309, RFC3150]).

327	   Insufficient buffering is more likely to exist in the access routers
328	   connecting slower links. A recent study of access router buffer size
329	   [DGHS07] reveals the majority of access routers provision enough
330	   buffer for 130ms or longer, sufficient to cover a burst of more than
331	   10 packets at 1Mbps speed, but possibly not sufficient for browsers
332	   opening simultaneous connections.

334	   Some TCP connections will receive better performance with the larger
335	   initial window even if the burstiness of the initial window results
336	   in premature segment drops.  This will be true if (1) the TCP
337	   connection recovers from the segment drop without a retransmit
338	   timeout, and (2) the TCP connection is ultimately limited to a small
339	   congestion window by either network congestion or by the receiver's
340	   advertised window.

342	7.  Disadvantages of Larger Initial Windows for the Network

344	   An increase in the initial window may increase congestion in a
345	   network. However, since the increase is one-time only (at the
346	   beginning of a connection), and the rest of TCP's congestion backoff
347	   mechanism remains in place, it's highly unlikely the increase will
348	   render a network in a persistent state of congestion, or even
349	   congestion collapse. This seems to have been confirmed by our large
350	   scale experiments described later.

352	   Some of the discussions from RFC 3390 are still valid for IW=10.
353	   Moreover, it is worth noting that although TCP NewReno increases the
354	   chance of duplicate segments when trying to recover multiple packet
355	   losses from a large window [RFC3782], the wide support of TCP
356	   Selective Acknowledgment (SACK) option [RFC2018] in all major OSes
357	   today should keep the volume of duplicate segments in check.

359	8.  Mitigation of Negative Impact

361	   Much of the negative impact from an increase in the initial window is
362	   likely to be felt by users behind slow links with limited buffers.
363	   The negative impact can be mitigated by hosts directly connected to a
364	   low-speed link advertising a smaller initial receive window than 10
365	   segments. This can be achieved either through manual configuration by
366	   the users, or through the host stack auto-detecting the low bandwidth
367	   links.

369	   More suggestions to improve the end-to-end performance of slow links
370	   can be found in RFC 3150 [RFC3150].

372	   [Note: if packet loss is detected during IW through fast retransmit,
373	   should cwnd back down to 2 rather than FlightSize / 2?]

375	9.  Interactions with the Retransmission Timer

377	   A large initial window increases the chance of spurious RTO on a low-
378	   bandwidth path because the packet transmission time will dominate the
379	   round-trip time. To minimize spurious retransmissions,
380	   implementations MUST follow RFC 2988 [RFC2988] to restart the
381	   retransmission timer with the current value of RTO for each ack
382	   received that acknowledges new data.

384	10. Experimental Results
385	   In this section we summarize our findings from large scale Internet
386	   experiments with an initial window of 10 segments, conducted via
387	   Google's front-end infrastructure serving a diverse set of
388	   applications. We present results from two datacenters, each chosen
389	   because of the specific characteristics of subnets served: AvgDC has
390	   connection bandwidths closer to the worldwide average reported in
391	   [AKAM10], with a median connection speed of about 1.7Mbps; SlowDC has
392	   a larger proportion of traffic from slow bandwidth subnets with
393	   nearly 20% of traffic from connections below 100Kbps, and a third
394	   below 256Kbps.

396	   Guided by measurements data, we answer two key questions: what is the
397	   latency benefit when TCP connections start with a higher initial
398	   window, and on the flip side, what is the cost?

400	10.1 The benefits

402	   The average web search latency improvement over all responses in
403	   AvgDC is 11.7% (68 ms) and 8.7% (72 ms) in SlowDC. We further
404	   analyzed the data based on traffic characteristics and subnet
405	   properties such as bandwidth (BW), round-trip time (RTT), and
406	   bandwidth-delay product (BDP). The average response latency improved
407	   across the board for a variety of subnets with the largest benefits
408	   of over 20% from high RTT and high BDP networks, wherein most
409	   responses can fit within the pipe. Correspondingly, responses from
410	   low RTT paths experienced the smallest improvements of about 5%.

412	   Contrary to what we expected, responses from low bandwidth subnets
413	   experienced the best latency improvements (between 10-20%) in the
414	   buckets 0-56Kbps and 56-256Kbps buckets. We speculate low BW networks
415	   observe improved latency for two plausible reasons: 1) fewer slow-
416	   start rounds: unlike many large BW networks, low BW subnets with
417	   dial-up modems have inherently large RTTs; and 2) faster loss
418	   recovery: an initial window larger than 3 segments increases the
419	   chances of a lost packet to be recovered through Fast Retransmit as
420	   opposed to a lengthy RTO.

422	   Responses of different sizes benefited to varying degrees; those
423	   larger than 3 segments naturally demonstrated larger improvements,
424	   because they finished in fewer rounds in slow start as compared to
425	   the baseline. In our experiments, response sizes <= 3 segments also
426	   demonstrated small latency benefits.

428	   To find out how individual subnets performed, we analyzed average
429	   latency at a /24 subnet level (an approximation to a user base
430	   offered similar set of services by a common ISP). We find even at the
431	   subnet granularity, latency improved at all quantiles ranging from 5-
432	   11%.

434	10.2 The cost

436	   To quantify the cost of raising the initial window, we analyzed the
437	   data specifically for subnets with low bandwidth and BDP,
438	   retransmission rates for different kinds of applications, as well as
439	   latency for applications operating with multiple concurrent TCP
440	   connections. From our measurements we found no evidence of a negative
441	   latency impacts that correlate to BW or BDP alone, but in fact both
442	   kinds of subnets demonstrated latency improvements across averages
443	   and quantiles.

445	   As expected, the retransmission rate increased modestly when
446	   operating with larger initial congestion window. The overall increase
447	   in AvgDC is 0.3% (from 1.98% to 2.29%) and in SlowDC is 0.7% (from
448	   3.54% to 4.21%). In our investigation, with the exception of one
449	   application, the larger window resulted in a retransmission increase
450	   of < 0.5% for services in the AvgDC.  The exception is the Maps
451	   application that operates with multiple concurrent TCP connections,
452	   which increased its retransmission rate by 0.9% in AvgDC and 1.85% in
453	   SlowDC (from 3.94% to 5.79%).

455	   In our experiments, the percentage of traffic experiencing
456	   retransmissions did not increase significantly. E.g. 90% of web
457	   search and maps experienced zero retransmissions in SlowDC
458	   (percentages are higher for AvgDC); a break up of retransmissions by
459	   percentiles indicate that most increases come from portion of traffic
460	   already experiencing retransmissions in the baseline with initial
461	   window of 3 segments.

463	   Traffic patterns from applications using multiple concurrent TCP
464	   connections all operating with a large initial window represent one
465	   of the worst case scenarios where latency can be adversely impacted
466	   due to bottleneck buffer overflow. Our investigation shows that such
467	   a traffic pattern has not been a problem in AvgDC, where all these
468	   applications, specifically maps and image thumbnails, demonstrated
469	   improved latencies varying from 2-20%. In the case of SlowDC, while
470	   these applications continued showing a latency improvement in the
471	   mean, their latencies in higher quantiles (96 and above for maps)
472	   indicated instances where latency with larger window is worse than
473	   the baseline, e.g. the 99% latency for maps has increased by 2.3%
474	   (80ms) when compared to the baseline. There is no evidence from our
475	   measurements that such a cost on latency is a result of subnet
476	   bandwidth alone. Although we have no way of knowing from our data, we
477	   conjecture that the amount of buffering at bottleneck links plays a
478	   key role in performance of these applications.

480	   Further details on our experiments and analysis can be found in
481	   [Duk10].

483	11. List of Concerns and Future Tests

485	   Although we were a little hard pressed to find negative impact from
486	   the initial window increase in our large scale tests, we don't
487	   contend our test coverage is complete. The following is an attempt to
488	   compile a list of concerns and to suggest future tests. Ultimately we
489	   would like to enlist the help from the TCP community at IETF to study
490	   and address any concern that may come up.

492	   1.  How complete are our tests in traffic pattern coverage?

494	       Google today offers a large portfolio of services beyond web
495	       search. The list includes Gmail, Google Maps, Photos, News,
496	       Sites, Images, Videos,..., etc. Our tests included most of
497	       Google's services, covering a wide variety of traffic sizes and
498	       patterns. One notable exception is YouTube because we don't think
499	       the large initial window will have much material impact, either
500	       positive or negative, on bulk data services.

502	   2.  Larger bursts from the increase in the initial window cause
503	       significantly more packet drops

505	       Let the max burst capacity of an end-to-end path be the largest
506	       burst of packets a given path can absorb before packet is
507	       dropped. To analyze the impact from the larger initial window, it
508	       helps to study the distribution of the max burst capacity of the
509	       current Internet.

511	       In the past similar studies were conducted by actively probing,
512	       e.g., through the TCP echo/discard ports from a large set of
513	       endhosts. However, most endhosts today are behind firewall
514	       enabled NAT boxes, making active probing infeasible.

516	       Our plan is to monitor TCP connections used to carry Google's
517	       bulk data services like YouTube, and infer the max burst capacity
518	       on a per-client basis from TCP internal connection parameters
519	       such as ssthresh, max cwnd, and packet drop pattern.

521	   3.  Need more thorough analysis of the impact on slow links

523	       Although our data showed the large initial window reduced the
524	       average latency even for the dialup link class of only 56Kbps in
525	       bandwidth, it is only prudent to perform more microscopic
526	       analysis on its effect on slow links. Moreover, data from the
527	       YouTube study above will likely be biased toward broadband users,
528	       leaving out users behind slow links.

530	       The narrowband classes here should include 56Kbps dialup modem,
531	       2.5G and GPRS mobile network.

533	   4.  How will the larger initial window affect flows with initial
534	       windows 4KB or less?

536	       Flows with the larger initial window will likely grab more
537	       bandwidth from a bottleneck link when competing against flows
538	       with smaller initial window, at least initially. How long will
539	       this "unfairness" last? Will there be any "capture effect" where
540	       flows with larger initial window possess a disproportional share
541	       of bandwidth beyond just a few round trips?

543	       If there is any "unfairness" issue from flows with different
544	       initial windows, it did not show up in our large scale
545	       experiments, as the average latency for the bucket of all
546	       responses < 4KB did not seem to be affected by the presence of
547	       many other larger responses employing large initial window.  As a
548	       matter of fact they seemed to benefit from the large initial
549	       window too, as shown in Figure 7 of [Duk10].

551	       More study can be done through simulation, similar to the set
552	       described in RFC 2415 [RFC2415].

554	12. Security Considerations

556	   This document discusses the initial congestion window permitted for
557	   TCP connections. Changing this value does not raise any known new
558	   security issues with TCP.

560	13. Conclusion

562	   This document suggests a change to TCP that will likely be beneficial
563	   to short-lived TCP connections and those over links with long RTTs
564	   (saving several RTTs during the initial slow-start phase). However,
565	   more tests are likely needed to fully understand its impact to the
566	   Internet. We welcome any help from the TCP community at IETF in
567	   moving this proposal forward.

569	14. IANA Considerations

571	   None

573	Acknowledgments

575	   Many people at Google have helped to make the set of large scale
576	   tests possible. We would especially like to acknowledge Amit Agarwal,
577	   Tom Herbert, Arvind Jain and Tiziana Refice for their major
578	   contributions.

580	Normative References

582	   [PAC10]   Paxson, V., Allman, M., and J. Chu, "Computing TCP's
583	             Retransmission Timer", Internet-draft draft-paxson-tcpm-
584	             rfc2988bis-00, work in progress, February, 2010.

586	   [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP
587	             Selective Acknowledgement Options", RFC 2018, October 1996.

589	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
590	             Requirement Levels", BCP 14, RFC 2119, March 1997.

592	   [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter,
593	             L., Leach, P. and T. Berners-Lee, "Hypertext Transfer
594	             Protocol -- HTTP/1.1", RFC 2616, June 1999.

596	   [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
597	             Timer", RFC 2988, November 2000.

599	   [RFC3390] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's
600	             Initial Window", RFC 3390, October 2002.

602	   [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion
603	             Control", RFC 5681, September 2009.

605	Informative References

607	   [AAABH10] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J. and P.
608	             Hurtig, "Early Retransmit for TCP and SCTP", Internet-draft
609	             draft-ietf-tcpm-early-rexmt-04.txt, work in progress.

611	   [AKAM10]  "The State of the Internet, 3rd Quarter 2009", Akamai
612	             Technologies, Inc., January 2010.

614	   [All00]   Allman, M., "A Web Server's View of the Transport Layer",
615	             ACM Computer Communication Review, 30(5), October 2000.

617	   [Bel10]   Belshe, M., "A Client-Side Argument For Changing TCP Slow
618	             Start", January, 2010. URL
619	             http://sites.google.com/a/chromium.org/dev/spdy/
620	             An_Argument_For_Changing_TCP_Slow_Start.pdf

622	   [Chu09]   Chu, J., "Tuning TCP Parameters for the 21st Century",
623	             Presented to 75th IETF TCPM working group meeting, July
624	             2009. http://www.ietf.org/proceedings/75/slides/tcpm-1.pdf.

626	   [DGHS07]  Dischinger, M., Gummadi, K., Haeberlen, A. and S. Saroiu,
627	             "Characterizing Residential Broadband Networks", Internet
628	             Measurement Conference, October 24-26, 2007.

630	   [Duk10]   Dukkipati, N., Refice, T., Cheng, Y., Chu, J., Sutin, N.,
631	             Agarwal, A., Herbert, T. and J. Arvind, "An Argument for
632	             Increasing TCP's Initial Congestion Window", March, 2010.
633	             URL http://code.google.com/speed/articles/
634	             tcp_initcwnd_paper.pdf

636	   [FF99]    Floyd, S., and K. Fall, "Promoting the Use of End-to-End
637	             Congestion Control in the Internet", IEEE/ACM Transactions
638	             on Networking, August 1999.

640	   [FJ93]    Floyd, S. and V. Jacobson, "Random Early Detection gateways
641	             for Congestion Avoidance", IEEE/ACM Transactions on
642	             Networking, V.1 N.4, August 1993, p. 397-413.

644	   [IOR2009] Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide,
645	             J. Jahanian, F. and M. Karir, "Atlas Internet Observatory
646	             2009 Annual Report", 47th NANOG Conference, October 2009.

648	   [Jac88]   Jacobson, V., "Congestion Avoidance and Control", Computer
649	             Communication Review, vol. 18, no. 4, pp. 314-329, Aug.
650	             1988.

652	   [LAJW07]  Liu, D., Allman, M., Jin, S. and L. Wang, "Congestion
653	             Control Without a Startup Phase", Protocols for Fast, Long
654	             Distance Networks (PFLDnet) Workshop, February 2007. URL
655	             http://www.icir.org/mallman/papers/jumpstart-pfldnet07.pdf

657	   [PK98]    Padmanabhan V.N. and R. Katz, "TCP Fast Start: A technique
658	             tbr speeding up web transfers", in Proceedings of IEEE
659	             Globecorn '98 Internet Mini-Conference, 1998.

661	   [PRAKS02] Partridge, C., Rockwell, D., Allman, M., Krishnan, R. and
662	             J. Sterbenz, "A Swifter Start for TCP", Technical Report
663	             No. 8339, BBN Technologies, March 2002.

665	   [PWSB09]  Papadimitriou, D., Welzl, M., Scharf, M. and B. Briscoe,
666	             "Open Research Issues in Internet Congestion Control",
667	             section 3.4, Internet-draft draft-irtf-iccrg-welzl-
668	             congestion-control-open-research-05.txt, work in progress.

670	   [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
671	             S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
672	             Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S.,
673	             Wroclawski, J. and L. Zhang, "Recommendations on Queue
674	             Management and Congestion Avoidance in the Internet", RFC
675	             2309, April 1998.

677	   [RFC2414] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's
678	             Initial Window", RFC 2414, September 1998.

680	   [RFC2415] Poduri, K. and K. Nichols, "Simulation Studies of Increased
681	             Initial TCP Window Size", RFC 2415, September 1998.

683	   [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's
684	             Loss Recovery Using Limited Transmit", RFC 3042, January
685	             2001.

687	   [RFC3150] Dawkins, S., Montenegro, G., Kojo, M. and V. Magret, "End-
688	             to-end Performance Implications of Slow Links", RFC 3150,
689	             July 2001.

691	   [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
692	             Modification to TCP's Fast Recovery Algorithm", RFC 3782,
693	             April 2004.

695	   [RFC4782] Floyd, S., Allman, M., Jain, A. and P. Sarolahti, "Quick-
696	             Start for TCP and IP", RFC 4782, January 2007.

698	   [RJ10]    Ramachandran, S. and A. Jain, "Aggregate Statistics of Size
699	             Related Metrics of Web Pages metrics", 2010. URL
700	             http://code.google.com/speed/articles/web-metrics.html

702	   [Sch08]   Scharf, M., "Quick-Start, Jump-Start, and Other Fast
703	             Startup Approaches", November 17, 2008. URL
704	             http://www.ietf.org/old/2009/proceedings/08nov/slides/
705	             iccrg-2.pdf

707	   [SPDY]    "SPDY: An experimental protocol for a faster web", URL
708	             http://dev.chromium.org/spdy

710	   [Ste08]   Sounders S., "Roundup on Parallel Connections", High
711	             Performance Web Sites blog. URL
712	             http://www.stevesouders.com/blog/2008/03/20/roundup-on-
713	             parallel-connections

715	   [VH97]    Visweswaraiah, V. and J. Heidemann, "Improving Restart of
716	             Idle TCP Connections", Technical Report 97-661, University
717	             of Southern California, November 1997.

719	Author's Addresses

721	   H.K. Jerry Chu
722	   Google, Inc.
723	   1600 Amphitheatre Parkway
724	   Mountain View, CA 94043
725	   USA
726	   EMail: hkchu@google.com

728	   Nandita Dukkipati
729	   Google, Inc.
730	   1600 Amphitheatre Parkway
731	   Mountain View, CA 94043
732	   USA
733	   EMail: nanditad@google.com

735	   Yuchung Cheng
736	   Google, Inc.
737	   1600 Amphitheatre Parkway
738	   Mountain View, CA 94043
739	   USA
740	   EMail: ycheng@google.com

742	   Matt Mathis
743	   Google, Inc.
744	   1600 Amphitheatre Parkway
745	   Mountain View, CA 94043
746	   USA
747	   EMail: mattmathis@google.com

749	Acknowledgement

751	   Funding for the RFC Editor function is currently provided by the
752	   Internet Society.