idnits 2.17.1 draft-ietf-tcpm-initcwnd-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document updates RFC3390, but the abstract doesn't seem to directly say this. It does mention RFC3390 though, so this could be OK. -- The draft header indicates that this document updates RFC5681, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3390, updated by this document, for RFC5378 checks: 2001-05-25) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Full Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'PWSB09' is defined on line 832, but no explicit reference was found in the text == Unused Reference: 'Sch08' is defined on line 866, but no explicit reference was found in the text ** Downref: Normative reference to an Proposed Standard RFC: RFC 6298 ** Downref: Normative reference to an Proposed Standard RFC: RFC 2018 ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) ** Downref: Normative reference to an Proposed Standard RFC: RFC 3390 ** Downref: Normative reference to an Draft Standard RFC: RFC 5681 ** Downref: Normative reference to an Experimental RFC: RFC 5827 == Outdated reference: A later version (-08) exists of draft-irtf-iccrg-welzl-congestion-control-open-research-05 -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2414 (Obsoleted by RFC 3390) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) == Outdated reference: A later version (-03) exists of draft-touch-tcpm-automatic-iw-01 Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft J. Chu 3 draft-ietf-tcpm-initcwnd-02.txt N. Dukkipati 4 Intended status: Standard Y. Cheng 5 Updates: 3390, 5681 M. Mathis 6 Creation date: October 16, 2011 Google, Inc. 7 Expiration date: April 2012 9 Increasing TCP's Initial Window 11 Status of this Memo 13 Distribution of this memo is unlimited. 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that other 20 groups may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/1id-abstracts.html 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This Internet-Draft will expire on October, 2011. 35 Copyright Notice 37 Copyright (c) 2011 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Abstract 52 This document proposes an increase in the permitted TCP initial 53 window (IW) from between 2 and 4 segments, as specified in RFC 3390, 54 to 10 segments. It discusses the motivation behind the increase, the 55 advantages and disadvantages of the higher initial window, and 56 presents results from several large scale experiments showing that 57 the higher initial window improves the overall performance of many 58 web services without risking congestion collapse. The document closes 59 with a discussion of a list of concerns, and some results from recent 60 studies to address the concerns. 62 Terminology 64 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 65 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 66 document are to be interpreted as described in RFC 2119 [RFC2119]. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 71 2. TCP Modification . . . . . . . . . . . . . . . . . . . . . . . 3 72 3. Implementation Issues . . . . . . . . . . . . . . . . . . . . . 4 73 4. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 5. Advantages of Larger Initial Windows . . . . . . . . . . . . . 6 75 5.1 Reducing Latency . . . . . . . . . . . . . . . . . . . . . . 6 76 5.2 Keeping up with the growth of web object size . . . . . . . 7 77 5.3 Recovering faster from loss on under-utilized or wireless 78 links . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 79 7. Disadvantages of Larger Initial Windows for the Network . . . . 9 80 8. Mitigation of Negative Impact . . . . . . . . . . . . . . . . . 9 81 9. Interactions with the Retransmission Timer . . . . . . . . . . 9 82 10. Experimental Results From Large Scale Cluster Tests . . . . . 10 83 10.1 The benefits . . . . . . . . . . . . . . . . . . . . . . 10 84 10.2 The cost . . . . . . . . . . . . . . . . . . . . . . . . 11 85 11. List of Concerns and Corresponding Test Results . . . . . . . 12 86 12. Related Proposals . . . . . . . . . . . . . . . . . . . . . . 14 87 14. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 15 88 15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 89 16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 90 Normative References . . . . . . . . . . . . . . . . . . . . . . 16 91 Informative References . . . . . . . . . . . . . . . . . . . . . 16 92 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 93 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . 20 95 1. Introduction 96 This document updates RFC 3390 to raise the upper bound on TCP's 97 initial window (IW) to 10 segments or roughly 15KB. It is patterned 98 after and borrows heavily from RFC 3390 [RFC3390] and earlier work in 99 this area. 101 The primary argument in favor of raising IW follows from the evolving 102 scale of the Internet. Ten segments are likely to fit into queue 103 space available at any broadband access link, even when there are a 104 reasonable number of concurrent connections. 106 Lower speed links can be treated with environment specific 107 configurations, such that they can be protected from being 108 overwhelmed by large initial window bursts without imposing a 109 suboptimal initial window on the rest of the Internet. 111 This document reviews the advantages and disadvantages of using a 112 larger initial window, and includes summaries of several large scale 113 experiments showing that an initial window of 10 segments provides 114 benefits across the board for a variety of BW, RTT, and BDP classes. 115 These results show significant benefits for increasing IW for users 116 at much smaller data rates than had been previously anticipated. 117 However, at initial windows larger than 10, the results are mixed. We 118 believe that these mixed results are not intrinsic, but are the 119 consequence of various implementation artifacts, including overly 120 aggressive applications employing many simultaneous connections. 122 We propose that all TCP implementations should have a settable TCP IW 123 parameter; the default setting may start at 10 segments and should be 124 raised as we come to understand and and correct things that conflict. 126 In addition, we introduce a minor revision to RFC 3390 and RFC 5681 127 [RFC5681] to eliminate resetting the initial window when the SYN or 128 SYN/ACK is lost. 130 The document closes with a discussion of a list of concerns that have 131 been brought up, and some recent test results showing most of the 132 concerns can not be validated. 134 A complementary set of slides for this proposal can be found at 135 [CD10]. 137 2. TCP Modification 139 This document proposes an increase in the permitted upper bound for 140 TCP's initial window (IW) to 10 segments. This increase is optional: 141 a TCP MAY start with a larger initial window up to 10 segments. 143 This upper bound for the initial window size represents a change from 144 RFC 3390 [RFC3390], which specified that the congestion window be 145 initialized between 2 and 4 segments depending on the MSS. 147 This change applies to the initial window of the connection in the 148 first round trip time (RTT) of data transmission following the TCP 149 three-way handshake. Neither the SYN/ACK nor its acknowledgment (ACK) 150 in the three-way handshake should increase the initial window size. 152 Furthermore, RFC 3390 and RFC 5681 [RFC5681] state that 154 "If the SYN or SYN/ACK is lost, the initial window used by a 155 sender after a correctly transmitted SYN MUST be one segment 156 consisting of MSS bytes." 158 The proposed change to reduce the default RTO to 1 second [RFC6298] 159 increases the chance for spurious SYN or SYN/ACK retransmission, thus 160 unnecessarily penalizing connections with RTT > 1 second if their 161 initial window is reduced to 1 segment. For this reason, it is 162 RECOMMENDED that implementations refrain from resetting the initial 163 window to 1 segment, unless either there have been multiple SYN or 164 SYN/ACK retransmissions, or true loss detection has been made. 166 TCP implementations use slow start in as many as three different 167 ways: (1) to start a new connection (the initial window); (2) to 168 restart transmission after a long idle period (the restart window); 169 and (3) to restart transmission after a retransmit timeout (the loss 170 window). The change specified in this document affects the value of 171 the initial window. Optionally, a TCP MAY set the restart window to 172 the minimum of the value used for the initial window and the current 173 value of cwnd (in other words, using a larger value for the restart 174 window should never increase the size of cwnd). These changes do NOT 175 change the loss window, which must remain 1 segment of MSS bytes (to 176 permit the lowest possible window size in the case of severe 177 congestion). 179 Furthermore, to limit any negative effect that a larger initial 180 window may have on links with limited bandwidth or buffer space, 181 implementations SHOULD fall back to RFC 3390 for the restart window 182 (RW), if any packet loss is detected during either the initial 183 window, or a restart window, when more than 4KB of data is sent. 185 3. Implementation Issues 187 [Need to decide if a different formula is needed for PMTU != 1500.] 189 HTTP 1.1 specification allows only two simultaneous connections per 190 domain, while web browsers open more simultaneous TCP connections 191 [Ste08], partly to circumvent the small initial window in order to 192 speed up the loading of web pages as described above. 194 When web browsers open simultaneous TCP connections to the same 195 destination, they are working against TCP's congestion control 196 mechanisms [FF99]. Combining this behavior with larger initial 197 windows further increases the burstiness and unfairness to other 198 traffic in the network. A larger initial window will incentivize 199 applications to use fewer concurrent TCP connections. 201 Some implementations advertise small initial receive window (Table 2 202 in [Duk10]), effectively limiting how much window a remote host may 203 use. In order to realize the full benefit of the large initial 204 window, implementations are encouraged to advertise an initial 205 receive window of at least 10 segments, except for the circumstances 206 where a larger initial window is deemed harmful. (See the Mitigation 207 section below.) 209 TCP SACK option ([RFC2018]) was thought to be required in order for 210 the larger initial window to perform well. But measurements from both 211 a testbed and live tests showed that IW=10 without the SACK option 212 still beats the performance of IW=3 with the SACK option [CW10]. 214 4. Background 216 TCP congestion window was introduced as part of the congestion 217 control algorithm by Van Jacobson in 1988 [Jac88]. The initial value 218 of one segment was used as the starting point for newly established 219 connections to probe the available bandwidth on the network. 221 Today's Internet is dominated by web traffic running on top of short- 222 lived TCP connections [IOR2009]. The relatively small initial window 223 has become a limiting factor for the performance of many web 224 applications. 226 The global Internet has continued to grow, both in speed and 227 penetration. According to the latest report from Akamai [AKAM10], the 228 global broadband (> 2Mbps) adoption has surpassed 50%, propelling the 229 average connection speed to reach 1.7Mbps, while the narrowband (< 230 256Kbps) usage has dropped to 5%. In contrast, TCP's initial window 231 has remained 4KB for a decade [RFC2414], corresponding to a bandwidth 232 utilization of less than 200Kbps per connection, assuming an RTT of 233 200ms. 235 A large proportion of flows on the Internet are short web 236 transactions over TCP, and complete before exiting TCP slow start. 237 Speeding up the TCP flow startup phase, including circumventing the 238 initial window limit, has been an area of active research [PWSB09, 239 Sch08]. Numerous proposals exist [LAJW07, RFC4782, PRAKS02, PK98]. 241 Some require router support [RFC4782, PK98], hence are not practical 242 for the public Internet. Others suggested bold, but often radical 243 ideas, likely requiring more years of research before standardization 244 and deployment. 246 In the mean time, applications have responded to TCP's "slow" start. 247 Web sites use multiple sub-domains [Bel10] to circumvent HTTP 1.1 248 regulation on two connections per physical host [RFC2616]. As of 249 today, major web browsers open multiple connections to the same site 250 (up to six connections per domain [Ste08] and the number is growing). 251 This trend is to remedy HTTP serialized download to achieve 252 parallelism and higher performance. But it also implies today most 253 access links are severely under-utilized, hence having multiple TCP 254 connections improves performance most of the time. While raising the 255 initial congestion window may cause congestion for certain users 256 using these browsers, we argue that the browsers and other 257 application need to respect HTTP 1.1 regulation and stop increasing 258 number of simultaneous TCP connections. We believe a modest increase 259 of the initial window will help to stop this trend, and provide the 260 best interim solution to improve overall user performance, and reduce 261 the server, client, and network load. 263 Note that persistent connections and pipelining are designed to 264 address some of the issues with HTTP above [RFC2616]. Their presence 265 does not diminish the need for a larger initial window. E.g., data 266 from the Chrome browser show that 35% of HTTP requests are made on 267 new TCP connections. Our test data also confirm significant latency 268 reduction with the large initial window even with these two HTTP 269 features ([Duk10]). 271 Also note that packet pacing has been suggested as an effective 272 mechanism to avoid large bursts and their associated damage [VH97]. 273 We do not require pacing in our proposal due to our strong preference 274 for a simple solution. We suspect for packet bursts of a moderate 275 size, packet pacing will not be necessary. This seems to be confirmed 276 by our test results. 278 More discussion of the increase in initial window, including the 279 choice of 10 segments can be found in [Duk10, CD10]. 281 5. Advantages of Larger Initial Windows 283 5.1 Reducing Latency 285 An increase of the initial window from 3 segments to 10 segments 286 reduces the total transfer time for data sets greater than 4KB by up 287 to 4 round trips. 289 The table below compares the number of round trips between IW=3 and 290 IW=10 for different transfer sizes, assuming infinite bandwidth, no 291 packet loss, and the standard delayed acks with large delayed-ack 292 timer. 294 --------------------------------------- 295 | total segments | IW=3 | IW=10 | 296 --------------------------------------- 297 | 3 | 1 | 1 | 298 | 6 | 2 | 1 | 299 | 10 | 3 | 1 | 300 | 12 | 3 | 2 | 301 | 21 | 4 | 2 | 302 | 25 | 5 | 2 | 303 | 33 | 5 | 3 | 304 | 46 | 6 | 3 | 305 | 51 | 6 | 4 | 306 | 78 | 7 | 4 | 307 | 79 | 8 | 4 | 308 | 120 | 8 | 5 | 309 | 127 | 9 | 5 | 310 --------------------------------------- 312 For example, with the larger initial window, a transfer of 32 313 segments of data will require only two rather than five round trips 314 to complete. 316 5.2 Keeping up with the growth of web object size 318 RFC 3390 stated that the main motivation for increasing the initial 319 window to 4KB was to speed up connections that only transmit a small 320 amount of data, e.g., email and web. The majority of transfers back 321 then were less than 4KB, and could be completed in a single RTT 322 [All00]. 324 Since RFC 3390 was published, web objects have gotten significantly 325 larger [Chu09, RJ10]. Today only a small percentage of web objects 326 (e.g., 10% of Google's search responses) can fit in the 4KB initial 327 window. The average HTTP response size of gmail.com, a highly 328 scripted web-site, is 8KB (Figure 1. in [Duk10]). The average web 329 page, including all static and dynamic scripted web objects on the 330 page, has seen even greater growth in size [RJ10]. HTTP pipelining 331 [RFC2616] and new web transport protocols like SPDY [SPDY] allow 332 multiple web objects to be sent in a single transaction, potentially 333 requiring even larger initial window in order to transfer a whole web 334 page in one round trip. 336 5.3 Recovering faster from loss on under-utilized or wireless links 337 A greater-than-3-segment initial window increases the chance to 338 recover packet loss through Fast Retransmit rather than the lengthy 339 initial RTO [RFC5681]. This is because the fast retransmit algorithm 340 requires three duplicate acks as an indication that a segment has 341 been lost rather than reordered. While newer loss recovery techniques 342 such as Limited Transmit [RFC3042] and Early Retransmit [RFC5827] 343 have been proposed to help speeding up loss recovery from a smaller 344 window, both algorithms can still benefit from the larger initial 345 window because of a better chance to receive more ACKs to react upon. 347 6. Disadvantages of Larger Initial Windows for the Individual Connection 349 The larger bursts from an increase in the initial window may cause 350 buffer overrun and packet drop in routers with small buffers, or 351 routers experiencing congestion. This could result in unnecessary 352 retransmit timeouts. For a large-window connection that is able to 353 recover without a retransmit timeout, this could result in an 354 unnecessarily-early transition from the slow-start to the congestion- 355 avoidance phase of the window increase algorithm. [Note: knowing the 356 large initial window may cause premature segment drop, should one 357 make an exception for it, i.e., by allowing ssthresh to remain 358 unchanged if loss is from an enlarged initial window?] 360 Premature segment drops are unlikely to occur in uncongested networks 361 with sufficient buffering, or in moderately-congested networks where 362 the congested router uses active queue management (such as Random 363 Early Detection [FJ93, RFC2309, RFC3150]). 365 Insufficient buffering is more likely to exist in the access routers 366 connecting slower links. A recent study of access router buffer size 367 [DGHS07] reveals the majority of access routers provision enough 368 buffer for 130ms or longer, sufficient to cover a burst of more than 369 10 packets at 1Mbps speed, but possibly not sufficient for browsers 370 opening simultaneous connections. 372 A testbed study [CW10] on the effect of the larger initial window 373 with five simultaneously opened connections revealed that, even with 374 limited buffer size on slow links, IW=10 still reduced the total 375 latency of web transactions, although at the cost of higher packet 376 drop rates as compared to IW=3. 378 Some TCP connections will receive better performance with the larger 379 initial window even if the burstiness of the initial window results 380 in premature segment drops. This will be true if (1) the TCP 381 connection recovers from the segment drop without a retransmit 382 timeout, and (2) the TCP connection is ultimately limited to a small 383 congestion window by either network congestion or by the receiver's 384 advertised window. 386 7. Disadvantages of Larger Initial Windows for the Network 388 An increase in the initial window may increase congestion in a 389 network. However, since the increase is one-time only (at the 390 beginning of a connection), and the rest of TCP's congestion backoff 391 mechanism remains in place, it's highly unlikely the increase will 392 render a network in a persistent state of congestion, or even 393 congestion collapse. This seems to have been confirmed by our large 394 scale experiments described later. 396 Some of the discussions from RFC 3390 are still valid for IW=10. 397 Moreover, it is worth noting that although TCP NewReno increases the 398 chance of duplicate segments when trying to recover multiple packet 399 losses from a large window [RFC3782], the wide support of TCP 400 Selective Acknowledgment (SACK) option [RFC2018] in all major OSes 401 today should keep the volume of duplicate segments in check. 403 Recent measurements [Get11] provide evidence of extremely large 404 queues (in the order of one second) at access networks of the 405 Internet. While a significant part of the buffer bloat is contributed 406 by large downloads/uploads such as video files, emails with large 407 attachments, backups and download of movies to disk, some of the 408 problem is also caused by Web browsing of image heavy sites [Get11]. 409 This queuing delay is generally considered harmful for responsiveness 410 of latency sensitive traffic such as DNS queries, ARP, DHCP, VoIP and 411 Gaming. IW=10 can exacerbate this problem when doing short downloads 412 such as Web browsing. The mitigations proposed for the broader 413 problem of buffer bloating are also applicable in this case, such as 414 the use of ECN, AQM schemes and traffic classification (QoS). 416 8. Mitigation of Negative Impact 418 Much of the negative impact from an increase in the initial window is 419 likely to be felt by users behind slow links with limited buffers. 420 The negative impact can be mitigated by hosts directly connected to a 421 low-speed link advertising a smaller initial receive window than 10 422 segments. This can be achieved either through manual configuration by 423 the users, or through the host stack auto-detecting the low bandwidth 424 links. 426 More suggestions to improve the end-to-end performance of slow links 427 can be found in RFC 3150 [RFC3150]. 429 [Note: if packet loss is detected during IW through fast retransmit, 430 should cwnd back down to 2 rather than FlightSize / 2?] 432 9. Interactions with the Retransmission Timer 433 A large initial window increases the chance of spurious RTO on a low- 434 bandwidth path because the packet transmission time will dominate the 435 round-trip time. To minimize spurious retransmissions, 436 implementations MUST follow RFC 2988 [RFC2988] to restart the 437 retransmission timer with the current value of RTO for each ack 438 received that acknowledges new data. 440 10. Experimental Results From Large Scale Cluster Tests 442 In this section we summarize our findings from large scale Internet 443 experiments with an initial window of 10 segments, conducted via 444 Google's front-end infrastructure serving a diverse set of 445 applications. We present results from two data centers, each chosen 446 because of the specific characteristics of subnets served: AvgDC has 447 connection bandwidths closer to the worldwide average reported in 448 [AKAM10], with a median connection speed of about 1.7Mbps; SlowDC has 449 a larger proportion of traffic from slow bandwidth subnets with 450 nearly 20% of traffic from connections below 100Kbps, and a third 451 below 256Kbps. 453 Guided by measurements data, we answer two key questions: what is the 454 latency benefit when TCP connections start with a higher initial 455 window, and on the flip side, what is the cost? 457 10.1 The benefits 459 The average web search latency improvement over all responses in 460 AvgDC is 11.7% (68 ms) and 8.7% (72 ms) in SlowDC. We further 461 analyzed the data based on traffic characteristics and subnet 462 properties such as bandwidth (BW), round-trip time (RTT), and 463 bandwidth-delay product (BDP). The average response latency improved 464 across the board for a variety of subnets with the largest benefits 465 of over 20% from high RTT and high BDP networks, wherein most 466 responses can fit within the pipe. Correspondingly, responses from 467 low RTT paths experienced the smallest improvements of about 5%. 469 Contrary to what we expected, responses from low bandwidth subnets 470 experienced the best latency improvements (between 10-20%) in the 471 buckets 0-56Kbps and 56-256Kbps buckets. We speculate low BW networks 472 observe improved latency for two plausible reasons: 1) fewer slow- 473 start rounds: unlike many large BW networks, low BW subnets with 474 dial-up modems have inherently large RTTs; and 2) faster loss 475 recovery: an initial window larger than 3 segments increases the 476 chances of a lost packet to be recovered through Fast Retransmit as 477 opposed to a lengthy RTO. 479 Responses of different sizes benefited to varying degrees; those 480 larger than 3 segments naturally demonstrated larger improvements, 481 because they finished in fewer rounds in slow start as compared to 482 the baseline. In our experiments, response sizes <= 3 segments also 483 demonstrated small latency benefits. 485 To find out how individual subnets performed, we analyzed average 486 latency at a /24 subnet level (an approximation to a user base 487 offered similar set of services by a common ISP). We find even at the 488 subnet granularity, latency improved at all quantiles ranging from 5- 489 11%. 491 10.2 The cost 493 To quantify the cost of raising the initial window, we analyzed the 494 data specifically for subnets with low bandwidth and BDP, 495 retransmission rates for different kinds of applications, as well as 496 latency for applications operating with multiple concurrent TCP 497 connections. From our measurements we found no evidence of a negative 498 latency impacts that correlate to BW or BDP alone, but in fact both 499 kinds of subnets demonstrated latency improvements across averages 500 and quantiles. 502 As expected, the retransmission rate increased modestly when 503 operating with larger initial congestion window. The overall increase 504 in AvgDC is 0.3% (from 1.98% to 2.29%) and in SlowDC is 0.7% (from 505 3.54% to 4.21%). In our investigation, with the exception of one 506 application, the larger window resulted in a retransmission increase 507 of < 0.5% for services in the AvgDC. The exception is the Maps 508 application that operates with multiple concurrent TCP connections, 509 which increased its retransmission rate by 0.9% in AvgDC and 1.85% in 510 SlowDC (from 3.94% to 5.79%). 512 In our experiments, the percentage of traffic experiencing 513 retransmissions did not increase significantly. E.g. 90% of web 514 search and maps experienced zero retransmission in SlowDC 515 (percentages are higher for AvgDC); a break up of retransmissions by 516 percentiles indicate that most increases come from portion of traffic 517 already experiencing retransmissions in the baseline with initial 518 window of 3 segments. 520 Traffic patterns from applications using multiple concurrent TCP 521 connections all operating with a large initial window represent one 522 of the worst case scenarios where latency can be adversely impacted 523 due to bottleneck buffer overflow. Our investigation shows that such 524 a traffic pattern has not been a problem in AvgDC, where all these 525 applications, specifically maps and image thumbnails, demonstrated 526 improved latencies varying from 2-20%. In the case of SlowDC, while 527 these applications continued showing a latency improvement in the 528 mean, their latencies in higher quantiles (96 and above for maps) 529 indicated instances where latency with larger window is worse than 530 the baseline, e.g. the 99% latency for maps has increased by 2.3% 531 (80ms) when compared to the baseline. There is no evidence from our 532 measurements that such a cost on latency is a result of subnet 533 bandwidth alone. Although we have no way of knowing from our data, we 534 conjecture that the amount of buffering at bottleneck links plays a 535 key role in performance of these applications. 537 Further details on our experiments and analysis can be found in 538 [Duk10, DCCM10]. 540 11. List of Concerns and Corresponding Test Results 542 Concerns have been raised since we first published our proposal based 543 on a set of large scale experiments. To better understand the impact 544 of a larger initial window in order to confirm or dismiss these 545 concerns, we, as well as people outside of Google have conducted 546 numerous additional tests in the past year, using either Google's 547 large scale clusters, simulations, or real testbeds. The following is 548 a list of concerns and some of the findings. 550 A complete list of tests conducted, their results and related studies 551 can be found at [IW10]. 553 o How complete are our tests in traffic pattern coverage? 555 Google today offers a large portfolio of services beyond web 556 search. The list includes Gmail, Google Maps, Photos, News, Sites, 557 Images, Videos,..., etc. Our tests included most of Google's 558 services, covering a wide variety of traffic sizes and patterns. 559 One notable exception is YouTube because we don't think the large 560 initial window will have much material impact, either positive or 561 negative, on bulk data services. 563 [CW10] contains some result from a testbed study on how short flows 564 with a larger initial window might affect the throughput 565 performance of other co-existing, long lived, bulk data transfers. 567 o Larger bursts from the increase in the initial window cause 568 significantly more packet drops 570 All the known tests conducted on this subject so far [Duk10, Sch11, 571 Sch11-1, CW10] show that, although bursts from the larger initial 572 window tend to cause more packet drops, the increase tends to be 573 very modest. The only exception is from our own testbed study 574 [CW10] when under extremely high load and/or simultaneous opens. 575 But both IW=3 and IW=10 suffered very high packet loss rates under 576 those conditions. 578 o A large initial window may severely impact TCP performance over 579 highly multiplexed links still common in developing regions 581 Our large scale experiments described in section 10 above also 582 covered Africa and South America. Measurement data from those 583 regions [DCCM10] revealed improved latency even for those Google 584 services that employ multiple simultaneous connections, at the cost 585 of small increase in the retransmission rate. It seems that the 586 round trip savings from a larger initial window more than make up 587 the time spent on recovering more lost packets. 589 Similar phenomenon have also been observed from our testbed study 590 [CW10]. 592 o Why 10 segments? 594 Questions have been raised on how the number 10 was picked. We have 595 tried different sizes in our large scale experiments, and found 596 that 10 segments seem to give most of the benefits for the services 597 we tested while not causing significant increase in the 598 retransmission rates. Going forward 10 segments may turn out to be 599 too small when the average of web object sizes continue to grow. A 600 scheme to attempt to right size the initial window automatically 601 over long timescales has been proposed in [Tou10]. 603 o Need more thorough analysis of the impact on slow links 605 Although data from [Duk10] showed the large initial window reduced 606 the average latency even for the dialup link class of only 56Kbps 607 in bandwidth, it is only prudent to perform more microscopic 608 analysis on its effect on slow links. We set up two testbeds for 609 this purpose [CW10]. 611 Both testbeds were used to emulate a 300ms RTT, bottleneck link 612 bandwidth as low as 64Kbps, and route queue size as low as 40 613 packets. Although we've tried a large combination of test 614 parameters, almost all tests we ran managed to show some latency 615 improvement from IW=10, with only a modest increase in the packet 616 drop rate until a very high load was injected. The testbed result 617 was consistent with both our own large scale data center 618 experiments [CD10, DCCM10] and a separate study using NSC 619 simulations [Sch11, Sch11-1]. 621 o How will the larger initial window affect flows with initial 622 windows 4KB or less? 624 Flows with the larger initial window will likely grab more 625 bandwidth from a bottleneck link when competing against flows with 626 smaller initial window, at least initially. How long will this 627 "unfairness" last? Will there be any "capture effect" where flows 628 with larger initial window possess a disproportional share of 629 bandwidth beyond just a few round trips? 631 If there is any "unfairness" issue from flows with different 632 initial windows, it did not show up in our large scale experiments, 633 as the average latency for the bucket of all responses < 4KB did 634 not seem to be affected by the presence of many other larger 635 responses employing large initial window. As a matter of fact they 636 seemed to benefit from the large initial window too, as shown in 637 Figure 7 of [Duk10]. 639 The same phenomenon seems to exist in our testbed experiments. 640 Flows with IW=3 only suffered slightly when competing against flows 641 with IW=10 in light to median loads. Under high load both flows' 642 latency improved when mixed together. Also long-lived, background 643 bulk-data flows seemed to enjoy higher throughput when running 644 against many foreground short flows of IW=10 than against short 645 flows of IW=3. One plausible explanation was IW=10 enabled short 646 flows to complete sooner, leaving more room for the long-lived, 647 background flows. 649 An independent study using NSC simulator has also concluded that 650 IW=10 works rather well and is quite fair against IW=3 [Sch11, 651 Sch11-1]. 653 o How will a larger initial window perform over cellular networks? 655 Some simulation studies [JNDK10, JNDK10-1] have been conducted to 656 study the effect of a larger initial window on wireless links from 657 2G to 4G networks (EGDE/HSPA/LTE). The overall result seems mixed 658 in both raw performance and the fairness index. 660 There has been on-going studies by people from Nokia on the effect 661 of a larger initial window on GPRS and HSDPA networks. Initial test 662 results seem to show no or little improvement from flows with a 663 larger initial window. More studies are needed to understand why. 665 12. Related Proposals 667 Two other proposals [All10, Tou10] have been made with the goal to 668 raise TCP's initial window size over a large timescale. Both aim at 669 addressing the concern about the uncertain impact from raising the 670 initial window size at an Internet wide scale. Moreover, [Tou10] 671 seeks an algorithm to automate the adjustment of IW safely over long 672 haul period. 674 Based on our test results from the past couple of years, we believe 675 our proposal - a modest, static increase of IW to 10, to be the best 676 near-term solution that is both simple and effective. The other 677 proposals, with their added complexity and much longer deployment 678 cycles, seem best suited for growing IW beyond 10 in the long run. 680 13. Security Considerations 682 This document discusses the initial congestion window permitted for 683 TCP connections. Changing this value does not raise any known new 684 security issues with TCP. 686 14. Conclusion 688 This document suggests a simple change to TCP that will reduce the 689 application latency over short-lived TCP connections or links with 690 long RTTs (saving several RTTs during the initial slow-start phase) 691 with little or no negative impact over other flows. Extensive tests 692 have been conducted through both testbeds and large data centers with 693 most results showing improved latency with only a small increase in 694 the packet retransmission rate. Based on these results we believe a 695 modest increase of IW to 10 is the best near-term proposal while 696 other proposals [All10, Tou10] may be best suited to grow IW beyond 697 10 in the long run. 699 15. IANA Considerations 701 None 703 16. Acknowledgments 705 Many people at Google have helped to make the set of large scale 706 tests possible. We would especially like to acknowledge Amit Agarwal, 707 Tom Herbert, Arvind Jain and Tiziana Refice for their major 708 contributions. 710 Normative References 712 [RFC6298] Paxson, V., Allman, M., Chu, J. and M. Sargent, "Computing 713 TCP's Retransmission Timer", RFC6298, June 2011. 715 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP 716 Selective Acknowledgement Options", RFC 2018, October 1996. 718 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 719 Requirement Levels", BCP 14, RFC 2119, March 1997. 721 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, 722 L., Leach, P. and T. Berners-Lee, "Hypertext Transfer 723 Protocol -- HTTP/1.1", RFC 2616, June 1999. 725 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 726 Timer", RFC 2988, November 2000. 728 [RFC3390] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's 729 Initial Window", RFC 3390, October 2002. 731 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 732 Control", RFC 5681, September 2009. 734 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J. and P. 735 Hurtig, "Early Retransmit for TCP and SCTP", RFC 5827, 736 April 2010. 738 Informative References 740 [AKAM10] "The State of the Internet, 3rd Quarter 2009", Akamai 741 Technologies, Inc., January 2010. 743 [All00] Allman, M., "A Web Server's View of the Transport Layer", 744 ACM Computer Communication Review, 30(5), October 2000. 746 [All10] Allman, M., "Initial Congestion Window Specification", 747 Internet-draft draft-allman-tcpm-bump-initcwnd-00.txt work 748 in progress. 750 [Bel10] Belshe, M., "A Client-Side Argument For Changing TCP Slow 751 Start", January, 2010. URL 752 http://sites.google.com/a/chromium.org/dev/spdy/ 753 An_Argument_For_Changing_TCP_Slow_Start.pdf 755 [CD10] Chu, J. and N. Dukkipati, "Increasing TCP's Initial 756 Window", Presented to 77th IRTF ICCRG & IETF TCPM working 757 group meetings, March 2010. URL 758 http://www.ietf.org/proceedings/77/slides/tcpm-4.pdf 760 [Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century", 761 Presented to 75th IETF TCPM working group meeting, July 762 2009. URL http://www.ietf.org/proceedings/75/slides/tcpm- 763 1.pdf. 765 [CW10] Chu, J. and Wang, Y., "A Testbed Study on IW10 vs IW3", 766 Presented to 79th IETF TCPM working group meeting, Nov. 767 2010. URL http://www.ietf.org/proceedings/79/slides/tcpm- 768 0.pdf. 770 [DCCM10] Dukkipati, D., Cheng, Y., Chu, J. and M. Mathis, 771 "Increasing TCP initial window", Presented to 78th IRTF 772 ICCRG working group meeting, July 2010. URL 773 http://www.ietf.org/proceedings/78/slides/iccrg-3.pdf 775 [DGHS07] Dischinger, M., Gummadi, K., Haeberlen, A. and S. Saroiu, 776 "Characterizing Residential Broadband Networks", Internet 777 Measurement Conference, October 24-26, 2007. 779 [Duk10] Dukkipati, N., Refice, T., Cheng, Y., Chu, J., Sutin, N., 780 Agarwal, A., Herbert, T. and J. Arvind, "An Argument for 781 Increasing TCP's Initial Congestion Window", ACM SIGCOMM 782 Computer Communications Review, vol. 40 (2010), pp. 27-33. 783 July 2010. URL 784 http://www.google.com/research/pubs/pub36640.html 786 [FF99] Floyd, S., and K. Fall, "Promoting the Use of End-to-End 787 Congestion Control in the Internet", IEEE/ACM Transactions 788 on Networking, August 1999. 790 [FJ93] Floyd, S. and V. Jacobson, "Random Early Detection gateways 791 for Congestion Avoidance", IEEE/ACM Transactions on 792 Networking, V.1 N.4, August 1993, p. 397-413. 794 [Get11] Gettys, J., "Bufferbloat: Dark buffers in the Internet", 795 Presented to 80th IETF TSV Area meeting, March 2011. URL 796 http://www.ietf.org/proceedings/80/slides/tsvarea-1.pdf 798 [IOR2009] Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, 799 J. Jahanian, F. and M. Karir, "Atlas Internet Observatory 800 2009 Annual Report", 47th NANOG Conference, October 2009. 802 [IW10] "TCP IW10 links", URL 803 http://code.google.com/speed/protocols/tcpm-IW10.html 805 [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer 806 Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 807 1988. 809 [JNDK10] Jarvinen, I., Nyrhinen. A., Ding, A. and M. Kojo, "A 810 Simulation Study on Increasing TCP's IW", Presented to 78th 811 IRTF ICCRG working group meeting, July 2010. URL 812 http://www.ietf.org/proceedings/78/slides/iccrg-7.pdf 814 [JNDK10-1] Jarvinen, I., Nyrhinen. A., Ding, A. and M. Kojo, "Effect 815 of IW and Initial RTO changes", Presented to 79th IETF TCPM 816 working group meeting, Nov. 2010. URL 817 http://www.ietf.org/proceedings/79/slides/tcpm-1.pdf 819 [LAJW07] Liu, D., Allman, M., Jin, S. and L. Wang, "Congestion 820 Control Without a Startup Phase", Protocols for Fast, Long 821 Distance Networks (PFLDnet) Workshop, February 2007. URL 822 http://www.icir.org/mallman/papers/jumpstart-pfldnet07.pdf 824 [PK98] Padmanabhan V.N. and R. Katz, "TCP Fast Start: A technique 825 for speeding up web transfers", in Proceedings of IEEE 826 Globecom '98 Internet Mini-Conference, 1998. 828 [PRAKS02] Partridge, C., Rockwell, D., Allman, M., Krishnan, R. and 829 J. Sterbenz, "A Swifter Start for TCP", Technical Report 830 No. 8339, BBN Technologies, March 2002. 832 [PWSB09] Papadimitriou, D., Welzl, M., Scharf, M. and B. Briscoe, 833 "Open Research Issues in Internet Congestion Control", 834 section 3.4, Internet-draft draft-irtf-iccrg-welzl- 835 congestion-control-open-research-05.txt, work in progress. 837 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 838 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 839 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., 840 Wroclawski, J. and L. Zhang, "Recommendations on Queue 841 Management and Congestion Avoidance in the Internet", RFC 842 2309, April 1998. 844 [RFC2414] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's 845 Initial Window", RFC 2414, September 1998. 847 [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's 848 Loss Recovery Using Limited Transmit", RFC 3042, January 849 2001. 851 [RFC3150] Dawkins, S., Montenegro, G., Kojo, M. and V. Magret, "End- 852 to-end Performance Implications of Slow Links", RFC 3150, 853 July 2001. 855 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 856 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 857 April 2004. 859 [RFC4782] Floyd, S., Allman, M., Jain, A. and P. Sarolahti, "Quick- 860 Start for TCP and IP", RFC 4782, January 2007. 862 [RJ10] Ramachandran, S. and A. Jain, "Aggregate Statistics of Size 863 Related Metrics of Web Pages metrics", 2010. URL 864 http://code.google.com/speed/articles/web-metrics.html 866 [Sch08] Scharf, M., "Quick-Start, Jump-Start, and Other Fast 867 Startup Approaches", November 17, 2008. URL 868 http://www.ietf.org/old/2009/proceedings/08nov/slides/ 869 iccrg-2.pdf 871 [Sch11] Scharf, M., "Performance and Fairness Evaluation of IW10 872 and Other Fast Startup Schemes", Presented to 80th IRTF 873 ICCRG working group meeting, Nov. 2010. URL 874 http://www.ietf.org/proceedings/80/slides/iccrg-1.pdf 876 [Sch11-1] Scharf, M., "Comparison of end-to-end and network- 877 supported fast startup congestion control schemes", 878 Computer Networks, Feb. 2011. URL 879 http://dx.doi.org/10.1016/j.comnet.2011.02.002 881 [SPDY] "SPDY: An experimental protocol for a faster web", URL 882 http://dev.chromium.org/spdy 884 [Ste08] Sounders S., "Roundup on Parallel Connections", High 885 Performance Web Sites blog. URL 886 http://www.stevesouders.com/blog/2008/03/20/roundup-on- 887 parallel-connections 889 [Tou10] Touch, J., "Automating the Initial Window in TCP", 890 Internet-draft draft-touch-tcpm-automatic-iw-01.txt, work 891 in progress. 893 [VH97] Visweswaraiah, V. and J. Heidemann, "Improving Restart of 894 Idle TCP Connections", Technical Report 97-661, University 895 of Southern California, November 1997. 897 Author's Addresses 899 Jerry Chu 900 Google, Inc. 901 1600 Amphitheatre Parkway 902 Mountain View, CA 94043 903 USA 904 EMail: hkchu@google.com 906 Nandita Dukkipati 907 Google, Inc. 908 1600 Amphitheatre Parkway 909 Mountain View, CA 94043 910 USA 911 EMail: nanditad@google.com 913 Yuchung Cheng 914 Google, Inc. 915 1600 Amphitheatre Parkway 916 Mountain View, CA 94043 917 USA 918 EMail: ycheng@google.com 920 Matt Mathis 921 Google, Inc. 922 1600 Amphitheatre Parkway 923 Mountain View, CA 94043 924 USA 925 EMail: mattmathis@google.com 927 Acknowledgement 929 Funding for the RFC Editor function is currently provided by the 930 Internet Society.