idnits 2.17.1 draft-ietf-tcpm-initcwnd-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 17 characters in excess of 72. ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document updates RFC3390, but the abstract doesn't seem to directly say this. It does mention RFC3390 though, so this could be OK. -- The draft header indicates that this document updates RFC5681, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 931 has weird spacing: '...gree of laten...' (Using the creation date from RFC3390, updated by this document, for RFC5378 checks: 2001-05-25) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 26, 2012) is 4443 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'RFC6077' is defined on line 821, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2414 (Obsoleted by RFC 3390) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) == Outdated reference: A later version (-03) exists of draft-touch-tcpm-automatic-iw-02 Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft J. Chu 3 draft-ietf-tcpm-initcwnd-03.txt N. Dukkipati 4 Intended status: Experimental Y. Cheng 5 Updates: 3390, 5681 M. Mathis 6 Expiration date: August 2012 Google, Inc. 7 February 26, 2012 9 Increasing TCP's Initial Window 11 Status of this Memo 13 Distribution of this memo is unlimited. 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that other 20 groups may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/1id-abstracts.html 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This Internet-Draft will expire on August, 2012. 35 Copyright Notice 37 Copyright (c) 2012 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Abstract 52 This document proposes an increase in the permitted TCP initial 53 window (IW) from between 2 and 4 segments, as specified in RFC 3390, 54 to 10 segments. It discusses the motivation behind the increase, the 55 advantages and disadvantages of the higher initial window, and 56 presents results from several large scale experiments showing that 57 the higher initial window improves the overall performance of many 58 web services without risking congestion collapse. The document closes 59 with a discussion of usage and deployment recommended by the IETF TCP 60 Maintenance and Minor Extensions (TCPM) working group. 62 Terminology 64 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 65 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 66 document are to be interpreted as described in RFC 2119 [RFC2119]. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 71 2. TCP Modification . . . . . . . . . . . . . . . . . . . . . . . 4 72 3. Implementation Issues . . . . . . . . . . . . . . . . . . . . . 5 73 4. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 5. Advantages of Larger Initial Windows . . . . . . . . . . . . . 7 75 5.1 Reducing Latency . . . . . . . . . . . . . . . . . . . . . . 7 76 5.2 Keeping up with the growth of web object size . . . . . . . 7 77 5.3 Recovering faster from loss on under-utilized or wireless 78 links . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 79 7. Disadvantages of Larger Initial Windows for the Network . . . . 9 80 8. Mitigation of Negative Impact . . . . . . . . . . . . . . . . 10 81 9. Interactions with the Retransmission Timer . . . . . . . . . 10 82 10. Experimental Results From Large Scale Cluster Tests . . . . . 10 83 10.1 The benefits . . . . . . . . . . . . . . . . . . . . . . 10 84 10.2 The cost . . . . . . . . . . . . . . . . . . . . . . . . 11 85 11. Other Studies . . . . . . . . . . . . . . . . . . . . . . . . 12 86 12. Usage and Deployment Recommendations . . . . . . . . . . . . 13 87 13. Related Proposals . . . . . . . . . . . . . . . . . . . . . . 13 88 14. Security Considerations . . . . . . . . . . . . . . . . . . . 14 89 15. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 14 90 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 91 17. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 92 Normative References . . . . . . . . . . . . . . . . . . . . . . 15 93 Informative References . . . . . . . . . . . . . . . . . . . . . 15 94 Appendix A - List of Concerns and Corresponding Test Results . . 19 95 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 96 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . 22 98 1. Introduction 100 This document proposes to update RFC 3390 to raise the upper bound on 101 TCP's initial window (IW) to 10 segments or roughly 15KB. It is 102 patterned after and borrows heavily from RFC 3390 [RFC3390] and 103 earlier work in this area. Due to lingering concerns about possible 104 side effects, some of the recommendations are conditional on 105 additional monitoring and evaluation. 107 The primary argument in favor of raising IW follows from the evolving 108 scale of the Internet. Ten segments are likely to fit into queue 109 space available at any broadband access link, even when there are a 110 reasonable number of concurrent connections. 112 Lower speed links can be treated with environment specific 113 configurations, such that they can be protected from being 114 overwhelmed by large initial window bursts without imposing a 115 suboptimal initial window on the rest of the Internet. 117 This document reviews the advantages and disadvantages of using a 118 larger initial window, and includes summaries of several large scale 119 experiments showing that an initial window of 10 segments provides 120 benefits across the board for a variety of BW, RTT, and BDP classes. 121 These results show significant benefits for increasing IW for users 122 at much smaller data rates than had been previously anticipated. 123 However, at initial windows larger than 10, the results are mixed. We 124 believe that these mixed results are not intrinsic, but are the 125 consequence of various implementation artifacts, including overly 126 aggressive applications employing many simultaneous connections. 128 We recommend that all TCP implementations have a settable TCP IW 129 parameter. As of 2011 an appropriate value for this parameter is 10 130 segments as long as there is a reasonable effort to monitor for 131 possible interactions with other Internet applications and services. 132 Furthermore, it is understood that the appropriate value for IW is 133 likely to continue to rise in the future, but this document does not 134 include any supporting evidence for larger values of IW. 136 In addition, we introduce a minor revision to RFC 3390 and RFC 5681 137 [RFC5681] to eliminate resetting the initial window when the SYN or 138 SYN/ACK is lost. 140 The document closes with a discussion of the consensus from the TCPM 141 working group on the near-term usage and deployment of IW10 in the 142 Internet. 144 A complementary set of slides for this proposal can be found at 145 [CD10]. 147 2. TCP Modification 149 This document proposes an increase in the permitted upper bound for 150 TCP's initial window (IW) to 10 segments depending on the MSS. This 151 increase is optional: a TCP MAY start with an initial window that is 152 smaller than 10 segments. 154 More precisely, the upper bound for the initial window will be 156 min (10*MSS, max (2*MSS, 14600)) (1) 158 This upper bound for the initial window size represents a change from 159 RFC 3390 [RFC3390], which specified that the congestion window be 160 initialized between 2 and 4 segments depending on the MSS. 162 This change applies to the initial window of the connection in the 163 first round trip time (RTT) of data transmission during or following 164 the TCP three-way handshake. Neither the SYN/ACK nor its 165 acknowledgment (ACK) in the three-way handshake should increase the 166 initial window size. 168 Note that all the test results described in this document were based 169 on the regular Ethernet MTU of 1500 bytes. Future study of the effect 170 of a different MTU may be needed to fully validate (1) above. 172 Furthermore, RFC 3390 and RFC 5681 [RFC5681] state that 174 "If the SYN or SYN/ACK is lost, the initial window used by a 175 sender after a correctly transmitted SYN MUST be one segment 176 consisting of MSS bytes." 178 The proposed change to reduce the default RTO to 1 second [RFC6298] 179 increases the chance for spurious SYN or SYN/ACK retransmission, thus 180 unnecessarily penalizing connections with RTT > 1 second if their 181 initial window is reduced to 1 segment. For this reason, it is 182 RECOMMENDED that implementations refrain from resetting the initial 183 window to 1 segment, unless either there have been multiple SYN or 184 SYN/ACK retransmissions, or true loss detection has been made. 186 TCP implementations use slow start in as many as three different 187 ways: (1) to start a new connection (the initial window); (2) to 188 restart transmission after a long idle period (the restart window); 189 and (3) to restart transmission after a retransmit timeout (the loss 190 window). The change specified in this document affects the value of 191 the initial window. Optionally, a TCP MAY set the restart window to 192 the minimum of the value used for the initial window and the current 193 value of cwnd (in other words, using a larger value for the restart 194 window should never increase the size of cwnd). These changes do NOT 195 change the loss window, which must remain 1 segment of MSS bytes (to 196 permit the lowest possible window size in the case of severe 197 congestion). 199 Furthermore, to limit any negative effect that a larger initial 200 window may have on links with limited bandwidth or buffer space, 201 implementations SHOULD fall back to RFC 3390 for the restart window 202 (RW) if any packet loss is detected during either the initial window, 203 or a restart window, and more than 4KB of data is sent. 205 3. Implementation Issues 207 HTTP 1.1 specification allows only two simultaneous connections per 208 domain, while web browsers open more simultaneous TCP connections 209 [Ste08], partly to circumvent the small initial window in order to 210 speed up the loading of web pages as described above. 212 When web browsers open simultaneous TCP connections to the same 213 destination, they are working against TCP's congestion control 214 mechanisms [FF99]. Combining this behavior with larger initial 215 windows further increases the burstiness and unfairness to other 216 traffic in the network. A larger initial window will incentivize 217 applications to use fewer concurrent TCP connections. 219 Some implementations advertise small initial receive window (Table 2 220 in [Duk10]), effectively limiting how much window a remote host may 221 use. In order to realize the full benefit of the large initial 222 window, implementations are encouraged to advertise an initial 223 receive window of at least 10 segments, except for the circumstances 224 where a larger initial window is deemed harmful. (See the Mitigation 225 section below.) 227 TCP SACK option ([RFC2018]) was thought to be required in order for 228 the larger initial window to perform well. But measurements from both 229 a testbed and live tests showed that IW=10 without the SACK option 230 outperforms IW=3 with the SACK option [CW10]. 232 4. Background 234 TCP congestion window was introduced as part of the congestion 235 control algorithm by Van Jacobson in 1988 [Jac88]. The initial value 236 of one segment was used as the starting point for newly established 237 connections to probe the available bandwidth on the network. 239 Today's Internet is dominated by web traffic running on top of short- 240 lived TCP connections [IOR2009]. The relatively small initial window 241 has become a limiting factor for the performance of many web 242 applications. 244 The global Internet has continued to grow, both in speed and 245 penetration. According to the latest report from Akamai [AKAM10], the 246 global broadband (> 2Mbps) adoption has surpassed 50%, propelling the 247 average connection speed to reach 1.7Mbps, while the narrowband (< 248 256Kbps) usage has dropped to 5%. In contrast, TCP's initial window 249 has remained 4KB for a decade [RFC2414], corresponding to a bandwidth 250 utilization of less than 200Kbps per connection, assuming an RTT of 251 200ms. 253 A large proportion of flows on the Internet are short web 254 transactions over TCP, and complete before exiting TCP slow start. 255 Speeding up the TCP flow startup phase, including circumventing the 256 initial window limit, has been an area of active research [RFC6077, 257 Sch08]. Numerous proposals exist [LAJW07, RFC4782, PRAKS02, PK98]. 258 Some require router support [RFC4782, PK98], hence are not practical 259 for the public Internet. Others suggested bold, but often radical 260 ideas, likely requiring more years of research before standardization 261 and deployment. 263 In the mean time, applications have responded to TCP's "slow" start. 264 Web sites use multiple sub-domains [Bel10] to circumvent HTTP 1.1 265 regulation on two connections per physical host [RFC2616]. As of 266 today, major web browsers open multiple connections to the same site 267 (up to six connections per domain [Ste08] and the number is growing). 268 This trend is to remedy HTTP serialized download to achieve 269 parallelism and higher performance. But it also implies today most 270 access links are severely under-utilized, hence having multiple TCP 271 connections improves performance most of the time. While raising the 272 initial congestion window may cause congestion for certain users 273 using these browsers, we argue that the browsers and other 274 application need to respect HTTP 1.1 regulation and stop increasing 275 number of simultaneous TCP connections. We believe a modest increase 276 of the initial window will help to stop this trend, and provide the 277 best interim solution to improve overall user performance, and reduce 278 the server, client, and network load. 280 Note that persistent connections and pipelining are designed to 281 address some of the above issues with HTTP [RFC2616]. Their presence 282 does not diminish the need for a larger initial window. E.g., data 283 from the Chrome browser show that 35% of HTTP requests are made on 284 new TCP connections. Our test data also shows significant latency 285 reduction with the large initial window even in conjunction with 286 these two HTTP features ([Duk10]). 288 Also note that packet pacing has been suggested as a possible 289 mechanism to avoid large bursts and their associated harm [VH97]. 290 Pacing is not required in this proposal due to a strong preference 291 for a simple solution. We suspect for packet bursts of a moderate 292 size, packet pacing will not be necessary. This seems to be confirmed 293 by our test results. 295 More discussion of the increase in initial window, including the 296 choice of 10 segments can be found in [Duk10, CD10]. 298 5. Advantages of Larger Initial Windows 300 5.1 Reducing Latency 302 An increase of the initial window from 3 segments to 10 segments 303 reduces the total transfer time for data sets greater than 4KB by up 304 to 4 round trips. 306 The table below compares the number of round trips between IW=3 and 307 IW=10 for different transfer sizes, assuming infinite bandwidth, no 308 packet loss, and the standard delayed acks with large delayed-ACK 309 timer. 311 --------------------------------------- 312 | total segments | IW=3 | IW=10 | 313 --------------------------------------- 314 | 3 | 1 | 1 | 315 | 6 | 2 | 1 | 316 | 10 | 3 | 1 | 317 | 12 | 3 | 2 | 318 | 21 | 4 | 2 | 319 | 25 | 5 | 2 | 320 | 33 | 5 | 3 | 321 | 46 | 6 | 3 | 322 | 51 | 6 | 4 | 323 | 78 | 7 | 4 | 324 | 79 | 8 | 4 | 325 | 120 | 8 | 5 | 326 | 127 | 9 | 5 | 327 --------------------------------------- 329 For example, with the larger initial window, a transfer of 32 330 segments of data will require only two rather than five round trips 331 to complete. 333 5.2 Keeping up with the growth of web object size 335 RFC 3390 stated that the main motivation for increasing the initial 336 window to 4KB was to speed up connections that only transmit a small 337 amount of data, e.g., email and web. The majority of transfers back 338 then were less than 4KB, and could be completed in a single RTT 339 [All00]. 341 Since RFC 3390 was published, web objects have gotten significantly 342 larger [Chu09, RJ10]. Today only a small percentage of web objects 343 (e.g., 10% of Google's search responses) can fit in the 4KB initial 344 window. The average HTTP response size of gmail.com, a highly 345 scripted web-site, is 8KB (Figure 1. in [Duk10]). The average web 346 page, including all static and dynamic scripted web objects on the 347 page, has seen even greater growth in size [RJ10]. HTTP pipelining 348 [RFC2616] and new web transport protocols such as SPDY [SPDY] allow 349 multiple web objects to be sent in a single transaction, potentially 350 benefiting from an even larger initial window in order to transfer an 351 entire web page in a small number of round trips. 353 5.3 Recovering faster from loss on under-utilized or wireless links 355 A greater-than-3-segment initial window increases the chance to 356 recover packet loss through Fast Retransmit rather than the lengthy 357 initial RTO [RFC5681]. This is because the fast retransmit algorithm 358 requires three duplicate ACKs as an indication that a segment has 359 been lost rather than reordered. While newer loss recovery techniques 360 such as Limited Transmit [RFC3042] and Early Retransmit [RFC5827] 361 have been proposed to help speeding up loss recovery from a smaller 362 window, both algorithms can still benefit from the larger initial 363 window because of a better chance to receive more ACKs to react upon. 365 6. Disadvantages of Larger Initial Windows for the Individual Connection 367 The larger bursts from an increase in the initial window may cause 368 buffer overrun and packet drop in routers with small buffers, or 369 routers experiencing congestion. This could result in unnecessary 370 retransmit timeouts. For a large-window connection that is able to 371 recover without a retransmit timeout, this could result in an 372 unnecessarily-early transition from the slow-start to the congestion- 373 avoidance phase of the window increase algorithm. 375 Premature segment drops are unlikely to occur in uncongested networks 376 with sufficient buffering, or in moderately-congested networks where 377 the congested router uses active queue management (such as Random 378 Early Detection [FJ93, RFC2309, RFC3150]). 380 Insufficient buffering is more likely to exist in the access routers 381 connecting slower links. A recent study of access router buffer size 382 [DGHS07] reveals the majority of access routers provision enough 383 buffer for 130ms or longer, sufficient to cover a burst of more than 384 10 packets at 1Mbps speed, but possibly not sufficient for browsers 385 opening simultaneous connections. 387 A testbed study [CW10] on the effect of the larger initial window 388 with five simultaneously opened connections revealed that, even with 389 limited buffer size on slow links, IW=10 still reduced the total 390 latency of web transactions, although at the cost of higher packet 391 drop rates as compared to IW=3. 393 Some TCP connections will receive better performance with the larger 394 initial window even if the burstiness of the initial window results 395 in premature segment drops. This will be true if (1) the TCP 396 connection recovers from the segment drop without a retransmit 397 timeout, and (2) the TCP connection is ultimately limited to a small 398 congestion window by either network congestion or by the receiver's 399 advertised window. 401 7. Disadvantages of Larger Initial Windows for the Network 403 An increase in the initial window may increase congestion in a 404 network. However, since the increase is one-time only (at the 405 beginning of a connection), and the rest of TCP's congestion backoff 406 mechanism remains in place, it's highly unlikely the increase will 407 render a network in a persistent state of congestion, or even 408 congestion collapse. This seems to have been confirmed by the large 409 scale experiments described later. 411 Until this proposal is widely deployed, a fairness issue may exist 412 between flows adopting a larger initial window vs flows that are 413 RFC3390-compliant. Although no severe unfairness has been detected on 414 all the known tests so far, further study on this topic may be 415 warranted. 417 Some of the discussions from RFC 3390 are still valid for IW=10. 418 Moreover, it is worth noting that although TCP NewReno increases the 419 chance of duplicate segments when trying to recover multiple packet 420 losses from a large window [RFC3782], the wide support of TCP 421 Selective Acknowledgment (SACK) option [RFC2018] in all major OSes 422 today should keep the volume of duplicate segments in check. 424 Recent measurements [Get11] provide evidence of extremely large 425 queues (in the order of one second or more) at access networks of the 426 Internet. While a significant part of the buffer bloat is contributed 427 by large downloads/uploads such as video files, emails with large 428 attachments, backups and download of movies to disk, some of the 429 problem is also caused by Web browsing of image heavy sites [Get11]. 430 This queuing delay is generally considered harmful for responsiveness 431 of latency sensitive traffic such as DNS queries, ARP, DHCP, VoIP and 432 Gaming. IW=10 can exacerbate this problem when doing short downloads 433 such as Web browsing [Get11-1]. The mitigations proposed for the 434 broader problem of buffer bloating are also applicable in this case, 435 such as the use of ECN, AQM schemes and traffic classification (QoS). 437 8. Mitigation of Negative Impact 439 Much of the negative impact from an increase in the initial window is 440 likely to be felt by users behind slow links with limited buffers. 441 The negative impact can be mitigated by hosts directly connected to a 442 low-speed link advertising a smaller initial receive window than 10 443 segments. This can be achieved either through manual configuration by 444 the users, or through the host stack auto-detecting the low bandwidth 445 links. 447 Additional suggestions to improve the end-to-end performance of slow 448 links can be found in RFC 3150 [RFC3150]. 450 9. Interactions with the Retransmission Timer 452 A large initial window increases the chance of spurious RTO on a low- 453 bandwidth path because the packet transmission time will dominate the 454 round-trip time. To minimize spurious retransmissions, 455 implementations MUST follow RFC 6298 [RFC6298] to restart the 456 retransmission timer with the current value of RTO for each ACK 457 received that acknowledges new data. 459 10. Experimental Results From Large Scale Cluster Tests 461 In this section we summarize our findings from large scale Internet 462 experiments with an initial window of 10 segments, conducted via 463 Google's front-end infrastructure serving a diverse set of 464 applications. We present results from two data centers, each chosen 465 because of the specific characteristics of subnets served: AvgDC has 466 connection bandwidths closer to the worldwide average reported in 467 [AKAM10], with a median connection speed of about 1.7Mbps; SlowDC has 468 a larger proportion of traffic from slow bandwidth subnets with 469 nearly 20% of traffic from connections below 100Kbps, and a third 470 below 256Kbps. 472 Guided by measurements data, we answer two key questions: what is the 473 latency benefit when TCP connections start with a higher initial 474 window, and on the flip side, what is the cost? 476 10.1 The benefits 478 The average web search latency improvement over all responses in 479 AvgDC is 11.7% (68 ms) and 8.7% (72 ms) in SlowDC. We further 480 analyzed the data based on traffic characteristics and subnet 481 properties such as bandwidth (BW), round-trip time (RTT), and 482 bandwidth-delay product (BDP). The average response latency improved 483 across the board for a variety of subnets with the largest benefits 484 of over 20% from high RTT and high BDP networks, wherein most 485 responses can fit within the pipe. Correspondingly, responses from 486 low RTT paths experienced the smallest improvements of about 5%. 488 Contrary to what we expected, responses from low bandwidth subnets 489 experienced the best latency improvements (between 10-20%) in the 490 buckets 0-56Kbps and 56-256Kbps buckets. We speculate low BW networks 491 observe improved latency for two plausible reasons: 1) fewer slow- 492 start rounds: unlike many large BW networks, low BW subnets with 493 dial-up modems have inherently large RTTs; and 2) faster loss 494 recovery: an initial window larger than 3 segments increases the 495 chances of a lost packet to be recovered through Fast Retransmit as 496 opposed to a lengthy RTO. 498 Responses of different sizes benefited to varying degrees; those 499 larger than 3 segments naturally demonstrated larger improvements, 500 because they finished in fewer rounds in slow start as compared to 501 the baseline. In our experiments, response sizes <= 3 segments also 502 demonstrated small latency benefits. 504 To find out how individual subnets performed, we analyzed average 505 latency at a /24 subnet level (an approximation to a user base 506 offered similar set of services by a common ISP). We find even at the 507 subnet granularity, latency improved at all quantiles ranging from 5- 508 11%. 510 10.2 The cost 512 To quantify the cost of raising the initial window, we analyzed the 513 data specifically for subnets with low bandwidth and BDP, 514 retransmission rates for different kinds of applications, as well as 515 latency for applications operating with multiple concurrent TCP 516 connections. From our measurements we found no evidence of a negative 517 latency impacts that correlate to BW or BDP alone, but in fact both 518 kinds of subnets demonstrated latency improvements across averages 519 and quantiles. 521 As expected, the retransmission rate increased modestly when 522 operating with larger initial congestion window. The overall increase 523 in AvgDC is 0.3% (from 1.98% to 2.29%) and in SlowDC is 0.7% (from 524 3.54% to 4.21%). In our investigation, with the exception of one 525 application, the larger window resulted in a retransmission increase 526 of < 0.5% for services in the AvgDC. The exception is the Maps 527 application that operates with multiple concurrent TCP connections, 528 which increased its retransmission rate by 0.9% in AvgDC and 1.85% in 529 SlowDC (from 3.94% to 5.79%). 531 In our experiments, the percentage of traffic experiencing 532 retransmissions did not increase significantly. E.g. 90% of web 533 search and maps experienced zero retransmission in SlowDC 534 (percentages are higher for AvgDC); a break up of retransmissions by 535 percentiles indicate that most increases come from portion of traffic 536 already experiencing retransmissions in the baseline with initial 537 window of 3 segments. 539 Traffic patterns from applications using multiple concurrent TCP 540 connections all operating with a large initial window represent one 541 of the worst case scenarios where latency can be adversely impacted 542 due to bottleneck buffer overflow. Our investigation shows that such 543 a traffic pattern has not been a problem in AvgDC, where all these 544 applications, specifically maps and image thumbnails, demonstrated 545 improved latencies varying from 2-20%. In the case of SlowDC, while 546 these applications continued showing a latency improvement in the 547 mean, their latencies in higher quantiles (96 and above for maps) 548 indicated instances where latency with larger window is worse than 549 the baseline, e.g. the 99% latency for maps has increased by 2.3% 550 (80ms) when compared to the baseline. There is no evidence from our 551 measurements that such a cost on latency is a result of subnet 552 bandwidth alone. Although we have no way of knowing from our data, we 553 conjecture that the amount of buffering at bottleneck links plays a 554 key role in performance of these applications. 556 Further details on our experiments and analysis can be found in 557 [Duk10, DCCM10]. 559 11. Other Studies 561 Besides the large scale Internet experiments described above, a 562 number of other studies have been conducted on the effects of IW10 in 563 various environments. These tests were summarized below, with more 564 discussion in Appendix A. 566 A complete list of tests conducted, with their results and related 567 studies can be found at the [IW10] link. 569 1. [Sch08] described an earlier evaluation of various Fast Startup 570 approaches, including the "Initial-Start" of 10 MSS. 572 2. [DCCM10] presented the result from Google's large scale IW10 573 experiments, with a focus on areas with highly multiplexed links or 574 limited broadband deployment such as Africa and South America. 576 3. [CW10] contained a testbed study on IW10 performance over slow 577 links. It also studied how short flows with a larger initial window 578 might affect the throughput performance of other co-existing, long 579 lived, bulk data transfers. 581 4. [Sch11] compared IW10 against a number of other fast startup 582 schemes, and concluded that IW10 works rather well and is also quite 583 fair. 585 5. [JNDK10] and later [JNDK10-1] studied the effect of IW10 over 586 cellular networks. 588 6. [AERG11] studied the effect of larger ICW sizes, among other 589 things, on end users' page load time from Yahoo!'s Content Delivery 590 Network. 592 12. Usage and Deployment Recommendations 594 Further experiments are required before a larger initial window shall 595 be enabled by default in the Internet. The existing measurement 596 results indicate that this does not cause significant harm to other 597 traffic. However, widespread use in the Internet could reveal issues 598 not known yet, e.g., regarding fairness or impact on latency- 599 sensitive traffic such as VoIP. 601 Therefore, special care is needed when using this experimental TCP 602 extension, in particular on large-scale systems originating a 603 significant amount of Internet traffic. Anyone (stack vendors, 604 network administrators, etc.) turning on a larger initial window 605 SHOULD ensure that the performance is monitored before and after that 606 change. Relevant performance metrics include the percentages of 607 packet losses or segment retransmissions as well as application-level 608 metrics such as data transfer completion times or media quality. Note 609 that it is important also to take into account hosts that do not 610 implement a larger initial window. Furthermore, non-TCP traffic (such 611 as VoIP) should be monitored as well. If users observe any 612 significant deterioration of performance, they SHOULD fall back to an 613 initial window as allowed by RFC 3390 for safety reasons. An 614 increased initial window SHOULD NOT be turned on by default on 615 systems without such monitoring capabilities. 617 The IETF TCPM working group is very much interested in further 618 reports from experiments with this specification and encourages the 619 publication of such measurement data. If no significant harm is 620 reported, a follow-up document may revisit the question on whether a 621 larger initial window can be safely used by default in all Internet 622 hosts. 624 13. Related Proposals 626 Two other proposals [All10, Tou12] have been published to raise TCP's 627 initial window size over a large timescale. Both aim at reducing the 628 uncertain impact of a larger initial window at an Internet wide 629 scale. Moreover, [Tou12] seeks an algorithm to automate the 630 adjustment of IW safely over long haul period. 632 Although a modest, static increase of IW to 10 may address the near- 633 term need for better web performance, much work is needed from the 634 TCP research community to find a long term solution to the TCP flow 635 startup problem. 637 14. Security Considerations 639 This document discusses the initial congestion window permitted for 640 TCP connections. Although changing this value may cause more packet 641 loss, it is highly unlikely to lead to a persistent state of network 642 congestion or even a congestion collapse. Hence it does not raise any 643 known new security issues with TCP. 645 15. Conclusion 647 This document suggests a simple change to TCP that will reduce the 648 application latency over short-lived TCP connections or links with 649 long RTTs (saving several RTTs during the initial slow-start phase) 650 with little or no negative impact over other flows. Extensive tests 651 have been conducted through both testbeds and large data centers with 652 most results showing improved latency with only a small increase in 653 the packet retransmission rate. Based on these results we believe a 654 modest increase of IW to 10 is the best solution for the near-term 655 deployment, while scaling IW over the long run remains a challenge 656 for the TCP research community. 658 16. IANA Considerations 660 None 662 17. Acknowledgments 664 Many people at Google have helped to make the set of large scale 665 tests possible. We would especially like to acknowledge Amit Agarwal, 666 Tom Herbert, Arvind Jain and Tiziana Refice for their major 667 contributions. 669 Normative References 671 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP 672 Selective Acknowledgment Options", RFC 2018, October 1996. 674 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 675 Requirement Levels", BCP 14, RFC 2119, March 1997. 677 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, 678 L., Leach, P. and T. Berners-Lee, "Hypertext Transfer 679 Protocol -- HTTP/1.1", RFC 2616, June 1999. 681 [RFC3390] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's 682 Initial Window", RFC 3390, October 2002. 684 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 685 Control", RFC 5681, September 2009. 687 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J. and P. 688 Hurtig, "Early Retransmit for TCP and SCTP", RFC 5827, May 689 2010. 691 [RFC6298] Paxson, V., Allman, M., Chu, J. and M. Sargent, "Computing 692 TCP's Retransmission Timer", RFC 6298, June 2011. 694 Informative References 696 [AKAM10] "The State of the Internet, 3rd Quarter 2009", Akamai 697 Technologies, Inc., January 2010. 698 URL=http://www.akamai.com/html/about/press/releases/2010/press_011310_1.html 700 [AERG11] Al-Fares, M., Elmeleegy, K., Reed, B. and I. Gashinsky, 701 "Overclocking the Yahoo! CDN for Faster Web Page Loads", 702 Internet Measurement Conference, November 2011. 704 [All00] Allman, M., "A Web Server's View of the Transport Layer", 705 ACM Computer Communication Review, 30(5), October 2000. 707 [All10] Allman, M., "Initial Congestion Window Specification", 708 Internet-draft draft-allman-tcpm-bump-initcwnd-00.txt, work 709 in progress, last updated November 2010. 711 [Bel10] Belshe, M., "A Client-Side Argument For Changing TCP Slow 712 Start", January, 2010. URL 713 http://sites.google.com/a/chromium.org/dev/spdy/ 714 An_Argument_For_Changing_TCP_Slow_Start.pdf 716 [CD10] Chu, J. and N. Dukkipati, "Increasing TCP's Initial 717 Window", Presented to 77th IRTF ICCRG & IETF TCPM working 718 group meetings, March 2010. URL 719 http://www.ietf.org/proceedings/77/slides/tcpm-4.pdf 721 [Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century", 722 Presented to 75th IETF TCPM working group meeting, July 723 2009. URL http://www.ietf.org/proceedings/75/slides/tcpm- 724 1.pdf. 726 [CW10] Chu, J. and Wang, Y., "A Testbed Study on IW10 vs IW3", 727 Presented to 79th IETF TCPM working group meeting, Nov. 728 2010. URL http://www.ietf.org/proceedings/79/slides/tcpm- 729 0.pdf. 731 [DCCM10] Dukkipati, D., Cheng, Y., Chu, J. and M. Mathis, 732 "Increasing TCP initial window", Presented to 78th IRTF 733 ICCRG working group meeting, July 2010. URL 734 http://www.ietf.org/proceedings/78/slides/iccrg-3.pdf 736 [DGHS07] Dischinger, M., Gummadi, K., Haeberlen, A. and S. Saroiu, 737 "Characterizing Residential Broadband Networks", Internet 738 Measurement Conference, October 24-26, 2007. 740 [Duk10] Dukkipati, N., Refice, T., Cheng, Y., Chu, J., Sutin, N., 741 Agarwal, A., Herbert, T. and J. Arvind, "An Argument for 742 Increasing TCP's Initial Congestion Window", ACM SIGCOMM 743 Computer Communications Review, vol. 40 (2010), pp. 27-33. 744 July 2010. 746 [FF99] Floyd, S., and K. Fall, "Promoting the Use of End-to-End 747 Congestion Control in the Internet", IEEE/ACM Transactions 748 on Networking, August 1999. 750 [FJ93] Floyd, S. and V. Jacobson, "Random Early Detection gateways 751 for Congestion Avoidance", IEEE/ACM Transactions on 752 Networking, V.1 N.4, August 1993, p. 397-413. 754 [Get11] Gettys, J., "Bufferbloat: Dark buffers in the Internet", 755 Presented to 80th IETF TSV Area meeting, March 2011. URL 756 http://www.ietf.org/proceedings/80/slides/tsvarea-1.pdf 758 [Get11-1] Gettys, J., "IW10 Considered Harmful", Internet-draft 759 draft-gettys-iw10-considered-harmful-00, work in progress, 760 August 2011. 762 [IOR2009] Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, 763 J. Jahanian, F. and M. Karir, "Atlas Internet Observatory 764 2009 Annual Report", 47th NANOG Conference, October 2009. 766 [IW10] "TCP IW10 links", URL 767 http://code.google.com/speed/protocols/tcpm-IW10.html 769 [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer 770 Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 771 1988. 773 [JNDK10] Jarvinen, I., Nyrhinen. A., Ding, A. and M. Kojo, "A 774 Simulation Study on Increasing TCP's IW", Presented to 78th 775 IRTF ICCRG working group meeting, July 2010. URL 776 http://www.ietf.org/proceedings/78/slides/iccrg-7.pdf 778 [JNDK10-1] Jarvinen, I., Nyrhinen. A., Ding, A. and M. Kojo, "Effect 779 of IW and Initial RTO changes", Presented to 79th IETF TCPM 780 working group meeting, Nov. 2010. URL 781 http://www.ietf.org/proceedings/79/slides/tcpm-1.pdf 783 [LAJW07] Liu, D., Allman, M., Jin, S. and L. Wang, "Congestion 784 Control Without a Startup Phase", Protocols for Fast, Long 785 Distance Networks (PFLDnet) Workshop, February 2007. URL 786 http://www.icir.org/mallman/papers/jumpstart-pfldnet07.pdf 788 [PK98] Padmanabhan V.N. and R. Katz, "TCP Fast Start: A technique 789 for speeding up web transfers", in Proceedings of IEEE 790 Globecom '98 Internet Mini-Conference, 1998. 792 [PRAKS02] Partridge, C., Rockwell, D., Allman, M., Krishnan, R. and 793 J. Sterbenz, "A Swifter Start for TCP", Technical Report 794 No. 8339, BBN Technologies, March 2002. 796 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 797 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 798 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., 799 Wroclawski, J. and L. Zhang, "Recommendations on Queue 800 Management and Congestion Avoidance in the Internet", RFC 801 2309, April 1998. 803 [RFC2414] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's 804 Initial Window", RFC 2414, September 1998. 806 [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's 807 Loss Recovery Using Limited Transmit", RFC 3042, January 808 2001. 810 [RFC3150] Dawkins, S., Montenegro, G., Kojo, M. and V. Magret, "End- 811 to-end Performance Implications of Slow Links", BCP 0048, 812 July 2001. 814 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 815 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 816 April 2004. 818 [RFC4782] Floyd, S., Allman, M., Jain, A. and P. Sarolahti, "Quick- 819 Start for TCP and IP", RFC 4782, January 2007. 821 [RFC6077] Papadimitriou, D., Welzl, M., Scharf, M. and B. Briscoe, 822 "Open Research Issues in Internet Congestion Control", 823 section 3.4, RFC 6077, February 2011. 825 [RJ10] Ramachandran, S. and A. Jain, "Aggregate Statistics of Size 826 Related Metrics of Web Pages metrics", May 2010. URL 827 http://code.google.com/speed/articles/web-metrics.html 829 [Sch08] Scharf, M., "Quick-Start, Jump-Start, and Other Fast 830 Startup Approaches", Internet Research Task Force ICCRG, 831 November 17, 2008. URL 832 http://www.ietf.org/proceedings/73/slides/iccrg-2.pdf 834 [Sch11] Scharf, M., "Performance and Fairness Evaluation of IW10 835 and Other Fast Startup Schemes", Internet Research Task 836 Force ICCRG, March 2011. URL 837 http://www.ietf.org/proceedings/80/slides/iccrg-1.pdf 839 [Sch11-1] Scharf, M., "Comparison of end-to-end and network- 840 supported fast startup congestion control schemes", 841 Computer Networks, Feb. 2011. URL 842 http://dx.doi.org/10.1016/j.comnet.2011.02.002 844 [SPDY] "SPDY: An experimental protocol for a faster web", URL 845 http://dev.chromium.org/spdy 847 [Ste08] Sounders S., "Roundup on Parallel Connections", High 848 Performance Web Sites blog. March 2008. URL 849 http://www.stevesouders.com/blog/2008/03/20/roundup-on- 850 parallel-connections 852 [Tou12] Touch, J., "Automating the Initial Window in TCP", 853 Internet-draft draft-touch-tcpm-automatic-iw-02.txt, work 854 in progress, January 2012. 856 [VH97] Visweswaraiah, V. and J. Heidemann, "Improving Restart of 857 Idle TCP Connections", Technical Report 97-661, University 858 of Southern California, November 1997. 860 Appendix A - List of Concerns and Corresponding Test Results 862 Concerns have been raised since this proposal was first published 863 based on a set of large scale experiments. To better understand the 864 impact of a larger initial window in order to confirm or dismiss 865 these concerns, additional tests have been conducted using either 866 large scale clusters, simulations, or real testbeds. The following 867 attempts to compile the list of concerns and summarize findings from 868 relevant tests. 870 o How complete are various tests in covering many different traffic 871 patterns? 873 The large scale Internet experiments conducted at Google front-end 874 infrastructure covered a large portfolio of services beyond web 875 search. It includes Gmail, Google Maps, Photos, News, Sites, 876 Images,..., etc, covering a wide variety of traffic sizes and 877 patterns. One notable exception is YouTube because we don't think 878 the large initial window will have much material impact, either 879 positive or negative, on bulk data services. 881 [CW10] contains some result from a testbed study on how short flows 882 with a larger initial window might affect the throughput 883 performance of other co-existing, long lived, bulk data transfers. 885 o Larger bursts from the increase in the initial window cause 886 significantly more packet drops 888 All the tests conducted on this subject [Duk10, Sch11, Sch11-1, 889 CW10] so far have shown only modest increase on packet drops. The 890 only exception is from the testbed study [CW10] when under 891 extremely high load and/or simultaneous opens. But under those 892 conditions both IW=3 and IW=10 suffered very high packet loss rates 893 though. 895 o A large initial window may severely impact TCP performance over 896 highly multiplexed links still common in developing regions 898 Our large scale experiments described in section 10 above also 899 covered Africa and South America. Measurement data from those 900 regions [DCCM10] revealed improved latency even for those services 901 that employ multiple simultaneous connections, at the cost of small 902 increase in the retransmission rate. It seems that the round trip 903 savings from a larger initial window more than make up the time 904 spent on recovering more lost packets. 906 Similar phenomenon have also been observed from testbed study 907 [CW10]. 909 o Why 10 segments? 911 Questions have been raised on how the number 10 was picked. We have 912 tried different sizes in our large scale experiments, and found 913 that 10 segments seem to give most of the benefits for the services 914 we tested while not causing significant increase in the 915 retransmission rates. Going forward 10 segments may turn out to be 916 too small when the average of web object sizes continue to grow. 917 But a scheme to right size the initial window automatically over 918 long timescales has yet to be developed. 920 o Need more thorough analysis of the impact on slow links 922 Although [Duk10] showed the large initial window reduced the 923 average latency even for the dialup link class of only 56Kbps in 924 bandwidth, more studied were needed in order to understand the 925 effect of IW10 on slow links at the microscopic level. [CW10] was 926 conducted for this purpose. 928 Testbeds in [CW10] emulated a 300ms RTT, bottleneck link bandwidth 929 as low as 64Kbps, and route queue size as low as 40 packets. A 930 large combination of test parameters were used. Almost all tests 931 showed varying degree of latency improvement from IW=10, with only 932 a modest increase in the packet drop rate until a very high load 933 was injected. The testbed result was consistent with both the large 934 scale data center experiments [CD10, DCCM10] and a separate study 935 using NSC simulations [Sch11, Sch11-1]. 937 o How will the larger initial window affect flows with initial 938 windows 4KB or less? 940 Flows with the larger initial window will likely grab more 941 bandwidth from a bottleneck link when competing against flows with 942 smaller initial window, at least initially. How long will this 943 "unfairness" last? Will there be any "capture effect" where flows 944 with larger initial window possess a disproportional share of 945 bandwidth beyond just a few round trips? 947 If there is any "unfairness" issue from flows with different 948 initial windows, it did not show up in the large scale experiments, 949 as the average latency for the bucket of all responses < 4KB did 950 not seem to be affected by the presence of many other larger 951 responses employing large initial window. As a matter of fact they 952 seemed to benefit from the large initial window too, as shown in 953 Figure 7 of [Duk10]. 955 The same phenomenon seems to exist in the testbed experiments 956 [CW10]. Flows with IW=3 only suffered slightly when competing 957 against flows with IW=10 in light to median loads. Under high load 958 both flows' latency improved when mixed together. Also long-lived, 959 background bulk-data flows seemed to enjoy higher throughput when 960 running against many foreground short flows of IW=10 than against 961 short flows of IW=3. One plausible explanation was IW=10 enabled 962 short flows to complete sooner, leaving more room for the long- 963 lived, background flows. 965 A study using NSC simulator has also concluded that IW=10 works 966 rather well and is quite fair against IW=3 [Sch11, Sch11-1]. 968 o How will a larger initial window perform over cellular networks? 970 Some simulation studies [JNDK10, JNDK10-1] have been conducted to 971 study the effect of a larger initial window on wireless links from 972 2G to 4G networks (EGDE/HSPA/LTE). The overall result seems mixed 973 in both raw performance and the fairness index. 975 Author's Addresses 977 Jerry Chu 978 Google, Inc. 979 1600 Amphitheatre Parkway 980 Mountain View, CA 94043 981 USA 982 EMail: hkchu@google.com 984 Nandita Dukkipati 985 Google, Inc. 986 1600 Amphitheatre Parkway 987 Mountain View, CA 94043 988 USA 989 EMail: nanditad@google.com 991 Yuchung Cheng 992 Google, Inc. 993 1600 Amphitheatre Parkway 994 Mountain View, CA 94043 995 USA 996 EMail: ycheng@google.com 998 Matt Mathis 999 Google, Inc. 1000 1600 Amphitheatre Parkway 1001 Mountain View, CA 94043 1002 USA 1003 EMail: mattmathis@google.com 1005 Acknowledgment 1007 Funding for the RFC Editor function is currently provided by the 1008 Internet Society.