idnits 2.17.1 draft-ietf-tcpm-initcwnd-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 22, 2013) is 4074 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'RFC6077' is defined on line 854, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2414 (Obsoleted by RFC 3390) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft J. Chu 3 draft-ietf-tcpm-initcwnd-08.txt N. Dukkipati 4 Intended status: Experimental Y. Cheng 5 M. Mathis 6 Expiration date: August 2013 Google, Inc. 7 February 22, 2013 9 Increasing TCP's Initial Window 11 Status of this Memo 13 Distribution of this memo is unlimited. 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that other 20 groups may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/1id-abstracts.html 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This Internet-Draft will expire on August, 2013. 35 Copyright Notice 37 Copyright (c) 2013 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Abstract 52 This document proposes an experiment to increase the permitted TCP 53 initial window (IW) from between 2 and 4 segments, as specified in 54 RFC 3390, to 10 segments, with a fallback to the existing 55 recommendation when performance issues are detected. It discusses the 56 motivation behind the increase, the advantages and disadvantages of 57 the higher initial window, and presents results from several large 58 scale experiments showing that the higher initial window improves the 59 overall performance of many web services without resulting in a 60 congestion collapse. The document closes with a discussion of usage 61 and deployment for further experimental purpose recommended by the 62 IETF TCP Maintenance and Minor Extensions (TCPM) working group. 64 Terminology 66 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 67 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 68 document are to be interpreted as described in RFC 2119 [RFC2119]. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 73 2. TCP Modification . . . . . . . . . . . . . . . . . . . . . . . 4 74 3. Implementation Issues . . . . . . . . . . . . . . . . . . . . . 5 75 4. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 6 76 5. Advantages of Larger Initial Windows . . . . . . . . . . . . . 7 77 5.1 Reducing Latency . . . . . . . . . . . . . . . . . . . . . . 7 78 5.2 Keeping up with the growth of web object size . . . . . . . 8 79 5.3 Recovering faster from loss on under-utilized or wireless 80 links . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 81 7. Disadvantages of Larger Initial Windows for the Network . . . . 9 82 8. Mitigation of Negative Impact . . . . . . . . . . . . . . . . 10 83 9. Interactions with the Retransmission Timer . . . . . . . . . 10 84 10. Experimental Results From Large Scale Cluster Tests . . . . . 10 85 10.1 The benefits . . . . . . . . . . . . . . . . . . . . . . 11 86 10.2 The cost . . . . . . . . . . . . . . . . . . . . . . . . 11 87 11. Other Studies . . . . . . . . . . . . . . . . . . . . . . . . 12 88 12. Usage and Deployment Recommendations . . . . . . . . . . . . 13 89 13. Related Proposals . . . . . . . . . . . . . . . . . . . . . . 14 90 14. Security Considerations . . . . . . . . . . . . . . . . . . . 14 91 15. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 15 92 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 93 17. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 94 Normative References . . . . . . . . . . . . . . . . . . . . . . 16 95 Informative References . . . . . . . . . . . . . . . . . . . . . 16 96 Appendix A - List of Concerns and Corresponding Test Results . . 20 97 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 98 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . 23 100 1. Introduction 102 This document proposes to raise the upper bound on TCP's initial 103 window (IW) to 10 segments (maximum 14600B). It is patterned after 104 and borrows heavily from RFC 3390 [RFC3390] and earlier work in this 105 area. Due to lingering concerns about possible side effects to other 106 flows sharing the same network bottleneck, some of the 107 recommendations are conditional on additional monitoring and 108 evaluation. 110 The primary argument in favor of raising IW follows from the evolving 111 scale of the Internet. Ten segments are likely to fit into queue 112 space available at any broadband access link, even when there are a 113 reasonable number of concurrent connections. 115 Lower speed links can be treated with environment specific 116 configurations, such that they can be protected from being 117 overwhelmed by large initial window bursts without imposing a 118 suboptimal initial window on the rest of the Internet. 120 This document reviews the advantages and disadvantages of using a 121 larger initial window, and includes summaries of several large scale 122 experiments showing that an initial window of 10 segments provides 123 benefits across the board for a variety of BW, RTT, and BDP classes. 124 These results show significant benefits for increasing IW for users 125 at much smaller data rates than had been previously anticipated. 126 However, at initial windows larger than 10, the results are mixed. We 127 believe that these mixed results are not intrinsic, but are the 128 consequence of various implementation artifacts, including overly 129 aggressive applications employing many simultaneous connections. 131 We recommend that all TCP implementations have a settable TCP IW 132 parameter as long as there is a reasonable effort to monitor for 133 possible interactions with other Internet applications and services 134 as described in Section 12. Furthermore, Section 10 details why 10 135 segments may be an appropriate value, and while that value may 136 continue to rise in the future, this document does not include any 137 supporting evidence for values of IW larger than 10. 139 In addition, we introduce a minor revision to RFC 3390 and RFC 5681 140 [RFC5681] to eliminate resetting the initial window when the SYN or 141 SYN/ACK is lost. 143 The document closes with a discussion of the consensus from the TCPM 144 working group on the near-term usage and deployment of IW10 in the 145 Internet. 147 A complementary set of slides for this proposal can be found at 148 [CD10]. 150 2. TCP Modification 152 This document proposes an increase in the permitted upper bound for 153 TCP's initial window (IW) to 10 segments depending on the MSS. This 154 increase is optional: a TCP MAY start with an initial window that is 155 smaller than 10 segments. 157 More precisely, the upper bound for the initial window will be 159 min (10*MSS, max (2*MSS, 14600)) (1) 161 This upper bound for the initial window size represents a change from 162 RFC 3390 [RFC3390], which specified that the congestion window be 163 initialized between 2 and 4 segments depending on the MSS. 165 This change applies to the initial window of the connection in the 166 first round trip time (RTT) of data transmission during or following 167 the TCP three-way handshake. Neither the SYN/ACK nor its 168 acknowledgment (ACK) in the three-way handshake should increase the 169 initial window size. 171 Note that all the test results described in this document were based 172 on the regular Ethernet MTU of 1500 bytes. Future study of the effect 173 of a different MTU may be needed to fully validate (1) above. 175 Furthermore, RFC 3390 and RFC 5681 [RFC5681] state that 177 "If the SYN or SYN/ACK is lost, the initial window used by a 178 sender after a correctly transmitted SYN MUST be one segment 179 consisting of MSS bytes." 181 The proposed change to reduce the default RTO to 1 second [RFC6298] 182 increases the chance for spurious SYN or SYN/ACK retransmission, thus 183 unnecessarily penalizing connections with RTT > 1 second if their 184 initial window is reduced to 1 segment. For this reason, it is 185 RECOMMENDED that implementations refrain from resetting the initial 186 window to 1 segment, unless either there have been more than one SYN 187 or SYN/ACK retransmissions, or true loss detection has been made. 189 TCP implementations use slow start in as many as three different 190 ways: (1) to start a new connection (the initial window); (2) to 191 restart transmission after a long idle period (the restart window); 192 and (3) to restart transmission after a retransmit timeout (the loss 193 window). The change specified in this document affects the value of 194 the initial window. Optionally, a TCP MAY set the restart window to 195 the minimum of the value used for the initial window and the current 196 value of cwnd (in other words, using a larger value for the restart 197 window should never increase the size of cwnd). These changes do NOT 198 change the loss window, which must remain 1 segment of MSS bytes (to 199 permit the lowest possible window size in the case of severe 200 congestion). 202 Furthermore, to limit any negative effect that a larger initial 203 window may have on links with limited bandwidth or buffer space, 204 implementations SHOULD fall back to RFC 3390 for the restart window 205 (RW) if any packet loss is detected during either the initial window, 206 or a restart window, and more than 4KB of data is sent. 207 Implementations must also follow RFC6298 [RFC6298] in order to avoid 208 spurious RTO as described in section 9 later. 210 3. Implementation Issues 212 HTTP 1.1 specification allows only two simultaneous connections per 213 domain, while web browsers open more simultaneous TCP connections 214 [Ste08], partly to circumvent the small initial window in order to 215 speed up the loading of web pages as described above. 217 When web browsers open simultaneous TCP connections to the same 218 destination, they are working against TCP's congestion control 219 mechanisms [FF99]. Combining this behavior with larger initial 220 windows further increases the burstiness and unfairness to other 221 traffic in the network. If a larger initial window causes harm to any 222 other flows then local application tuning will reveal that fewer 223 concurrent connections yields better performance for some users. Any 224 content provider deploying IW10 in conjunction with content 225 distributed across multiple domains is explicitly encouraged to 226 perform measurement experiments to detect such problems, and to 227 consider reducing the number of concurrent connections used to 228 retrieve their content. 230 Some implementations advertise small initial receive window (Table 2 231 in [Duk10]), effectively limiting how much window a remote host may 232 use. In order to realize the full benefit of the large initial 233 window, implementations are encouraged to advertise an initial 234 receive window of at least 10 segments, except for the circumstances 235 where a larger initial window is deemed harmful. (See the Mitigation 236 section below.) 238 TCP SACK option ([RFC2018]) was thought to be required in order for 239 the larger initial window to perform well. But measurements from both 240 a testbed and live tests showed that IW=10 without the SACK option 241 outperforms IW=3 with the SACK option [CW10]. 243 4. Background 245 TCP congestion window was introduced as part of the congestion 246 control algorithm by Van Jacobson in 1988 [Jac88]. The initial value 247 of one segment was used as the starting point for newly established 248 connections to probe the available bandwidth on the network. 250 Today's Internet is dominated by web traffic running on top of short- 251 lived TCP connections [IOR2009]. The relatively small initial window 252 has become a limiting factor for the performance of many web 253 applications. 255 The global Internet has continued to grow, both in speed and 256 penetration. According to the latest report from Akamai [AKAM10], the 257 global broadband (> 2Mbps) adoption has surpassed 50%, propelling the 258 average connection speed to reach 1.7Mbps, while the narrowband (< 259 256Kbps) usage has dropped to 5%. In contrast, TCP's initial window 260 has remained 4KB for a decade [RFC2414], corresponding to a bandwidth 261 utilization of less than 200Kbps per connection, assuming an RTT of 262 200ms. 264 A large proportion of flows on the Internet are short web 265 transactions over TCP, and complete before exiting TCP slow start. 266 Speeding up the TCP flow startup phase, including circumventing the 267 initial window limit, has been an area of active research [RFC6077, 268 Sch08]. Numerous proposals exist [LAJW07, RFC4782, PRAKS02, PK98]. 269 Some require router support [RFC4782, PK98], hence are not practical 270 for the public Internet. Others suggested bold, but often radical 271 ideas, likely requiring more years of research before standardization 272 and deployment. 274 In the mean time, applications have responded to TCP's "slow" start. 275 Web sites use multiple sub-domains [Bel10] to circumvent HTTP 1.1 276 regulation on two connections per physical host [RFC2616]. As of 277 today, major web browsers open multiple connections to the same site 278 (up to six connections per domain [Ste08] and the number is growing). 279 This trend is to remedy HTTP serialized download to achieve 280 parallelism and higher performance. But it also implies today most 281 access links are severely under-utilized, hence having multiple TCP 282 connections improves performance most of the time. While raising the 283 initial congestion window may cause congestion for certain users 284 using these browsers, we argue that the browsers and other 285 application need to respect HTTP 1.1 regulation and stop increasing 286 number of simultaneous TCP connections. We believe a modest increase 287 of the initial window will help to stop this trend, and provide the 288 best interim solution to improve overall user performance, and reduce 289 the server, client, and network load. 291 Note that persistent connections and pipelining are designed to 292 address some of the above issues with HTTP [RFC2616]. Their presence 293 does not diminish the need for a larger initial window. E.g., data 294 from the Chrome browser show that 35% of HTTP requests are made on 295 new TCP connections. Our test data also shows significant latency 296 reduction with the large initial window even in conjunction with 297 these two HTTP features ([Duk10]). 299 Also note that packet pacing has been suggested as a possible 300 mechanism to avoid large bursts and their associated harm [VH97]. 301 Pacing is not required in this proposal due to a strong preference 302 for a simple solution. We suspect for packet bursts of a moderate 303 size, packet pacing will not be necessary. This seems to be confirmed 304 by our test results. 306 More discussion of the increase in initial window, including the 307 choice of 10 segments can be found in [Duk10, CD10]. 309 5. Advantages of Larger Initial Windows 311 5.1 Reducing Latency 313 An increase of the initial window from 3 segments to 10 segments 314 reduces the total transfer time for data sets greater than 4KB by up 315 to 4 round trips. 317 The table below compares the number of round trips between IW=3 and 318 IW=10 for different transfer sizes, assuming infinite bandwidth, no 319 packet loss, and the standard delayed acks with large delayed-ACK 320 timer. 322 --------------------------------------- 323 | total segments | IW=3 | IW=10 | 324 --------------------------------------- 325 | 3 | 1 | 1 | 326 | 6 | 2 | 1 | 327 | 10 | 3 | 1 | 328 | 12 | 3 | 2 | 329 | 21 | 4 | 2 | 330 | 25 | 5 | 2 | 331 | 33 | 5 | 3 | 332 | 46 | 6 | 3 | 333 | 51 | 6 | 4 | 334 | 78 | 7 | 4 | 335 | 79 | 8 | 4 | 336 | 120 | 8 | 5 | 337 | 127 | 9 | 5 | 338 --------------------------------------- 340 For example, with the larger initial window, a transfer of 32 341 segments of data will require only two rather than five round trips 342 to complete. 344 5.2 Keeping up with the growth of web object size 346 RFC 3390 stated that the main motivation for increasing the initial 347 window to 4KB was to speed up connections that only transmit a small 348 amount of data, e.g., email and web. The majority of transfers back 349 then were less than 4KB, and could be completed in a single RTT 350 [All00]. 352 Since RFC 3390 was published, web objects have gotten significantly 353 larger [Chu09, RJ10]. Today only a small percentage of web objects 354 (e.g., 10% of Google's search responses) can fit in the 4KB initial 355 window. The average HTTP response size of gmail.com, a highly 356 scripted web-site, is 8KB (Figure 1. in [Duk10]). The average web 357 page, including all static and dynamic scripted web objects on the 358 page, has seen even greater growth in size [RJ10]. HTTP pipelining 359 [RFC2616] and new web transport protocols such as SPDY [SPDY] allow 360 multiple web objects to be sent in a single transaction, potentially 361 benefiting from an even larger initial window in order to transfer an 362 entire web page in a small number of round trips. 364 5.3 Recovering faster from loss on under-utilized or wireless links 366 A greater-than-3-segment initial window increases the chance to 367 recover packet loss through Fast Retransmit rather than the lengthy 368 initial RTO [RFC5681]. This is because the fast retransmit algorithm 369 requires three duplicate ACKs as an indication that a segment has 370 been lost rather than reordered. While newer loss recovery techniques 371 such as Limited Transmit [RFC3042] and Early Retransmit [RFC5827] 372 have been proposed to help speeding up loss recovery from a smaller 373 window, both algorithms can still benefit from the larger initial 374 window because of a better chance to receive more ACKs to react upon. 376 6. Disadvantages of Larger Initial Windows for the Individual Connection 378 The larger bursts from an increase in the initial window may cause 379 buffer overrun and packet drop in routers with small buffers, or 380 routers experiencing congestion. This could result in unnecessary 381 retransmit timeouts. For a large-window connection that is able to 382 recover without a retransmit timeout, this could result in an 383 unnecessarily-early transition from the slow-start to the congestion- 384 avoidance phase of the window increase algorithm. 386 Premature segment drops are unlikely to occur in uncongested networks 387 with sufficient buffering, or in moderately-congested networks where 388 the congested router uses active queue management (such as Random 389 Early Detection [FJ93, RFC2309, RFC3150]). 391 Insufficient buffering is more likely to exist in the access routers 392 connecting slower links. A recent study of access router buffer size 393 [DGHS07] reveals the majority of access routers provision enough 394 buffer for 130ms or longer, sufficient to cover a burst of more than 395 10 packets at 1Mbps speed, but possibly not sufficient for browsers 396 opening simultaneous connections. 398 A testbed study [CW10] on the effect of the larger initial window 399 with five simultaneously opened connections revealed that, even with 400 limited buffer size on slow links, IW=10 still reduced the total 401 latency of web transactions, although at the cost of higher packet 402 drop rates as compared to IW=3. 404 Some TCP connections will receive better performance with the larger 405 initial window even if the burstiness of the initial window results 406 in premature segment drops. This will be true if (1) the TCP 407 connection recovers from the segment drop without a retransmit 408 timeout, and (2) the TCP connection is ultimately limited to a small 409 congestion window by either network congestion or by the receiver's 410 advertised window. 412 7. Disadvantages of Larger Initial Windows for the Network 414 An increase in the initial window may increase congestion in a 415 network. However, since the increase is one-time only (at the 416 beginning of a connection), and the rest of TCP's congestion backoff 417 mechanism remains in place, it's unlikely the increase by itself will 418 render a network in a persistent state of congestion, or even 419 congestion collapse. This seems to have been confirmed by the large 420 scale web experiments described later. 422 It should be noted that the above may not hold if applications open a 423 large number of simultaneous connections. 425 Until this proposal is widely deployed, a fairness issue may exist 426 between flows adopting a larger initial window vs flows that are 427 RFC3390-compliant. Although no severe unfairness has been detected on 428 all the known tests so far, further study on this topic may be 429 warranted. 431 Some of the discussions from RFC 3390 are still valid for IW=10. 433 Moreover, it is worth noting that although TCP NewReno increases the 434 chance of duplicate segments when trying to recover multiple packet 435 losses from a large window, the wide support of TCP Selective 436 Acknowledgment (SACK) option [RFC2018] in all major OSes today should 437 keep the volume of duplicate segments in check. 439 Recent measurements [Get11] provide evidence of extremely large 440 queues (in the order of one second or more) at access networks of the 441 Internet. While a significant part of the buffer bloat is contributed 442 by large downloads/uploads such as video files, emails with large 443 attachments, backups and download of movies to disk, some of the 444 problem is also caused by Web browsing of image heavy sites [Get11]. 445 This queuing delay is generally considered harmful for responsiveness 446 of latency sensitive traffic such as DNS queries, ARP, DHCP, VoIP and 447 Gaming. IW=10 can exacerbate this problem when doing short downloads 448 such as Web browsing [Get11-1]. The mitigations proposed for the 449 broader problem of buffer bloating are also applicable in this case, 450 such as the use of ECN, AQM schemes [CoDel] and traffic 451 classification (QoS). 453 8. Mitigation of Negative Impact 455 Much of the negative impact from an increase in the initial window is 456 likely to be felt by users behind slow links with limited buffers. 457 The negative impact can be mitigated by hosts directly connected to a 458 low-speed link advertising a smaller initial receive window than 10 459 segments. This can be achieved either through manual configuration by 460 the users, or through the host stack auto-detecting the low bandwidth 461 links. 463 Additional suggestions to improve the end-to-end performance of slow 464 links can be found in RFC 3150 [RFC3150]. 466 9. Interactions with the Retransmission Timer 468 A large initial window increases the chance of spurious RTO on a low- 469 bandwidth path because the packet transmission time will dominate the 470 round-trip time. To minimize spurious retransmissions, 471 implementations MUST follow RFC 6298 [RFC6298] to restart the 472 retransmission timer with the current value of RTO for each ACK 473 received that acknowledges new data. 475 For a more detailed discussion see RFC3390, section 6. 477 10. Experimental Results From Large Scale Cluster Tests 479 In this section we summarize our findings from large scale Internet 480 experiments with an initial window of 10 segments, conducted via 481 Google's front-end infrastructure serving a diverse set of 482 applications. We present results from two data centers, each chosen 483 because of the specific characteristics of subnets served: AvgDC has 484 connection bandwidths closer to the worldwide average reported in 485 [AKAM10], with a median connection speed of about 1.7Mbps; SlowDC has 486 a larger proportion of traffic from slow bandwidth subnets with 487 nearly 20% of traffic from connections below 100Kbps, and a third 488 below 256Kbps. 490 Guided by measurements data, we answer two key questions: what is the 491 latency benefit when TCP connections start with a higher initial 492 window, and on the flip side, what is the cost? 494 10.1 The benefits 496 The average web search latency improvement over all responses in 497 AvgDC is 11.7% (68 ms) and 8.7% (72 ms) in SlowDC. We further 498 analyzed the data based on traffic characteristics and subnet 499 properties such as bandwidth (BW), round-trip time (RTT), and 500 bandwidth-delay product (BDP). The average response latency improved 501 across the board for a variety of subnets with the largest benefits 502 of over 20% from high RTT and high BDP networks, wherein most 503 responses can fit within the pipe. Correspondingly, responses from 504 low RTT paths experienced the smallest improvements of about 5%. 506 Contrary to what we expected, responses from low bandwidth subnets 507 experienced the best latency improvements (between 10-20%) in the 508 buckets 0-56Kbps and 56-256Kbps buckets. We speculate low BW networks 509 observe improved latency for two plausible reasons: 1) fewer slow- 510 start rounds: unlike many large BW networks, low BW subnets with 511 dial-up modems have inherently large RTTs; and 2) faster loss 512 recovery: an initial window larger than 3 segments increases the 513 chances of a lost packet to be recovered through Fast Retransmit as 514 opposed to a lengthy RTO. 516 Responses of different sizes benefited to varying degrees; those 517 larger than 3 segments naturally demonstrated larger improvements, 518 because they finished in fewer rounds in slow start as compared to 519 the baseline. In our experiments, response sizes <= 3 segments also 520 demonstrated small latency benefits. 522 To find out how individual subnets performed, we analyzed average 523 latency at a /24 subnet level (an approximation to a user base 524 offered similar set of services by a common ISP). We find even at the 525 subnet granularity, latency improved at all quantiles ranging from 5- 526 11%. 528 10.2 The cost 529 To quantify the cost of raising the initial window, we analyzed the 530 data specifically for subnets with low bandwidth and BDP, 531 retransmission rates for different kinds of applications, as well as 532 latency for applications operating with multiple concurrent TCP 533 connections. From our measurements we found no evidence of a negative 534 latency impacts that correlate to BW or BDP alone, but in fact both 535 kinds of subnets demonstrated latency improvements across averages 536 and quantiles. 538 As expected, the retransmission rate increased modestly when 539 operating with larger initial congestion window. The overall increase 540 in AvgDC is 0.3% (from 1.98% to 2.29%) and in SlowDC is 0.7% (from 541 3.54% to 4.21%). In our investigation, with the exception of one 542 application, the larger window resulted in a retransmission increase 543 of < 0.5% for services in the AvgDC. The exception is the Maps 544 application that operates with multiple concurrent TCP connections, 545 which increased its retransmission rate by 0.9% in AvgDC and 1.85% in 546 SlowDC (from 3.94% to 5.79%). 548 In our experiments, the percentage of traffic experiencing 549 retransmissions did not increase significantly. E.g. 90% of web 550 search and maps experienced zero retransmission in SlowDC 551 (percentages are higher for AvgDC); a break up of retransmissions by 552 percentiles indicate that most increases come from portion of traffic 553 already experiencing retransmissions in the baseline with initial 554 window of 3 segments. 556 Traffic patterns from applications using multiple concurrent TCP 557 connections all operating with a large initial window represent one 558 of the worst case scenarios where latency can be adversely impacted 559 due to bottleneck buffer overflow. Our investigation shows that such 560 a traffic pattern has not been a problem in AvgDC, where all these 561 applications, specifically maps and image thumbnails, demonstrated 562 improved latencies varying from 2-20%. In the case of SlowDC, while 563 these applications continued showing a latency improvement in the 564 mean, their latencies in higher quantiles (96 and above for maps) 565 indicated instances where latency with larger window is worse than 566 the baseline, e.g. the 99% latency for maps has increased by 2.3% 567 (80ms) when compared to the baseline. There is no evidence from our 568 measurements that such a cost on latency is a result of subnet 569 bandwidth alone. Although we have no way of knowing from our data, we 570 conjecture that the amount of buffering at bottleneck links plays a 571 key role in performance of these applications. 573 Further details on our experiments and analysis can be found in 574 [Duk10, DCCM10]. 576 11. Other Studies 577 Besides the large scale Internet experiments described above, a 578 number of other studies have been conducted on the effects of IW10 in 579 various environments. These tests were summarized below, with more 580 discussion in Appendix A. 582 A complete list of tests conducted, with their results and related 583 studies can be found at the [IW10] link. 585 1. [Sch08] described an earlier evaluation of various Fast Startup 586 approaches, including the "Initial-Start" of 10 MSS. 588 2. [DCCM10] presented the result from Google's large scale IW10 589 experiments, with a focus on areas with highly multiplexed links or 590 limited broadband deployment such as Africa and South America. 592 3. [CW10] contained a testbed study on IW10 performance over slow 593 links. It also studied how short flows with a larger initial window 594 might affect the throughput performance of other co-existing, long 595 lived, bulk data transfers. 597 4. [Sch11] compared IW10 against a number of other fast startup 598 schemes, and concluded that IW10 works rather well and is also quite 599 fair. 601 5. [JNDK10] and later [JNDK10-1] studied the effect of IW10 over 602 cellular networks. 604 6. [AERG11] studied the effect of larger ICW sizes, among other 605 things, on end users' page load time from Yahoo!'s Content Delivery 606 Network. 608 12. Usage and Deployment Recommendations 610 Further experiments are required before a larger initial window shall 611 be enabled by default in the Internet. The existing measurement 612 results indicate that this does not cause significant harm to other 613 traffic. However, widespread use in the Internet could reveal issues 614 not known yet, e.g., regarding fairness or impact on latency- 615 sensitive traffic such as VoIP. 617 Therefore, special care is needed when using this experimental TCP 618 extension, in particular on large-scale systems originating a 619 significant amount of Internet traffic, or on large numbers of 620 individual consumer-level systems that have similar aggregate impact. 621 Anyone (stack vendors, network administrators, etc.) turning on a 622 larger initial window SHOULD ensure that the performance is monitored 623 before and after that change. A key metric to monitor is the rate of 624 packet losses, ECN marking, or segment retransmissions during the 625 initial burst. The sender SHOULD cache such information about 626 connection setups using an initial window larger than allowed by RFC 627 3390, and new connections SHOULD fall back to the initial window 628 allowed by RFC 3390 if there is evidence of performance issues. 629 Further experiments are needed on the design of such a cache and 630 corresponding heuristics. 632 Other relevant metrics that may indicate a need to reduce the IW 633 include an increased overall percentage of packet loss or segment 634 retransmissions as well as application-level metrics such as reduced 635 data transfer completion times or impaired media quality. 637 It is important also to take into account hosts that do not implement 638 a larger initial window. Furthermore, any deployment of IW10 should 639 be aware that there are potential side effects to real-time traffic 640 (such as VoIP). If users observe any significant deterioration of 641 performance, they SHOULD fall back to an initial window as allowed by 642 RFC 3390 for safety reasons. An increased initial window MUST NOT be 643 turned on by default on systems without such monitoring capabilities. 645 The IETF TCPM working group is very much interested in further 646 reports from experiments with this specification and encourages the 647 publication of such measurement data. By now, there are no adequate 648 studies available that either prove or or do not prove impact of IW10 649 to real-time traffic. Further experimentation in this directions in 650 encouraged. 652 If no significant harm is reported, a follow-up document may revisit 653 the question on whether a larger initial window can be safely used by 654 default in all Internet hosts. Resolution of these experiments and 655 tighter specifications of the suggestions here might be grounds for a 656 future standards track document on the same topic. 658 13. Related Proposals 660 Two other proposals [All10, Tou12] have been published to raise TCP's 661 initial window size over a large timescale. Both aim at reducing the 662 uncertain impact of a larger initial window at an Internet wide 663 scale. Moreover, [Tou12] seeks an algorithm to automate the 664 adjustment of IW safely over long haul period. 666 Although a modest, static increase of IW to 10 may address the near- 667 term need for better web performance, much work is needed from the 668 TCP research community to find a long term solution to the TCP flow 669 startup problem. 671 14. Security Considerations 672 This document discusses the initial congestion window permitted for 673 TCP connections. Although changing this value may cause more packet 674 loss, it is highly unlikely to lead to a persistent state of network 675 congestion or even a congestion collapse. Hence it does not raise any 676 known new security issues with TCP. 678 15. Conclusion 680 This document suggests a simple change to TCP that will reduce the 681 application latency over short-lived TCP connections or links with 682 long RTTs (saving several RTTs during the initial slow-start phase) 683 with little or no negative impact over other flows. Extensive tests 684 have been conducted through both testbeds and large data centers with 685 most results showing improved latency with only a small increase in 686 the packet retransmission rate. Based on these results we believe a 687 modest increase of IW to 10 is the best solution for the near-term 688 deployment, while scaling IW over the long run remains a challenge 689 for the TCP research community. 691 16. IANA Considerations 693 None 695 17. Acknowledgments 697 Many people at Google have helped to make the set of large scale 698 tests possible. We would especially like to acknowledge Amit Agarwal, 699 Tom Herbert, Arvind Jain and Tiziana Refice for their major 700 contributions. 702 Normative References 704 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP 705 Selective Acknowledgment Options", RFC 2018, October 1996. 707 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 708 Requirement Levels", BCP 14, RFC 2119, March 1997. 710 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, 711 L., Leach, P. and T. Berners-Lee, "Hypertext Transfer 712 Protocol -- HTTP/1.1", RFC 2616, June 1999. 714 [RFC3390] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's 715 Initial Window", RFC 3390, October 2002. 717 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 718 Control", RFC 5681, September 2009. 720 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J. and P. 721 Hurtig, "Early Retransmit for TCP and SCTP", RFC 5827, May 722 2010. 724 [RFC6298] Paxson, V., Allman, M., Chu, J. and M. Sargent, "Computing 725 TCP's Retransmission Timer", RFC 6298, June 2011. 727 Informative References 729 [AKAM10] "The State of the Internet, 3rd Quarter 2009", Akamai 730 Technologies, Inc., January 2010. 731 URL=http://www.akamai.com/html/about/press/releases/2010/ 732 press_011310_1.html 734 [AERG11] Al-Fares, M., Elmeleegy, K., Reed, B. and I. Gashinsky, 735 "Overclocking the Yahoo! CDN for Faster Web Page Loads", 736 Internet Measurement Conference, November 2011. 738 [All00] Allman, M., "A Web Server's View of the Transport Layer", 739 ACM Computer Communication Review, 30(5), October 2000. 741 [All10] Allman, M., "Initial Congestion Window Specification", 742 Internet-draft draft-allman-tcpm-bump-initcwnd-00.txt, work 743 in progress, last updated November 2010. 745 [Bel10] Belshe, M., "A Client-Side Argument For Changing TCP Slow 746 Start", January, 2010. URL 747 http://sites.google.com/a/chromium.org/dev/spdy/ 748 An_Argument_For_Changing_TCP_Slow_Start.pdf 750 [CD10] Chu, J. and N. Dukkipati, "Increasing TCP's Initial 751 Window", Presented to 77th IRTF ICCRG & IETF TCPM working 752 group meetings, March 2010. URL 753 http://www.ietf.org/proceedings/77/slides/tcpm-4.pdf 755 [Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century", 756 Presented to 75th IETF TCPM working group meeting, July 757 2009. URL http://www.ietf.org/proceedings/75/slides/tcpm- 758 1.pdf. 760 [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", ACM 761 QUEUE, May 6, 2012. 763 [CW10] Chu, J. and Wang, Y., "A Testbed Study on IW10 vs IW3", 764 Presented to 79th IETF TCPM working group meeting, Nov. 765 2010. URL http://www.ietf.org/proceedings/79/slides/tcpm- 766 0.pdf. 768 [DCCM10] Dukkipati, D., Cheng, Y., Chu, J. and M. Mathis, 769 "Increasing TCP initial window", Presented to 78th IRTF 770 ICCRG working group meeting, July 2010. URL 771 http://www.ietf.org/proceedings/78/slides/iccrg-3.pdf 773 [DGHS07] Dischinger, M., Gummadi, K., Haeberlen, A. and S. Saroiu, 774 "Characterizing Residential Broadband Networks", Internet 775 Measurement Conference, October 24-26, 2007. 777 [Duk10] Dukkipati, N., Refice, T., Cheng, Y., Chu, J., Sutin, N., 778 Agarwal, A., Herbert, T. and J. Arvind, "An Argument for 779 Increasing TCP's Initial Congestion Window", ACM SIGCOMM 780 Computer Communications Review, vol. 40 (2010), pp. 27-33. 781 July 2010. 783 [FF99] Floyd, S., and K. Fall, "Promoting the Use of End-to-End 784 Congestion Control in the Internet", IEEE/ACM Transactions 785 on Networking, August 1999. 787 [FJ93] Floyd, S. and V. Jacobson, "Random Early Detection gateways 788 for Congestion Avoidance", IEEE/ACM Transactions on 789 Networking, V.1 N.4, August 1993, p. 397-413. 791 [Get11] Gettys, J., "Bufferbloat: Dark buffers in the Internet", 792 Presented to 80th IETF TSV Area meeting, March 2011. URL 793 http://www.ietf.org/proceedings/80/slides/tsvarea-1.pdf 795 [Get11-1] Gettys, J., "IW10 Considered Harmful", Internet-draft 796 draft-gettys-iw10-considered-harmful-00, work in progress, 797 August 2011. 799 [IOR2009] Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, 800 J. Jahanian, F. and M. Karir, "Atlas Internet Observatory 801 2009 Annual Report", 47th NANOG Conference, October 2009. 803 [IW10] "TCP IW10 links", URL 804 http://code.google.com/speed/protocols/tcpm-IW10.html 806 [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer 807 Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 808 1988. 810 [JNDK10] Jarvinen, I., Nyrhinen. A., Ding, A. and M. Kojo, "A 811 Simulation Study on Increasing TCP's IW", Presented to 78th 812 IRTF ICCRG working group meeting, July 2010. URL 813 http://www.ietf.org/proceedings/78/slides/iccrg-7.pdf 815 [JNDK10-1] Jarvinen, I., Nyrhinen. A., Ding, A. and M. Kojo, "Effect 816 of IW and Initial RTO changes", Presented to 79th IETF TCPM 817 working group meeting, Nov. 2010. URL 818 http://www.ietf.org/proceedings/79/slides/tcpm-1.pdf 820 [LAJW07] Liu, D., Allman, M., Jin, S. and L. Wang, "Congestion 821 Control Without a Startup Phase", Protocols for Fast, Long 822 Distance Networks (PFLDnet) Workshop, February 2007. URL 823 http://www.icir.org/mallman/papers/jumpstart-pfldnet07.pdf 825 [PK98] Padmanabhan V.N. and R. Katz, "TCP Fast Start: A technique 826 for speeding up web transfers", in Proceedings of IEEE 827 Globecom '98 Internet Mini-Conference, 1998. 829 [PRAKS02] Partridge, C., Rockwell, D., Allman, M., Krishnan, R. and 830 J. Sterbenz, "A Swifter Start for TCP", Technical Report 831 No. 8339, BBN Technologies, March 2002. 833 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 834 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 835 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., 836 Wroclawski, J. and L. Zhang, "Recommendations on Queue 837 Management and Congestion Avoidance in the Internet", RFC 838 2309, April 1998. 840 [RFC2414] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's 841 Initial Window", RFC 2414, September 1998. 843 [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's 844 Loss Recovery Using Limited Transmit", RFC 3042, January 845 2001. 847 [RFC3150] Dawkins, S., Montenegro, G., Kojo, M. and V. Magret, "End- 848 to-end Performance Implications of Slow Links", BCP 0048, 849 July 2001. 851 [RFC4782] Floyd, S., Allman, M., Jain, A. and P. Sarolahti, "Quick- 852 Start for TCP and IP", RFC 4782, January 2007. 854 [RFC6077] Papadimitriou, D., Welzl, M., Scharf, M. and B. Briscoe, 855 "Open Research Issues in Internet Congestion Control", 856 section 3.4, RFC 6077, February 2011. 858 [RJ10] Ramachandran, S. and A. Jain, "Aggregate Statistics of Size 859 Related Metrics of Web Pages metrics", May 2010. URL 860 http://code.google.com/speed/articles/web-metrics.html 862 [Sch08] Scharf, M., "Quick-Start, Jump-Start, and Other Fast 863 Startup Approaches", Internet Research Task Force ICCRG, 864 November 17, 2008. URL 865 http://www.ietf.org/proceedings/73/slides/iccrg-2.pdf 867 [Sch11] Scharf, M., "Performance and Fairness Evaluation of IW10 868 and Other Fast Startup Schemes", Internet Research Task 869 Force ICCRG, March 2011. URL 870 http://www.ietf.org/proceedings/80/slides/iccrg-1.pdf 872 [Sch11-1] Scharf, M., "Comparison of end-to-end and network- 873 supported fast startup congestion control schemes", 874 Computer Networks, Feb. 2011. URL 875 http://dx.doi.org/10.1016/j.comnet.2011.02.002 877 [SPDY] "SPDY: An experimental protocol for a faster web", URL 878 http://dev.chromium.org/spdy 880 [Ste08] Sounders S., "Roundup on Parallel Connections", High 881 Performance Web Sites blog. March 2008. URL 882 http://www.stevesouders.com/blog/2008/03/20/roundup-on- 883 parallel-connections 885 [Tou12] Touch, J., "Automating the Initial Window in TCP", 886 Internet-draft draft-touch-tcpm-automatic-iw-03.txt, work 887 in progress, July 16, 2012. 889 [VH97] Visweswaraiah, V. and J. Heidemann, "Improving Restart of 890 Idle TCP Connections", Technical Report 97-661, University 891 of Southern California, November 1997. 893 Appendix A - List of Concerns and Corresponding Test Results 895 Concerns have been raised since this proposal was first published 896 based on a set of large scale experiments. To better understand the 897 impact of a larger initial window in order to confirm or dismiss 898 these concerns, additional tests have been conducted using either 899 large scale clusters, simulations, or real testbeds. The following 900 attempts to compile the list of concerns and summarize findings from 901 relevant tests. 903 o How complete are various tests in covering many different traffic 904 patterns? 906 The large scale Internet experiments conducted at Google front-end 907 infrastructure covered a large portfolio of services beyond web 908 search. It includes Gmail, Google Maps, Photos, News, Sites, 909 Images,..., etc, covering a wide variety of traffic sizes and 910 patterns. One notable exception is YouTube because we don't think 911 the large initial window will have much material impact, either 912 positive or negative, on bulk data services. 914 [CW10] contains some result from a testbed study on how short flows 915 with a larger initial window might affect the throughput 916 performance of other co-existing, long lived, bulk data transfers. 918 o Larger bursts from the increase in the initial window cause 919 significantly more packet drops 921 All the tests conducted on this subject [Duk10, Sch11, Sch11-1, 922 CW10] so far have shown only modest increase on packet drops. The 923 only exception is from the testbed study [CW10] when under 924 extremely high load and/or simultaneous opens. But under those 925 conditions both IW=3 and IW=10 suffered very high packet loss rates 926 though. 928 o A large initial window may severely impact TCP performance over 929 highly multiplexed links still common in developing regions 931 Our large scale experiments described in section 10 above also 932 covered Africa and South America. Measurement data from those 933 regions [DCCM10] revealed improved latency even for those services 934 that employ multiple simultaneous connections, at the cost of small 935 increase in the retransmission rate. It seems that the round trip 936 savings from a larger initial window more than make up the time 937 spent on recovering more lost packets. 939 Similar phenomenon have also been observed from testbed study 940 [CW10]. 942 o Why 10 segments? 944 Questions have been raised on how the number 10 was picked. We have 945 tried different sizes in our large scale experiments, and found 946 that 10 segments seem to give most of the benefits for the services 947 we tested while not causing significant increase in the 948 retransmission rates. Going forward 10 segments may turn out to be 949 too small when the average of web object sizes continue to grow. 950 But a scheme to right size the initial window automatically over 951 long timescales has yet to be developed. 953 o Need more thorough analysis of the impact on slow links 955 Although [Duk10] showed the large initial window reduced the 956 average latency even for the dialup link class of only 56Kbps in 957 bandwidth, more studied were needed in order to understand the 958 effect of IW10 on slow links at the microscopic level. [CW10] was 959 conducted for this purpose. 961 Testbeds in [CW10] emulated a 300ms RTT, bottleneck link bandwidth 962 as low as 64Kbps, and route queue size as low as 40 packets. A 963 large combination of test parameters were used. Almost all tests 964 showed varying degree of latency improvement from IW=10, with only 965 a modest increase in the packet drop rate until a very high load 966 was injected. The testbed result was consistent with both the large 967 scale data center experiments [CD10, DCCM10] and a separate study 968 using NSC simulations [Sch11, Sch11-1]. 970 o How will the larger initial window affect flows with initial 971 windows 4KB or less? 973 Flows with the larger initial window will likely grab more 974 bandwidth from a bottleneck link when competing against flows with 975 smaller initial window, at least initially. How long will this 976 "unfairness" last? Will there be any "capture effect" where flows 977 with larger initial window possess a disproportional share of 978 bandwidth beyond just a few round trips? 980 If there is any "unfairness" issue from flows with different 981 initial windows, it did not show up in the large scale experiments, 982 as the average latency for the bucket of all responses < 4KB did 983 not seem to be affected by the presence of many other larger 984 responses employing large initial window. As a matter of fact they 985 seemed to benefit from the large initial window too, as shown in 986 Figure 7 of [Duk10]. 988 The same phenomenon seems to exist in the testbed experiments 989 [CW10]. Flows with IW=3 only suffered slightly when competing 990 against flows with IW=10 in light to median loads. Under high load 991 both flows' latency improved when mixed together. Also long-lived, 992 background bulk-data flows seemed to enjoy higher throughput when 993 running against many foreground short flows of IW=10 than against 994 short flows of IW=3. One plausible explanation was IW=10 enabled 995 short flows to complete sooner, leaving more room for the long- 996 lived, background flows. 998 A study using NSC simulator has also concluded that IW=10 works 999 rather well and is quite fair against IW=3 [Sch11, Sch11-1]. 1001 o How will a larger initial window perform over cellular networks? 1003 Some simulation studies [JNDK10, JNDK10-1] have been conducted to 1004 study the effect of a larger initial window on wireless links from 1005 2G to 4G networks (EGDE/HSPA/LTE). The overall result seems mixed 1006 in both raw performance and the fairness index. 1008 Author's Addresses 1010 Jerry Chu 1011 Google, Inc. 1012 1600 Amphitheatre Parkway 1013 Mountain View, CA 94043 1014 USA 1015 EMail: hkchu@google.com 1017 Nandita Dukkipati 1018 Google, Inc. 1019 1600 Amphitheatre Parkway 1020 Mountain View, CA 94043 1021 USA 1022 EMail: nanditad@google.com 1024 Yuchung Cheng 1025 Google, Inc. 1026 1600 Amphitheatre Parkway 1027 Mountain View, CA 94043 1028 USA 1029 EMail: ycheng@google.com 1031 Matt Mathis 1032 Google, Inc. 1033 1600 Amphitheatre Parkway 1034 Mountain View, CA 94043 1035 USA 1036 EMail: mattmathis@google.com 1038 Acknowledgment 1040 Funding for the RFC Editor function is currently provided by the 1041 Internet Society.