idnits 2.17.1 draft-ietf-tcpm-initcwnd-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 17 characters in excess of 72. ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document updates RFC3390, but the abstract doesn't seem to directly say this. It does mention RFC3390 though, so this could be OK. -- The draft header indicates that this document updates RFC5681, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 953 has weird spacing: '...gree of laten...' (Using the creation date from RFC3390, updated by this document, for RFC5378 checks: 2001-05-25) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 28, 2012) is 4291 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'RFC6077' is defined on line 843, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2414 (Obsoleted by RFC 3390) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) == Outdated reference: A later version (-03) exists of draft-touch-tcpm-automatic-iw-02 Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft J. Chu 3 draft-ietf-tcpm-initcwnd-04.txt N. Dukkipati 4 Intended status: Experimental Y. Cheng 5 Updates: 3390, 5681 M. Mathis 6 Expiration date: December 2012 Google, Inc. 7 June 28, 2012 9 Increasing TCP's Initial Window 11 Status of this Memo 13 Distribution of this memo is unlimited. 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that other 20 groups may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/1id-abstracts.html 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This Internet-Draft will expire on December, 2012. 35 Copyright Notice 37 Copyright (c) 2012 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Abstract 52 This document proposes an experiment to increase the permitted TCP 53 initial window (IW) from between 2 and 4 segments, as specified in 54 RFC 3390, to 10 segments, with a fallback to the existing 55 recommendation when performance issues are detected. It discusses the 56 motivation behind the increase, the advantages and disadvantages of 57 the higher initial window, and presents results from several large 58 scale experiments showing that the higher initial window improves the 59 overall performance of many web services without resulting in a 60 congestion collapse. The document closes with a discussion of usage 61 and deployment for further experimental purpose recommended by the 62 IETF TCP Maintenance and Minor Extensions (TCPM) working group. 64 Terminology 66 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 67 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 68 document are to be interpreted as described in RFC 2119 [RFC2119]. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 73 2. TCP Modification . . . . . . . . . . . . . . . . . . . . . . . 4 74 3. Implementation Issues . . . . . . . . . . . . . . . . . . . . . 5 75 4. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 5 76 5. Advantages of Larger Initial Windows . . . . . . . . . . . . . 7 77 5.1 Reducing Latency . . . . . . . . . . . . . . . . . . . . . . 7 78 5.2 Keeping up with the growth of web object size . . . . . . . 8 79 5.3 Recovering faster from loss on under-utilized or wireless 80 links . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 81 7. Disadvantages of Larger Initial Windows for the Network . . . . 9 82 8. Mitigation of Negative Impact . . . . . . . . . . . . . . . . 10 83 9. Interactions with the Retransmission Timer . . . . . . . . . 10 84 10. Experimental Results From Large Scale Cluster Tests . . . . . 10 85 10.1 The benefits . . . . . . . . . . . . . . . . . . . . . . 11 86 10.2 The cost . . . . . . . . . . . . . . . . . . . . . . . . 11 87 11. Other Studies . . . . . . . . . . . . . . . . . . . . . . . . 12 88 12. Usage and Deployment Recommendations . . . . . . . . . . . . 13 89 13. Related Proposals . . . . . . . . . . . . . . . . . . . . . . 14 90 14. Security Considerations . . . . . . . . . . . . . . . . . . . 14 91 15. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 14 92 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 93 17. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 94 Normative References . . . . . . . . . . . . . . . . . . . . . . 16 95 Informative References . . . . . . . . . . . . . . . . . . . . . 16 96 Appendix A - List of Concerns and Corresponding Test Results . . 21 97 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 98 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . 24 100 1. Introduction 102 This document proposes to raise the upper bound on TCP's initial 103 window (IW) to 10 segments or roughly 15KB. It is patterned after and 104 borrows heavily from RFC 3390 [RFC3390] and earlier work in this 105 area. Due to lingering concerns about possible side effects to other 106 flows sharing the same network bottleneck, some of the 107 recommendations are conditional on additional monitoring and 108 evaluation. 110 The primary argument in favor of raising IW follows from the evolving 111 scale of the Internet. Ten segments are likely to fit into queue 112 space available at any broadband access link, even when there are a 113 reasonable number of concurrent connections. 115 Lower speed links can be treated with environment specific 116 configurations, such that they can be protected from being 117 overwhelmed by large initial window bursts without imposing a 118 suboptimal initial window on the rest of the Internet. 120 This document reviews the advantages and disadvantages of using a 121 larger initial window, and includes summaries of several large scale 122 experiments showing that an initial window of 10 segments provides 123 benefits across the board for a variety of BW, RTT, and BDP classes. 124 These results show significant benefits for increasing IW for users 125 at much smaller data rates than had been previously anticipated. 126 However, at initial windows larger than 10, the results are mixed. We 127 believe that these mixed results are not intrinsic, but are the 128 consequence of various implementation artifacts, including overly 129 aggressive applications employing many simultaneous connections. 131 We recommend that all TCP implementations have a settable TCP IW 132 parameter as long as there is a reasonable effort to monitor for 133 possible interactions with other Internet applications and services 134 as described in Section 12. Furthermore, Section 10 details why 10 135 segments may be an appropriate value, and while that value may 136 continue to rise in the future, this document does not include any 137 supporting evidence for values of IW larger than 10. 139 In addition, we introduce a minor revision to RFC 3390 and RFC 5681 140 [RFC5681] to eliminate resetting the initial window when the SYN or 141 SYN/ACK is lost. 143 The document closes with a discussion of the consensus from the TCPM 144 working group on the near-term usage and deployment of IW10 in the 145 Internet. 147 A complementary set of slides for this proposal can be found at 148 [CD10]. 150 2. TCP Modification 152 This document proposes an increase in the permitted upper bound for 153 TCP's initial window (IW) to 10 segments depending on the MSS. This 154 increase is optional: a TCP MAY start with an initial window that is 155 smaller than 10 segments. 157 More precisely, the upper bound for the initial window will be 159 min (10*MSS, max (2*MSS, 14600)) (1) 161 This upper bound for the initial window size represents a change from 162 RFC 3390 [RFC3390], which specified that the congestion window be 163 initialized between 2 and 4 segments depending on the MSS. 165 This change applies to the initial window of the connection in the 166 first round trip time (RTT) of data transmission during or following 167 the TCP three-way handshake. Neither the SYN/ACK nor its 168 acknowledgment (ACK) in the three-way handshake should increase the 169 initial window size. 171 Note that all the test results described in this document were based 172 on the regular Ethernet MTU of 1500 bytes. Future study of the effect 173 of a different MTU may be needed to fully validate (1) above. 175 Furthermore, RFC 3390 and RFC 5681 [RFC5681] state that 177 "If the SYN or SYN/ACK is lost, the initial window used by a 178 sender after a correctly transmitted SYN MUST be one segment 179 consisting of MSS bytes." 181 The proposed change to reduce the default RTO to 1 second [RFC6298] 182 increases the chance for spurious SYN or SYN/ACK retransmission, thus 183 unnecessarily penalizing connections with RTT > 1 second if their 184 initial window is reduced to 1 segment. For this reason, it is 185 RECOMMENDED that implementations refrain from resetting the initial 186 window to 1 segment, unless either there have been multiple SYN or 187 SYN/ACK retransmissions, or true loss detection has been made. 189 TCP implementations use slow start in as many as three different 190 ways: (1) to start a new connection (the initial window); (2) to 191 restart transmission after a long idle period (the restart window); 192 and (3) to restart transmission after a retransmit timeout (the loss 193 window). The change specified in this document affects the value of 194 the initial window. Optionally, a TCP MAY set the restart window to 195 the minimum of the value used for the initial window and the current 196 value of cwnd (in other words, using a larger value for the restart 197 window should never increase the size of cwnd). These changes do NOT 198 change the loss window, which must remain 1 segment of MSS bytes (to 199 permit the lowest possible window size in the case of severe 200 congestion). 202 Furthermore, to limit any negative effect that a larger initial 203 window may have on links with limited bandwidth or buffer space, 204 implementations SHOULD fall back to RFC 3390 for the restart window 205 (RW) if any packet loss is detected during either the initial window, 206 or a restart window, and more than 4KB of data is sent. 208 3. Implementation Issues 210 HTTP 1.1 specification allows only two simultaneous connections per 211 domain, while web browsers open more simultaneous TCP connections 212 [Ste08], partly to circumvent the small initial window in order to 213 speed up the loading of web pages as described above. 215 When web browsers open simultaneous TCP connections to the same 216 destination, they are working against TCP's congestion control 217 mechanisms [FF99]. Combining this behavior with larger initial 218 windows further increases the burstiness and unfairness to other 219 traffic in the network. A larger initial window will incentivize 220 applications to use fewer concurrent TCP connections. 222 Some implementations advertise small initial receive window (Table 2 223 in [Duk10]), effectively limiting how much window a remote host may 224 use. In order to realize the full benefit of the large initial 225 window, implementations are encouraged to advertise an initial 226 receive window of at least 10 segments, except for the circumstances 227 where a larger initial window is deemed harmful. (See the Mitigation 228 section below.) 230 TCP SACK option ([RFC2018]) was thought to be required in order for 231 the larger initial window to perform well. But measurements from both 232 a testbed and live tests showed that IW=10 without the SACK option 233 outperforms IW=3 with the SACK option [CW10]. 235 4. Background 237 TCP congestion window was introduced as part of the congestion 238 control algorithm by Van Jacobson in 1988 [Jac88]. The initial value 239 of one segment was used as the starting point for newly established 240 connections to probe the available bandwidth on the network. 242 Today's Internet is dominated by web traffic running on top of short- 243 lived TCP connections [IOR2009]. The relatively small initial window 244 has become a limiting factor for the performance of many web 245 applications. 247 The global Internet has continued to grow, both in speed and 248 penetration. According to the latest report from Akamai [AKAM10], the 249 global broadband (> 2Mbps) adoption has surpassed 50%, propelling the 250 average connection speed to reach 1.7Mbps, while the narrowband (< 251 256Kbps) usage has dropped to 5%. In contrast, TCP's initial window 252 has remained 4KB for a decade [RFC2414], corresponding to a bandwidth 253 utilization of less than 200Kbps per connection, assuming an RTT of 254 200ms. 256 A large proportion of flows on the Internet are short web 257 transactions over TCP, and complete before exiting TCP slow start. 258 Speeding up the TCP flow startup phase, including circumventing the 259 initial window limit, has been an area of active research [RFC6077, 260 Sch08]. Numerous proposals exist [LAJW07, RFC4782, PRAKS02, PK98]. 261 Some require router support [RFC4782, PK98], hence are not practical 262 for the public Internet. Others suggested bold, but often radical 263 ideas, likely requiring more years of research before standardization 264 and deployment. 266 In the mean time, applications have responded to TCP's "slow" start. 267 Web sites use multiple sub-domains [Bel10] to circumvent HTTP 1.1 268 regulation on two connections per physical host [RFC2616]. As of 269 today, major web browsers open multiple connections to the same site 270 (up to six connections per domain [Ste08] and the number is growing). 271 This trend is to remedy HTTP serialized download to achieve 272 parallelism and higher performance. But it also implies today most 273 access links are severely under-utilized, hence having multiple TCP 274 connections improves performance most of the time. While raising the 275 initial congestion window may cause congestion for certain users 276 using these browsers, we argue that the browsers and other 277 application need to respect HTTP 1.1 regulation and stop increasing 278 number of simultaneous TCP connections. We believe a modest increase 279 of the initial window will help to stop this trend, and provide the 280 best interim solution to improve overall user performance, and reduce 281 the server, client, and network load. 283 Note that persistent connections and pipelining are designed to 284 address some of the above issues with HTTP [RFC2616]. Their presence 285 does not diminish the need for a larger initial window. E.g., data 286 from the Chrome browser show that 35% of HTTP requests are made on 287 new TCP connections. Our test data also shows significant latency 288 reduction with the large initial window even in conjunction with 289 these two HTTP features ([Duk10]). 291 Also note that packet pacing has been suggested as a possible 292 mechanism to avoid large bursts and their associated harm [VH97]. 293 Pacing is not required in this proposal due to a strong preference 294 for a simple solution. We suspect for packet bursts of a moderate 295 size, packet pacing will not be necessary. This seems to be confirmed 296 by our test results. 298 More discussion of the increase in initial window, including the 299 choice of 10 segments can be found in [Duk10, CD10]. 301 5. Advantages of Larger Initial Windows 303 5.1 Reducing Latency 305 An increase of the initial window from 3 segments to 10 segments 306 reduces the total transfer time for data sets greater than 4KB by up 307 to 4 round trips. 309 The table below compares the number of round trips between IW=3 and 310 IW=10 for different transfer sizes, assuming infinite bandwidth, no 311 packet loss, and the standard delayed acks with large delayed-ACK 312 timer. 314 --------------------------------------- 315 | total segments | IW=3 | IW=10 | 316 --------------------------------------- 317 | 3 | 1 | 1 | 318 | 6 | 2 | 1 | 319 | 10 | 3 | 1 | 320 | 12 | 3 | 2 | 321 | 21 | 4 | 2 | 322 | 25 | 5 | 2 | 323 | 33 | 5 | 3 | 324 | 46 | 6 | 3 | 325 | 51 | 6 | 4 | 326 | 78 | 7 | 4 | 327 | 79 | 8 | 4 | 328 | 120 | 8 | 5 | 329 | 127 | 9 | 5 | 330 --------------------------------------- 332 For example, with the larger initial window, a transfer of 32 333 segments of data will require only two rather than five round trips 334 to complete. 336 5.2 Keeping up with the growth of web object size 338 RFC 3390 stated that the main motivation for increasing the initial 339 window to 4KB was to speed up connections that only transmit a small 340 amount of data, e.g., email and web. The majority of transfers back 341 then were less than 4KB, and could be completed in a single RTT 342 [All00]. 344 Since RFC 3390 was published, web objects have gotten significantly 345 larger [Chu09, RJ10]. Today only a small percentage of web objects 346 (e.g., 10% of Google's search responses) can fit in the 4KB initial 347 window. The average HTTP response size of gmail.com, a highly 348 scripted web-site, is 8KB (Figure 1. in [Duk10]). The average web 349 page, including all static and dynamic scripted web objects on the 350 page, has seen even greater growth in size [RJ10]. HTTP pipelining 351 [RFC2616] and new web transport protocols such as SPDY [SPDY] allow 352 multiple web objects to be sent in a single transaction, potentially 353 benefiting from an even larger initial window in order to transfer an 354 entire web page in a small number of round trips. 356 5.3 Recovering faster from loss on under-utilized or wireless links 358 A greater-than-3-segment initial window increases the chance to 359 recover packet loss through Fast Retransmit rather than the lengthy 360 initial RTO [RFC5681]. This is because the fast retransmit algorithm 361 requires three duplicate ACKs as an indication that a segment has 362 been lost rather than reordered. While newer loss recovery techniques 363 such as Limited Transmit [RFC3042] and Early Retransmit [RFC5827] 364 have been proposed to help speeding up loss recovery from a smaller 365 window, both algorithms can still benefit from the larger initial 366 window because of a better chance to receive more ACKs to react upon. 368 6. Disadvantages of Larger Initial Windows for the Individual Connection 370 The larger bursts from an increase in the initial window may cause 371 buffer overrun and packet drop in routers with small buffers, or 372 routers experiencing congestion. This could result in unnecessary 373 retransmit timeouts. For a large-window connection that is able to 374 recover without a retransmit timeout, this could result in an 375 unnecessarily-early transition from the slow-start to the congestion- 376 avoidance phase of the window increase algorithm. 378 Premature segment drops are unlikely to occur in uncongested networks 379 with sufficient buffering, or in moderately-congested networks where 380 the congested router uses active queue management (such as Random 381 Early Detection [FJ93, RFC2309, RFC3150]). 383 Insufficient buffering is more likely to exist in the access routers 384 connecting slower links. A recent study of access router buffer size 385 [DGHS07] reveals the majority of access routers provision enough 386 buffer for 130ms or longer, sufficient to cover a burst of more than 387 10 packets at 1Mbps speed, but possibly not sufficient for browsers 388 opening simultaneous connections. 390 A testbed study [CW10] on the effect of the larger initial window 391 with five simultaneously opened connections revealed that, even with 392 limited buffer size on slow links, IW=10 still reduced the total 393 latency of web transactions, although at the cost of higher packet 394 drop rates as compared to IW=3. 396 Some TCP connections will receive better performance with the larger 397 initial window even if the burstiness of the initial window results 398 in premature segment drops. This will be true if (1) the TCP 399 connection recovers from the segment drop without a retransmit 400 timeout, and (2) the TCP connection is ultimately limited to a small 401 congestion window by either network congestion or by the receiver's 402 advertised window. 404 7. Disadvantages of Larger Initial Windows for the Network 406 An increase in the initial window may increase congestion in a 407 network. However, since the increase is one-time only (at the 408 beginning of a connection), and the rest of TCP's congestion backoff 409 mechanism remains in place, it's unlikely the increase by itself will 410 render a network in a persistent state of congestion, or even 411 congestion collapse. This seems to have been confirmed by the large 412 scale web experiments described later. 414 It should be noted that the above may not hold if applications open a 415 large number of simultaneous connections. 417 Until this proposal is widely deployed, a fairness issue may exist 418 between flows adopting a larger initial window vs flows that are 419 RFC3390-compliant. Although no severe unfairness has been detected on 420 all the known tests so far, further study on this topic may be 421 warranted. 423 Some of the discussions from RFC 3390 are still valid for IW=10. 424 Moreover, it is worth noting that although TCP NewReno increases the 425 chance of duplicate segments when trying to recover multiple packet 426 losses from a large window [RFC3782], the wide support of TCP 427 Selective Acknowledgment (SACK) option [RFC2018] in all major OSes 428 today should keep the volume of duplicate segments in check. 430 Recent measurements [Get11] provide evidence of extremely large 431 queues (in the order of one second or more) at access networks of the 432 Internet. While a significant part of the buffer bloat is contributed 433 by large downloads/uploads such as video files, emails with large 434 attachments, backups and download of movies to disk, some of the 435 problem is also caused by Web browsing of image heavy sites [Get11]. 436 This queuing delay is generally considered harmful for responsiveness 437 of latency sensitive traffic such as DNS queries, ARP, DHCP, VoIP and 438 Gaming. IW=10 can exacerbate this problem when doing short downloads 439 such as Web browsing [Get11-1]. The mitigations proposed for the 440 broader problem of buffer bloating are also applicable in this case, 441 such as the use of ECN, AQM schemes [CoDel] and traffic 442 classification (QoS). 444 8. Mitigation of Negative Impact 446 Much of the negative impact from an increase in the initial window is 447 likely to be felt by users behind slow links with limited buffers. 448 The negative impact can be mitigated by hosts directly connected to a 449 low-speed link advertising a smaller initial receive window than 10 450 segments. This can be achieved either through manual configuration by 451 the users, or through the host stack auto-detecting the low bandwidth 452 links. 454 Additional suggestions to improve the end-to-end performance of slow 455 links can be found in RFC 3150 [RFC3150]. 457 9. Interactions with the Retransmission Timer 459 A large initial window increases the chance of spurious RTO on a low- 460 bandwidth path because the packet transmission time will dominate the 461 round-trip time. To minimize spurious retransmissions, 462 implementations MUST follow RFC 6298 [RFC6298] to restart the 463 retransmission timer with the current value of RTO for each ACK 464 received that acknowledges new data. 466 10. Experimental Results From Large Scale Cluster Tests 468 In this section we summarize our findings from large scale Internet 469 experiments with an initial window of 10 segments, conducted via 470 Google's front-end infrastructure serving a diverse set of 471 applications. We present results from two data centers, each chosen 472 because of the specific characteristics of subnets served: AvgDC has 473 connection bandwidths closer to the worldwide average reported in 474 [AKAM10], with a median connection speed of about 1.7Mbps; SlowDC has 475 a larger proportion of traffic from slow bandwidth subnets with 476 nearly 20% of traffic from connections below 100Kbps, and a third 477 below 256Kbps. 479 Guided by measurements data, we answer two key questions: what is the 480 latency benefit when TCP connections start with a higher initial 481 window, and on the flip side, what is the cost? 483 10.1 The benefits 485 The average web search latency improvement over all responses in 486 AvgDC is 11.7% (68 ms) and 8.7% (72 ms) in SlowDC. We further 487 analyzed the data based on traffic characteristics and subnet 488 properties such as bandwidth (BW), round-trip time (RTT), and 489 bandwidth-delay product (BDP). The average response latency improved 490 across the board for a variety of subnets with the largest benefits 491 of over 20% from high RTT and high BDP networks, wherein most 492 responses can fit within the pipe. Correspondingly, responses from 493 low RTT paths experienced the smallest improvements of about 5%. 495 Contrary to what we expected, responses from low bandwidth subnets 496 experienced the best latency improvements (between 10-20%) in the 497 buckets 0-56Kbps and 56-256Kbps buckets. We speculate low BW networks 498 observe improved latency for two plausible reasons: 1) fewer slow- 499 start rounds: unlike many large BW networks, low BW subnets with 500 dial-up modems have inherently large RTTs; and 2) faster loss 501 recovery: an initial window larger than 3 segments increases the 502 chances of a lost packet to be recovered through Fast Retransmit as 503 opposed to a lengthy RTO. 505 Responses of different sizes benefited to varying degrees; those 506 larger than 3 segments naturally demonstrated larger improvements, 507 because they finished in fewer rounds in slow start as compared to 508 the baseline. In our experiments, response sizes <= 3 segments also 509 demonstrated small latency benefits. 511 To find out how individual subnets performed, we analyzed average 512 latency at a /24 subnet level (an approximation to a user base 513 offered similar set of services by a common ISP). We find even at the 514 subnet granularity, latency improved at all quantiles ranging from 5- 515 11%. 517 10.2 The cost 519 To quantify the cost of raising the initial window, we analyzed the 520 data specifically for subnets with low bandwidth and BDP, 521 retransmission rates for different kinds of applications, as well as 522 latency for applications operating with multiple concurrent TCP 523 connections. From our measurements we found no evidence of a negative 524 latency impacts that correlate to BW or BDP alone, but in fact both 525 kinds of subnets demonstrated latency improvements across averages 526 and quantiles. 528 As expected, the retransmission rate increased modestly when 529 operating with larger initial congestion window. The overall increase 530 in AvgDC is 0.3% (from 1.98% to 2.29%) and in SlowDC is 0.7% (from 531 3.54% to 4.21%). In our investigation, with the exception of one 532 application, the larger window resulted in a retransmission increase 533 of < 0.5% for services in the AvgDC. The exception is the Maps 534 application that operates with multiple concurrent TCP connections, 535 which increased its retransmission rate by 0.9% in AvgDC and 1.85% in 536 SlowDC (from 3.94% to 5.79%). 538 In our experiments, the percentage of traffic experiencing 539 retransmissions did not increase significantly. E.g. 90% of web 540 search and maps experienced zero retransmission in SlowDC 541 (percentages are higher for AvgDC); a break up of retransmissions by 542 percentiles indicate that most increases come from portion of traffic 543 already experiencing retransmissions in the baseline with initial 544 window of 3 segments. 546 Traffic patterns from applications using multiple concurrent TCP 547 connections all operating with a large initial window represent one 548 of the worst case scenarios where latency can be adversely impacted 549 due to bottleneck buffer overflow. Our investigation shows that such 550 a traffic pattern has not been a problem in AvgDC, where all these 551 applications, specifically maps and image thumbnails, demonstrated 552 improved latencies varying from 2-20%. In the case of SlowDC, while 553 these applications continued showing a latency improvement in the 554 mean, their latencies in higher quantiles (96 and above for maps) 555 indicated instances where latency with larger window is worse than 556 the baseline, e.g. the 99% latency for maps has increased by 2.3% 557 (80ms) when compared to the baseline. There is no evidence from our 558 measurements that such a cost on latency is a result of subnet 559 bandwidth alone. Although we have no way of knowing from our data, we 560 conjecture that the amount of buffering at bottleneck links plays a 561 key role in performance of these applications. 563 Further details on our experiments and analysis can be found in 564 [Duk10, DCCM10]. 566 11. Other Studies 568 Besides the large scale Internet experiments described above, a 569 number of other studies have been conducted on the effects of IW10 in 570 various environments. These tests were summarized below, with more 571 discussion in Appendix A. 573 A complete list of tests conducted, with their results and related 574 studies can be found at the [IW10] link. 576 1. [Sch08] described an earlier evaluation of various Fast Startup 577 approaches, including the "Initial-Start" of 10 MSS. 579 2. [DCCM10] presented the result from Google's large scale IW10 580 experiments, with a focus on areas with highly multiplexed links or 581 limited broadband deployment such as Africa and South America. 583 3. [CW10] contained a testbed study on IW10 performance over slow 584 links. It also studied how short flows with a larger initial window 585 might affect the throughput performance of other co-existing, long 586 lived, bulk data transfers. 588 4. [Sch11] compared IW10 against a number of other fast startup 589 schemes, and concluded that IW10 works rather well and is also quite 590 fair. 592 5. [JNDK10] and later [JNDK10-1] studied the effect of IW10 over 593 cellular networks. 595 6. [AERG11] studied the effect of larger ICW sizes, among other 596 things, on end users' page load time from Yahoo!'s Content Delivery 597 Network. 599 12. Usage and Deployment Recommendations 601 Further experiments are required before a larger initial window shall 602 be enabled by default in the Internet. The existing measurement 603 results indicate that this does not cause significant harm to other 604 traffic. However, widespread use in the Internet could reveal issues 605 not known yet, e.g., regarding fairness or impact on latency- 606 sensitive traffic such as VoIP. 608 Therefore, special care is needed when using this experimental TCP 609 extension, in particular on large-scale systems originating a 610 significant amount of Internet traffic, or on large numbers of 611 individual consumer-level systems that have similar aggregate impact. 612 Anyone (stack vendors, network administrators, etc.) turning on a 613 larger initial window SHOULD ensure that the performance is monitored 614 before and after that change. A key metric to monitor is the rate of 615 packet losses, ECN marking, or segment retransmissions during the 616 initial burst. The sender SHOULD cache such information about 617 connection setups using an initial window larger than allowed by RFC 618 3390, and new connections SHOULD fall back to the initial window 619 allowed by RFC 3390 if there is evidence of performance issues. 620 Further experiments are needed on the design of such a cache and 621 corresponding heuristics. 623 Other relevant metrics that may indicate a need to reduce the IW 624 include an increased overall percentage of packet loss or segment 625 retransmissions as well as application-level metrics such as reduced 626 data transfer completion times or impaired media quality. 628 It is important also to take into account hosts that do not implement 629 a larger initial window. Furthermore, non-TCP traffic (such as VoIP) 630 should be monitored as well. If users observe any significant 631 deterioration of performance, they SHOULD fall back to an initial 632 window as allowed by RFC 3390 for safety reasons. An increased 633 initial window MUST NOT be turned on by default on systems without 634 such monitoring capabilities. 636 The IETF TCPM working group is very much interested in further 637 reports from experiments with this specification and encourages the 638 publication of such measurement data. If no significant harm is 639 reported, a follow-up document may revisit the question on whether a 640 larger initial window can be safely used by default in all Internet 641 hosts. 643 13. Related Proposals 645 Two other proposals [All10, Tou12] have been published to raise TCP's 646 initial window size over a large timescale. Both aim at reducing the 647 uncertain impact of a larger initial window at an Internet wide 648 scale. Moreover, [Tou12] seeks an algorithm to automate the 649 adjustment of IW safely over long haul period. 651 Although a modest, static increase of IW to 10 may address the near- 652 term need for better web performance, much work is needed from the 653 TCP research community to find a long term solution to the TCP flow 654 startup problem. 656 14. Security Considerations 658 This document discusses the initial congestion window permitted for 659 TCP connections. Although changing this value may cause more packet 660 loss, it is highly unlikely to lead to a persistent state of network 661 congestion or even a congestion collapse. Hence it does not raise any 662 known new security issues with TCP. 664 15. Conclusion 666 This document suggests a simple change to TCP that will reduce the 667 application latency over short-lived TCP connections or links with 668 long RTTs (saving several RTTs during the initial slow-start phase) 669 with little or no negative impact over other flows. Extensive tests 670 have been conducted through both testbeds and large data centers with 671 most results showing improved latency with only a small increase in 672 the packet retransmission rate. Based on these results we believe a 673 modest increase of IW to 10 is the best solution for the near-term 674 deployment, while scaling IW over the long run remains a challenge 675 for the TCP research community. 677 16. IANA Considerations 679 None 681 17. Acknowledgments 683 Many people at Google have helped to make the set of large scale 684 tests possible. We would especially like to acknowledge Amit Agarwal, 685 Tom Herbert, Arvind Jain and Tiziana Refice for their major 686 contributions. 688 Normative References 690 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP 691 Selective Acknowledgment Options", RFC 2018, October 1996. 693 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 694 Requirement Levels", BCP 14, RFC 2119, March 1997. 696 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, 697 L., Leach, P. and T. Berners-Lee, "Hypertext Transfer 698 Protocol -- HTTP/1.1", RFC 2616, June 1999. 700 [RFC3390] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's 701 Initial Window", RFC 3390, October 2002. 703 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 704 Control", RFC 5681, September 2009. 706 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J. and P. 707 Hurtig, "Early Retransmit for TCP and SCTP", RFC 5827, May 708 2010. 710 [RFC6298] Paxson, V., Allman, M., Chu, J. and M. Sargent, "Computing 711 TCP's Retransmission Timer", RFC 6298, June 2011. 713 Informative References 715 [AKAM10] "The State of the Internet, 3rd Quarter 2009", Akamai 716 Technologies, Inc., January 2010. 717 URL=http://www.akamai.com/html/about/press/releases/2010/press_011310_1.html 719 [AERG11] Al-Fares, M., Elmeleegy, K., Reed, B. and I. Gashinsky, 720 "Overclocking the Yahoo! CDN for Faster Web Page Loads", 721 Internet Measurement Conference, November 2011. 723 [All00] Allman, M., "A Web Server's View of the Transport Layer", 724 ACM Computer Communication Review, 30(5), October 2000. 726 [All10] Allman, M., "Initial Congestion Window Specification", 727 Internet-draft draft-allman-tcpm-bump-initcwnd-00.txt, work 728 in progress, last updated November 2010. 730 [Bel10] Belshe, M., "A Client-Side Argument For Changing TCP Slow 731 Start", January, 2010. URL 732 http://sites.google.com/a/chromium.org/dev/spdy/ 733 An_Argument_For_Changing_TCP_Slow_Start.pdf 735 [CD10] Chu, J. and N. Dukkipati, "Increasing TCP's Initial 736 Window", Presented to 77th IRTF ICCRG & IETF TCPM working 737 group meetings, March 2010. URL 738 http://www.ietf.org/proceedings/77/slides/tcpm-4.pdf 740 [Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century", 741 Presented to 75th IETF TCPM working group meeting, July 742 2009. URL http://www.ietf.org/proceedings/75/slides/tcpm- 743 1.pdf. 745 [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", ACM 746 QUEUE, May 6, 2012. 748 [CW10] Chu, J. and Wang, Y., "A Testbed Study on IW10 vs IW3", 749 Presented to 79th IETF TCPM working group meeting, Nov. 750 2010. URL http://www.ietf.org/proceedings/79/slides/tcpm- 751 0.pdf. 753 [DCCM10] Dukkipati, D., Cheng, Y., Chu, J. and M. Mathis, 754 "Increasing TCP initial window", Presented to 78th IRTF 755 ICCRG working group meeting, July 2010. URL 756 http://www.ietf.org/proceedings/78/slides/iccrg-3.pdf 758 [DGHS07] Dischinger, M., Gummadi, K., Haeberlen, A. and S. Saroiu, 759 "Characterizing Residential Broadband Networks", Internet 760 Measurement Conference, October 24-26, 2007. 762 [Duk10] Dukkipati, N., Refice, T., Cheng, Y., Chu, J., Sutin, N., 763 Agarwal, A., Herbert, T. and J. Arvind, "An Argument for 764 Increasing TCP's Initial Congestion Window", ACM SIGCOMM 765 Computer Communications Review, vol. 40 (2010), pp. 27-33. 766 July 2010. 768 [FF99] Floyd, S., and K. Fall, "Promoting the Use of End-to-End 769 Congestion Control in the Internet", IEEE/ACM Transactions 770 on Networking, August 1999. 772 [FJ93] Floyd, S. and V. Jacobson, "Random Early Detection gateways 773 for Congestion Avoidance", IEEE/ACM Transactions on 774 Networking, V.1 N.4, August 1993, p. 397-413. 776 [Get11] Gettys, J., "Bufferbloat: Dark buffers in the Internet", 777 Presented to 80th IETF TSV Area meeting, March 2011. URL 778 http://www.ietf.org/proceedings/80/slides/tsvarea-1.pdf 780 [Get11-1] Gettys, J., "IW10 Considered Harmful", Internet-draft 781 draft-gettys-iw10-considered-harmful-00, work in progress, 782 August 2011. 784 [IOR2009] Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, 785 J. Jahanian, F. and M. Karir, "Atlas Internet Observatory 786 2009 Annual Report", 47th NANOG Conference, October 2009. 788 [IW10] "TCP IW10 links", URL 789 http://code.google.com/speed/protocols/tcpm-IW10.html 791 [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer 792 Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 793 1988. 795 [JNDK10] Jarvinen, I., Nyrhinen. A., Ding, A. and M. Kojo, "A 796 Simulation Study on Increasing TCP's IW", Presented to 78th 797 IRTF ICCRG working group meeting, July 2010. URL 798 http://www.ietf.org/proceedings/78/slides/iccrg-7.pdf 800 [JNDK10-1] Jarvinen, I., Nyrhinen. A., Ding, A. and M. Kojo, "Effect 801 of IW and Initial RTO changes", Presented to 79th IETF TCPM 802 working group meeting, Nov. 2010. URL 803 http://www.ietf.org/proceedings/79/slides/tcpm-1.pdf 805 [LAJW07] Liu, D., Allman, M., Jin, S. and L. Wang, "Congestion 806 Control Without a Startup Phase", Protocols for Fast, Long 807 Distance Networks (PFLDnet) Workshop, February 2007. URL 808 http://www.icir.org/mallman/papers/jumpstart-pfldnet07.pdf 810 [PK98] Padmanabhan V.N. and R. Katz, "TCP Fast Start: A technique 811 for speeding up web transfers", in Proceedings of IEEE 812 Globecom '98 Internet Mini-Conference, 1998. 814 [PRAKS02] Partridge, C., Rockwell, D., Allman, M., Krishnan, R. and 815 J. Sterbenz, "A Swifter Start for TCP", Technical Report 816 No. 8339, BBN Technologies, March 2002. 818 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 819 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 820 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., 821 Wroclawski, J. and L. Zhang, "Recommendations on Queue 822 Management and Congestion Avoidance in the Internet", RFC 823 2309, April 1998. 825 [RFC2414] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's 826 Initial Window", RFC 2414, September 1998. 828 [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's 829 Loss Recovery Using Limited Transmit", RFC 3042, January 830 2001. 832 [RFC3150] Dawkins, S., Montenegro, G., Kojo, M. and V. Magret, "End- 833 to-end Performance Implications of Slow Links", BCP 0048, 834 July 2001. 836 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 837 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 838 April 2004. 840 [RFC4782] Floyd, S., Allman, M., Jain, A. and P. Sarolahti, "Quick- 841 Start for TCP and IP", RFC 4782, January 2007. 843 [RFC6077] Papadimitriou, D., Welzl, M., Scharf, M. and B. Briscoe, 844 "Open Research Issues in Internet Congestion Control", 845 section 3.4, RFC 6077, February 2011. 847 [RJ10] Ramachandran, S. and A. Jain, "Aggregate Statistics of Size 848 Related Metrics of Web Pages metrics", May 2010. URL 849 http://code.google.com/speed/articles/web-metrics.html 851 [Sch08] Scharf, M., "Quick-Start, Jump-Start, and Other Fast 852 Startup Approaches", Internet Research Task Force ICCRG, 853 November 17, 2008. URL 854 http://www.ietf.org/proceedings/73/slides/iccrg-2.pdf 856 [Sch11] Scharf, M., "Performance and Fairness Evaluation of IW10 857 and Other Fast Startup Schemes", Internet Research Task 858 Force ICCRG, March 2011. URL 859 http://www.ietf.org/proceedings/80/slides/iccrg-1.pdf 861 [Sch11-1] Scharf, M., "Comparison of end-to-end and network- 862 supported fast startup congestion control schemes", 863 Computer Networks, Feb. 2011. URL 864 http://dx.doi.org/10.1016/j.comnet.2011.02.002 866 [SPDY] "SPDY: An experimental protocol for a faster web", URL 867 http://dev.chromium.org/spdy 869 [Ste08] Sounders S., "Roundup on Parallel Connections", High 870 Performance Web Sites blog. March 2008. URL 871 http://www.stevesouders.com/blog/2008/03/20/roundup-on- 872 parallel-connections 874 [Tou12] Touch, J., "Automating the Initial Window in TCP", 875 Internet-draft draft-touch-tcpm-automatic-iw-02.txt, work 876 in progress, January 2012. 878 [VH97] Visweswaraiah, V. and J. Heidemann, "Improving Restart of 879 Idle TCP Connections", Technical Report 97-661, University 880 of Southern California, November 1997. 882 Appendix A - List of Concerns and Corresponding Test Results 884 Concerns have been raised since this proposal was first published 885 based on a set of large scale experiments. To better understand the 886 impact of a larger initial window in order to confirm or dismiss 887 these concerns, additional tests have been conducted using either 888 large scale clusters, simulations, or real testbeds. The following 889 attempts to compile the list of concerns and summarize findings from 890 relevant tests. 892 o How complete are various tests in covering many different traffic 893 patterns? 895 The large scale Internet experiments conducted at Google front-end 896 infrastructure covered a large portfolio of services beyond web 897 search. It includes Gmail, Google Maps, Photos, News, Sites, 898 Images,..., etc, covering a wide variety of traffic sizes and 899 patterns. One notable exception is YouTube because we don't think 900 the large initial window will have much material impact, either 901 positive or negative, on bulk data services. 903 [CW10] contains some result from a testbed study on how short flows 904 with a larger initial window might affect the throughput 905 performance of other co-existing, long lived, bulk data transfers. 907 o Larger bursts from the increase in the initial window cause 908 significantly more packet drops 910 All the tests conducted on this subject [Duk10, Sch11, Sch11-1, 911 CW10] so far have shown only modest increase on packet drops. The 912 only exception is from the testbed study [CW10] when under 913 extremely high load and/or simultaneous opens. But under those 914 conditions both IW=3 and IW=10 suffered very high packet loss rates 915 though. 917 o A large initial window may severely impact TCP performance over 918 highly multiplexed links still common in developing regions 920 Our large scale experiments described in section 10 above also 921 covered Africa and South America. Measurement data from those 922 regions [DCCM10] revealed improved latency even for those services 923 that employ multiple simultaneous connections, at the cost of small 924 increase in the retransmission rate. It seems that the round trip 925 savings from a larger initial window more than make up the time 926 spent on recovering more lost packets. 928 Similar phenomenon have also been observed from testbed study 929 [CW10]. 931 o Why 10 segments? 933 Questions have been raised on how the number 10 was picked. We have 934 tried different sizes in our large scale experiments, and found 935 that 10 segments seem to give most of the benefits for the services 936 we tested while not causing significant increase in the 937 retransmission rates. Going forward 10 segments may turn out to be 938 too small when the average of web object sizes continue to grow. 939 But a scheme to right size the initial window automatically over 940 long timescales has yet to be developed. 942 o Need more thorough analysis of the impact on slow links 944 Although [Duk10] showed the large initial window reduced the 945 average latency even for the dialup link class of only 56Kbps in 946 bandwidth, more studied were needed in order to understand the 947 effect of IW10 on slow links at the microscopic level. [CW10] was 948 conducted for this purpose. 950 Testbeds in [CW10] emulated a 300ms RTT, bottleneck link bandwidth 951 as low as 64Kbps, and route queue size as low as 40 packets. A 952 large combination of test parameters were used. Almost all tests 953 showed varying degree of latency improvement from IW=10, with only 954 a modest increase in the packet drop rate until a very high load 955 was injected. The testbed result was consistent with both the large 956 scale data center experiments [CD10, DCCM10] and a separate study 957 using NSC simulations [Sch11, Sch11-1]. 959 o How will the larger initial window affect flows with initial 960 windows 4KB or less? 962 Flows with the larger initial window will likely grab more 963 bandwidth from a bottleneck link when competing against flows with 964 smaller initial window, at least initially. How long will this 965 "unfairness" last? Will there be any "capture effect" where flows 966 with larger initial window possess a disproportional share of 967 bandwidth beyond just a few round trips? 969 If there is any "unfairness" issue from flows with different 970 initial windows, it did not show up in the large scale experiments, 971 as the average latency for the bucket of all responses < 4KB did 972 not seem to be affected by the presence of many other larger 973 responses employing large initial window. As a matter of fact they 974 seemed to benefit from the large initial window too, as shown in 975 Figure 7 of [Duk10]. 977 The same phenomenon seems to exist in the testbed experiments 978 [CW10]. Flows with IW=3 only suffered slightly when competing 979 against flows with IW=10 in light to median loads. Under high load 980 both flows' latency improved when mixed together. Also long-lived, 981 background bulk-data flows seemed to enjoy higher throughput when 982 running against many foreground short flows of IW=10 than against 983 short flows of IW=3. One plausible explanation was IW=10 enabled 984 short flows to complete sooner, leaving more room for the long- 985 lived, background flows. 987 A study using NSC simulator has also concluded that IW=10 works 988 rather well and is quite fair against IW=3 [Sch11, Sch11-1]. 990 o How will a larger initial window perform over cellular networks? 992 Some simulation studies [JNDK10, JNDK10-1] have been conducted to 993 study the effect of a larger initial window on wireless links from 994 2G to 4G networks (EGDE/HSPA/LTE). The overall result seems mixed 995 in both raw performance and the fairness index. 997 Author's Addresses 999 Jerry Chu 1000 Google, Inc. 1001 1600 Amphitheatre Parkway 1002 Mountain View, CA 94043 1003 USA 1004 EMail: hkchu@google.com 1006 Nandita Dukkipati 1007 Google, Inc. 1008 1600 Amphitheatre Parkway 1009 Mountain View, CA 94043 1010 USA 1011 EMail: nanditad@google.com 1013 Yuchung Cheng 1014 Google, Inc. 1015 1600 Amphitheatre Parkway 1016 Mountain View, CA 94043 1017 USA 1018 EMail: ycheng@google.com 1020 Matt Mathis 1021 Google, Inc. 1022 1600 Amphitheatre Parkway 1023 Mountain View, CA 94043 1024 USA 1025 EMail: mattmathis@google.com 1027 Acknowledgment 1029 Funding for the RFC Editor function is currently provided by the 1030 Internet Society.