idnits 2.17.1 draft-hkchu-tcpm-initcwnd-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document updates RFC3390, but the abstract doesn't seem to directly say this. It does mention RFC3390 though, so this could be OK. -- The draft header indicates that this document updates RFC5681, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3390, updated by this document, for RFC5378 checks: 2001-05-25) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Full Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'PWSB09' is defined on line 665, but no explicit reference was found in the text == Unused Reference: 'Sch08' is defined on line 702, but no explicit reference was found in the text == Outdated reference: A later version (-02) exists of draft-paxson-tcpm-rfc2988bis-00 ** Downref: Normative reference to an Proposed Standard draft: draft-paxson-tcpm-rfc2988bis (ref. 'PAC10') ** Downref: Normative reference to an Proposed Standard RFC: RFC 2018 ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) ** Downref: Normative reference to an Proposed Standard RFC: RFC 3390 ** Downref: Normative reference to an Draft Standard RFC: RFC 5681 == Outdated reference: A later version (-08) exists of draft-irtf-iccrg-welzl-congestion-control-open-research-05 -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2414 (Obsoleted by RFC 3390) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 7 errors (**), 0 flaws (~~), 5 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft J. Chu 3 draft-hkchu-tcpm-initcwnd-01.txt N. Dukkipati 4 Intended status: Standard Y. Cheng 5 Updates: 3390, 5681 M. Mathis 6 Creation date: July 12, 2010 Google, Inc. 7 Expiration date: January 2011 9 Increasing TCP's Initial Window 11 Status of this Memo 13 Distribution of this memo is unlimited. 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that other 20 groups may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/1id-abstracts.html 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This Internet-Draft will expire on January, 2011. 35 Copyright Notice 37 Copyright (c) 2010 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Abstract 52 This document proposes an increase in the permitted TCP initial 53 window (IW) from between 2 and 4 segments, as specified in RFC 3390, 54 to 10 segments. It discusses the motivation behind the increase, the 55 advantages and disadvantages of the higher initial window, and 56 presents results from several large scale experiments showing that 57 the higher initial window improves the overall performance of many 58 web services without risking congestion collapse. Finally, it 59 outlines a list of concerns to be addressed in future tests. 61 Terminology 63 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 64 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 65 document are to be interpreted as described in RFC 2119 [RFC2119]. 67 Editor's Note 69 This draft aims at updating RFC 3390, thus it follows RFC 3390's 70 layout closely. Much of the analysis from RFC 3390 remains valid. 71 Some non-critical details are intentionally excluded from this draft. 72 The intent is to have the draft published to solicit feedbacks early. 73 All the excluded pieces will be supplied in later revisions. 75 The choice of 10 for initial window may not be the "optimal" one. Our 76 most recent tests gave better performance with IW=16 compared to 77 IW=10, while still showing little negative impact. We are still in 78 the process of completing the latest large scale tests and analysis 79 for IW=16, which might be a better, more future proofing choice. 81 1. Introduction 83 TCP congestion window was introduced as part of the congestion 84 control algorithm by Van Jacobson in 1988 [Jac88]. The initial value 85 of one segment was used as the starting point for newly established 86 connections to probe the available bandwidth on the network. 88 The default value was increased to roughly 4KB more than a decade ago 89 [RFC2414]. Since then, the Internet has continued to grow, both in 90 speed and penetration [AKAM10]. Today's Internet is dominated by web 91 traffic running on top of short-lived TCP connections [IOR2009]. The 92 relatively small initial window has become a limiting factor for the 93 performance of many web applications. 95 This document proposes an optional standard to allow TCP's initial 96 window to start at 10 segments or roughly 15KB, updating RFC 3390 97 [RFC3390]. It discusses the motivation, the advantages and 98 disadvantages of the higher initial window, and includes test results 99 from several large scale experiments showing improved latency across 100 the board for a variety of BW, RTT, and BDP classes. 102 It also discusses potential negative impacts and suggests mitigation. 103 A minor change to RFC 3390 and RFC 5681 [RFC5681] is proposed on 104 resetting the initial window when the SYN or SYN/ACK is lost. 106 The document closes with a discussion on remaining concerns, and 107 future tests to further validate the higher initial window. 109 2. TCP Modification 111 This document proposes an increase in the permitted upper bound for 112 TCP's initial window (IW) to 10 segments. This increase is optional: 113 a TCP MAY start with a larger initial window up to 10 segments. 115 This upper bound for the initial window size represents a change from 116 RFC 3390 [RFC3390], which specified that the congestion window be 117 initialized between 2 and 4 segments depending on the MSS. 119 This change applies to the initial window of the connection in the 120 first round trip time (RTT) of data transmission following the TCP 121 three-way handshake. Neither the SYN/ACK nor its acknowledgment (ACK) 122 in the three-way handshake should increase the initial window size 123 beyond 10 segments. 125 Furthermore, RFC 3390 and RFC 5681 [RFC5681] state that 127 "If the SYN or SYN/ACK is lost, the initial window used by a 128 sender after a correctly transmitted SYN MUST be one segment 129 consisting of MSS bytes." 131 The proposed change to reduce the default RTO to 1 second [PAC10] 132 increases the chance for spurious SYN or SYN/ACK retransmission, thus 133 unnecessarily penalizing connections with RTT > 1 second if their 134 initial window is reduced to 1 segment. For this reason, it is 135 RECOMMENDED that implementations refrain from resetting the initial 136 window to 1 segment, unless either there have been multiple SYN or 137 SYN/ACK retransmissions, or true loss detection has been made. 139 TCP implementations use slow start in as many as three different 140 ways: (1) to start a new connection (the initial window); (2) to 141 restart transmission after a long idle period (the restart window); 142 and (3) to restart transmission after a retransmit timeout (the loss 143 window). The change specified in this document affects the value of 144 the initial window. Optionally, a TCP MAY set the restart window to 145 the minimum of the value used for the initial window and the current 146 value of cwnd (in other words, using a larger value for the restart 147 window should never increase the size of cwnd). These changes do NOT 148 change the loss window, which must remain 1 segment of MSS bytes (to 149 permit the lowest possible window size in the case of severe 150 congestion). 152 Furthermore, to limit any negative effect that a larger initial 153 window may have on links with limited bandwidth or buffer space, 154 implementations SHOULD fall back to RFC 3390 for the restart window 155 (RW), if any packet loss is detected during either the initial 156 window, or a restart window, when more than 4KB of data is sent. 158 3. Motivation 160 The global Internet has continued to grow, both in speed and 161 penetration. According to the latest report from Akamai [AKAM10], the 162 global broadband (> 2Mbps) adoption has surpassed 50%, propelling the 163 average connection speed to reach 1.7Mbps, while the narrowband (< 164 256Kbps) usage has dropped to 5%. In contrast, TCP's initial window 165 has remained 4KB for a decade, corresponding to a bandwidth 166 utilization of less than 200Kbps per connection, assuming an RTT of 167 200ms. 169 A large proportion of flows on the Internet are short web 170 transactions over TCP, and complete before exiting TCP slow start. 171 Speeding up the TCP flow startup phase, including circumventing the 172 initial window limit, has been an area of active research [PWSB09, 173 Sch08]. Numerous proposals exist [LAJW07, RFC4782, PRAKS02, PK98]. 174 Some require router support [RFC4782, PK98], hence are not practical 175 for the public Internet. Others suggested bold, but often radical 176 ideas, likely requiring more years of research before standardization 177 and deployment. 179 In the mean time, applications have responded to TCP's "slow" start. 180 Web sites use multiple sub-domains [Bel10] to circumvent HTTP 1.1 181 regulation on two connections per physical host [RFC2616]. As of 182 today, major web browsers open multiple connections to the same site 183 (up to six connections per domain [Ste08] and the number is growing). 184 This trend is to remedy HTTP serialized download to achieve 185 parallelism and higher performance. But it also implies today most 186 access links are severely under-utilized, hence having multiple TCP 187 connections improves performance most of the time. While raising the 188 initial congestion window may cause congestion for certain users 189 using these browsers, we argue that the browsers and other 190 application need to respect HTTP 1.1 regulation and stop increasing 191 number of simultaneous TCP connections. We believe a modest increase 192 of the initial window will help to stop this trend, and provide the 193 best interim solution to improve overall user performance, and reduce 194 the server, client, and network load. 196 Note that persistent connections and pipelining are designed to 197 address some of the issues with HTTP above [RFC2616]. Their presence 198 does not diminish the need for a larger initial window, as the first 199 data chunk to respond is often the largest, and will easily hit the 200 initial window limit. Our test data confirm significant latency 201 reduction with the large initial window even with these two HTTP 202 features ([Duk10]). 204 Also note that packet pacing has been suggested as an effective 205 mechanism to avoid large bursts and their associated damage [VH97]. 206 We do not require pacing in our proposal due to our strong preference 207 for a simple solution. We suspect for packet bursts of a moderate 208 size, packet pacing will not be necessary. This seems to be confirmed 209 by our test results. 211 More discussion of the increase in initial window, including the 212 choice of 10 segments can be found in [Duk10]. 214 4. Implementation Issues 216 [Need to decide if a different formula is needed for PMTU != 1500.] 218 HTTP 1.1 specification allows only two simultaneous connections per 219 domain, while web browsers open more simultaneous TCP connections 220 [Ste08], partly to circumvent the small initial window in order to 221 speed up the loading of web pages as described above. 223 When web browsers open simultaneous TCP connections to the same 224 destination, they are working against TCP's congestion control 225 mechanisms [FF99]. Combining this behavior with larger initial 226 windows further increases the burstiness and unfairness to other 227 traffic in the network. A larger initial window will incent 228 applications to use fewer concurrent TCP connections. 230 Some implementations advertise small initial receive window (Table 2 231 in [Duk10]), effectively limiting how much window a remote host may 232 use. In order to realize the full benefit of the large initial 233 window, implementations are encouraged to advertise an initial 234 receive window of at least 10 segments, except for the circumstances 235 where a larger initial window is deemed harmful. (See the Mitigation 236 section below.) 238 5. Advantages of Larger Initial Windows 240 1. Reducing Latency 241 An increase of the initial window from 3 segments to 10 segments 242 reduces the total transfer time for data sets greater than 4KB by 243 up to 4 round trips. 245 The table below compares the number of round trips between IW=3 246 and IW=10 for different transfer sizes, assuming infinite 247 bandwidth, no packet loss, and the standard delayed acks with 248 large delay-ack timer. 250 --------------------------------------- 251 | total segments | IW=3 | IW=10 | 252 --------------------------------------- 253 | 3 | 1 | 1 | 254 | 6 | 2 | 1 | 255 | 10 | 3 | 1 | 256 | 12 | 3 | 2 | 257 | 21 | 4 | 2 | 258 | 25 | 5 | 2 | 259 | 32 | 5 | 3 | 260 | 46 | 6 | 3 | 261 | 51 | 6 | 4 | 262 | 79 | 7 | 4 | 263 | 121 | 8 | 5 | 264 | 128 | 9 | 5 | 265 --------------------------------------- 267 For example, with the larger initial window, a transfer of 32KB 268 data will require only two rather than five round trips to 269 complete. 271 2. Keeping up with the growth of web object size 273 RFC 3390 stated that the main motivation for increasing the 274 initial window to 4KB was to speed up connections that only 275 transmit a small amount of data, e.g., email and web. The 276 majority of transfers back then were less than 4KB, and could be 277 completed in a single RTT [All00]. 279 Since RFC 3390 was published, web objects have gotten 280 significantly larger [Chu09, RJ10]. A large percentage of web 281 objects today no longer fit in the 4KB initial window, and will 282 require more than one round trip to transfer. E.g., only 10% of 283 Google's search responses can fit in 4KB, while 90% can fit in 10 284 segments (15KB). The average HTTP response size of gmail.com, a 285 highly scripted web-site, is 8KB (Figure 1. in [Duk10]). 287 During the same period, the average web page, including all 288 static and dynamic scripted web objects on the page, has seen 289 even greater growth in size [RJ10]. HTTP pipelining [RFC2616] and 290 new web transport protocols like SPDY [SPDY] allow multiple web 291 objects to be sent in a single transaction, potentially requiring 292 even larger initial window in order to transfer a whole web page 293 in one round trip. 295 3. Recovering faster from loss on under-utilized or wireless links 297 A greater-than-3-segment initial window increases the chance to 298 recover packet loss through Fast Retransmit rather than the 299 lengthy initial RTO [RFC5681]. This is because the fast 300 retransmit algorithm requires three duplicate acks as an 301 indication that a segment has been lost rather than reordered. 302 While newer loss recovery techniques such as Limited Transmit 303 [RFC3042] and Early Retransmit [AAABH10] have been proposed to 304 help speeding up loss recovery from a smaller window, both 305 algorithms can still benefit from the larger initial window 306 because of a better chance to receive more ACKs to react upon. 308 6. Disadvantages of Larger Initial Windows for the Individual 309 Connection 311 The larger bursts from an increase in the initial window may cause 312 buffer overrun and packet drop in routers with small buffers, or 313 routers experiencing congestion. This could result in unnecessary 314 retransmit timeouts. For a large-window connection that is able to 315 recover without a retransmit timeout, this could result in an 316 unnecessarily-early transition from the slow-start to the congestion- 317 avoidance phase of the window increase algorithm. [Note: knowing the 318 large initial window may cause premature segment drop, should one 319 make an exception for it, i.e., by allowing ssthresh to remain 320 unchanged if loss is from an enlarged initial window?] 322 Premature segment drops are unlikely to occur in uncongested networks 323 with sufficient buffering, or in moderately-congested networks where 324 the congested router uses active queue management (such as Random 325 Early Detection [FJ93, RFC2309, RFC3150]). 327 Insufficient buffering is more likely to exist in the access routers 328 connecting slower links. A recent study of access router buffer size 329 [DGHS07] reveals the majority of access routers provision enough 330 buffer for 130ms or longer, sufficient to cover a burst of more than 331 10 packets at 1Mbps speed, but possibly not sufficient for browsers 332 opening simultaneous connections. 334 Some TCP connections will receive better performance with the larger 335 initial window even if the burstiness of the initial window results 336 in premature segment drops. This will be true if (1) the TCP 337 connection recovers from the segment drop without a retransmit 338 timeout, and (2) the TCP connection is ultimately limited to a small 339 congestion window by either network congestion or by the receiver's 340 advertised window. 342 7. Disadvantages of Larger Initial Windows for the Network 344 An increase in the initial window may increase congestion in a 345 network. However, since the increase is one-time only (at the 346 beginning of a connection), and the rest of TCP's congestion backoff 347 mechanism remains in place, it's highly unlikely the increase will 348 render a network in a persistent state of congestion, or even 349 congestion collapse. This seems to have been confirmed by our large 350 scale experiments described later. 352 Some of the discussions from RFC 3390 are still valid for IW=10. 353 Moreover, it is worth noting that although TCP NewReno increases the 354 chance of duplicate segments when trying to recover multiple packet 355 losses from a large window [RFC3782], the wide support of TCP 356 Selective Acknowledgment (SACK) option [RFC2018] in all major OSes 357 today should keep the volume of duplicate segments in check. 359 8. Mitigation of Negative Impact 361 Much of the negative impact from an increase in the initial window is 362 likely to be felt by users behind slow links with limited buffers. 363 The negative impact can be mitigated by hosts directly connected to a 364 low-speed link advertising a smaller initial receive window than 10 365 segments. This can be achieved either through manual configuration by 366 the users, or through the host stack auto-detecting the low bandwidth 367 links. 369 More suggestions to improve the end-to-end performance of slow links 370 can be found in RFC 3150 [RFC3150]. 372 [Note: if packet loss is detected during IW through fast retransmit, 373 should cwnd back down to 2 rather than FlightSize / 2?] 375 9. Interactions with the Retransmission Timer 377 A large initial window increases the chance of spurious RTO on a low- 378 bandwidth path because the packet transmission time will dominate the 379 round-trip time. To minimize spurious retransmissions, 380 implementations MUST follow RFC 2988 [RFC2988] to restart the 381 retransmission timer with the current value of RTO for each ack 382 received that acknowledges new data. 384 10. Experimental Results 385 In this section we summarize our findings from large scale Internet 386 experiments with an initial window of 10 segments, conducted via 387 Google's front-end infrastructure serving a diverse set of 388 applications. We present results from two datacenters, each chosen 389 because of the specific characteristics of subnets served: AvgDC has 390 connection bandwidths closer to the worldwide average reported in 391 [AKAM10], with a median connection speed of about 1.7Mbps; SlowDC has 392 a larger proportion of traffic from slow bandwidth subnets with 393 nearly 20% of traffic from connections below 100Kbps, and a third 394 below 256Kbps. 396 Guided by measurements data, we answer two key questions: what is the 397 latency benefit when TCP connections start with a higher initial 398 window, and on the flip side, what is the cost? 400 10.1 The benefits 402 The average web search latency improvement over all responses in 403 AvgDC is 11.7% (68 ms) and 8.7% (72 ms) in SlowDC. We further 404 analyzed the data based on traffic characteristics and subnet 405 properties such as bandwidth (BW), round-trip time (RTT), and 406 bandwidth-delay product (BDP). The average response latency improved 407 across the board for a variety of subnets with the largest benefits 408 of over 20% from high RTT and high BDP networks, wherein most 409 responses can fit within the pipe. Correspondingly, responses from 410 low RTT paths experienced the smallest improvements of about 5%. 412 Contrary to what we expected, responses from low bandwidth subnets 413 experienced the best latency improvements (between 10-20%) in the 414 buckets 0-56Kbps and 56-256Kbps buckets. We speculate low BW networks 415 observe improved latency for two plausible reasons: 1) fewer slow- 416 start rounds: unlike many large BW networks, low BW subnets with 417 dial-up modems have inherently large RTTs; and 2) faster loss 418 recovery: an initial window larger than 3 segments increases the 419 chances of a lost packet to be recovered through Fast Retransmit as 420 opposed to a lengthy RTO. 422 Responses of different sizes benefited to varying degrees; those 423 larger than 3 segments naturally demonstrated larger improvements, 424 because they finished in fewer rounds in slow start as compared to 425 the baseline. In our experiments, response sizes <= 3 segments also 426 demonstrated small latency benefits. 428 To find out how individual subnets performed, we analyzed average 429 latency at a /24 subnet level (an approximation to a user base 430 offered similar set of services by a common ISP). We find even at the 431 subnet granularity, latency improved at all quantiles ranging from 5- 432 11%. 434 10.2 The cost 436 To quantify the cost of raising the initial window, we analyzed the 437 data specifically for subnets with low bandwidth and BDP, 438 retransmission rates for different kinds of applications, as well as 439 latency for applications operating with multiple concurrent TCP 440 connections. From our measurements we found no evidence of a negative 441 latency impacts that correlate to BW or BDP alone, but in fact both 442 kinds of subnets demonstrated latency improvements across averages 443 and quantiles. 445 As expected, the retransmission rate increased modestly when 446 operating with larger initial congestion window. The overall increase 447 in AvgDC is 0.3% (from 1.98% to 2.29%) and in SlowDC is 0.7% (from 448 3.54% to 4.21%). In our investigation, with the exception of one 449 application, the larger window resulted in a retransmission increase 450 of < 0.5% for services in the AvgDC. The exception is the Maps 451 application that operates with multiple concurrent TCP connections, 452 which increased its retransmission rate by 0.9% in AvgDC and 1.85% in 453 SlowDC (from 3.94% to 5.79%). 455 In our experiments, the percentage of traffic experiencing 456 retransmissions did not increase significantly. E.g. 90% of web 457 search and maps experienced zero retransmissions in SlowDC 458 (percentages are higher for AvgDC); a break up of retransmissions by 459 percentiles indicate that most increases come from portion of traffic 460 already experiencing retransmissions in the baseline with initial 461 window of 3 segments. 463 Traffic patterns from applications using multiple concurrent TCP 464 connections all operating with a large initial window represent one 465 of the worst case scenarios where latency can be adversely impacted 466 due to bottleneck buffer overflow. Our investigation shows that such 467 a traffic pattern has not been a problem in AvgDC, where all these 468 applications, specifically maps and image thumbnails, demonstrated 469 improved latencies varying from 2-20%. In the case of SlowDC, while 470 these applications continued showing a latency improvement in the 471 mean, their latencies in higher quantiles (96 and above for maps) 472 indicated instances where latency with larger window is worse than 473 the baseline, e.g. the 99% latency for maps has increased by 2.3% 474 (80ms) when compared to the baseline. There is no evidence from our 475 measurements that such a cost on latency is a result of subnet 476 bandwidth alone. Although we have no way of knowing from our data, we 477 conjecture that the amount of buffering at bottleneck links plays a 478 key role in performance of these applications. 480 Further details on our experiments and analysis can be found in 481 [Duk10]. 483 11. List of Concerns and Future Tests 485 Although we were a little hard pressed to find negative impact from 486 the initial window increase in our large scale tests, we don't 487 contend our test coverage is complete. The following is an attempt to 488 compile a list of concerns and to suggest future tests. Ultimately we 489 would like to enlist the help from the TCP community at IETF to study 490 and address any concern that may come up. 492 1. How complete are our tests in traffic pattern coverage? 494 Google today offers a large portfolio of services beyond web 495 search. The list includes Gmail, Google Maps, Photos, News, 496 Sites, Images, Videos,..., etc. Our tests included most of 497 Google's services, covering a wide variety of traffic sizes and 498 patterns. One notable exception is YouTube because we don't think 499 the large initial window will have much material impact, either 500 positive or negative, on bulk data services. 502 2. Larger bursts from the increase in the initial window cause 503 significantly more packet drops 505 Let the max burst capacity of an end-to-end path be the largest 506 burst of packets a given path can absorb before packet is 507 dropped. To analyze the impact from the larger initial window, it 508 helps to study the distribution of the max burst capacity of the 509 current Internet. 511 In the past similar studies were conducted by actively probing, 512 e.g., through the TCP echo/discard ports from a large set of 513 endhosts. However, most endhosts today are behind firewall 514 enabled NAT boxes, making active probing infeasible. 516 Our plan is to monitor TCP connections used to carry Google's 517 bulk data services like YouTube, and infer the max burst capacity 518 on a per-client basis from TCP internal connection parameters 519 such as ssthresh, max cwnd, and packet drop pattern. 521 3. Need more thorough analysis of the impact on slow links 523 Although our data showed the large initial window reduced the 524 average latency even for the dialup link class of only 56Kbps in 525 bandwidth, it is only prudent to perform more microscopic 526 analysis on its effect on slow links. Moreover, data from the 527 YouTube study above will likely be biased toward broadband users, 528 leaving out users behind slow links. 530 The narrowband classes here should include 56Kbps dialup modem, 531 2.5G and GPRS mobile network. 533 4. How will the larger initial window affect flows with initial 534 windows 4KB or less? 536 Flows with the larger initial window will likely grab more 537 bandwidth from a bottleneck link when competing against flows 538 with smaller initial window, at least initially. How long will 539 this "unfairness" last? Will there be any "capture effect" where 540 flows with larger initial window possess a disproportional share 541 of bandwidth beyond just a few round trips? 543 If there is any "unfairness" issue from flows with different 544 initial windows, it did not show up in our large scale 545 experiments, as the average latency for the bucket of all 546 responses < 4KB did not seem to be affected by the presence of 547 many other larger responses employing large initial window. As a 548 matter of fact they seemed to benefit from the large initial 549 window too, as shown in Figure 7 of [Duk10]. 551 More study can be done through simulation, similar to the set 552 described in RFC 2415 [RFC2415]. 554 12. Security Considerations 556 This document discusses the initial congestion window permitted for 557 TCP connections. Changing this value does not raise any known new 558 security issues with TCP. 560 13. Conclusion 562 This document suggests a change to TCP that will likely be beneficial 563 to short-lived TCP connections and those over links with long RTTs 564 (saving several RTTs during the initial slow-start phase). However, 565 more tests are likely needed to fully understand its impact to the 566 Internet. We welcome any help from the TCP community at IETF in 567 moving this proposal forward. 569 14. IANA Considerations 571 None 573 Acknowledgments 575 Many people at Google have helped to make the set of large scale 576 tests possible. We would especially like to acknowledge Amit Agarwal, 577 Tom Herbert, Arvind Jain and Tiziana Refice for their major 578 contributions. 580 Normative References 582 [PAC10] Paxson, V., Allman, M., and J. Chu, "Computing TCP's 583 Retransmission Timer", Internet-draft draft-paxson-tcpm- 584 rfc2988bis-00, work in progress, February, 2010. 586 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP 587 Selective Acknowledgement Options", RFC 2018, October 1996. 589 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 590 Requirement Levels", BCP 14, RFC 2119, March 1997. 592 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, 593 L., Leach, P. and T. Berners-Lee, "Hypertext Transfer 594 Protocol -- HTTP/1.1", RFC 2616, June 1999. 596 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 597 Timer", RFC 2988, November 2000. 599 [RFC3390] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's 600 Initial Window", RFC 3390, October 2002. 602 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 603 Control", RFC 5681, September 2009. 605 Informative References 607 [AAABH10] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J. and P. 608 Hurtig, "Early Retransmit for TCP and SCTP", Internet-draft 609 draft-ietf-tcpm-early-rexmt-04.txt, work in progress. 611 [AKAM10] "The State of the Internet, 3rd Quarter 2009", Akamai 612 Technologies, Inc., January 2010. 614 [All00] Allman, M., "A Web Server's View of the Transport Layer", 615 ACM Computer Communication Review, 30(5), October 2000. 617 [Bel10] Belshe, M., "A Client-Side Argument For Changing TCP Slow 618 Start", January, 2010. URL 619 http://sites.google.com/a/chromium.org/dev/spdy/ 620 An_Argument_For_Changing_TCP_Slow_Start.pdf 622 [Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century", 623 Presented to 75th IETF TCPM working group meeting, July 624 2009. http://www.ietf.org/proceedings/75/slides/tcpm-1.pdf. 626 [DGHS07] Dischinger, M., Gummadi, K., Haeberlen, A. and S. Saroiu, 627 "Characterizing Residential Broadband Networks", Internet 628 Measurement Conference, October 24-26, 2007. 630 [Duk10] Dukkipati, N., Refice, T., Cheng, Y., Chu, J., Sutin, N., 631 Agarwal, A., Herbert, T. and J. Arvind, "An Argument for 632 Increasing TCP's Initial Congestion Window", March, 2010. 633 URL http://code.google.com/speed/articles/ 634 tcp_initcwnd_paper.pdf 636 [FF99] Floyd, S., and K. Fall, "Promoting the Use of End-to-End 637 Congestion Control in the Internet", IEEE/ACM Transactions 638 on Networking, August 1999. 640 [FJ93] Floyd, S. and V. Jacobson, "Random Early Detection gateways 641 for Congestion Avoidance", IEEE/ACM Transactions on 642 Networking, V.1 N.4, August 1993, p. 397-413. 644 [IOR2009] Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, 645 J. Jahanian, F. and M. Karir, "Atlas Internet Observatory 646 2009 Annual Report", 47th NANOG Conference, October 2009. 648 [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer 649 Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 650 1988. 652 [LAJW07] Liu, D., Allman, M., Jin, S. and L. Wang, "Congestion 653 Control Without a Startup Phase", Protocols for Fast, Long 654 Distance Networks (PFLDnet) Workshop, February 2007. URL 655 http://www.icir.org/mallman/papers/jumpstart-pfldnet07.pdf 657 [PK98] Padmanabhan V.N. and R. Katz, "TCP Fast Start: A technique 658 tbr speeding up web transfers", in Proceedings of IEEE 659 Globecorn '98 Internet Mini-Conference, 1998. 661 [PRAKS02] Partridge, C., Rockwell, D., Allman, M., Krishnan, R. and 662 J. Sterbenz, "A Swifter Start for TCP", Technical Report 663 No. 8339, BBN Technologies, March 2002. 665 [PWSB09] Papadimitriou, D., Welzl, M., Scharf, M. and B. Briscoe, 666 "Open Research Issues in Internet Congestion Control", 667 section 3.4, Internet-draft draft-irtf-iccrg-welzl- 668 congestion-control-open-research-05.txt, work in progress. 670 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 671 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 672 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., 673 Wroclawski, J. and L. Zhang, "Recommendations on Queue 674 Management and Congestion Avoidance in the Internet", RFC 675 2309, April 1998. 677 [RFC2414] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's 678 Initial Window", RFC 2414, September 1998. 680 [RFC2415] Poduri, K. and K. Nichols, "Simulation Studies of Increased 681 Initial TCP Window Size", RFC 2415, September 1998. 683 [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's 684 Loss Recovery Using Limited Transmit", RFC 3042, January 685 2001. 687 [RFC3150] Dawkins, S., Montenegro, G., Kojo, M. and V. Magret, "End- 688 to-end Performance Implications of Slow Links", RFC 3150, 689 July 2001. 691 [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 692 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 693 April 2004. 695 [RFC4782] Floyd, S., Allman, M., Jain, A. and P. Sarolahti, "Quick- 696 Start for TCP and IP", RFC 4782, January 2007. 698 [RJ10] Ramachandran, S. and A. Jain, "Aggregate Statistics of Size 699 Related Metrics of Web Pages metrics", 2010. URL 700 http://code.google.com/speed/articles/web-metrics.html 702 [Sch08] Scharf, M., "Quick-Start, Jump-Start, and Other Fast 703 Startup Approaches", November 17, 2008. URL 704 http://www.ietf.org/old/2009/proceedings/08nov/slides/ 705 iccrg-2.pdf 707 [SPDY] "SPDY: An experimental protocol for a faster web", URL 708 http://dev.chromium.org/spdy 710 [Ste08] Sounders S., "Roundup on Parallel Connections", High 711 Performance Web Sites blog. URL 712 http://www.stevesouders.com/blog/2008/03/20/roundup-on- 713 parallel-connections 715 [VH97] Visweswaraiah, V. and J. Heidemann, "Improving Restart of 716 Idle TCP Connections", Technical Report 97-661, University 717 of Southern California, November 1997. 719 Author's Addresses 721 H.K. Jerry Chu 722 Google, Inc. 723 1600 Amphitheatre Parkway 724 Mountain View, CA 94043 725 USA 726 EMail: hkchu@google.com 728 Nandita Dukkipati 729 Google, Inc. 730 1600 Amphitheatre Parkway 731 Mountain View, CA 94043 732 USA 733 EMail: nanditad@google.com 735 Yuchung Cheng 736 Google, Inc. 737 1600 Amphitheatre Parkway 738 Mountain View, CA 94043 739 USA 740 EMail: ycheng@google.com 742 Matt Mathis 743 Google, Inc. 744 1600 Amphitheatre Parkway 745 Mountain View, CA 94043 746 USA 747 EMail: mattmathis@google.com 749 Acknowledgement 751 Funding for the RFC Editor function is currently provided by the 752 Internet Society.