idnits 2.17.1 draft-ietf-tsvwg-initwin-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 9 instances of too long lines in the document, the longest one being 11 characters in excess of 72. ** There is 1 instance of lines with control characters in the document. ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2481' is mentioned on line 573, but not defined ** Obsolete undefined reference: RFC 2481 (Obsoleted by RFC 3168) == Missing Reference: 'MMFR96' is mentioned on line 585, but not defined == Unused Reference: 'FF96' is defined on line 461, but no explicit reference was found in the text == Unused Reference: 'FJ93' is defined on line 470, but no explicit reference was found in the text == Unused Reference: 'Flo96' is defined on line 477, but no explicit reference was found in the text == Unused Reference: 'RFC2309' is defined on line 516, but no explicit reference was found in the text == Unused Reference: 'RFC3168' is defined on line 541, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AHO98' -- Possible downref: Non-RFC (?) normative reference: ref. 'All97a' -- Possible downref: Non-RFC (?) normative reference: ref. 'All97b' -- Possible downref: Non-RFC (?) normative reference: ref. 'All00' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93' -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo97' -- Possible downref: Non-RFC (?) normative reference: ref. 'KAGT98' -- Possible downref: Non-RFC (?) normative reference: ref. 'Mor97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Nic97' ** Obsolete normative reference: RFC 821 (ref. 'Pos82') (Obsoleted by RFC 2821) ** Downref: Normative reference to an Informational RFC: RFC 1945 ** Obsolete normative reference: RFC 2068 (Obsoleted by RFC 2616) ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) ** Obsolete normative reference: RFC 2414 (Obsoleted by RFC 3390) ** Downref: Normative reference to an Informational RFC: RFC 2415 ** Downref: Normative reference to an Informational RFC: RFC 2416 ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) Summary: 18 errors (**), 0 flaws (~~), 9 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Mark Allman 3 INTERNET DRAFT BBN/NASA GRC 4 File: draft-ietf-tsvwg-initwin-03.txt April, 2002 5 Expires: October, 2002 6 Sally Floyd 7 ICIR 8 Craig Partridge 9 BBN Technologies 11 Increasing TCP's Initial Window 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as 21 Internet-Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six 24 months and may be updated, replaced, or obsoleted by other documents 25 at any time. It is inappropriate to use Internet- Drafts as 26 reference material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Abstract 36 This document specifies an optional standard for TCP to increase the 37 permitted initial window from one segment to roughly 4K bytes, 38 replacing RFC 2414. This document discusses the advantages and 39 disadvantages of the higher initial window. The document includes 40 discussion of experiments and simulations showing that the higher 41 initial window does not lead to congestion collapse. Finally, the 42 document provides guidance on implementation issues. 44 Terminology 46 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 47 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 48 document are to be interpreted as described in RFC 2119 [RFC2119]. 50 1. TCP Modification 52 This document updates [RFC2414] and specifies an increase in the 53 permitted upper bound for TCP's initial window from one segment to 54 between two and four segments. In most cases, this change results 55 in an upper bound on the initial window of roughly 4K bytes 56 (although given a large segment size, the permitted initial window 57 of two segments may be significantly larger than 4K bytes). The 58 upper bound for the initial window is given more precisely in (1): 60 min (4*MSS, max (2*MSS, 4380 bytes)) (1) 62 Equivalently, the upper bound for the initial window size is based 63 on the maximum segment size (MSS), as follows: 65 If (MSS <= 1095 bytes) 66 then win <= 4 * MSS; 67 If (1095 bytes < MSS < 2190 bytes) 68 then win <= 4380; 69 If (2190 bytes <= MSS) 70 then win <= 2 * MSS; 72 This increased initial window is optional: that a TCP MAY start with 73 a larger initial window. However, we expect that most 74 general-purpose TCP implementations would choose to use the larger 75 initial congestion window given in equation (1) above. 77 This upper bound for the initial window size represents a change 78 from RFC 2581 [RFC2581], which specified that the congestion window 79 be initialized to one or two segments. 81 This change applies to the initial window of the connection in the 82 first round trip time (RTT) of data transmission following the TCP three- 83 way handshake. Neither the SYN/ACK nor its acknowledgment (ACK) in 84 the three-way handshake should increase the initial window size 85 above that outlined in equation (1). If the SYN or SYN/ACK is lost, 86 the initial window used by a sender after a correctly transmitted 87 SYN MUST be one segment consisting of MSS bytes. 89 TCP implementations use slow start in as many as three different 90 ways: (1) to start a new connection (the initial window); (2) to 91 restart transmission after a long idle period (the restart window); 92 and (3) to restart transmission after a retransmit timeout (the loss 93 window). The change specified in this document affects the value of 94 the initial window. Optionally, a TCP MAY set the restart window to 95 the minimum of the value used for the initial window and the current 96 value of cwnd (in other words, using a larger value for the restart 97 window should never increase the size of cwnd). These changes do 98 NOT change the loss window, which must remain 1 segment of MSS bytes 99 (to permit the lowest possible window size in the case of severe 100 congestion). 102 2. Implementation Issues 104 When larger initial windows are implemented along with Path MTU 105 Discovery [RFC1191], and the MSS being used is found to be too large, 106 the congestion window `cwnd' SHOULD be reduced to prevent large 107 bursts of smaller segments. Specifically, `cwnd' SHOULD be reduced 108 by the ratio of the old segment size to the new segment size. 110 When larger initial windows are implemented along with Path MTU 111 Discovery [RFC1191], alternatives are to set the "Don't Fragment" 112 (DF) bit in all segments in the initial window, or to set the "Don't 113 Fragment" (DF) bit in one of the segments. It is an open question 114 which of these two alternatives is best; we would hope that 115 implementation experiences will shed light on this question. In the 116 first case of setting the DF bit in all segments, if the initial 117 packets are too large, then all of the initial packets will be 118 dropped in the network. In the second case of setting the DF bit in 119 only one segment, if the initial packets are too large, then all but 120 one of the initial packets will be fragmented in the network. When 121 the second case is followed, setting the DF bit in the last segment 122 in the initial window provides the least chance for needless 123 retransmissions when the initial segment size is found to be too 124 large, because it minimizes the chances of duplicate ACKs triggering 125 a Fast Retransmit. However, more attention needs to be paid to the 126 interaction between larger initial windows and Path MTU Discovery. 128 The larger initial window specified in this document is not intended 129 as encouragement for web browsers to open multiple simultaneous 130 TCP connections all with large initial windows. When web browsers 131 open simultaneous TCP connections to the same destination, this 132 works against TCP's congestion control mechanisms [FF98], regardless 133 of the size of the initial window. Combining this behavior with 134 larger initial windows further increases the unfairness to other 135 traffic in the network. 137 3. Advantages of Larger Initial Windows 139 1. When the initial window is one segment, a receiver employing 140 delayed ACKs [RFC1122] is forced to wait for a timeout before 141 generating an ACK. With an initial window of at least two 142 segments, the receiver will generate an ACK after the second 143 data segment arrives. This eliminates the wait on the timeout 144 (often up to 200 msec, and possibly up to 500 msec [RFC1122]). 146 2. For connections transmitting only a small amount of data, a 147 larger initial window reduces the transmission time (assuming at 148 most moderate segment drop rates). For many email (SMTP 149 [Pos82]) and web page (HTTP [RFC1945, RFC2068]) transfers that 150 are less than 4K bytes, the larger initial window would reduce 151 the data transfer time to a single RTT. 153 3. For connections that will be able to use large congestion 154 windows, this modification eliminates up to three RTTs and a 155 delayed ACK timeout during the initial slow-start phase. This 156 will be of particular benefit for high-bandwidth large- 157 propagation-delay TCP connections, such as those over satellite 158 links. 160 4. Disadvantages of Larger Initial Windows for the Individual 161 Connection 163 In high-congestion environments, particularly for routers that have 164 a bias against bursty traffic (as in the typical Drop Tail router 165 queues), a TCP connection can sometimes be better off starting with 166 an initial window of one segment. There are scenarios where a TCP 167 connection slow-starting from an initial window of one segment might 168 not have segments dropped, while a TCP connection starting with an 169 initial window of four segments might experience unnecessary 170 retransmits due to the inability of the router to handle small 171 bursts. This could result in an unnecessary retransmit timeout. 172 For a large-window connection that is able to recover without a 173 retransmit timeout, this could result in an unnecessarily-early 174 transition from the slow-start to the congestion-avoidance phase of 175 the window increase algorithm. These premature segment drops are 176 unlikely to occur in uncongested networks with sufficient buffering 177 or in moderately-congested networks where the congested router uses 178 active queue management (such as Random Early Detection [FJ93, 179 RFC2309]). 181 Some TCP connections will receive better performance with the larger 182 initial window even if the burstiness of the initial window results 183 in premature segment drops. This will be true if (1) the TCP 184 connection recovers from the segment drop without a retransmit 185 timeout, and (2) the TCP connection is ultimately limited to a small 186 congestion window by either network congestion or by the receiver's 187 advertised window. 189 5. Disadvantages of Larger Initial Windows for the Network 191 In terms of the potential for congestion collapse, we consider two 192 separate potential dangers for the network. The first danger would 193 be a scenario where a large number of segments on congested links 194 were duplicate segments that had already been received at the 195 receiver. The second danger would be a scenario where a large 196 number of segments on congested links were segments that would be 197 dropped later in the network before reaching their final 198 destination. 200 In terms of the negative effect on other traffic in the network, a 201 potential disadvantage of larger initial windows would be that they 202 increase the general packet drop rate in the network. We discuss 203 these three issues below. 205 Duplicate segments: 207 As described in the previous section, the larger initial window 208 could occasionally result in a segment dropped from the initial 209 window, when that segment might not have been dropped if the 210 sender had slow-started from an initial window of one segment. 211 However, Appendix A shows that even in this case, the larger 212 initial window would not result in the transmission of a large 213 number of duplicate segments. 215 Segments dropped later in the network: 217 How much would the larger initial window for TCP increase the 218 number of segments on congested links that would be dropped 219 before reaching their final destination? This is a problem that 220 can only occur for connections with multiple congested links, 221 where some segments might use scarce bandwidth on the first 222 congested link along the path, only to be dropped later along 223 the path. 225 First, many of the TCP connections will have only one congested 226 link along the path. Segments dropped from these connections do 227 not "waste" scarce bandwidth, and do not contribute to 228 congestion collapse. 230 However, some network paths will have multiple congested links, 231 and segments dropped from the initial window could use scarce 232 bandwidth along the earlier congested links before ultimately 233 being dropped on subsequent congested links. To the extent that 234 the drop rate is independent of the initial window used by TCP 235 segments, the problem of congested links carrying segments that 236 will be dropped before reaching their destination will be 237 similar for TCP connections that start by sending four segments 238 or one segment. 240 An increased packet drop rate: 242 For a network with a high segment drop rate, increasing the TCP 243 initial window could increase the segment drop rate even 244 further. This is in part because routers with Drop Tail queue 245 management have difficulties with bursty traffic in times of 246 congestion. However, given uncorrelated arrivals for TCP 247 connections, the larger TCP initial window should not 248 significantly increase the segment drop rate. Simulation-based 249 explorations of these issues are discussed in Section 7.2. 251 These potential dangers for the network are explored in simulations 252 and experiments described in the section below. Our judgment is that 253 while there are dangers of congestion collapse in the current 254 Internet (see [FF98] for a discussion of the dangers of congestion 255 collapse from an increased deployment of UDP connections without 256 end-to-end congestion control), there is no such danger to the 257 network from increasing the TCP initial window to 4K bytes. 259 6. Interactions with the Retransmission Timer 261 Using a larger initial burst of data can exacerbate existing 262 problems with spurious retransmit timeouts on low-bandwidth paths, 263 assuming the standard algorithm for determining the TCP 264 retransmission timeout (RTO) [RFC2988]. The problem is that across 265 low-bandwidth network paths on which the transmission time of a 266 packet is a large portion of the round-trip time, the small packets 267 used to establish a TCP connection do not seed the RTO estimator appropriately. 268 When the first window of data packets is transmitted, the sender's 269 retransmit timer could expire before the acknowledgments for those 270 packets are received. As each acknowledgment arrives, the 271 retransmit timer is generally reset. Thus, the retransmit timer 272 will not expire as long as an acknowledgment arrives at least once 273 a second, given the one-second minimum on the RTO recommended in RFC 274 2988. 276 For instance, consider a 9.6 Kbps link. The initial RTT measurement 277 will be on the order of 67 msec, if we simply consider the 278 transmission time of 2 packets (the SYN and SYN-ACK) each consisting 279 of 40 bytes. Using the RTO estimator given in [RFC2988], this 280 yields an initial RTO of 201 msec (67 + 4*(67/2)). However, we 281 round the RTO to 1 second as specified in RFC 2988. Then assume we 282 send an initial window of one or more 1500-byte packets (1460 data 283 bytes plus overhead). Each packet will take on the order of 1.25 284 seconds to transmit. Clearly the RTO will fire before the ACK for 285 the first packet returns, causing a spurious timeout. In this case, 286 a larger initial window of three or four packets exacerbates the 287 problems caused by this spurious timeout. 289 One way to deal with this problem is to make the RTO algorithm more 290 conservative. During the initial window of data, for instance, we 291 could update the RTO for each acknowledgment received. In 292 addition, if the retransmit timer expires for some packet lost in 293 the first window of data, we could leave the exponential-backoff of 294 the retransmit timer engaged until at least one valid RTT measurement is 295 received that involves a data packet. 297 Another method would be to refrain from taking a RTT sample during 298 connection establishment, leaving the default RTO in place until TCP 299 takes a sample from a data segment and the corresponding ACK. While 300 this method likely helps prevent spurious retransmits it also slows 301 the data transfer down if loss occurs before the RTO is seeded. 303 This specification leaves the decision about what to do (if 304 anything) with regards to the RTO when using a larger initial window 305 to the implementer. 307 7. Typical Levels of Burstiness for TCP Traffic. 309 Larger TCP initial windows would not dramatically increase the 310 burstiness of TCP traffic in the Internet today, because such 311 traffic is already fairly bursty. Bursts of two and three segments 312 are already typical of TCP [Flo97]; A delayed ACK (covering two 313 previously unacknowledged segments) received during congestion 314 avoidance causes the congestion window to slide and two segments to 315 be sent. The same delayed ACK received during slow start causes the 316 window to slide by two segments and then be incremented by one 317 segment, resulting in a three-segment burst. While not necessarily 318 typical, bursts of four and five segments for TCP are not rare. 319 Assuming delayed ACKs, a single dropped ACK causes the subsequent 320 ACK to cover four previously unacknowledged segments. During 321 congestion avoidance this leads to a four-segment burst and during 322 slow start a five-segment burst is generated. 324 There are also changes in progress that reduce the performance 325 problems posed by moderate traffic bursts. One such change is the 326 deployment of higher-speed links in some parts of the network, where 327 a burst of 4K bytes can represent a small quantity of data. A 328 second change, for routers with sufficient buffering, is the 329 deployment of queue management mechanisms such as RED, which is 330 designed to be tolerant of transient traffic bursts. 332 8. Simulations and Experimental Results 334 8.1 Studies of TCP Connections using that Larger Initial Window 336 This section surveys simulations and experiments that have been used 337 to explore the effect of larger initial windows on TCP 338 connections. The first set of experiments 339 explores performance over satellite links. Larger initial windows 340 have been shown to improve performance of TCP connections over 341 satellite channels [All97b]. In this study, an initial window of 342 four segments (512 byte MSS) resulted in throughput improvements of 343 up to 30% (depending upon transfer size). [KAGT98] shows that the 344 use of larger initial windows results in a decrease in transfer time 345 in HTTP tests over the ACTS satellite system. A study involving 346 simulations of a large number of HTTP transactions over hybrid fiber 347 coax (HFC) indicates that the use of larger initial windows 348 decreases the time required to load WWW pages [Nic97]. 350 A second set of experiments has explored TCP performance over dialup 351 modem links. In experiments over a 28.8 bps dialup channel [All97a, 352 AHO98], a four-segment initial window decreased the transfer time of 353 a 16KB file by roughly 10%, with no accompanying increase in the 354 drop rate. A particular area of concern has been TCP performance 355 over low speed tail circuits (e.g., dialup modem links) with routers 356 with small buffers. A simulation study [RFC2416] investigated the 357 effects of using a larger initial window on a host connected by a 358 slow modem link and a router with a 3 packet buffer. The study 359 concluded that for the scenario investigated, the use of larger 360 initial windows was not harmful to TCP performance. Questions have 361 been raised concerning the effects of larger initial windows on the 362 transfer time for short transfers in this environment, but these 363 effects have not been quantified. A question has also been raised 364 concerning the possible effect on existing TCP connections sharing 365 the link. 367 Finally, [All00] illustrates that the percentage of connections at a 368 particular web server that experience loss in the initial window of 369 data transmission increases with the size of the initial congestion 370 window. However, the increase is in line with what would be 371 expected from sending a larger burst into the network. 373 8.2 Studies of Networks using Larger Initial Windows 375 This section surveys simulations and experiments investigating the 376 impact of the larger window on other TCP connections sharing the 377 path. Experiments in [All97a, AHO98] show that for 16 KB transfers 378 to 100 Internet hosts, four-segment initial windows resulted in a 379 small increase in the drop rate of 0.04 segments/transfer. While 380 the drop rate increased slightly, the transfer time was reduced by 381 roughly 25% for transfers using the four-segment (512 byte MSS) 382 initial window when compared to an initial window of one segment. 384 One scenario of concern is heavily loaded links. For instance, 385 several years ago one of the trans-Atlantic links was so heavily 386 loaded that the correct congestion window size for each connection was 387 about one segment. In this environment, new connections using 388 larger initial windows would be starting with windows that were four 389 times too big. What would the effects be? Do connections thrash? 391 A simulation study in [RFC2415] explores the impact of a larger initial 392 window on competing network traffic. In this investigation, HTTP 393 and FTP flows share a single congested gateway (where the number of 394 HTTP and FTP flows varies from one simulation set to another). For 395 each simulation set, the paper examines aggregate link utilization 396 and packet drop rates, median web page delay, and network power for 397 the FTP transfers. The larger initial window generally resulted in 398 increased throughput, slightly-increased packet drop rates, and an 399 increase in overall network power. With the exception of one 400 scenario, the larger initial window resulted in an increase in the 401 drop rate of less than 1% above the loss rate experienced when using 402 a one-segment initial window; in this scenario, the drop rate 403 increased from 3.5% with one-segment initial windows, to 4.5% with 404 four-segment initial windows. The overall conclusions were that 405 increasing the TCP initial window to three packets (or 4380 bytes) 406 helps to improve perceived performance. 408 Morris [Mor97] investigated larger initial windows in a very 409 congested network with transfers of size 20K. The loss rate in 410 networks where all TCP connections use an initial window of four 411 segments is shown to be 1-2% greater than in a network where all 412 connections use an initial window of one segment. This relationship 413 held in scenarios where the loss rates with one-segment initial 414 windows ranged from 1% to 11%. In addition, in networks where 415 connections used an initial window of four segments, TCP connections 416 spent more time waiting for the retransmit timer (RTO) to expire to 417 resend a segment than was spent when using an initial window of one 418 segment. The time spent waiting for the RTO timer to expire 419 represents idle time when no useful work was being accomplished for 420 that connection. These results show that in a very congested 421 environment, where each connection's share of the bottleneck 422 bandwidth is close to one segment, using a larger initial window can 423 cause a perceptible increase in both loss rates and retransmit 424 timeouts. 426 9. Security Considerations 428 This document discusses the initial congestion window permitted for 429 TCP connections. Changing this value does not raise any known new 430 security issues with TCP. 432 10. Conclusion 433 This document specifies a small change to TCP that will likely be beneficial 434 to short-lived TCP connections and those over links with long RTTs 435 (saving several RTTs during the initial slow-start phase). 437 11. Acknowledgments 439 We would like to acknowledge Vern Paxson, Tim Shepard, members of 440 the End-to-End-Interest Mailing List, and members of the IETF TCP 441 Implementation Working Group for continuing discussions of these 442 issues for discussions and feedback on this document. 444 12. References 446 [AHO98] Mark Allman, Chris Hayes, and Shawn Ostermann, An Evaluation 447 of TCP with Larger Initial Windows, March 1998. Submitted to 448 ACM Computer Communication Review. URL: 449 "http://roland.lerc.nasa.gov/~mallman/papers/initwin.ps". 451 [All97a] Mark Allman. An Evaluation of TCP with Larger Initial 452 Windows. 40th IETF Meeting -- TCP Implementations WG. 453 December, 1997. Washington, DC. 455 [All97b] Mark Allman. Improving TCP Performance Over Satellite 456 Channels. Master's thesis, Ohio University, June 1997. 458 [All00] Mark Allman. A Web Server's View of the Transport Layer. ACM 459 Computer Communication Review, 30(5), October 2000. 461 [FF96] Fall, K., and Floyd, S., Simulation-based Comparisons of 462 Tahoe, Reno, and SACK TCP. Computer Communication Review, 463 26(3), July 1996. 465 [FF98] Sally Floyd, Kevin Fall. Promoting the Use of End-to-End 466 Congestion Control in the Internet. Submitted to IEEE 467 Transactions on Networking. URL "http://www- 468 nrg.ee.lbl.gov/floyd/end2end-paper.html". 470 [FJ93] Floyd, S., and Jacobson, V., Random Early Detection gateways 471 for Congestion Avoidance. IEEE/ACM Transactions on Networking, 472 V.1 N.4, August 1993, p. 397-413. 474 [Flo94] Floyd, S., TCP and Explicit Congestion Notification. 475 Computer Communication Review, 24(5):10-23, October 1994. 477 [Flo96] Floyd, S., Issues of TCP with SACK. Technical report, 478 January 1996. Available from http://www-nrg.ee.lbl.gov/floyd/. 480 [Flo97] Floyd, S., Increasing TCP's Initial Window. Viewgraphs, 481 40th IETF Meeting - TCP Implementations WG. December, 1997. URL 482 "ftp://ftp.ee.lbl.gov/talks/sf-tcp-ietf97.ps". 484 [KAGT98] Hans Kruse, Mark Allman, Jim Griner, Diepchi Tran. HTTP 485 Page Transfer Rates Over Geo-Stationary Satellite Links. March 486 1998. Proceedings of the Sixth International Conference on 487 Telecommunication Systems. URL 488 "http://roland.lerc.nasa.gov/~mallman/papers/nash98.ps". 490 [Mor97] Robert Morris. Private communication, 1997. Cited for 491 acknowledgement purposes only. 493 [Nic97] Kathleen Nichols. Improving Network Simulation with 494 Feedback. Com21, Inc. Technical Report. Available from 495 http://www.com21.com/pages/papers/068.pdf. 497 [Pos82] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC 498 821, August 1982. 500 [RFC1122] Braden, R., "Requirements for Internet Hosts -- 501 Communication Layers", STD 3, RFC 1122, October 1989. 503 [RFC1191] Mogul, J., and S. Deering, "Path MTU Discovery", RFC 1191, 504 November 1990. 506 [RFC1945] Berners-Lee, T., Fielding, R., and H. Nielsen, "Hypertext 507 Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996. 509 [RFC2068] Fielding, R., Mogul, J., Gettys, J., Frystyk, H., and T. 510 Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 511 2068, January 1997. 513 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 514 Requirement Levels", BCP 14, RFC 2119, March 1997. 516 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 517 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 518 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., 519 Wroclawski, J., and L. Zhang, "Recommendations on Queue 520 Management and Congestion Avoidance in the Internet", RFC 2309, 521 April 1998. 523 [RFC2414] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's 524 Initial Window", RFC 2414, September 1998. 526 [RFC2415] Poduri, K., and K. Nichols, "Simulation Studies of 527 Increased Initial TCP Window Size", RFC 2415, September 1998. 529 [RFC2416] Shepard, T., and C. Partridge, "When TCP Starts Up With 530 Four Packets Into Only Three Buffers", RFC 2416, September 1998. 532 [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens. TCP 533 Congestion Control, April 1999. RFC 2581. 535 [RFC2988] Vern Paxson, Mark Allman. Computing TCP's Retransmission 536 Timer, November 2000. RFC 2988. 538 [RFC3042] M. Allman, H. Balakrishnan, and S. Floyd, Enhancing TCP's 539 Loss Recovery Using Limited Transmit, RFC 3042, January 2001. 541 [RFC3168] Ramakrishnan, K.K., Floyd, S., and Black, D., "The 542 Addition of Explicit Congestion Notification (ECN) to IP", RFC 543 3168, September 2001. 545 13. Author's Addresses 547 Mark Allman 548 BBN Technologies/NASA Glenn Research Center 549 21000 Brookpark Road 550 MS 54-5 551 Cleveland, OH 44135 552 EMail: mallman@bbn.com 553 http://roland.lerc.nasa.gov/~mallman/ 555 Sally Floyd 556 ICSI Center for Internet Research 557 1947 Center St, Suite 600 558 Berkeley, CA 94704 559 Phone: +1 (510) 666-2989 560 EMail: floyd@icir.org 561 http://www.icir.org/floyd/ 563 Craig Partridge 564 BBN Technologies 565 10 Moulton Street 566 Cambridge, MA 02138 568 EMail: craig@bbn.com 570 14. Appendix - Duplicate Segments 572 In the current environment (without Explicit Congestion Notification 573 [Flo94] [RFC2481]), all TCPs use segment drops as indications from 574 the network about the limits of available bandwidth. We argue here 575 that the change to a larger initial window should not result in the 576 sender retransmitting a large number of duplicate segments that have 577 already arrived at the receiver. 579 If one segment is dropped from the initial window, there are three 580 different ways for TCP to recover: (1) Slow-starting from a window 581 of one segment, as is done after a retransmit timeout, or after Fast 582 Retransmit in Tahoe TCP; (2) Fast Recovery without selective 583 acknowledgments (SACK), as is done after three duplicate ACKs in 584 Reno TCP; and (3) Fast Recovery with SACK, for TCP where both the 585 sender and the receiver support the SACK option [MMFR96]. In all 586 three cases, if a single segment is dropped from the initial window, 587 no duplicate segments (i.e., segments that have already been 588 received at the receiver) are transmitted. Note that for a TCP 589 sending four 512-byte segments in the initial window, a single 590 segment drop will not require a retransmit timeout, but can be 591 recovered from using the Fast Retransmit algorithm (unless the 592 retransmit timer expires prematurely). In addition, a single 593 segment dropped from an initial window of three segments might be 594 repaired using the fast retransmit algorithm, depending on which 595 segment is dropped and whether or not delayed ACKs are used. For 596 example, dropping the first segment of a three segment initial 597 window will always require waiting for a timeout, in the absence of 598 Limited Transmit [RFC3042]. However, dropping the third segment 599 will always allow recovery via the fast retransmit algorithm, as 600 long as no ACKs are lost. 602 Next we consider scenarios where the initial window contains two to 603 four segments, and at least two of those segments are dropped. If 604 all segments in the initial window are dropped, then clearly no 605 duplicate segments are retransmitted, as the receiver has not yet 606 received any segments. (It is still a possibility that these 607 dropped segments used scarce bandwidth on the way to their drop 608 point; this issue was discussed in Section 5.) 610 When two segments are dropped from an initial window of three 611 segments, the sender will only send a duplicate segment if the first 612 two of the three segments were dropped, and the sender does not 613 receive a packet with the SACK option acknowledging the third 614 segment. 616 When two segments are dropped from an initial window of four 617 segments, an examination of the six possible scenarios (which we 618 don't go through here) shows that, depending on the position of the 619 dropped packets, in the absence of SACK the sender might send one 620 duplicate segment. There are no scenarios in which the sender sends 621 two duplicate segments. 623 When three segments are dropped from an initial window of four 624 segments, then, in the absence of SACK, it is possible that one 625 duplicate segment will be sent, depending on the position of the 626 dropped segments. 628 The summary is that in the absence of SACK, there are some scenarios 629 with multiple segment drops from the initial window where one 630 duplicate segment will be transmitted. There are no scenarios where 631 more that one duplicate segment will be transmitted. Our conclusion 632 is that the number of duplicate segments transmitted as a result of 633 a larger initial window should be small. 635 15. Full Copyright Statement 637 Copyright (C) The Internet Society (2001). All Rights Reserved. 639 This document and translations of it may be copied and furnished to 640 others, and derivative works that comment on or otherwise explain it 641 or assist in its implementation may be prepared, copied, published 642 and distributed, in whole or in part, without restriction of any 643 kind, provided that the above copyright notice and this paragraph are 644 included on all such copies and derivative works. However, this 645 document itself may not be modified in any way, such as by removing 646 the copyright notice or references to the Internet Society or other 647 Internet organizations, except as needed for the purpose of 648 developing Internet standards in which case the procedures for 649 copyrights defined in the Internet Standards process must be 650 followed, or as required to translate it into languages other than 651 English. 653 The limited permissions granted above are perpetual and will not be 654 revoked by the Internet Society or its successors or assigns. 656 This document and the information contained herein is provided on an 657 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 658 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 659 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 660 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 661 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.