idnits 2.17.1 draft-ietf-tsvwg-initwin-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 8 instances of too long lines in the document, the longest one being 11 characters in excess of 72. ** There is 1 instance of lines with control characters in the document. ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2481' is mentioned on line 566, but not defined ** Obsolete undefined reference: RFC 2481 (Obsoleted by RFC 3168) == Missing Reference: 'MMFR96' is mentioned on line 578, but not defined == Unused Reference: 'FF96' is defined on line 458, but no explicit reference was found in the text == Unused Reference: 'FJ93' is defined on line 467, but no explicit reference was found in the text == Unused Reference: 'Flo96' is defined on line 474, but no explicit reference was found in the text == Unused Reference: 'RFC2309' is defined on line 513, but no explicit reference was found in the text == Unused Reference: 'RFC3168' is defined on line 535, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AHO98' -- Possible downref: Non-RFC (?) normative reference: ref. 'All97a' -- Possible downref: Non-RFC (?) normative reference: ref. 'All97b' -- Possible downref: Non-RFC (?) normative reference: ref. 'All00' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93' -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo97' -- Possible downref: Non-RFC (?) normative reference: ref. 'KAGT98' -- Possible downref: Non-RFC (?) normative reference: ref. 'Mor97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Nic97' ** Obsolete normative reference: RFC 821 (ref. 'Pos82') (Obsoleted by RFC 2821) ** Downref: Normative reference to an Informational RFC: RFC 1945 ** Obsolete normative reference: RFC 2068 (Obsoleted by RFC 2616) ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) ** Downref: Normative reference to an Informational RFC: RFC 2415 ** Downref: Normative reference to an Informational RFC: RFC 2416 ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) Summary: 17 errors (**), 0 flaws (~~), 8 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Mark Allman 2 INTERNET DRAFT BBN/NASA GRC 3 File: draft-ietf-tsvwg-initwin-02.txt March, 2002 4 Expires: September, 2002 5 Sally Floyd 6 ICIR 7 Craig Partridge 8 BBN Technologies 10 Increasing TCP's Initial Window 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC2026. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as 20 Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six 23 months and may be updated, replaced, or obsoleted by other documents 24 at any time. It is inappropriate to use Internet- Drafts as 25 reference material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract 35 This document specifies an increase in the permitted initial window 36 for TCP from one segment to roughly 4K bytes. This document also 37 discusses the advantages and disadvantages of the change, outlining 38 experimental results that indicate the costs and benefits. 40 Terminology 42 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 43 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 44 document are to be interpreted as described in RFC 2119 [RFC2119]. 46 1. TCP Modification 48 This document specifies an increase in the permitted upper bound for 49 TCP's initial window from one segment to between two and four 50 segments. In most cases, this change results in an upper bound on 51 the initial window of roughly 4K bytes (although given a large 52 segment size, the permitted initial window of two segments may be 53 significantly larger than 4K bytes). The upper bound for the 54 initial window is given more precisely in (1): 56 min (4*MSS, max (2*MSS, 4380 bytes)) (1) 58 Equivalently, the upper bound for the initial window size is based 59 on the maximum segment size (MSS), as follows: 61 If (MSS <= 1095 bytes) 62 then win <= 4 * MSS; 63 If (1095 bytes < MSS < 2190 bytes) 64 then win <= 4380; 65 If (2190 bytes <= MSS) 66 then win <= 2 * MSS; 68 This increased initial window is optional: that a TCP MAY start with 69 a larger initial window. However, we expect that most 70 general-purpose TCP implementations would choose to use the larger 71 initial congestion window given in equation (1) above. 73 This upper bound for the initial window size represents a change 74 from RFC 2581 [RFC2581], which specified that the congestion window 75 be initialized to one or two segments. 77 This change applies to the initial window of the connection in the 78 first round trip time (RTT) of data transmission following the TCP three- 79 way handshake. Neither the SYN/ACK nor its acknowledgment (ACK) in 80 the three-way handshake should increase the initial window size 81 above that outlined in equation (1). If the SYN or SYN/ACK is lost, 82 the initial window used by a sender after a correctly transmitted 83 SYN MUST be one segment consisting of MSS bytes. 85 TCP implementations use slow start in as many as three different 86 ways: (1) to start a new connection (the initial window); (2) to 87 restart transmission after a long idle period (the restart window); 88 and (3) to restart transmission after a retransmit timeout (the loss 89 window). The change specified in this document affects the value of 90 the initial window. Optionally, a TCP MAY set the restart window to 91 the minimum of the value used for the initial window and the current 92 value of cwnd (in other words, using a larger value for the restart 93 window should never increase the size of cwnd). These changes do 94 NOT change the loss window, which must remain 1 segment of MSS bytes 95 (to permit the lowest possible window size in the case of severe 96 congestion). 98 2. Implementation Issues 100 When larger initial windows are implemented along with Path MTU 101 Discovery [RFC1191], and the MSS being used is found to be too large, 102 the congestion window `cwnd' SHOULD be reduced to prevent large 103 bursts of smaller segments. Specifically, `cwnd' SHOULD be reduced 104 by the ratio of the old segment size to the new segment size. 106 When larger initial windows are implemented along with Path MTU 107 Discovery [RFC1191], alternatives are to set the "Don't Fragment" 108 (DF) bit in all segments in the initial window, or to set the "Don't 109 Fragment" (DF) bit in one of the segments. It is an open question 110 which of these two alternatives is best; we would hope that 111 implementation experiences will shed light on this question. In the 112 first case of setting the DF bit in all segments, if the initial 113 packets are too large, then all of the initial packets will be 114 dropped in the network. In the second case of setting the DF bit in 115 only one segment, if the initial packets are too large, then all but 116 one of the initial packets will be fragmented in the network. When 117 the second case is followed, setting the DF bit in the last segment 118 in the initial window provides the least chance for needless 119 retransmissions when the initial segment size is found to be too 120 large, because it minimizes the chances of duplicate ACKs triggering 121 a Fast Retransmit. However, more attention needs to be paid to the 122 interaction between larger initial windows and Path MTU Discovery. 124 The larger initial window specified in this document is not intended 125 as encouragement for web browsers to open multiple simultaneous 126 TCP connections all with large initial windows. When web browsers 127 open simultaneous TCP connections to the same destination, this 128 works against TCP's congestion control mechanisms [FF98], regardless 129 of the size of the initial window. Combining this behavior with 130 larger initial windows further increases the unfairness to other 131 traffic in the network. 133 3. Advantages of Larger Initial Windows 135 1. When the initial window is one segment, a receiver employing 136 delayed ACKs [RFC1122] is forced to wait for a timeout before 137 generating an ACK. With an initial window of at least two 138 segments, the receiver will generate an ACK after the second 139 data segment arrives. This eliminates the wait on the timeout 140 (often up to 200 msec, and possibly up to 500 msec [RFC1122]). 142 2. For connections transmitting only a small amount of data, a 143 larger initial window reduces the transmission time (assuming at 144 most moderate segment drop rates). For many email (SMTP 145 [Pos82]) and web page (HTTP [RFC1945, RFC2068]) transfers that 146 are less than 4K bytes, the larger initial window would reduce 147 the data transfer time to a single RTT. 149 3. For connections that will be able to use large congestion 150 windows, this modification eliminates up to three RTTs and a 151 delayed ACK timeout during the initial slow-start phase. This 152 will be of particular benefit for high-bandwidth large- 153 propagation-delay TCP connections, such as those over satellite 154 links. 156 4. Disadvantages of Larger Initial Windows for the Individual 157 Connection 159 In high-congestion environments, particularly for routers that have 160 a bias against bursty traffic (as in the typical Drop Tail router 161 queues), a TCP connection can sometimes be better off starting with 162 an initial window of one segment. There are scenarios where a TCP 163 connection slow-starting from an initial window of one segment might 164 not have segments dropped, while a TCP connection starting with an 165 initial window of four segments might experience unnecessary 166 retransmits due to the inability of the router to handle small 167 bursts. This could result in an unnecessary retransmit timeout. 168 For a large-window connection that is able to recover without a 169 retransmit timeout, this could result in an unnecessarily-early 170 transition from the slow-start to the congestion-avoidance phase of 171 the window increase algorithm. These premature segment drops are 172 unlikely to occur in uncongested networks with sufficient buffering 173 or in moderately-congested networks where the congested router uses 174 active queue management (such as Random Early Detection [FJ93, 175 RFC2309]). 177 Some TCP connections will receive better performance with the larger 178 initial window even if the burstiness of the initial window results 179 in premature segment drops. This will be true if (1) the TCP 180 connection recovers from the segment drop without a retransmit 181 timeout, and (2) the TCP connection is ultimately limited to a small 182 congestion window by either network congestion or by the receiver's 183 advertised window. 185 5. Disadvantages of Larger Initial Windows for the Network 187 In terms of the potential for congestion collapse, we consider two 188 separate potential dangers for the network. The first danger would 189 be a scenario where a large number of segments on congested links 190 were duplicate segments that had already been received at the 191 receiver. The second danger would be a scenario where a large 192 number of segments on congested links were segments that would be 193 dropped later in the network before reaching their final 194 destination. 196 In terms of the negative effect on other traffic in the network, a 197 potential disadvantage of larger initial windows would be that they 198 increase the general packet drop rate in the network. We discuss 199 these three issues below. 201 Duplicate segments: 203 As described in the previous section, the larger initial window 204 could occasionally result in a segment dropped from the initial 205 window, when that segment might not have been dropped if the 206 sender had slow-started from an initial window of one segment. 207 However, Appendix A shows that even in this case, the larger 208 initial window would not result in the transmission of a large 209 number of duplicate segments. 211 Segments dropped later in the network: 213 How much would the larger initial window for TCP increase the 214 number of segments on congested links that would be dropped 215 before reaching their final destination? This is a problem that 216 can only occur for connections with multiple congested links, 217 where some segments might use scarce bandwidth on the first 218 congested link along the path, only to be dropped later along 219 the path. 221 First, many of the TCP connections will have only one congested 222 link along the path. Segments dropped from these connections do 223 not "waste" scarce bandwidth, and do not contribute to 224 congestion collapse. 226 However, some network paths will have multiple congested links, 227 and segments dropped from the initial window could use scarce 228 bandwidth along the earlier congested links before ultimately 229 being dropped on subsequent congested links. To the extent that 230 the drop rate is independent of the initial window used by TCP 231 segments, the problem of congested links carrying segments that 232 will be dropped before reaching their destination will be 233 similar for TCP connections that start by sending four segments 234 or one segment. 236 An increased packet drop rate: 238 For a network with a high segment drop rate, increasing the TCP 239 initial window could increase the segment drop rate even 240 further. This is in part because routers with Drop Tail queue 241 management have difficulties with bursty traffic in times of 242 congestion. However, given uncorrelated arrivals for TCP 243 connections, the larger TCP initial window should not 244 significantly increase the segment drop rate. Simulation-based 245 explorations of these issues are discussed in Section 7.2. 247 These potential dangers for the network are explored in simulations 248 and experiments described in the section below. Our judgment is that 249 while there are dangers of congestion collapse in the current 250 Internet (see [FF98] for a discussion of the dangers of congestion 251 collapse from an increased deployment of UDP connections without 252 end-to-end congestion control), there is no such danger to the 253 network from increasing the TCP initial window to 4K bytes. 255 6. Interactions with the Retransmission Timer 257 Using a larger initial burst of data can exacerbate existing 258 problems with spurious retransmit timeouts on low-bandwidth paths, 259 assuming the standard algorithm for determining the TCP 260 retransmission timeout (RTO) [RFC2988]. The problem is that across 261 low-bandwidth network paths on which the transmission time of a 262 packet is a large portion of the round-trip time, the small packets 263 used to establish a TCP connection do not seed the RTO estimator appropriately. 264 When the first window of data packets is transmitted, the sender's 265 retransmit timer could expire before the acknowledgments for those 266 packets are received. As each acknowledgment arrives, the 267 retransmit timer is generally reset. Thus, the retransmit timer 268 will not expire as long as an acknowledgment arrives at least once 269 a second, given the one-second minimum on the RTO recommended in RFC 270 2988. 272 For instance, consider a 9.6 Kbps link. The initial RTT measurement 273 will be on the order of 67 msec, if we simply consider the 274 transmission time of 2 packets (the SYN and SYN-ACK) each consisting 275 of 40 bytes. Using the RTO estimator given in [RFC2988], this 276 yields an initial RTO of 201 msec (67 + 4*(67/2)). However, we 277 round the RTO to 1 second as specified in RFC 2988. Then assume we 278 send an initial window of one or more 1500-byte packets (1460 data 279 bytes plus overhead). Each packet will take on the order of 1.25 280 seconds to transmit. Clearly the RTO will fire before the ACK for 281 the first packet returns, causing a spurious timeout. In this case, 282 a larger initial window of three or four packets exacerbates the 283 problems caused by this spurious timeout. 285 One way to deal with this problem is to make the RTO algorithm more 286 conservative. During the initial window of data, for instance, we 287 could update the RTO for each acknowledgment received. In 288 addition, if the retransmit timer expires for some packet lost in 289 the first window of data, we could leave the exponential-backoff of 290 the retransmit timer engaged until at least one valid RTT measurement is 291 received that involves a data packet. 293 Another method would be to refrain from taking a RTT sample during 294 connection establishment, leaving the default RTO in place until TCP 295 takes a sample from a data segment and the corresponding ACK. While 296 this method likely helps prevent spurious retransmits it also slows 297 the data transfer down if loss occurs before the RTO is seeded. 299 This specification leaves the decision about what to do (if 300 anything) with regards to the RTO when using a larger initial window 301 to the implementer. 303 7. Typical Levels of Burstiness for TCP Traffic. 305 Larger TCP initial windows would not dramatically increase the 306 burstiness of TCP traffic in the Internet today, because such 307 traffic is already fairly bursty. Bursts of two and three segments 308 are already typical of TCP [Flo97]; A delayed ACK (covering two 309 previously unacknowledged segments) received during congestion 310 avoidance causes the congestion window to slide and two segments to 311 be sent. The same delayed ACK received during slow start causes the 312 window to slide by two segments and then be incremented by one 313 segment, resulting in a three-segment burst. While not necessarily 314 typical, bursts of four and five segments for TCP are not rare. 315 Assuming delayed ACKs, a single dropped ACK causes the subsequent 316 ACK to cover four previously unacknowledged segments. During 317 congestion avoidance this leads to a four-segment burst and during 318 slow start a five-segment burst is generated. 320 There are also changes in progress that reduce the performance 321 problems posed by moderate traffic bursts. One such change is the 322 deployment of higher-speed links in some parts of the network, where 323 a burst of 4K bytes can represent a small quantity of data. A 324 second change, for routers with sufficient buffering, is the 325 deployment of queue management mechanisms such as RED, which is 326 designed to be tolerant of transient traffic bursts. 328 8. Simulations and Experimental Results 330 8.1 Studies of TCP Connections using that Larger Initial Window 332 This section surveys simulations and experiments that have been used 333 to explore the effect of larger initial windows on TCP 334 connections. The first set of experiments 335 explores performance over satellite links. Larger initial windows 336 have been shown to improve performance of TCP connections over 337 satellite channels [All97b]. In this study, an initial window of 338 four segments (512 byte MSS) resulted in throughput improvements of 339 up to 30% (depending upon transfer size). [KAGT98] shows that the 340 use of larger initial windows results in a decrease in transfer time 341 in HTTP tests over the ACTS satellite system. A study involving 342 simulations of a large number of HTTP transactions over hybrid fiber 343 coax (HFC) indicates that the use of larger initial windows 344 decreases the time required to load WWW pages [Nic97]. 346 A second set of experiments has explored TCP performance over dialup 347 modem links. In experiments over a 28.8 bps dialup channel [All97a, 348 AHO98], a four-segment initial window decreased the transfer time of 349 a 16KB file by roughly 10%, with no accompanying increase in the 350 drop rate. A particular area of concern has been TCP performance 351 over low speed tail circuits (e.g., dialup modem links) with routers 352 with small buffers. A simulation study [RFC2416] investigated the 353 effects of using a larger initial window on a host connected by a 354 slow modem link and a router with a 3 packet buffer. The study 355 concluded that for the scenario investigated, the use of larger 356 initial windows was not harmful to TCP performance. Questions have 357 been raised concerning the effects of larger initial windows on the 358 transfer time for short transfers in this environment, but these 359 effects have not been quantified. A question has also been raised 360 concerning the possible effect on existing TCP connections sharing 361 the link. 363 Finally, [All00] illustrates that the percentage of connections at a 364 particular web server that experience loss in the initial window of 365 data transmission increases with the size of the initial congestion 366 window. However, the increase is in line with what would be 367 expected from sending a larger burst into the network. 369 8.2 Studies of Networks using Larger Initial Windows 371 This section surveys simulations and experiments investigating the 372 impact of the larger window on other TCP connections sharing the 373 path. Experiments in [All97a, AHO98] show that for 16 KB transfers 374 to 100 Internet hosts, four-segment initial windows resulted in a 375 small increase in the drop rate of 0.04 segments/transfer. While 376 the drop rate increased slightly, the transfer time was reduced by 377 roughly 25% for transfers using the four-segment (512 byte MSS) 378 initial window when compared to an initial window of one segment. 380 One scenario of concern is heavily loaded links. For instance, 381 several years ago one of the trans-Atlantic links was so heavily 382 loaded that the correct congestion window size for each connection was 383 about one segment. In this environment, new connections using 384 larger initial windows would be starting with windows that were four 385 times too big. What would the effects be? Do connections thrash? 387 A simulation study in [RFC2415] explores the impact of a larger initial 388 window on competing network traffic. In this investigation, HTTP 389 and FTP flows share a single congested gateway (where the number of 390 HTTP and FTP flows varies from one simulation set to another). For 391 each simulation set, the paper examines aggregate link utilization 392 and packet drop rates, median web page delay, and network power for 393 the FTP transfers. The larger initial window generally resulted in 394 increased throughput, slightly-increased packet drop rates, and an 395 increase in overall network power. With the exception of one 396 scenario, the larger initial window resulted in an increase in the 397 drop rate of less than 1% above the loss rate experienced when using 398 a one-segment initial window; in this scenario, the drop rate 399 increased from 3.5% with one-segment initial windows, to 4.5% with 400 four-segment initial windows. The overall conclusions were that 401 increasing the TCP initial window to three packets (or 4380 bytes) 402 helps to improve perceived performance. 404 Morris [Mor97] investigated larger initial windows in a very 405 congested network with transfers of size 20K. The loss rate in 406 networks where all TCP connections use an initial window of four 407 segments is shown to be 1-2% greater than in a network where all 408 connections use an initial window of one segment. This relationship 409 held in scenarios where the loss rates with one-segment initial 410 windows ranged from 1% to 11%. In addition, in networks where 411 connections used an initial window of four segments, TCP connections 412 spent more time waiting for the retransmit timer (RTO) to expire to 413 resend a segment than was spent when using an initial window of one 414 segment. The time spent waiting for the RTO timer to expire 415 represents idle time when no useful work was being accomplished for 416 that connection. These results show that in a very congested 417 environment, where each connection's share of the bottleneck 418 bandwidth is close to one segment, using a larger initial window can 419 cause a perceptible increase in both loss rates and retransmit 420 timeouts. 422 9. Security Considerations 424 This document discusses the initial congestion window permitted for 425 TCP connections. Changing this value does not raise any known new 426 security issues with TCP. 428 10. Conclusion 430 This document specifies a small change to TCP that will likely be beneficial 431 to short-lived TCP connections and those over links with long RTTs 432 (saving several RTTs during the initial slow-start phase). 434 11. Acknowledgments 436 We would like to acknowledge Vern Paxson, Tim Shepard, members of 437 the End-to-End-Interest Mailing List, and members of the IETF TCP 438 Implementation Working Group for continuing discussions of these 439 issues for discussions and feedback on this document. 441 12. References 443 [AHO98] Mark Allman, Chris Hayes, and Shawn Ostermann, An Evaluation 444 of TCP with Larger Initial Windows, March 1998. Submitted to 445 ACM Computer Communication Review. URL: 446 "http://roland.lerc.nasa.gov/~mallman/papers/initwin.ps". 448 [All97a] Mark Allman. An Evaluation of TCP with Larger Initial 449 Windows. 40th IETF Meeting -- TCP Implementations WG. 450 December, 1997. Washington, DC. 452 [All97b] Mark Allman. Improving TCP Performance Over Satellite 453 Channels. Master's thesis, Ohio University, June 1997. 455 [All00] Mark Allman. A Web Server's View of the Transport Layer. ACM 456 Computer Communication Review, 30(5), October 2000. 458 [FF96] Fall, K., and Floyd, S., Simulation-based Comparisons of 459 Tahoe, Reno, and SACK TCP. Computer Communication Review, 460 26(3), July 1996. 462 [FF98] Sally Floyd, Kevin Fall. Promoting the Use of End-to-End 463 Congestion Control in the Internet. Submitted to IEEE 464 Transactions on Networking. URL "http://www- 465 nrg.ee.lbl.gov/floyd/end2end-paper.html". 467 [FJ93] Floyd, S., and Jacobson, V., Random Early Detection gateways 468 for Congestion Avoidance. IEEE/ACM Transactions on Networking, 469 V.1 N.4, August 1993, p. 397-413. 471 [Flo94] Floyd, S., TCP and Explicit Congestion Notification. 472 Computer Communication Review, 24(5):10-23, October 1994. 474 [Flo96] Floyd, S., Issues of TCP with SACK. Technical report, 475 January 1996. Available from http://www-nrg.ee.lbl.gov/floyd/. 477 [Flo97] Floyd, S., Increasing TCP's Initial Window. Viewgraphs, 478 40th IETF Meeting - TCP Implementations WG. December, 1997. URL 479 "ftp://ftp.ee.lbl.gov/talks/sf-tcp-ietf97.ps". 481 [KAGT98] Hans Kruse, Mark Allman, Jim Griner, Diepchi Tran. HTTP 482 Page Transfer Rates Over Geo-Stationary Satellite Links. March 483 1998. Proceedings of the Sixth International Conference on 484 Telecommunication Systems. URL 485 "http://roland.lerc.nasa.gov/~mallman/papers/nash98.ps". 487 [Mor97] Robert Morris. Private communication, 1997. Cited for 488 acknowledgement purposes only. 490 [Nic97] Kathleen Nichols. Improving Network Simulation with 491 Feedback. Com21, Inc. Technical Report. Available from 492 http://www.com21.com/pages/papers/068.pdf. 494 [Pos82] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC 495 821, August 1982. 497 [RFC1122] Braden, R., "Requirements for Internet Hosts -- 498 Communication Layers", STD 3, RFC 1122, October 1989. 500 [RFC1191] Mogul, J., and S. Deering, "Path MTU Discovery", RFC 1191, 501 November 1990. 503 [RFC1945] Berners-Lee, T., Fielding, R., and H. Nielsen, "Hypertext 504 Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996. 506 [RFC2068] Fielding, R., Mogul, J., Gettys, J., Frystyk, H., and T. 507 Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 508 2068, January 1997. 510 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 511 Requirement Levels", BCP 14, RFC 2119, March 1997. 513 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 514 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 515 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., 516 Wroclawski, J., and L. Zhang, "Recommendations on Queue 517 Management and Congestion Avoidance in the Internet", RFC 2309, 518 April 1998. 520 [RFC2415] Poduri, K., and K. Nichols, "Simulation Studies of 521 Increased Initial TCP Window Size", RFC 2415, September 1998. 523 [RFC2416] Shepard, T., and C. Partridge, "When TCP Starts Up With 524 Four Packets Into Only Three Buffers", RFC 2416, September 1998. 526 [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens. TCP 527 Congestion Control, April 1999. RFC 2581. 529 [RFC2988] Vern Paxson, Mark Allman. Computing TCP's Retransmission 530 Timer, November 2000. RFC 2988. 532 [RFC3042] M. Allman, H. Balakrishnan, and S. Floyd, Enhancing TCP's 533 Loss Recovery Using Limited Transmit, RFC 3042, January 2001. 535 [RFC3168] Ramakrishnan, K.K., Floyd, S., and Black, D., "The 536 Addition of Explicit Congestion Notification (ECN) to IP", RFC 537 3168, September 2001. 539 13. Author's Addresses 540 Mark Allman 541 BBN Technologies/NASA Glenn Research Center 542 21000 Brookpark Road 543 MS 54-5 544 Cleveland, OH 44135 545 EMail: mallman@bbn.com 546 http://roland.lerc.nasa.gov/~mallman/ 548 Sally Floyd 549 ICSI Center for Internet Research 550 1947 Center St, Suite 600 551 Berkeley, CA 94704 552 Phone: +1 (510) 666-2989 553 EMail: floyd@icir.org 554 http://www.icir.org/floyd/ 556 Craig Partridge 557 BBN Technologies 558 10 Moulton Street 559 Cambridge, MA 02138 561 EMail: craig@bbn.com 563 13. Appendix - Duplicate Segments 565 In the current environment (without Explicit Congestion Notification 566 [Flo94] [RFC2481]), all TCPs use segment drops as indications from 567 the network about the limits of available bandwidth. We argue here 568 that the change to a larger initial window should not result in the 569 sender retransmitting a large number of duplicate segments that have 570 already arrived at the receiver. 572 If one segment is dropped from the initial window, there are three 573 different ways for TCP to recover: (1) Slow-starting from a window 574 of one segment, as is done after a retransmit timeout, or after Fast 575 Retransmit in Tahoe TCP; (2) Fast Recovery without selective 576 acknowledgments (SACK), as is done after three duplicate ACKs in 577 Reno TCP; and (3) Fast Recovery with SACK, for TCP where both the 578 sender and the receiver support the SACK option [MMFR96]. In all 579 three cases, if a single segment is dropped from the initial window, 580 no duplicate segments (i.e., segments that have already been 581 received at the receiver) are transmitted. Note that for a TCP 582 sending four 512-byte segments in the initial window, a single 583 segment drop will not require a retransmit timeout, but can be 584 recovered from using the Fast Retransmit algorithm (unless the 585 retransmit timer expires prematurely). In addition, a single 586 segment dropped from an initial window of three segments might be 587 repaired using the fast retransmit algorithm, depending on which 588 segment is dropped and whether or not delayed ACKs are used. For 589 example, dropping the first segment of a three segment initial 590 window will always require waiting for a timeout, in the absence of 591 Limited Transmit [RFC3042]. However, dropping the third segment 592 will always allow recovery via the fast retransmit algorithm, as 593 long as no ACKs are lost. 595 Next we consider scenarios where the initial window contains two to 596 four segments, and at least two of those segments are dropped. If 597 all segments in the initial window are dropped, then clearly no 598 duplicate segments are retransmitted, as the receiver has not yet 599 received any segments. (It is still a possibility that these 600 dropped segments used scarce bandwidth on the way to their drop 601 point; this issue was discussed in Section 5.) 603 When two segments are dropped from an initial window of three 604 segments, the sender will only send a duplicate segment if the first 605 two of the three segments were dropped, and the sender does not 606 receive a packet with the SACK option acknowledging the third 607 segment. 609 When two segments are dropped from an initial window of four 610 segments, an examination of the six possible scenarios (which we 611 don't go through here) shows that, depending on the position of the 612 dropped packets, in the absence of SACK the sender might send one 613 duplicate segment. There are no scenarios in which the sender sends 614 two duplicate segments. 616 When three segments are dropped from an initial window of four 617 segments, then, in the absence of SACK, it is possible that one 618 duplicate segment will be sent, depending on the position of the 619 dropped segments. 621 The summary is that in the absence of SACK, there are some scenarios 622 with multiple segment drops from the initial window where one 623 duplicate segment will be transmitted. There are no scenarios where 624 more that one duplicate segment will be transmitted. Our conclusion 625 is that the number of duplicate segments transmitted as a result of 626 a larger initial window should be small.