idnits 2.17.1 draft-ietf-tcpimpl-restart-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 452 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([AHO97], [JK90], [BSSK97], [JB88], [BPK97], [NS97], [VH97], [6], [PN98], [Tou97], [FAP97], [HOT97], [Poo97], [FGMFB97]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The "Author's Address" (or "Authors' Addresses") section title is misspelled. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 30, 1998) is 9517 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '6' on line 66 == Unused Reference: 'Hei97' is defined on line 343, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AHO97' -- Possible downref: Non-RFC (?) normative reference: ref. 'BPK97' -- Possible downref: Non-RFC (?) normative reference: ref. 'BSSK97' ** Obsolete normative reference: RFC 2068 (ref. 'FGMFB97') (Obsoleted by RFC 2616) == Outdated reference: A later version (-02) exists of draft-floyd-incr-init-win-01 ** Downref: Normative reference to an Experimental draft: draft-floyd-incr-init-win (ref. 'FAP97') -- Possible downref: Non-RFC (?) normative reference: ref. 'Hei97' -- Possible downref: Non-RFC (?) normative reference: ref. 'HOT97' ** Obsolete normative reference: RFC 1072 (ref. 'JB88') (Obsoleted by RFC 1323, RFC 2018, RFC 6247) -- Possible downref: Non-RFC (?) normative reference: ref. 'JK90' -- Possible downref: Non-RFC (?) normative reference: ref. 'NS97' == Outdated reference: A later version (-01) exists of draft-ietf-tcpimpl-poduri-00 ** Downref: Normative reference to an Informational draft: draft-ietf-tcpimpl-poduri (ref. 'PN98') -- Possible downref: Non-RFC (?) normative reference: ref. 'Poo97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Tou97' -- Possible downref: Non-RFC (?) normative reference: ref. 'VH97' Summary: 14 errors (**), 0 flaws (~~), 6 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Amy Hughes, Joe Touch, John Heidemann 2 draft-ietf-tcpimpl-restart-00.txt ISI 3 March 30, 1998 4 Expires: Sept. 30, 1998 6 Issues in TCP Slow-Start Restart After Idle 8 Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its areas, 12 and its working groups. Note that other groups may also distribute 13 working documents as Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six months 16 and may be updated, replaced, or obsoleted by other documents at any 17 time. It is inappropriate to use Internet- Drafts as reference 18 material or to cite them other than as ``work in progress.'' 20 Please check the I-D abstract listing contained in each Internet 21 Draft directory to learn the current status of this or any other 22 Internet Draft. 24 The distribution of this document is unlimited. 26 Abstract 28 This draft discusses variations in the TCP 'slow-start restart' (SSR) 29 algorithm, and the unintended failure of some variations to properly 30 restart in some environments. SSR is intended to avoid line-rate 31 bursts after idle periods, where TCP accumulates permission to send 32 in the form of ACKs, but does not consume that permission 33 immediately. SSR's original "restart after send is idle" is commonly 34 implemented as "restart after receive is idle". The latter 35 unintentionally fails to restart for bidirectional connections where 36 the sender's burst is triggered by a reverse-path data packet, such 37 as in persistent HTTP. Both the former and latter are shown to permit 38 bursts in other circumstances. Three solutions are discussed, and 39 their implementations evaluated. 41 This document is a product of the LSAM project at ISI. Comments are 42 solicited and should be addressed to the authors. 44 Introduction 46 Slow-Start Restart (SSR) describes one TCP behavior to respond to 47 long sending pauses in an open connection. When a sender becomes 48 idle, the normal ack-clocking mechanism which regulates traffic is no 49 longer present and the sender may introduce a burst of packets into 50 the network as large as the current congestion window (CWND). Such a 51 burst may be too large for the intermediate routers to handle and may 52 be too large for the receiver to handle at one time as well. 54 A send timer was first proposed [JK90] to detect idle sending 55 periods; the recommended response is to close the congestion window 56 and perform a new slow-start. However, a footnote to this first 57 proposed solution noted that send/receive symmetry on the channel 58 meant that a receive timer could be used instead to achieve the same 59 results. As this second solution takes advantage of a timer that is 60 already required (to detect packet loss) it was implemented by 61 Jacobson and Karels. This solution has been repeated in 62 implementations which derive from their work. 64 Bursty connections, such as the persistent connections required in 65 HTTP/1.1 [FGMFB97] have been found to interact in meaningful ways 66 with SSR [6]. In fact, it was discovered that SSR never occurs with 67 HTTP/1.1 [Poo97]. This is because a new request will reset the 68 receive timer (as suggested in the footnote in [JK90]) and the 69 sending pause will not be detected [Tou97]. 71 Further, both timer solutions depend on the retransmit timeout (RTO) 72 and cannot detect send pauses that are shorter than this duration. 73 In such cases, the sender may transmit a burst as large as the full 74 congestion window. 76 Burst detection. 78 There are several ways of determining whether a connection is at risk 79 of sending a burst of packets into the channel. We will discuss each 80 method below, from the least radical to the most radical. 82 Receive Timer: 83 The use of a receive timer is the most common burst detection method. 84 It is attractive because it is simple and makes use of an existing 85 timer. However, a receive timer does not properly detect bursts in 86 HTTP/1.1 because the timer is cancelled when the request packet is 87 received. Further, when the connection is idle for less than a full 88 RTO, a burst cannot be detected. Such a burst can happen when the 89 connection is "nearly idle" or when acks are lost or reordered. 91 Send Timer: 92 A send timer is the reciprocal solution to using a receive timer. 93 While it requires a new timestamp field to be maintained, it clearly 94 detects send pauses and corrects the problem presented by HTTP/1.1. 95 However, as with the receive timer, it cannot detect bursts that 96 could happen before a full RTO. 98 Packet Counting: 99 An alternative method examines the unused portion of the congestion 100 window to determine if the capacity to burst exists. This method is 101 simple, it uses existing information to make its decision, and it 102 solves both the HTTP/1.1. problem as well as the RTO problem. In 103 addition, it addresses the problem that needs to be solved (bursts) 104 instead of a specific circumstance where the problem could happen 105 (send pauses). However, where timer detection avoids defining a 106 burst (it defines idle periods instead), here a burst must be defined 107 before it can be detected. One possible definition is the situation 108 where the available portion of the sending window is some proportion 109 of the entire congestion window, say 50%. Another definition places 110 a numerical limit on the available portion of the congestion window, 111 say 4 or CWND-1 packets. 113 Burst Response 115 Once a burst is detected, there are several different ways to take 116 action. The different possibilities are listed below, again from 117 least to most radical. 119 Full Restart: 120 Reducing the congestion window to one packet and re-entering slow- 121 start, the original slow-start restart is one response. This was the 122 solution proposed by J&K. This is a very conservative response and 123 it defeats most of the speedup that HTTP/1.1 provides [HOT97]. 124 Current proposals [FAP97] have suggested increasing the initial 125 window from 1 packet to 4 packets. Further, depending on the method 126 of burst detection, Full Restart can be far more punitive than it 127 should be. Coupled with a timer, full restart is most likely to 128 respond to a completely empty congestion window. Coupled with Packet 129 Counting, the response could close the window too far, even smaller 130 than the amount of outstanding data. 132 Window Limiting: 133 This is a modified version of Full Restart which solves the problem 134 created by using Packet Counting to detect bursts. With this type of 135 response, the congestion window is reduced to the amount of 136 outstanding data plus the slow-start initial window (1, 2, or 4). It 137 works exactly like Full Restart in the idle case, but is successful 138 at controlling bursts in an active connection. Further, in an active 139 connection, it effectively implements a leaky bucket of the initial 140 window size for the accumulation of send opportunity based on the 141 receipt of acks. This solution is fairly conservative, especially as 142 it defaults to Full Restart, but more importantly, sending 143 opportunity is simply lost if not used, and is not available for 144 paced output. Also, it forces negative congestion feedback on the 145 congestion window. 147 Burst Size Limitation: 148 When a burst is detected, its effects are limited, the sender may not 149 send any more than a preset number of packets into the network. It 150 is less conservative than the first two responses in that it does not 151 affect the size of the congestion window, and it is simple to 152 implement, simply count up the number of packets you can send and 153 stop when you reach the limit. Whether to wait for an ack or some 154 other signal to resume sending is an implementation detail. Lastly, 155 this burst response can be performed after each ack or with each 156 send. The behavior is slightly different in each case. 158 Pacing: 159 When a burst is detected, packets are dribbled into the network until 160 the sender starts receiving acks and normal maintenance can be 161 resumed [VH97]. This solution is very easy on the network and scales 162 well in cases of high bw/delay. However, it requires a new timer and 163 parameter tuning require more research. 165 Implemented Solutions 167 Now we will examine combinations of the different detection and 168 response methods presented above. Each of the solutions that below 169 have been implemented in some form. 171 BSD Implementation (Jacobson and Karels) 172 The most common implementation uses a receive timer coupled with Full 173 Restart. This is the implementation that causes the interaction 174 problems with HTTP/1.1. The obvious alternative is to implement a 175 send timer as originally intended and use Full Restart. There are 176 several drawbacks to this solution. First, a send timer adds 177 additional state and serves no purpose other than to correct the 178 bursting behavior after send pauses. Second, forcing a slow-start in 179 this situation is problematic for HTTP/1.1. A slow-start for each 180 new user request adds a delay burden to characteristically small HTTP 181 responses. Further, the HTTP user request pattern is unpredictable. 182 It is possible for the user to make a new request before the send 183 timer expires, triggering a burst that would defeat such a timer. 185 Maximum Burst Limitation (Floyd) 186 Floyd has proposed a coupling of Packet Counting with Burst Size 187 Limitation. This solution has been implemented in ns and it prevents 188 the sender from transmitting a series of back-to-back packets larger 189 than the user configured burst limit (suggested to be 4 packets) 190 [NS97]. There are several issues involved with recovering from a 191 burst and the ns implementation doesn't address them consistently. 192 First, it is not clear when the sender is allowed to send again after 193 sending the the first limited burst of packets. One implementation 194 requires the sender to wait for the burst timer to expire. Another 195 seems to allow a series of short bursts. Another issue is how the 196 simulation implementation and usage translates to a live network 197 situation. The implementation of this solution can range from simple 198 to more complex. 200 Congestion Window Monitoring (Hughes, Touch, and Heidemann) 201 Our proposed solution combines Packet Counting with Window Limiting. 202 Whenever (CWND - outstanding data > 4), we reduce CWND to 203 (outstanding data + 4). The choice of 4 packets is discussed in with 204 the implementation details below. Congestion Window Monitoring (CWM) 205 allows the congestion window to grow normally but shrinks the 206 congestion window as the sender becomes idle. It also prevents the 207 sender from transmitting any bursts larger than 4 packets in response 208 to a new request. Because CWM is not dependent on any timers, the 209 loss of an ack or a nearly idle connection cannot cause any bursts. 210 CWM is similar to Burst Limitation, but avoids the burst by reducing 211 CWND, rather than by inhibiting the sends directly. As a result, we 212 avoid the potential problem of sequential calls to TCP_output, which 213 would cause bursts in the former, but not the latter. CWM also 214 causes TCP to use the feedback of 'not using the CWND fast enough', 215 which results in a decrease in the CWND. 217 CWM effectively imposes a leaky bucket type limitation on the 218 congestion window. The window is allowed to grow and be managed 219 normally but the sender is not allowed to save up any sending 220 opportunities. Any opportunity that is not used is lost. This 221 property of CWM forces interleaved reception of acks and processing 222 of sends. 224 Rate Based Pacing (Visweswaraiah and Heidemann) 225 Rate Based Pacing combines the Pacing response with either a Send 226 Timer or Packet Counting. It avoids slow-start when resuming after 227 sending pauses and allows the normal clocking of packets to be 228 gracefully restarted. When a burst potential is detected, the 229 algorithm meters a small burst of packets into the channel [VH97]. 230 RBP is the least conservative solution to the bursting problem 231 because it continues to make use of the pre-pause congestion window. 232 If network conditions have changed significantly, maintaining the 233 previous window could cause the paced connection to be overly 234 aggressive as compared to other connections. (Although some work 235 suggests congestion windows are stable over multi-minute timeframes 236 [BSSK97].) More recently pacing been suggested for use in wireless 237 networking scenarios [BPK97], and for satellite connections. 239 Experimental Comparisons 241 Packet traces of the current FreeBSD implementation of SSR (using the 242 receive timer), of a modified version of FreeBSD using a send timer, 243 and of CWM with HTTP/1.1 support the above observations. In all of 244 the traces, the response pattern for the first request is the same 245 with each method. This shows that CWM allows the congestion window 246 to grow normally. Because of the different actions taken by the 247 three algorithms, the response pattern for the second request differs 248 as would be expected. [We have graphs available upon request] 250 When the second request arrives at the server after the 251 retransmission timeout (RTO), normal FreeBSD allows the server to 252 respond with a burst of packets. FreeBSD using a send timer responds 253 by entering slow-start. CWM allows a 4 packet burst. When the second 254 request arrives at the server before the RTO, both timer 255 implementations allow a burst. CWM again limits the burst to 4 256 packets. Note, RTO is the common timer limit, but any value would 257 have the same results, depending on when the second request was 258 presented in relation to the timer. 260 Implementation of Congestion Window Monitoring 262 Congestion Window Monitoring requires a simple modification to 263 existing TCP output routines. The changes required replace the 264 current idle detection code. Replace the existing 3 lines of code: 266 idle = (snd_max == snd_una) 267 if (idle && now - lastrcv >= rto) 268 cwnd = 1; 270 with the following 3 lines of code: 272 maxwin = 4 + snd_nxt - snd_una; 273 if (cwnd > maxwin) 274 cwnd = maxwin; 276 Packet counting is implemented by line 1. Lines 2 and 3 implement 277 Window Limitation. 279 The choice of limiting the available congestion window to 4 packets 280 is based on the normal operation of TCP. An ACK received by the 281 sender may be in response to the receipt of 2 packets, allowing 282 another 2 to be sent. Further, normal window growth may require the 283 sending of a third packet. Lastly, in slow-start with delayed ACKs, 284 the receipt of an ACK can trigger the sending of 4 packets. Thus, 4 285 packets is a reasonable burst to send into the network. 287 Increasing the initial window in slow-start to 4 packets has already 288 been proposed [FAP97]. The effects of this change have been explored 289 in simulation in [PN98] and in practice in [AHO97]. Such a 290 modification to TCP would cause the same behavior as our solution in 291 the cases where the pause timer has expired. It does not address the 292 pre-timeout bursting situation we are concerned with. 294 Conclusions 296 At this time, we propose CWM as a simple, minimal and effective fix 297 to the 'bug' in current TCP implementations that is exploited by 298 HTTP/1.1. Modifications can be made to TCP to solve the slow-start 299 restart problem that are consistent with the original congestion 300 avoidance specifications (i.e. a send timer). However, we feel that 301 the original intended behavior is not appropriate to some current 302 applications, specifically HTTP. Thus, we recommend Congestion Window 303 Monitoring to prevent bursts into the network. Not only does this 304 solution solve the current problem in a simple way, it will prevent 305 bursting in any other situation that might arise. The 4 packet bursts 306 which we allow are consistent with congestion window growth 307 algorithms and with Floyd's conclusion about increasing the initial 308 window size. 310 CWM, as well as the other solutions listed, need to be re-evaluated 311 within emerging TCP implementations, e.g., SACK [JB88]. In general, 312 TCP has no rate pacing and uses congestion control to avoid bursts in 313 current implementations. A more explicit mechanism, such as RBP or 314 similar proposals may be desirable in the future. 316 Security implications 318 CWM presents no security problems. 320 References 322 [AHO97] Mark Allman, Chris Hayes, and Shawn Ostermann. An Evaluatin 323 of TCP Slow Start Modifications, July 1997. (Submitted to CCR, 324 draft available from http://jarok.cs.ohiou.edu/papers/) 326 [BPK97] Hari Balakrishnan, Venkata N. Padmanabhan, and Randy H. Katz. 327 The Effects of Asymmetry on TCP Performance. In Proceedings of 328 the ACM/IEEE Mobicom, Budapest, Hungary, ACM. September, 1997. 330 [BSSK97] Hari Balakrishnan, Srinivasan Seshan, Mark Stemm, and Randy 331 H. Katz. Analyzing Stability in Wide-Area Network Performance. 332 In Proceedings of the ACM SIGMETRICS, Seattle WA, USA, ACM. 333 June, 1997. 335 [FGMFB97] R. Fielding, Jim Gettys, Jeffrey C. Mogul, H. Frystyk, and 336 Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1, January 337 1997. RFC 2068. 339 [FAP97] Sally Floyd, Mark Allman, and Craig Partridge. Increasing 340 TCP's Initial Window, July 1997. Internet Draft draft-floyd- 341 incr-init-win-01.txt 343 [Hei97] John Heidemann. Performance Interactions Between P-HTTP and 344 TCP Implementations. ACM Computer Communications Review, 27(2), 345 65-73, April 1997. 347 [HOT97] John Heidemann, Katia Obraczka, and Joe Touch. Modeling the 348 Performance of HTTP Over Several Transport Protocols. ACM/IEEE 349 Transactions on Networking 5(5), 616-630, October, 1997. 351 [JB88] Van Jacobson and R.T. Braden. TCP extensions for long-delay 352 paths, October 1988. RFC 1072. 354 [JK90] Van Jacobson and Michael J. Karels. Congestion Avoidance and 355 Control. ACM Computer Communication Review, 18(4):314-329, 356 August 1990. Revised version of his SIGCOMM '88 paper. 358 [NS97] ns Network Simulator. http://www-mash.cs.berkeley.edu/ns/, 359 1997. 361 [PN98] K. Poduri and K. Nichols. Simulation Studies of Increased 362 Initial TCP Window Size, February 1998. Internet Draft draft- 363 ietf-tcpimpl-poduri-00.txt 365 [Poo97] Kacheong Poon, Sun Microsystems, tcp-implementors mailing 366 list, August, 1997. 368 [Tou97] Joe Touch, ISI, tcp-implementors mailing list, August 12, 369 1997. 371 [VH97] Vikram Visweswaraiah and John Heidemann. Improving Restart of 372 Idle TCP Connections. Technical Report 97-661, University of 373 Southern California, November 1997. 375 Authors/ Address 377 Amy Hughes, Joe Touch, John Hiedemann 378 University of Southern California/Information Sciences Institute 379 4676 Admiralty Way 380 Marina del Rey, CA 90292-6695 381 USA 382 Phone: +1 310-822-1511 383 Fax: +1 310-823-6714 384 URLs: http://www.isi.edu/~ahughes 385 http://www.isi.edu/~touch 386 http://www.isi.edu/~johnh 387 Email: ahughes@isi.edu 388 touch@isi.edu 389 johnh@isi.edu