idnits 2.17.1 draft-handley-tcp-cwv-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 10 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 11 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 56: '... The keywords MUST, MUST NOT, REQUIR...' RFC 2119 keyword, line 57: '... SHOULD NOT, RECOMMENDED, MAY, and O...' RFC 2119 keyword, line 99: '... [RFC2581] recommends the following: ``a TCP SHOULD set cwnd to no...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 1999) is 8898 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'B97' is mentioned on line 59, but not defined == Missing Reference: 'VJ88' is mentioned on line 97, but not defined == Missing Reference: 'J96' is mentioned on line 106, but not defined == Missing Reference: 'J95' is mentioned on line 106, but not defined == Unused Reference: 'J88' is defined on line 396, but no explicit reference was found in the text == Unused Reference: 'JKBFL96' is defined on line 400, but no explicit reference was found in the text == Unused Reference: 'JKGFL95' is defined on line 405, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96' -- Possible downref: Non-RFC (?) normative reference: ref. 'HPF99' -- Possible downref: Non-RFC (?) normative reference: ref. 'HTH98' -- Possible downref: Non-RFC (?) normative reference: ref. 'J88' -- Possible downref: Non-RFC (?) normative reference: ref. 'JKBFL96' -- Possible downref: Non-RFC (?) normative reference: ref. 'JKGFL95' -- Possible downref: Non-RFC (?) normative reference: ref. 'MSML99' -- Possible downref: Non-RFC (?) normative reference: ref. 'NS' ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) -- Possible downref: Non-RFC (?) normative reference: ref. 'VH97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Dummynet' Summary: 7 errors (**), 0 flaws (~~), 10 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Mark Handley 2 INTERNET DRAFT Jitendra Padhye 3 draft-handley-tcp-cwv-01.txt Sally Floyd 4 ACIRI 5 December 1999 6 Expires: June 2000 8 TCP Congestion Window Validation 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet- Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 TCP's congestion window controls the number of packets a TCP flow may 34 have in the network at any time. However, long periods when the 35 sender is idle or application-limited can lead to the invalidation of 36 the congestion window, in that the congestion window no longer 37 reflects current information about the state of the network. This 38 document describes a simple modification to TCP's congestion control 39 algorithms to decay the congestion window cwnd after the transition 40 from a sufficiently-long application-limited period, while using the 41 slow-start threshold ssthresh to save information about the previous 42 value of the congestion window. 44 An invalid congestion window also results when the congestion window 45 is increased (i.e., in TCP's slow-start or congestion avoidance 46 phases) during application-limited periods, when the previous value 47 of the congestion window might never have been fully utilized. We 48 propose that the TCP sender should not increase the congestion window 49 when the TCP sender has been application-limited (and therefore has 50 not fully used the current congestion window). We have explored 51 these algorithms both with simulations and with experiments from an 52 implementation in FreeBSD. 54 1. Conventions and Acronyms 56 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 57 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 58 document, are to be interpreted as described in [B97]. 60 2. Introduction 62 TCP's congestion window controls the number of packets a TCP flow may 63 have in the network at any time. The congestion window is set using 64 an Additive-Increase, Multiplicative-Decrease (AIMD) mechanism that 65 probes for available bandwidth, dynamically adapting to changing 66 network conditions. This AIMD mechanism works well when the sender 67 continually has data to send, as is typically the case for TCP used 68 for bulk-data transfer. In contrast, for TCP used with telnet 69 applications, the data sender often has little or no data to send, 70 and the sending rate is often determined by the rate at which data is 71 generated by the user. With the advent of the web, including 72 developments such as TCP senders with dynamically-created data and 73 HTTP 1.1 with persistent-connection TCP, the interaction between 74 application-limited periods (when the sender sends less than is 75 allowed by the congestion or receiver windows) and network-limited 76 periods (when the sender is limited by the TCP window) becomes 77 increasingly important. More precisely, we define a network-limited 78 period as any period when the sender is sending a full window of 79 data. 81 Long periods when the sender is application-limited can lead to the 82 invalidation of the congestion window. During periods when the TCP 83 sender is network-limited, the value of the congestion window is 84 repeatedly ``revalidated'' by the successful transmission of a window 85 of data without loss. When the TCP sender is network-limited, there 86 is an incoming stream of acknowledgements that ``clocks out" new 87 data, giving concrete evidence of recent available bandwidth in the 88 network. In contrast, during periods when the TCP sender is 89 application-limited, the estimate of available capacity represented 90 by the congestion window may become steadily less accurate over time. 91 In particular, capacity that had once been used by the network- 92 limited connection might now be used by other traffic. 94 Current TCP implementations have a range of behaviors for starting up 95 after an idle period. Some current TCP implementations slow-start 96 after an idle period longer than the RTO estimate, as suggested in 97 [RFC2581] and in the appendix of [VJ88], while other implementations 98 don't reduce their congestion window after an idle period. RFC 2581 99 [RFC2581] recommends the following: ``a TCP SHOULD set cwnd to no 100 more than RW [the initial window] before beginning transmission if 101 the TCP has not sent data in an interval exceeding the retransmission 102 timeout.'' A proposal for TCP's slow-start after idle has also been 103 discussed in [HTH98]. The issue of validation of congestion 104 information during idle periods has also been addressed in contexts 105 other than TCP and IP, for example in ``Use-it or Lose-it'' 106 mechanisms for ATM networks [J96,J95]. 108 To address the revalidation of the congestion window after a 109 application-limited period, we propose a simple modification to TCP's 110 congestion control algorithms to decay the congestion window cwnd 111 after the transition from a sufficiently-long application-limited 112 period (i.e., at least one roundtrip time) to a network-limited 113 period. 115 When the congestion window is reduced, the slow-start threshold 116 ssthresh remains as ``memory" of the recent congestion window. 117 Specifically, ssthresh is never decreased when cwnd is reduced after 118 an application-limited period; before cwnd is reduced, ssthresh is 119 set to the maximum of its current value, and half-way between the old 120 and the new values of cwnd. This use of ssthresh allows a TCP sender 121 increasing its sending rate after an application-limited period to 122 quickly slow-start to recover most of the previous value of the 123 congestion window. 125 To be more precise, if ssthresh is less than 3/4 cwnd when the 126 congestion window is reduced after an application-limited period, 127 then ssthresh is increased to 3/4 cwnd before the reduction of the 128 congestion window. The justification for this value of ``3/4 cwnd'' 129 is that 3/4 cwnd is a conservative estimate of the recent average 130 value of the congestion window, and the TCP should safely be able to 131 slow-start at least up to this point. For a TCP in steady-state that 132 has been reducing its congestion window each time the congestion 133 window reached some maximum value `maxwin', the average congestion 134 window has been 3/4 maxwin. On average, when the connection becomes 135 application-limited, cwnd will be 3/4 maxwin, and in this case cwnd 136 itself represents the average value of the congestion window. 137 However, if the connection happens to become application-limited when 138 cwnd equals maxwin, then the average value of the congestion window 139 is given by 3/4 cwnd. 141 An invalid congestion window also results when the congestion window 142 is increased (i.e., in TCP's slow-start or congestion avoidance 143 phases) during application-limited periods, when the previous value 144 of the congestion window might never have been fully utilized. As 145 far as we know, all current TCP implementations increase the 146 congestion window when an acknowledgement arrives, if allowed by the 147 receiver's advertised window and the slow-start or congestion 148 avoidance window increase algorithm, without checking to see if the 149 previous value of the congestion window has in fact been used. This 150 draft proposes that the window increase algorithm not be invoked 151 during application-limited periods [MSML99]. In particular, the TCP 152 sender should not increase the congestion window when the TCP sender 153 has been application-limited (and therefore has not fully used the 154 current congestion window). This restriction prevents the congestion 155 window from growing arbitrarily large, in the absence of evidence 156 that the congestion window can be supported by the network. From 157 [MSML99, Section 5.2]: ``This restriction assures that [cwnd] only 158 grows as long as TCP actually succeeds in injecting enough data into 159 the network to test the path.'' 161 A somewhat-orthogonal problem associated with maintaining a large 162 congestion window after an application-limited period is that the 163 sender, with a sudden large amount of data to send after a quiescent 164 period, might immediately send a full congestion window of back-to- 165 back packets. This problem of sending large bursts of packets back- 166 to-back can be effectively handled using rate-based pacing (RBP, 167 [VH97]), or using a maximum burst size control [FF96]. We would 168 contend that, even with mechanisms for limiting the sending of back- 169 to-back packets or pacing packets out over the period of a roundtrip 170 time, an old congestion window that has not been fully used for some 171 time can not be trusted as an indication of the bandwidth currently 172 available for that flow. We would contend that the mechanisms to 173 pace out packets allowed by the congestion window are largely 174 orthogonal to the algorithms used to determine the appropriate size 175 of the congestion window. 177 3. Description 179 When a TCP sender has sufficient data available to fill the available 180 network capacity for that flow, cwnd and ssthresh get set to 181 appropriate values for the network conditions. When a TCP sender 182 stops sending, the flow stops sampling the network conditions, and so 183 the value of the congestion window may become inaccurate. We believe 184 the correct conservative behavior under these circumstances is to 185 decay the congestion window by half for every RTT that the flow 186 remains inactive. The value of half is a very conservative figure 187 based on how quickly multiplicative decrease would have decayed the 188 window in the presence of loss. 190 Another possibility is that the sender may not stop sending, but may 191 become application-limited rather than network-limited, and offer 192 less data to the network than the congestion window allows to be 193 sent. In this case the TCP flow is still sampling network 194 conditions, but is not offering sufficient traffic to be sure that 195 there is still sufficient capacity in the network for that flow to 196 send a full congestion window. Under these circumstances we believe 197 the correct conservative behavior is for the sender to keep track of 198 the maximum amount of the congestion window used during each RTT, and 199 to decay the congestion window each RTT to midway between the current 200 cwnd value and the maximum value used. 202 Before the congestion window is reduced, ssthresh is set to the 203 maximum of its current value and 3/4 cwnd. If the sender then has 204 more data to send than the decayed cwnd allows, the TCP will slow- 205 start (perform exponential increase) at least half-way back up to the 206 old value of cwnd. 208 An alternate possibility would be to set ssthresh to the maximum of 209 the current value of ssthresh, and the old value of cwnd, allowing 210 TCP to slow-start all of the way back up to the old value of cwnd. 211 Further experimentation can be used to evaluate these two options for 212 setting ssthresh. 214 For the separate issue of the increase of the congestion window in 215 response to an acknowledgement, we believe the correct behavior is 216 for the sender to increase the congestion window only if the window 217 was full when the acknowledgment arrived. 219 We term this set of modifications to TCP Congestion Window Validation 220 (CWV) because they are related to ensuring the congestion window is 221 always a valid reflection of the current network state as probed by 222 the connection. 224 3.1. The basic algorithm for reducing the congestion window 226 A key issue in the CWV algorithm is to determine how to apply the 227 guideline of reducing the congestion window once for every roundtrip 228 time that the flow is application-limited. We use TCP's 229 retransmission timer (RTO) as a reasonable upper bound on the 230 roundtrip time, and reduce the congestion window once per RTO. 232 This basic algorithm could be implemented in TCP as follows: After 233 TCP sends a packet, it checks to see if that packet filled the 234 congestion window. If so, the sender is network-limited, and sets 235 the variable T_prev to the current TCP clock time, and a variable 236 W_used to zero. T_prev will be used to determine the elapsed time 237 since the sender last was network-limited. When the sender is 238 application-limited, W_used holds the maximum congestion window 239 actually used since the sender was last network-limited. 241 If the transmitted packet did not fill the congestion window and the 242 TCP send queue is empty, then the sender is application-limited. The 243 sender checks to see if the amount of unacknowledged data is greater 244 than W_used; if so, W_used is set to the amount of unacknowledged 245 data. In addition TCP checks to see if the elapsed time since T_prev 246 is greater than RTO. If so, then the TCP has been application- 247 limited rather than network-limited for an entire RTO interval. In 248 this case, TCP sets ssthresh to the maximum of 3/4 cwnd and the 249 current value of ssthresh, and reduces its congestion window to 250 (cwnd+W_used)/2. W_used is then set to zero, T_prev is set to the 251 current time, so a further reduction will not take place until 252 another RTO period has elapsed. 254 After TCP sends a packet, it also sets the variable T_{last} to the 255 current time. When TCP sends a new packet it also checks to see if 256 more than RTO seconds have elapsed since the previous packet was 257 sent. If RTO has elapsed, ssthresh is set to the maximum of 3/4 cwnd 258 and the current value of ssthresh, and then the congestion window is 259 halved for every RTO that elapsed since the previous packet was sent. 260 In addition, T_prev is set to the current time, and W_used is reset 261 to zero. This last mechanism could also be implemented by using a 262 timer that expires every RTO after the last packet was sent instead 263 of a check per packet - efficiency constraints on different operating 264 systems may dictate which is more efficient to implement. 266 3.2. Pseudo-code for reducing the congestion window 267 Initially: 268 T_last = tcpnow, T_prev = tcpnow, W_used = 0 270 After sending a data segment: 271 If tcpnow - T_last >= RTO 272 (The sender has been idle.) 273 ssthresh = max(ssthresh, 3*cwnd/4) 274 For i=1 To (tcpnow - T_last)/RTO 275 win = min(cwnd, receiver's declared max window) 276 cwnd = max(win/2, MSS) 277 T_prev = tcpnow 278 W_used = 0 280 T_last = tcpnow 282 If window is full 283 T_prev = tcpnow 284 W_used = 0 285 Else 286 If no more data is available to send 287 W_used = max(W_used, amount of unacknowledged data) 288 If tcpnow - T_prev >= RTO 289 (The sender has been application-limited.) 290 ssthresh = max(ssthresh, 3*cwnd/4) 291 win = min(cwnd, receiver's declared max window) 292 cwnd = (win + W_used)/2 293 T_prev = tcpnow 294 W_used = 0 296 4. Simulations 298 The CWV proposal has been implemented as an option in the network 299 simulator NS [NS]. The simulations in the validation test suite for 300 CWV can be run with the command "./test-all-tcp" in the directory 301 "tcl/test". The simulations show the use of CWV to reduce the 302 congestion window after a period when the TCP connection was 303 application-limited, and to limit the increase in the congestion 304 window when a transfer is application-limited. As the simulations 305 illustrate, the use of ssthresh to maintain connection history is a 306 critical part of the Congestion Window Validation algorithm. [HPF99] 307 discusses these simulations in more detail. 309 5. Experiments 311 We have implemented the CWV mechanism in the TCP implementation in 312 FreeBSD 3.2. [HPF99] discusses these experiments in more detail. 314 The first experiment examines the effects of the Congestion Window 315 Validation mechanisms for limiting cwnd increases during application- 316 limited periods. The experiment used a real ssh connection through a 317 modem link emulated using Dummynet[Dummynet]. The link speed is 318 30Kb/s and the link has five packet buffers available. Today most 319 modem banks have more buffering available than this, but the more 320 buffer-limited situation sometimes occurs with older modems. In the 321 first half of the transfer, the user is typing away over the 322 connection. About half way through the time, the user lists a 323 moderately large file, which causes a large burst of traffic to be 324 transmitted. 326 For the unmodified TCP, every returning ACK during the first part of 327 the transfer results in an increase in cwnd. As a result, the large 328 burst of data arriving from the application to the transport layer is 329 sent as many back-to-back packets, most of which get lost and 330 subsequently retransmitted. 332 For the modified TCP with Congestion Window Validation, the 333 congestion window is not increased when the window is not full, has 334 been decreased during application-limited periods closer to what the 335 user actually used. The burst of traffic is now constrained by the 336 congestion window, resulting in a better-behaved flow with minimal 337 loss. The end result is that the transfer happens approximately 30% 338 faster than the transfer without CWV, due to avoiding retransmission 339 timeouts. 341 The second experiment uses a real ssh connection over a real dialup 342 ppp connection, where the modem bank has much more buffering. For 343 the unmodified TCP, the initial burst from the large file does not 344 cause loss, but does cause the RTT to increase to approximately 5 345 seconds, where the connection becomes bounded by the receiver's 346 window. 348 For the modified TCP with Congestion Window Validation, the flow is 349 much better behaved, and produces no large burst of traffic. In this 350 case the linear increase for cwnd results in a slow increase in the 351 RTT as the buffer slowly fills. 353 For the second experiment, both the modified and the unmodified TCP 354 finish delivering the data at precisely the same time. This is 355 because the link has been fully utilized in both cases due to the 356 modem buffer being larger than the receiver window. Clearly a modem 357 buffer of this size is undesirable due to its effect on the RTT of 358 competing flows, but it is necessary with current TCP implementations 359 that produce bursts similar to those shown in the top graph. 361 6. Conclusions 363 This document has presented several TCP algorithms for Congestion 364 Window Validation, to be employed after an idle period or a period in 365 which the sender was application-limited, and before an increase of 366 the congestion window. The goal of these algorithms is for TCP's 367 congestion window to reflect recent knowledge of the TCP connection 368 about the state of the network path, while at the same time keeping 369 some memory (i.e., in ssthresh) about the earlier state of the path. 370 We believe that these modifications will be of benefit to both the 371 network and to the TCP flows themselves, by preventing unnecessary 372 packet drops due to the TCP sender's failure to update its 373 information (or lack of information) about current network 374 conditions. Future work will document and investigate the benefit 375 provided by these algorithms, using both simulations and experiments. 376 Additional future work will describe a more complex version of the 377 CWV algorithm for TCP implementations where the sender does not have 378 an accurate estimate of the TCP roundtrip time. 380 8. References 382 [FF96] Fall, K., and Floyd, S., Simulation-based Comparisons of 383 Tahoe, Reno, and SACK TCP, Computer Communication Review, V. 26 N. 3, 384 July 1996, pp. 5-21. URL `http://www.aciri.org/floyd/papers.html'. 386 [HPF99] Mark Handley, Jitendra Padhye, Sally Floyd, TCP Congestion 387 Window Validation, UMass CMPSCI Technical Report 99-77, September 388 1999. URL ``ftp://www-net.cs.umass.edu/pub/Handley99-tcpq- 389 tr-99-77.ps.gz''. 391 [HTH98] Amy Hughes, Joe Touch, John Heidemann, Issues in TCP Slow- 392 Start Restart After Idle Work-in-progress. April 1998. URL 393 ``ftp://ftp.isi.edu/internet-drafts/draft-ietf-tcpimpl- 394 restart-00.txt". 396 [J88] Jacobson, V., Congestion Avoidance and Control, Originally from 397 Proceedings of SIGCOMM '88 (Palo Alto, CA, Aug. 1988), and revised in 398 1992. URL ``http://www-nrg.ee.lbl.gov/nrg-papers.html". 400 [JKBFL96] Raj Jain, Shiv Kalyanaraman, Rohit Goyal, Sonia Fahmy, and 401 Fang Lu, Comments on "Use-it or Lose-it", ATM Forum Document Number: 402 ATM Forum/96-0178, URL `http://www.netlab.ohio- 403 state.edu/~jain/atmf/af_rl5b2.htm'. 405 [JKGFL95] R. Jain, S. Kalyanaraman, R. Goyal, S. Fahmy, and F. Lu, A 406 Fix for Source End System Rule 5, AF-TM 95-1660, December 1995, URL 407 `http://www.netlab.ohio-state.edu/~jain/atmf/af_rl52.htm'. 409 [MSML99] Matt Mathis, Jeff Semke, Jamshid Mahdavi, and Kevin Lahey, 410 The Rate-Halving Algorithm for TCP Congestion Control, June 1999. 411 URL ``http://www.psc.edu/networking/ftp/papers/draft- 412 ratehalving.txt''. 414 [NS] NS, the UCB/LBNL/VINT Network Simulator. URL ``http://www- 415 mash.cs.berkeley.edu/ns/''. 417 [RFC2581] M. Allman, V. Paxson, and W. Stevens, TCP Congestion 418 Control, RFC 2581, Proposed Standard, April 1999. URL 419 ``ftp://ftp.isi.edu/in-notes/rfc2581.txt''. 421 [VH97] Vikram Visweswaraiah and John Heidemann. Improving Restart of 422 Idle TCP Connections, Technical Report 97-661, University of Southern 423 California, November, 1997. 425 [Dummynet] Luigi Rizzo, "Dummynet and Forward Error Correction", 426 Freenix 98, June 1998, New Orleans. URL 427 ``http://info.iet.unipi.it/~luigi/ip_dummynet/''. 429 AUTHORS' ADDRESSES 431 Mark Handley 432 AT&T Center for Internet Research at ICSI (ACIRI) 433 Phone: +1 510 642 4274 x 146 434 EMail: mjh@aciri.org 435 URL: http://www.aciri.org/mjh/ 437 Jitendra Padhye 438 University of Massachusetts at Amherst 439 Phone: (413) 545 2447 440 EMail: jitu@cs.umass.edu 441 URL: http://www-net.cs.umass.edu/~jitu/ 443 Sally Floyd 444 AT&T Center for Internet Research at ICSI (ACIRI) 445 Phone: +1 510-642-4274 x189 446 EMail: floyd@aciri.org 447 URL: http://www.aciri.org/floyd/ 449 This draft was created in December 1999. 450 It expires June 2000. 452 --------------------------------------------