idnits 2.17.1 draft-allman-tcp-early-rexmt-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. ** There is 1 instance of lines with control characters in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 160: '...ger Fast Retransmit MAY be reduced to:...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2026' is mentioned on line 19, but not defined == Unused Reference: 'AA02' is defined on line 238, but no explicit reference was found in the text == Unused Reference: 'LK98' is defined on line 265, but no explicit reference was found in the text == Unused Reference: 'Mor97' is defined on line 268, but no explicit reference was found in the text == Unused Reference: 'RFC3150' is defined on line 310, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AA02' -- Possible downref: Non-RFC (?) normative reference: ref. 'All00' -- Possible downref: Non-RFC (?) normative reference: ref. 'Bal98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jac88' -- Possible downref: Non-RFC (?) normative reference: ref. 'LK98' -- Possible downref: Non-RFC (?) normative reference: ref. 'Mor97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2481 (Obsoleted by RFC 3168) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782) ** Downref: Normative reference to an Informational RFC: RFC 2884 ** Obsolete normative reference: RFC 2960 (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675) ** Downref: Normative reference to an Experimental RFC: RFC 3522 Summary: 15 errors (**), 0 flaws (~~), 7 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Mark Allman 3 INTERNET DRAFT NASA GRC/BBN 4 File: draft-allman-tcp-early-rexmt-01.txt Konstantin Avrachenkov 5 INRIA 6 Urtzi Ayesta 7 France Telecom R&D 8 Josh Blanton 9 Ohio University 10 June, 2003 11 Expires: December, 2003 13 Early Retransmit for TCP and SCTP 15 Status of this Memo 17 This document is an Internet-Draft and is in full conformance with 18 all provisions of Section 10 of [RFC2026]. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as 23 Internet-Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six 26 months and may be updated, replaced, or obsoleted by other documents 27 at any time. It is inappropriate to use Internet-Drafts as 28 reference material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 Abstract 38 This document proposes a new mechanism for TCP and SCTP that can be 39 used to more effectively recover lost segments when a connection's 40 congestion window is small. The "Early Retransmit" mechanism allows 41 the transport to reduce, in certain special circumstances, the 42 number of duplicate acknowledgments required to trigger a fast 43 retransmission. This allows the transport to use fast retransmit to 44 recover packet losses that would otherwise require a lengthy 45 retransmission timeout. 47 1 Introduction 49 A number of researchers have pointed out that the loss recovery 50 strategies employed by TCP [RFC793] and SCTP [RFC2960] do not work 51 well when the congestion window at a TCP sender is small. This can 52 happen in a number of situations, such as: 54 (1) The connection is "application limited" and has only a limited 55 amount of data to send. This can happen any time the 56 application does not produce enough data to fill the congestion 57 window. A particular case when all connections become 58 application limited is as the connection ends. 60 (2) The connection is limited by the receiver-advertised window. 62 (3) The connection is constrained by end-to-end congestion control 63 when the connection's share of the path is small, the path has a 64 small bandwidth-delay product or the transport is ascertaining 65 the available bandwidth in the first few round-trip times of 66 slow start. 68 Many researchers have studied problems with TCP when the congestion 69 window is small and have outlined possible mechanisms to mitigate 70 these problems (e.g., [Mor97,BPS+98,Bal98,LK98,RFC3150,AA02]). 71 SCTP's loss recovery and congestion control mechanisms are based on 72 TCP and therefore the same problems impact the performance of SCTP 73 connections. When the transport detects a missing segment, the 74 connection enters a loss recovery phase using one of two methods. 75 First, if an acknowledgment (ACK) for a given segment is not 76 received in a certain amount of time a retransmission timer fires 77 and the segment is resent [RFC2988]. Second, the ``Fast 78 Retransmit'' algorithm resends a segment when three duplicate ACKs 79 arrive at the sender [Jac88,RFC2581]. However, because duplicate 80 ACKs from the receiver are also triggered by packet reordering in 81 the Internet, the sender waits for three duplicate ACKs in an 82 attempt to disambiguate segment loss from packet reordering. When 83 using small windows it may not be possible to generate the required 84 number of duplicate ACKs to trigger Fast Retransmit when a loss does 85 happen. 87 Once in a loss recovery phase, a number of techniques can be used to 88 retransmit lost segments. TCP can use slow start based recovery or 89 Fast Recovery [RFC2581], NewReno [RFC2582], and loss recovery based 90 on selective acknowledgments (SACKs) [RFC2018,FF96,RFC3517]. SCTP's 91 loss recovery is not as varied due to the built-in selective 92 acknowledgments. 94 The transport's retransmission timeout (RTO) is based on measured 95 round-trip times (RTT) between the sender and receiver, as specified 96 in [RFC2988] (for TCP) and [RFC2960] (for SCTP). To prevent 97 spurious retransmissions of segments that are only delayed and not 98 lost, the minimum RTO is conservatively chosen to be 1 second. 99 Therefore, it behooves TCP senders to detect and recover from as 100 many losses as possible without incurring a lengthy timeout during 101 which the connection remains idle. However, if not enough duplicate 102 ACKs arrive from the receiver, the Fast Retransmit algorithm is 103 never triggered---this situation occurs when the congestion window 104 is small, if a large number of segments in a window are lost or at 105 the end of a transfer as data drains from the network. For 106 instance, consider a congestion window (cwnd) of three segments. If 107 one segment is dropped by the network, then at most two duplicate 108 ACKs will arrive at the sender, assuming no ACK loss. Since three 109 duplicate ACKs are required to trigger Fast Retransmit, a timeout 110 will be required to resend the dropped packet. 112 [BPS+98] shows that roughly 56% of retransmissions sent by a busy 113 web server are sent after the RTO timer expires, while only 44% are 114 handled by Fast Retransmit. In addition, only 4% of the RTO 115 timer-based retransmissions could have been avoided with SACK, which 116 has to continue to disambiguate reordering from genuine 117 loss. Furthermore, [All00] shows that for one particular web server 118 the median transfer size is less than four segments, indicating that 119 more than half of the connections will be forced to rely on the RTO 120 timer to recover from any losses that occur. Thus, loss recovery 121 without relying on the conservative RTO is beneficial for short TCP 122 transfers. 124 The Limited Transmit mechanism introduced in [RFC3042] allows a TCP 125 sender to send previously unsent data upon the reception of each of 126 the two duplicate ACKs that precede a fast retransmit. SCTP 127 [RFC2960] uses SACK information to calculate the number of 128 outstanding segments in the network. Hence, when the first two 129 duplicate ACKs arrive at the sender they will indicate that data has 130 left the network and allow the sender to transmit new data (if 131 available) similar to TCP's Limited Transmit algorithm. 133 By sending these two new segments the TCP sender is attempting to 134 induce additional duplicate ACKs (if appropriate) so that Fast 135 Retransmit will be triggered before the retransmission timeout 136 expires. The "Early Retransmit" mechanism outlined in this document 137 covers the case when previously unsent data is not available for 138 transmission. 140 The next section of this document outlines a small change to TCP and 141 SCTP senders that will decrease the reliance on the retransmission 142 timer, and thereby improve performance when Fast Retransmit cannot 143 otherwise be triggered. 145 2 Reduction of the Retransmission Threshold 147 The Early Retransmit algorithm calls for lowering the duplicate ACK 148 threshold when the amount of outstanding data is small and when no 149 unsent data segments are enqueued. In particular, if the following 150 two conditions hold the sender can use Early Retransmit. 152 (2.a) The amount of outstanding data (ownd) is less than 4*SMSS 153 bytes. 155 (2.b) There is either no unsent data ready for transmission at the 156 sender or the advertised window does not permit new segments to 157 be transmitted. 159 When the above two conditions hold the duplicate ACK threshold used 160 to trigger Fast Retransmit MAY be reduced to: 162 ER_thresh = ceiling (ownd/SMSS) - 1 (1) 164 duplicate ACKs, where ownd is in terms of bytes. In other words, 165 when ownd is small enough that losing one segment would not trigger 166 Fast Retransmit, the duplicate ACK threshold is reduced to the 167 number of duplicate ACKs expected if one segment is lost. This 168 mitigation is less robust in the face of reordered segments than the 169 standard Fast Retransmit threshold of three duplicate ACKs. 170 Research shows that a general reduction in the number of duplicate 171 ACKs required to trigger fast retransmission of a segment to two 172 (rather than three) leads to a reduction in the ratio of good to bad 173 retransmits by a factor of three [Pax97]. However, this analysis 174 did not include the additional conditioning on the event that the 175 ownd was smaller than 4 segments. 177 We note two "worst case" scenarios for Early Retransmit: 179 (1) Persistent reordering of segments, coupled with an application 180 that does not constantly send data, can result in large numbers 181 of needless retransmissions when using Early Retransmit. For 182 instance, consider an application that sends data two segments 183 at a time, followed by an idle period when no data is queued for 184 delivery by TCP. If the network consistently reorders the two 185 segments, the sender will needlessly retransmit one out of every 186 two unique segments transmitted (and one-third of all segments) 187 when using the above algorithm. However, this would only be a 188 problem for long-lived connections from applications that 189 transmit in spurts. 191 (2) Similar to the above, consider the case of 2 segment transfers 192 that always experience reordering. Just as in (1) above, one 193 out of every two unique data segments will be retransmitted 194 needlessly, therefore one-third of the traffic will be spurious. 196 Currently this document offers no suggestion on how to mitigate the 197 above problems. Rather, the authors believe that the community's 198 consensus is that Early Retransmit is scoped enough that the worst 199 case problems are pathological and do not need mitigation at this 200 time. However, Appendix A offers a survey of possible mitigations. 202 3 Related Work 204 Deployment of Explicit Congestion Notification (ECN) [Flo94,RFC2481] 205 may benefit connections with small congestion window sizes 206 [RFC2884]. ECN provides a method for indicating congestion to the 207 end-host without dropping segments. While some segment drops may 208 still occur, ECN may allow TCP to perform better with small cwnd 209 sizes because the sender will be required to detect less segment 210 loss [RFC2884]. 212 [Bal98] outlines another solution to the problem of having no new 213 segments to transmit into the network when the first two duplicate 214 ACKs arrive. In response to these duplicate ACKs, a TCP sender 215 transmits zero-byte segments to induce additional duplicate ACKs. 216 This method preserves the robustness of the standard Fast Retransmit 217 algorithm at the cost of injecting segments into the network that do 218 not deliver any data (and, therefore are potentially wasting network 219 resources). 221 4 Security Considerations 223 The security considerations found in [RFC2581] apply to this 224 document. No additional security problems have been identified with 225 Early Retransmit at this time. 227 Acknowledgments 229 We thank Sally Floyd for her feedback in discussions about Early 230 Retransmit. We also thank Sally Floyd and Hari Balakrishnan who 231 helped with a large portion of the text of this document when it was 232 part of a separate document. Armando Caro and many members of the 233 tsvwg mailing list provided good discussions that helped shape this 234 document. 236 References 238 [AA02] Urtzi Ayesta, Konstantin Avrachenkov, "The Effect of the 239 Initial Window Size and Limited Transmit Algorithm on the 240 Transient Behavior of TCP Transfers", In Proc. of the 15th ITC 241 Internet Specialist Seminar, Wurzburg, July 2002. 243 [All00] Mark Allman. A Server-Side View of WWW Characteristics. 244 ACM Computer Communications Review, October 2000. 246 [Bal98] Hari Balakrishnan. Challenges to Reliable Data Transport 247 over Heterogeneous Wireless Networks. Ph.D. Thesis, University 248 of California at Berkeley, August 1998. 250 [BPS+98] Hari Balakrishnan, Venkata Padmanabhan, Srinivasan Seshan, 251 Mark Stemm, and Randy Katz. TCP Behavior of a Busy Web Server: 252 Analysis and Improvements. Proc. IEEE INFOCOM Conf., San 253 Francisco, CA, March 1998. 255 [FF96] Kevin Fall, Sally Floyd. Simulation-based Comparisons of 256 Tahoe, Reno, and SACK TCP. ACM Computer Communication Review, 257 July 1996. 259 [Flo94] Sally Floyd. TCP and Explicit Congestion Notification. ACM 260 Computer Communication Review, October 1994. 262 [Jac88] Van Jacobson. Congestion Avoidance and Control. ACM 263 SIGCOMM 1988. 265 [LK98] Dong Lin, H.T. Kung. TCP Fast Recovery Strategies: Analysis 266 and Improvements. Proceedings of InfoCom, March 1998. 268 [Mor97] Robert Morris. TCP Behavior with Many Flows. Proceedings 269 of the Fifth IEEE International Conference on Network Protocols. 270 October 1997. 272 [Pax97] Vern Paxson. End-to-End Internet Packet Dynamics. ACM 273 SIGCOMM, September 1997. 275 [RFC793] Jon Postel. Transmission Control Protocol. Std 7, RFC 276 793. September 1981. 278 [RFC2018] Matt Mathis, Jamshid Mahdavi, Sally Floyd, Allyn Romanow. 279 TCP Selective Acknowledgement Options. RFC 2018, October 1996. 281 [RFC2481] K. K. Ramakrishnan, Sally Floyd. A Proposal to Add 282 Explicit Congestion Notification (ECN) to IP. RFC 2481, January 283 1999. 285 [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens. TCP 286 Congestion Control. RFC 2581, April 1999. 288 [RFC2582] Sally Floyd, Tom Henderson. The NewReno Modification to 289 TCP's Fast Recovery Algorithm. RFC 2582, April 1999. 291 [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis, Matt Podolsky. 292 An Extension to the Selective Acknowledgement (SACK) Option for 293 TCP. RFC 2883, July 2000. 295 [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed. Performance Evaluation 296 of Explicit Congestion Notification (ECN) in IP Networks. RFC 297 2884, July 2000. 299 [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. 300 Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, V. 301 Paxson. Stream Control Transmission Protocol. October 2000. 303 [RFC2988] Vern Paxson, Mark Allman. Computing TCP's Retransmission 304 Timer. RFC 2988, April 2000. 306 [RFC3042] Mark Allman, Hari Balakrishnan, Sally Floyd. Enhancing 307 TCP's Loss Recovery Using Limited Transmit. RFC 3042, January 308 2001. 310 [RFC3150] Spencer Dawkins, Gabriel Montenegro, Markku Kojo, Vincent 311 Magret. End-to-end Performance Implications of Slow Links. RFC 312 3150, July 2001. 314 [RFC3517] Ethan Blanton, Mark Allman, Kevin Fall, Lili Wang. A 315 Conservative Selective Acknowledgment (SACK)-based Loss Recovery 316 Algorithm for TCP. RFC 3517, April 2003. 318 [RFC3522] Reiner Ludwig, Michael Meyer. The Eifel Detection 319 Algorithm for TCP. RFC 3522, April 2003. 321 Author's Addresses: 323 Mark Allman 324 NASA Glenn Research Center/BBN Technologies 325 Lewis Field 326 21000 Brookpark Rd. MS 54-2 327 Cleveland, OH 44135 328 Phone: 216-433-6586 329 Fax: 216-433-8705 330 mallman@bbn.com 331 http://roland.grc.nasa.gov/~mallman 333 Konstantin Avrachenkov 334 INRIA 335 2004 route des Lucioles, B.P.93 336 06902, Sophia Antipolis 337 France 338 Phone: 00 33 492 38 7751 339 Email: k.avrachenkov@sophia.inria.fr 340 http://www.inria.fr/mistral/personnel/K.Avrachenkov/moi.html 342 Urtzi Ayesta 343 France Telecom R&D 344 905 rue Albert Einstein 345 06921 Sophia Antipolis 346 France 347 Email: Urtzi.Ayesta@francetelecom.com 348 http://www.inria.fr/mistral/personnel/Urtzi.Ayesta/me.html 350 Josh Blanton 351 Ohio University 352 301 Stocker Center 353 Athens, OH 45701 354 jblanton@irg.cs.ohiou.edu 356 Appendix A: Research Issues in Adjusting the Duplicate ACK Threshold 358 Decreasing the number of duplicate ACKs required to trigger Fast 359 Retransmit, as suggested in section 2, has the drawback of making 360 Fast Retransmit less robust in the face of minor network reordering. 361 Two egregious examples of problems caused by reordering are given in 362 section 2. This appendix outlines several schemes that have been 363 suggested to mitigate the problems caused to Early Retransmit by 364 reordering. These methods need further research before they are 365 suggested for general use. 367 MITIGATION A.1: Allow a connection to use Early Retransmit as long 368 as the algorithm is not injecting a "too much" spurious data into 369 the network. For instance, using the information provided by TCP's 370 DSACK option [RFC2883] or SCTP's Duplicate-TSN notification, a 371 sender can determine when segments sent via Early Retransmit are 372 needless. Likewise, using Eifel [RFC3522] the sender can detect 373 spurious Early Retransmits. Once spurious Early Retransmits are 374 detected the sender can either eliminate the use of Early Retransmit 375 or limit the use of the algorithm to ensure that an acceptably small 376 fraction of the connection's transmissions are not spurious. 378 Alternatively, if a sender cannot reliably determine if an Early 379 Retransmitted segment is spurious or not the sender could simply 380 limit Early Retransmits either to some fixed number per connection 381 (e.g., Early Retransmit is allowed only once per connection) or to 382 some small percentage of the total traffic being transmitted. 384 MITIGATION A.2: Allow a connection to trigger Early Retransmit using 385 the number of duplicate ACKs defined in equation (1), in addition to 386 a "small" timeout [Pax97]. For instance, a sender may have to wait 387 for 2 duplicate ACKs and then T msec before Early Retransmitting a 388 segment. The added time gives reordered acknowledgments time to 389 arrive at the sender and avoid a needless retransmit. Designing a 390 method for choosing an appropriate timeout is part of the research 391 that would need to be involved in this scheme.