idnits 2.17.1 draft-allman-tcp-early-rexmt-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 61 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 159: '...used to trigger Fast Retransmit MAY be...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2026' is mentioned on line 19, but not defined == Unused Reference: 'AA02' is defined on line 224, but no explicit reference was found in the text == Unused Reference: 'BPS99' is defined on line 249, but no explicit reference was found in the text == Unused Reference: 'LK98' is defined on line 263, but no explicit reference was found in the text == Unused Reference: 'Mor97' is defined on line 266, but no explicit reference was found in the text == Unused Reference: 'SCWA99' is defined on line 273, but no explicit reference was found in the text == Unused Reference: 'RFC3150' is defined on line 305, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AA02' -- Possible downref: Non-RFC (?) normative reference: ref. 'All00' -- Possible downref: Non-RFC (?) normative reference: ref. 'AP99' -- Possible downref: Non-RFC (?) normative reference: ref. 'Bal98' -- Possible downref: Non-RFC (?) normative reference: ref. 'BPS99' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jac88' -- Possible downref: Non-RFC (?) normative reference: ref. 'LK98' -- Possible downref: Non-RFC (?) normative reference: ref. 'Mor97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97' -- Possible downref: Non-RFC (?) normative reference: ref. 'SCWA99' ** Obsolete normative reference: RFC 2481 (Obsoleted by RFC 3168) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782) ** Downref: Normative reference to an Informational RFC: RFC 2884 ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) Summary: 9 errors (**), 0 flaws (~~), 9 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Mark Allman 3 INTERNET DRAFT NASA GRC/BBN 4 File: draft-allman-tcp-early-rexmt-00.txt Konstantin Avrachenkov 5 INRIA 6 Urtzi Ayesta 7 France Telecom R&D 8 Josh Blanton 9 Ohio University 10 February, 2003 11 Expires: August, 2003 13 Early Retransmit for TCP 15 Status of this Memo 17 This document is an Internet-Draft and is in full conformance with 18 all provisions of Section 10 of [RFC2026]. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as 23 Internet-Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six 26 months and may be updated, replaced, or obsoleted by other documents 27 at any time. It is inappropriate to use Internet-Drafts as 28 reference material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 Abstract 38 This document proposes a new TCP mechanism that can be used to more 39 effectively recover lost segments when a connection's congestion 40 window is small. The "Early Retransmit" mechanism allows TCP to 41 reduce, in certain special circumstances, the number of duplicate 42 acknowledgments required to trigger a fast retransmission. This 43 allows TCP to use fast retransmit to recover packet losses that 44 would otherwise require a lengthy retransmission timeout. 46 1 Introduction 48 A number of researchers have pointed out that TCP's loss recovery 49 strategies do not work well when the congestion window at a TCP 50 sender is small. This can happen in a number of situations, such 51 as: 53 (1) The TCP connection is "application limited" and has only a 54 limited amount of data to send. 56 (2) The TCP connection is limited by the receiver-advertised window. 58 (3) The TCP connection is constrained by end-to-end congestion 59 control when the connection's share of the path is small, the 60 path has a small bandwidth-delay product or TCP is ascertaining 61 the available bandwidth in the first few round-trip times of 62 slow start. 64 (4) The TCP connection is "winding down" at the end of a transfer 65 such that data is draining from the network but no new data 66 (from the application) is available to transmit. 68 Many researchers have studied problems with TCP when the congestion 69 window is small and have outlined possible mechanisms to mitigate 70 these problems (e.g., [Mor97,BPS+98,Bal98,LK98,RFC3150,AA02]). When 71 TCP detects a missing segment, the connection enters a loss recovery 72 phase using one of two methods. First, if an acknowledgment (ACK) 73 for a given segment is not received in a certain amount of time a 74 retransmission timeout occurs and the segment is resent [RFC2988]. 75 Second, the ``Fast Retransmit'' algorithm resends a segment when 76 three duplicate ACKs arrive at the sender [Jac88,RFC2581]. However, 77 because duplicate ACKs from the receiver are also triggered by 78 packet reordering in the Internet, the TCP sender waits for three 79 duplicate ACKs in an attempt to disambiguate segment loss from 80 packet reordering. Once in a loss recovery phase, a number of 81 techniques can be used to retransmit lost segments, including slow 82 start based recovery or Fast Recovery [RFC2581], NewReno [RFC2582], 83 and loss recovery based on selective acknowledgments (SACKs) 84 [RFC2018,FF96,BAFW02]. 86 TCP's retransmission timeout (RTO) is based on measured round-trip 87 times (RTT) between the sender and receiver, as specified in 88 [RFC2988]. To prevent spurious retransmissions of segments that are 89 only delayed and not lost, the minimum RTO is conservatively chosen 90 to be 1 second. Therefore, it behooves TCP senders to detect and 91 recover from as many losses as possible without incurring a lengthy 92 timeout during which the connection remains idle. However, if not 93 enough duplicate ACKs arrive from the receiver, the Fast Retransmit 94 algorithm is never triggered---this situation occurs when the 95 congestion window is small, if a large number of segments in a 96 window are lost or at the end of a transfer as data drains from the 97 network. For instance, consider a congestion window (cwnd) of three 98 segments. If one segment is dropped by the network, then at most 99 two duplicate ACKs will arrive at the sender, assuming no ACK loss. 100 Since three duplicate ACKs are required to trigger Fast Retransmit, 101 a timeout will be required to resend the dropped packet. 103 [BPS+98] shows that roughly 56% of retransmissions sent by a busy 104 web server are sent after the RTO expires, while only 44% are 105 handled by Fast Retransmit. In addition, only 4% of the RTO-based 106 retransmissions could have been avoided with SACK, which has to 107 continue to disambiguate reordering from genuine loss. Furthermore, 108 [All00] shows that for one particular web server the median transfer 109 size is less than four segments, indicating that more than half of 110 the connections will be forced to rely on the RTO to recover from 111 any losses that occur. Thus, loss recovery without relying on the 112 conservative RTO is beneficial for short TCP transfers. In 113 particular, as a consequence of points (3) and (4) above, a single 114 segment loss will require TCP to RTO when a loss occurs in small 115 transfers. 117 The Limited Transmit mechanism introduced in [RFC3042] allows a TCP 118 sender to send previously unsent data upon the reception of each of 119 the two duplicate ACKs that precede a fast retransmit. By sending 120 these two new segments the TCP sender is attempting to induce 121 additional duplicate ACKs (if appropriate) so that Fast Retransmit 122 will be triggered before the retransmission timeout expires. The 123 "Early Retransmit" mechanism outlined in this document covers the 124 case when previously unsent data is not available for transmission. 126 The next section of this document outlines a small change to TCP 127 senders that will decrease the reliance on the retransmission timer, 128 and thereby improve TCP performance when Fast Retransmit would not 129 otherwise be triggered. 131 2 Reduction of the Retransmission Threshold 133 Limited Transmit [RFC3042] allows the sender to attempt to induce 134 enough duplicate ACKs to trigger Fast Retransmit. However, in some 135 cases the TCP sender may not have new data queued and ready to be 136 transmitted or may be limited by the advertised window when the 137 first two duplicate ACKs arrive. In these cases, the Limited 138 Transmit algorithm cannot be utilized. If there is a large amount 139 of outstanding data in the network, not being able to transmit new 140 segments when the first two duplicate ACKs arrive is not a problem, 141 as Fast Retransmit will be triggered naturally. However, when the 142 amount of outstanding data is small the sender will have to rely on 143 the RTO to repair any lost segments. 145 As an example, consider the case when cwnd is three segments and one 146 of these segments is dropped by the network. If the other two 147 segments arrive at the receiver and the corresponding ACKs are not 148 dropped by the network the sender will receive two duplicate ACKs, 149 which is not enough to trigger the Fast Retransmit algorithm. The 150 loss can therefore be repaired only after an RTO. However, the 151 sender has enough information to infer that it cannot expect three 152 duplicate ACKs when one segment is dropped. 154 The first mitigation of the above problem involves lowering the 155 duplicate ACK threshold when the amount of outstanding data is small 156 and when no unsent data segments are enqueued. In particular, if 157 the amount of outstanding data (ownd) is less than 4 segments and 158 there are no unsent segments ready for transmission at the sender, 159 the duplicate ACK threshold used to trigger Fast Retransmit MAY be 160 reduced to ownd-1 duplicate ACKs (where ownd is in terms of 161 segments). In other words, when ownd is small enough that losing 162 one segment would not trigger Fast Retransmit, we reduce the 163 duplicate ACK threshold to the number of duplicate ACKs expected if 164 one segment is lost. This mitigation is less robust in the face of 165 reordered segments than the standard Fast Retransmit threshold of 166 three duplicate ACKs. Research shows that a general reduction in 167 the number of duplicate ACKs required to trigger fast retransmission 168 of a segment to two (rather than three) leads to a reduction in the 169 ratio of good to bad retransmits by a factor of three [Pax97]. 170 However, this analysis did not include the additional conditioning 171 on the event that the ownd was smaller than 4 segments. 173 We note two "worst case" scenarios for Early Retransmit: 175 (1) Persistent reordering of segments, coupled with an application 176 that does not constantly send data, can result in large numbers 177 of needless retransmissions when using Early Retransmit. For 178 instance, consider an application that sends data two segments 179 at a time, followed by an idle period when no data is queued for 180 delivery by TCP. If the network consistently reorders the two 181 segments, the TCP sender will needlessly retransmit one out of 182 every two unique segments transmitted (and one-third of all 183 segments) when using the above algorithm. However, this would 184 only be a problem for long-lived connections from applications 185 that transmit in spurts. 187 (2) Similar to the above, consider the case of 2 segment transfers 188 that always experience reordering. Just as in (1) above, one 189 out of every two unique data segments will be retransmitted 190 needlessly, therefore one-third of the traffic will be spurious. 192 Currently this document offers no suggestion on how to mitigate the 193 above problems. Appendix A offers a survey of possible mitigations. 194 However, the authors would like further input before choosing one of 195 these options (or, deciding that the worst case scenarios listed 196 above are sufficiently rare that Early Retransmit can be used 197 without modification). 199 3 Related Work 201 Deployment of Explicit Congestion Notification (ECN) [Flo94,RFC2481] 202 may benefit connections with small congestion window sizes 203 [RFC2884]. ECN provides a method for indicating congestion to the 204 end-host without dropping segments. While some segment drops may 205 still occur, ECN may allow TCP to perform better with small cwnd 206 sizes because the sender will be required to detect less segment 207 loss [RFC2884]. 209 4 Security Considerations 211 The security considerations found in [RFC2581] apply to this 212 document. No additional security problems have been identified with 213 Early Retransmit at this time. 215 Acknowledgments 217 We thank Sally Floyd for her feedback in discussions about Early 218 Retransmit. We also thank Sally Floyd and Hari Balakrishnan who 219 helped with a large portion of the text of this document when it was 220 part of a seperate effort. 222 References 224 [AA02] Urtzi Ayesta, Konstantin Avrachenkov, "The Effect of the 225 Initial Window Size and Limited Transmit Algorithm on the 226 Transient Behavior of TCP Transfers", In Proc. of the 15th ITC 227 Internet Specialist Seminar, Wurzburg, July 2002. 229 [All00] Mark Allman. A Server-Side View of WWW Characteristics. 230 ACM Computer Communications Review, October 2000. 232 [AP99] Mark Allman, Vern Paxson. On Estimating End-to-End Network 233 Path Properties. ACM SIGCOMM, September 1999. 235 [BAFW02] Ethan Blanton, Mark Allman, Kevin Fall, Lili Wang. A 236 Conservative SACK-based Loss Recovery Algorithm for TCP, October 237 2002. Internet-Draft draft-allman-tcp-sack-13.txt (work in 238 progress). 240 [Bal98] Hari Balakrishnan. Challenges to Reliable Data Transport 241 over Heterogeneous Wireless Networks. Ph.D. Thesis, University 242 of California at Berkeley, August 1998. 244 [BPS+98] Hari Balakrishnan, Venkata Padmanabhan, Srinivasan Seshan, 245 Mark Stemm, and Randy Katz. TCP Behavior of a Busy Web Server: 246 Analysis and Improvements. Proc. IEEE INFOCOM Conf., San 247 Francisco, CA, March 1998. 249 [BPS99] Jon Bennett, Craig Partridge, Nicholas Shectman. Packet 250 Reordering is Not Pathological Network Behavior. IEEE/ACM 251 Transactions on Networking, December 1999. 253 [FF96] Kevin Fall, Sally Floyd. Simulation-based Comparisons of 254 Tahoe, Reno, and SACK TCP. ACM Computer Communication Review, 255 July 1996. 257 [Flo94] Sally Floyd. TCP and Explicit Congestion Notification. ACM 258 Computer Communication Review, October 1994. 260 [Jac88] Van Jacobson. Congestion Avoidance and Control. ACM 261 SIGCOMM 1988. 263 [LK98] Dong Lin, H.T. Kung. TCP Fast Recovery Strategies: Analysis 264 and Improvements. Proceedings of InfoCom, March 1998. 266 [Mor97] Robert Morris. TCP Behavior with Many Flows. Proceedings 267 of the Fifth IEEE International Conference on Network Protocols. 268 October 1997. 270 [Pax97] Vern Paxson. End-to-End Internet Packet Dynamics. ACM 271 SIGCOMM, September 1997. 273 [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, Tom 274 Anderson. TCP Congestion Control with a Misbehaving Receiver. 275 ACM Computer Communications Review, October 1999. 277 [RFC2018] Matt Mathis, Jamshid Mahdavi, Sally Floyd, Allyn Romanow. 278 TCP Selective Acknowledgement Options. RFC 2018, October 1996. 280 [RFC2481] K. K. Ramakrishnan, Sally Floyd. A Proposal to Add 281 Explicit Congestion Notification (ECN) to IP. RFC 2481, January 282 1999. 284 [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens. TCP 285 Congestion Control. RFC 2581, April 1999. 287 [RFC2582] Sally Floyd, Tom Henderson. The NewReno Modification to 288 TCP's Fast Recovery Algorithm. RFC 2582, April 1999. 290 [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis, Matt Podolsky. 291 An Extension to the Selective Acknowledgement (SACK) Option for 292 TCP. RFC 2883, July 2000. 294 [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed. Performance Evaluation 295 of Explicit Congestion Notification (ECN) in IP Networks. RFC 296 2884, July 2000. 298 [RFC2988] Vern Paxson, Mark Allman. Computing TCP's Retransmission 299 Timer. RFC 2988, April 2000. 301 [RFC3042] Mark Allman, Hari Balakrishnan, Sally Floyd. Enhancing 302 TCP's Loss Recovery Using Limited Transmit. RFC 3042, January 303 2001. 305 [RFC3150] Spencer Dawkins, Gabriel Montenegro, Markku Kojo, Vincent 306 Magret. End-to-end Performance Implications of Slow Links. RFC 307 3150, July 2001. 309 Author's Addresses: 311 Mark Allman 312 NASA Glenn Research Center/BBN Technologies 313 Lewis Field 314 21000 Brookpark Rd. MS 54-2 315 Cleveland, OH 44135 316 Phone: 216-433-6586 317 Fax: 216-433-8705 318 mallman@bbn.com 319 http://roland.grc.nasa.gov/~mallman 321 Konstantin Avrachenkov 322 INRIA 323 2004 route des Lucioles, B.P.93 324 06902, Sophia Antipolis 325 France 326 Phone: 00 33 492 38 7751 327 Email: k.avrachenkov@inria.fr 329 Urtzi Ayesta 330 France Telecom R&D 331 905 rue Albert Einstein 332 06921 Sophia Antipolis 333 France 334 Email: Urtzi.Ayesta@francetelecom.com 336 Josh Blanton 337 Ohio University 338 301 Stocker Center 339 Athens, OH 45701 340 jblanton@irg.cs.ohiou.edu 342 Appendix A: Research Issues in Adjusting the Duplicate ACK Threshold 344 Decreasing the number of duplicate ACKs required to trigger Fast 345 Retransmit, as suggested in section 2, has the drawback of making 346 Fast Retransmit less robust in the face of minor network reordering. 347 Two egregious examples of problems caused by reordering are given in 348 section 2. This appendix outlines several schemes that have been 349 suggested to mitigate the problems caused to Early Retransmit by 350 reordering. These methods need further research before they are 351 suggested for use in shared networks. 353 One possible mitigation for the damge of spurious retransmits is to 354 allow a TCP connection to only send one retransmission using a 355 duplicate ACK threshold of less than three. This allows for 356 enhanced recovery for short connections and protects the network 357 from longer connections that could possibly use this algorithm to 358 send many needless retransmissions. 360 Using information provided by the DSACK option [RFC2883], a TCP 361 sender can determine when its Fast Retransmit threshold is too low, 362 causing needless retransmissions due to reordering in the network. 363 Coupling the information provided by DSACKs with the algorithm 364 outlined in section 2 may provide a further enhancement. 365 Specifically, the proposed reduction in the duplicate ACK threshold 366 would not be taken if the network path is known to be reordering 367 segments. 369 The next method is to detect needless retransmits based on the time 370 between the retransmission and the next ACK received. As outlined 371 in [AP99] if this time is less than half of the minimum RTT observed 372 thus far the retransmission was likely unnecessary. When using less 373 than three duplicate ACKs as the threshold to trigger Fast 374 Retransmit, a TCP sender could attempt to determine whether the 375 retransmission was needed or not. In the case when it was 376 unnecessary, the sender could refrain from further use of Fast 377 Retransmit with a threshold of less than three duplicate ACKs. This 378 method of detecting bad retransmits is not as robust as using 379 DSACKs. However, the advantage is that this mechanism only requires 380 sender-side implementation changes. 382 A TCP sender can take measures to avoid the case where a large 383 percentage of the unique segments transmitted are being needlessly 384 retransmitted due to the use of a low duplicate ACK threshold (such 385 as the one outlined in section 2). Specifically, the sender can 386 limit the percentage of retransmits based on a duplicate ACK 387 threshold of less than three. This allows the mechanism to be used 388 throughout a long lived connection, but at the same time protecting 389 the network from potentially wasteful needless retransmissions. 390 However, this solution does not attempt to address the underlying 391 problem, but rather limits the damage the algorithm can cause. 393 Finally, [Bal98] outlines another solution to the problem of having 394 no new segments to transmit into the network when the first two 395 duplicate ACKs arrive. In response to these duplicate ACKs, a TCP 396 sender transmits zero-byte segments to induce additional duplicate 397 ACKs [Bal98]. This method preserves the robustness of the standard 398 Fast Retransmit algorithm at the cost of injecting segments into the 399 network that do not deliver any data (and, therefore are potentially 400 wasting network resources). 402 Even with the introduction of the Early Retransmit mechanism, the 403 loss of the last segment of a transfer will lead to a timeout. To 404 overcome this TCP can send an extra segment at the end of the 405 session containing no data. One may expect this would introduce 406 less aditional load than the proposal of [Bal98], but requires more 407 research before such a mechanism can be recommended.