idnits 2.17.1 draft-ietf-tcpm-early-rexmt-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 5 instances of too long lines in the document, the longest one being 2 characters in excess of 72. ** The abstract seems to contain references ([RFC5681], [RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 2009) is 5248 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'AA02' is defined on line 475, but no explicit reference was found in the text == Unused Reference: 'LK98' is defined on line 513, but no explicit reference was found in the text == Unused Reference: 'Mor97' is defined on line 517, but no explicit reference was found in the text == Unused Reference: 'RFC3150' is defined on line 534, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 3517 (Obsoleted by RFC 6675) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 6 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Mark Allman 2 INTERNET DRAFT ICSI 3 File: draft-ietf-tcpm-early-rexmt-03.txt Konstantin Avrachenkov 4 Intended Status: Experimental INRIA 5 Urtzi Ayesta 6 LAAS-CNRS 7 Josh Blanton 8 Ohio University 9 Per Hurtig 10 Karlstad University 11 November 2009 12 Expires: May 2010 14 Early Retransmit for TCP and SCTP 16 Status of this Memo 18 This Internet-Draft is submitted to IETF in full conformance with 19 the provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six 27 months and may be updated, replaced, or obsoleted by other documents 28 at any time. It is inappropriate to use Internet-Drafts as 29 reference material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on May 18, 2010. 39 Copyright Notice 41 Copyright (c) 2009 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with 49 respect to this document. Code Components extracted from this 50 document must include Simplified BSD License text as described in 51 Section 4.e of the Trust Legal Provisions and are provided without 52 warranty as described in the BSD License. 54 Abstract 55 This document proposes a new mechanism for TCP and SCTP that can be 56 used to recover lost segments when a connection's congestion window 57 is small. The "Early Retransmit" mechanism allows the transport to 58 reduce, in certain special circumstances, the number of duplicate 59 acknowledgments required to trigger a fast retransmission. This 60 allows the transport to use fast retransmit to recover segment 61 losses that would otherwise require a lengthy retransmission 62 timeout. 64 Terminology 66 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 67 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 68 document are to be interpreted as described in RFC 2119 [RFC2119]. 70 The reader is expected to be familiar with the definitions given in 71 [RFC5681]. 73 1 Introduction 75 Many researchers have studied problems with TCP [RFC793,RFC5681] 76 when the congestion window is small and have outlined possible 77 mechanisms to mitigate these problems 78 [Mor97,BPS+98,Bal98,LK98,RFC3150,AA02]. SCTP's [RFC4960] loss 79 recovery and congestion control mechanisms are based on TCP and 80 therefore the same problems impact the performance of SCTP 81 connections. When the transport detects a missing segment, the 82 connection enters a loss recovery phase. There are several variants 83 of the loss recovery phase depending on the TCP implementation. TCP 84 can use slow start based recovery or Fast Recovery [RFC5681], 85 NewReno [RFC3782], and loss recovery based on selective 86 acknowledgments (SACKs) [RFC2018,FF96,RFC3517]. SCTP's loss 87 recovery is not as varied due to the built-in selective 88 acknowledgments. 90 All the above variants have two methods for invoking loss recovery. 91 First, if an acknowledgment (ACK) for a given segment is not 92 received in a certain amount of time a retransmission timer fires 93 and the segment is resent [RFC2988,RFC4960]. Second, the "Fast 94 Retransmit" algorithm resends a segment when three duplicate ACKs 95 arrive at the sender [Jac88,RFC5681]. Duplicate ACKs are triggered by 96 out-of-order arrivals at the receiver. However, because duplicate 97 ACKs from the receiver are triggered by both segment loss and 98 segment reordering in the network path, the sender waits for three 99 duplicate ACKs in an attempt to disambiguate segment loss from 100 segment reordering. When the congestion window is small it may not 101 be possible to generate the required number of duplicate ACKs to 102 trigger Fast Retransmit when a loss does happen. 104 Small congestion windows can occur in a number of situations, such 105 as: 107 (1) The connection is constrained by end-to-end congestion control 108 when the connection's share of the path is small, the path has a 109 small bandwidth-delay product or the transport is ascertaining 110 the available bandwidth in the first few round-trip times of 111 slow start. 113 (2) The connection is "application limited" and has only a limited 114 amount of data to send. This can happen any time the 115 application does not produce enough data to fill the congestion 116 window. A particular case when all connections become 117 application limited is as the connection ends. 119 (3) The connection is limited by the receiver's advertised window. 121 The transport's retransmission timeout (RTO) is based on measured 122 round-trip times (RTT) between the sender and receiver, as specified 123 in [RFC2988] (for TCP) and [RFC4960] (for SCTP). To prevent 124 spurious retransmissions of segments that are only delayed and not 125 lost, the minimum RTO is conservatively chosen to be 1 second. 126 Therefore, it behooves TCP senders to detect and recover from as 127 many losses as possible without incurring a lengthy timeout during 128 which the connection remains idle. However, if not enough duplicate 129 ACKs arrive from the receiver, the Fast Retransmit algorithm is 130 never triggered---this situation occurs when the congestion window 131 is small, if a large number of segments in a window are lost or at 132 the end of a transfer as data drains from the network. For 133 instance, consider a congestion window of three segments worth of 134 data. If one segment is dropped by the network, then at most two 135 duplicate ACKs will arrive at the sender. Since three duplicate 136 ACKs are required to trigger Fast Retransmit, a timeout will be 137 required to resend the dropped segment. Note, delayed ACKs 138 [RFC5681] may further reduce the number of duplicate ACKs a receiver 139 sends. However, we assume that receivers send immediate ACKs when 140 there is a gap in the received sequence space per [RFC5681]. 142 [BPS+98] shows that roughly 56% of retransmissions sent by a busy 143 web server are sent after the RTO timer expires, while only 44% are 144 handled by Fast Retransmit. In addition, only 4% of the RTO 145 timer-based retransmissions could have been avoided with SACK, which 146 has to continue to disambiguate reordering from genuine loss. 147 Furthermore, [All00] shows that for one particular web server the 148 median number of bytes carried by a connection is less than four 149 segments, indicating that more than half of the connections will be 150 forced to rely on the RTO timer to recover from any losses that 151 occur. Thus, loss recovery that does not rely on the conservative 152 RTO is likely to be beneficial for short TCP transfers. 154 The Limited Transmit mechanism introduced in [RFC3042] and currently 155 codified in [RFC5681] allows a TCP sender to transmit previously 156 unsent data upon the reception of each of the two duplicate ACKs 157 that precede a Fast Retransmit. SCTP [RFC4960] uses SACK 158 information to calculate the number of outstanding segments in the 159 network. Hence, when the first two duplicate ACKs arrive at the 160 sender they will indicate that data has left the network and allow 161 the sender to transmit new data (if available) similar to TCP's 162 Limited Transmit algorithm. In the remainder of this document we 163 use "Limited Transmit" to include both TCP and SCTP mechanisms for 164 sending in response to the first two duplicate ACKs. By sending 165 these two new segments the sender is attempting to induce additional 166 duplicate ACKs (if appropriate) so that Fast Retransmit will be 167 triggered before the retransmission timeout expires. The 168 sender-side "Early Retransmit" mechanism outlined in this document 169 covers the case when previously unsent data is not available for 170 transmission (case (2) above) or cannot be transmitted due to an 171 advertised window limitation (case (3) above). 173 2 Early Retransmit Algorithm 175 The Early Retransmit algorithm calls for lowering the threshold for 176 triggering Fast Retransmit when the amount of outstanding data is 177 small and when no previously unsent data can be transmitted (such 178 that Limited Transmit could be used). Duplicate ACKs are triggered 179 by each arriving out-of-order segment. Therefore, Fast Retransmit 180 will not be invoked when there are less than four outstanding 181 segments (assuming only one segment loss in the window). However, 182 TCP and SCTP are not required to track the number of outstanding 183 segments, but rather the number of outstanding bytes or messages. 184 (Note, SCTP's message boundaries do not necessarily correspond to 185 segment boundaries.) Therefore, applying the intuitive notion of a 186 transport with less than four segments outstanding is more 187 complicated than it first appears. In section 2.1 we describe a 188 "byte-based" variant of Early Retransmit that attempts to roughly 189 map the number of outstanding bytes to a number of outstanding 190 segments that is then used when deciding whether to trigger Early 191 Retransmit. In section 2.2 we describe a "segment-based" variant 192 that represents a more precise algorithm for triggering Early 193 Retransmit. The precision comes at the cost of requiring additional 194 state to be kept by the TCP sender. In both cases we describe 195 SACK-based and non-SACK-based versions of the scheme (of course, the 196 non-SACK version will not apply to SCTP). This document explicitly 197 does not prefer one variant over the other, but leaves the choice to 198 the implementer. 200 2.1 Byte-based Early Retransmit 202 A TCP or SCTP sender MAY use byte-based Early Retransmit. 204 Upon the arrival of an ACK, a sender employing byte-based Early 205 Retransmit MUST use the following two conditions to determine when 206 an Early Retransmit is sent: 208 (2.a) The amount of outstanding data (ownd)---data sent but not yet 209 acknowledged---is less than 4*SMSS bytes. 211 Note that in the byte-based variant of Early Retransmit 212 'ownd' is equivalent to 'FlightSize' defined in [RFC5681]. We 213 use different notation because 'ownd' is not consistent with 214 FlightSize through this document. 216 Also note that in SCTP messages will have to be converted to 217 bytes to make this variant of Early Retransmit work. 219 (2.b) There is either no unsent data ready for transmission at the 220 sender or the advertised receive window does not permit new 221 segments to be transmitted. 223 When the above two conditions hold and a TCP connection does not 224 support SACK the duplicate ACK threshold used to trigger a 225 retransmission MUST be reduced to: 227 ER_thresh = ceiling (ownd/SMSS) - 1 (1) 229 duplicate ACKs, where ownd is in terms of bytes. We call this 230 reduced ACK threshold enabling "Early Retransmission". 232 When conditions (2.a) and (2.b) hold and a TCP connection does 233 support SACK or SCTP is in use, Early Retransmit MUST be used only 234 when "ownd - SMSS" bytes have been SACKed. 236 When conditions (2.a) and (2.b) do not hold, the transport MUST NOT 237 use Early Retransmit, but rather prefer the standard mechanisms, 238 including Fast Retransmit and Limited Transmit. 240 As noted above, the drawback of this byte-based variant is precision 241 [HB08]. We illustrate this with two examples: 243 + Consider a non-SACK TCP sender that uses an SMSS of 1460 bytes 244 and transmits three segments each with 400 bytes of payload. 245 This is a case where Early Retransmit could aid loss recovery if 246 one segment is lost. However, in this case ER_thresh will 247 become zero, per equation (1), because the number of outstanding 248 bytes is a poor estimate of the number of outstanding segments. 249 A similar problem occurs for senders that employ SACK as the 250 expression "ownd - SMSS" will become negative. 252 + Next, consider a non-SACK TCP sender that uses an SMSS of 1460 253 bytes and transmits 10 segments each with 400 bytes of payload. 254 In this case ER_thresh will be two, per equation (1). Thus, 255 even though there are enough segments outstanding to trigger 256 Fast Retransmit with the standard duplicate ACK threshold Early 257 Retransmit will be triggered. This could cause or exacerbate 258 performance problems caused by segment reordering in the network. 260 2.2 Segment-based Early Retransmit 262 A TCP or SCTP sender MAY use segment-based Early Retransmit. 264 Upon the arrival of an ACK, a sender employing segment-based Early 265 Retransmit MUST use the following two conditions to determine when 266 an Early Retransmit is sent: 268 (3.a) The number of outstanding segments (oseg)---segments sent but 269 not yet acknowledged---is less than four. 271 (3.b) There is either no unsent data ready for transmission at the 272 sender or the advertised receive window does not permit new 273 segments to be transmitted. 275 When the above two conditions hold and a TCP connection does not 276 support SACK the duplicate ACK threshold used to trigger a 277 retransmission MUST be reduced to: 279 ER_thresh = oseg - 1 (2) 281 duplicate ACKs, where oseg represents the number of outstanding 282 segments. (We discuss tracking the number of outstanding segments 283 below.) We call this reduced ACK threshold enabling "Early 284 Retransmission". 286 When conditions (3.a) and (3.b) hold and a TCP connection does 287 support SACK or SCTP is in use, Early Retransmit MUST be used only 288 when "oseg - 1" segments have been SACKed. A segment is considered 289 to be SACKed when all its data bytes (TCP) or data chunks (SCTP) 290 have been indicated as arrived by the receiver. 292 When conditions (3.a) and (3.b) do not hold, the transport MUST NOT 293 use Early Retransmit, but rather prefer the standard mechanisms, 294 including Fast Retransmit and Limited Transmit. 296 This version of Early Retransmit solves the precision issues 297 discussed in the previous section. As noted previously, the cost is 298 that the implementation will have to track segment boundaries to 299 form an understanding as to how many actual segments have been 300 transmitted, but not acknowledged. This can be done by the sender 301 tracking the boundaries of the three segments on the right side of 302 the current window (which involves tracking four sequence numbers in 303 TCP). This could be done by keeping a circular list of the segment 304 boundaries, for instance. Cumulative ACKs that do not fall within 305 this region indicate that at least four segments are outstanding and 306 therefore Early Retransmit MUST NOT be used. When the outstanding 307 window becomes small enough that Early Retransmit can be invoked, a 308 full understanding of the number of outstanding segments will be 309 available from the four sequence numbers retained. (Note: the 310 implicit sequence number consumed by the TCP FIN can also included 311 in the tracking of segment boundaries.) 313 3 Discussion 315 In this section we discuss a number of issues surrounding the Early 316 Retransmit algorithm. 318 3.1 SACK vs. non-SACK 320 The SACK variant of the Early Retransmit algorithm is preferred to 321 the non-SACK variant in TCP due to its robustness in the face of ACK 322 loss (since SACKs are sent redundantly) and due to interactions with 323 the delayed ACK timer (SCTP does not have a non-SACK mode and 324 therefore naturally supports SACK-based Early Retransmit). Consider 325 a flight of three segments, S1...S3, with S2 being dropped by the 326 network. When S1 arrives it is in-order and so the receiver may or 327 may not delay the ACK, leading to two scenarios: 329 (A) The ACK for S1 is delayed: In this case the arrival of S3 will 330 trigger an ACK to be transmitted covering segment S1 (which was 331 previously unacknowledged). In this case Early Retransmit 332 without SACK will not prevent an RTO because no duplicate ACKs 333 will arrive. However, with SACK the ACK for S1 will also 334 include SACK information indicating that S3 has arrived at the 335 receiver. The sender can then invoke Early Retransmit on this 336 ACK because only one segment remains outstanding. 338 (B) The ACK for S1 is not delayed: In this case the arrival of S1 339 triggers an ACK of previously unacknowledged data. The arrival 340 of S3 triggers a duplicate ACK (because it is out-of-order). 341 Both ACKs will cover the same segment (S1). Therefore, 342 regardless of whether SACK is used Early Retransmit can be 343 performed by the sender (assuming no ACK loss). 345 3.2 Segment Reordering 347 Early Retransmit is less robust in the face of reordered segments 348 than when using the standard Fast Retransmit threshold. Research 349 shows that a general reduction in the number of duplicate ACKs 350 required to trigger Fast Retransmit to two (rather than three) leads 351 to a reduction in the ratio of good to bad retransmits by a factor 352 of three [Pax97]. However, this analysis did not include the 353 additional conditioning on the event that the ownd was smaller than 354 4 segments and that no new data was available for transmission. 356 A number of studies have shown that network reordering is not a rare 357 event across some network paths. Various measurement studies have 358 shown that reordering along most paths is negligible, but along 359 certain paths can be quite prevalent [Pax97,BPS99,BS02,Pir05]. 360 Evaluating Early Retransmit in the face of real segment reordering is 361 part of the experiment we hope to instigate with this document. 363 3.3 Worst Case 365 Next, we note two "worst case" scenarios for Early Retransmit: 367 (1) Persistent reordering of segments coupled with an application 368 that does not constantly send data can result in large numbers 369 of needless retransmissions when using Early Retransmit. For 370 instance, consider an application that sends data two segments 371 at a time, followed by an idle period when no data is queued for 372 delivery. If the network consistently reorders the two 373 segments, the sender will needlessly retransmit one out of every 374 two unique segments transmitted when using the above algorithm 375 (meaning that one-third of all segments sent are needless 376 retransmissions). However, this would only be a problem for 377 long-lived connections from applications that transmit in 378 spurts. 380 (2) Similar to the above, consider the case of 2 segment transfers 381 that always experience reordering. Just as in (1) above, one 382 out of every two unique data segments will be retransmitted 383 needlessly, therefore one-third of the traffic will be spurious. 385 Currently this document offers no suggestion on how to mitigate the 386 above problems. However, the worst cases are likely pathological 387 and part of the experiments that this document hopes to trigger 388 would involve better understanding of whether such theoretical worst 389 case scenarios are prevalent in the network and in general to 390 explore the tradeoff between spurious fast retransmits and the delay 391 imposed by the RTO. Appendix A does offer a survey of possible 392 mitigations that call for curtailing the use of Early Retransmit 393 when it is making poor retransmission decisions. 395 4 Related Work 397 There are a number of similar proposals in the literature that 398 attempt to mitigate the same problem Early Retransmit addresses. 400 Deployment of Explicit Congestion Notification (ECN) [Flo94,RFC3168] 401 may benefit connections with small congestion window sizes 402 [RFC2884]. ECN provides a method for indicating congestion to the 403 end-host without dropping segments. While some segment drops may 404 still occur, ECN may allow a transport to perform better with small 405 congestion window sizes because the sender will be required to 406 detect less segment loss [RFC2884]. 408 [Bal98] outlines another solution to the problem of having no new 409 segments to transmit into the network when the first two duplicate 410 ACKs arrive. In response to these duplicate ACKs, a TCP sender 411 transmits zero-byte segments to induce additional duplicate ACKs. 412 This method preserves the robustness of the standard Fast Retransmit 413 algorithm at the cost of injecting segments into the network that do 414 not deliver any data, and therefore are potentially wasting network 415 resources (at a time when there is a reasonable chance that the 416 resources are scarce). 418 [RFC4653] also defines an orthogonal method for altering the 419 duplicate ACK threshold. The mechanisms proposed in this document 420 decrease the duplicate ACK threshold when a small amount of data is 421 outstanding. Meanwhile, the mechanisms in [RFC4653] increase the 422 duplicate ACK threshold (over the standard of 3) when the congestion 423 window is large in an effort to increase robustness to segment 424 reordering. 426 5 Security Considerations 428 The security considerations found in [RFC5681] apply to this 429 document. No additional security problems have been identified with 430 Early Retransmit at this time. 432 6 IANA Considerations 434 None 436 Acknowledgments 438 We thank Sally Floyd for her feedback in discussions about Early 439 Retransmit. The notion of Early Transmit was originally sketched in 440 an Internet-Draft co-authored by Sally Floyd and Hari Balakrishnan. 441 Armando Caro, Joe Touch and Alexander Zimmermann and many members of 442 the TSVWG and TCPM working groups provided good discussions that 443 helped shape this document. Our thanks to all! 445 Normative References 447 [RFC793] Jon Postel. Transmission Control Protocol. Std 7, RFC 448 793. September 1981. 450 [RFC2018] Matt Mathis, Jamshid Mahdavi, Sally Floyd, Allyn Romanow. 451 TCP Selective Acknowledgement Options. RFC 2018, October 1996. 453 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 454 Requirement Levels", BCP 14, RFC 2119, March 1997. 456 [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis, Matt Podolsky. 457 An Extension to the Selective Acknowledgement (SACK) Option for 458 TCP. RFC 2883, July 2000. 460 [RFC2988] Vern Paxson, Mark Allman. Computing TCP's Retransmission 461 Timer. RFC 2988, April 2000. 463 [RFC3042] Mark Allman, Hari Balakrishnan, Sally Floyd. Enhancing 464 TCP's Loss Recovery Using Limited Transmit. RFC 3042, January 465 2001. 467 [RFC4960] R. Stewart. Stream Control Transmission Protocol. RFC 468 4960, September 2007. 470 [RFC5681] Mark Allman, Vern Paxson, Ethan Blanton. TCP Congestion 471 Control. RFC 5681, May 2009. 473 Informative References 475 [AA02] Urtzi Ayesta, Konstantin Avrachenkov, "The Effect of the 476 Initial Window Size and Limited Transmit Algorithm on the 477 Transient Behavior of TCP Transfers", In Proc. of the 15th ITC 478 Internet Specialist Seminar, Wurzburg, July 2002. 480 [All00] Mark Allman. A Web Server's View of the Transport Layer. 481 ACM Computer Communications Review, October 2000. 483 [Bal98] Hari Balakrishnan. Challenges to Reliable Data Transport 484 over Heterogeneous Wireless Networks. Ph.D. Thesis, University 485 of California at Berkeley, August 1998. 487 [BPS+98] Hari Balakrishnan, Venkata Padmanabhan, Srinivasan Seshan, 488 Mark Stemm, and Randy Katz. TCP Behavior of a Busy Web Server: 489 Analysis and Improvements. Proc. IEEE INFOCOM Conf., San 490 Francisco, CA, March 1998. 492 [BPS99] Jon Bennett, Craig Partridge, Nicholas Shectman. Packet 493 Reordering is Not Pathological Network Behavior. IEEE/ACM 494 Transactions on Networking, December 1999. 496 [BS02] John Bellardo, Stefan Savage. Measuring Packet Reordering, 497 ACM/USENIX Internet Measurement Workshop, November 2002. 499 [FF96] Kevin Fall, Sally Floyd. Simulation-based Comparisons of 500 Tahoe, Reno, and SACK TCP. ACM Computer Communication Review, 501 July 1996. 503 [Flo94] Sally Floyd. TCP and Explicit Congestion Notification. ACM 504 Computer Communication Review, October 1994. 506 [HB08] Per Hurtig, Anna Brunstrom. Enhancing SCTP Loss Recovery: An 507 Experimental Evaluation of Early Retransmit. Elsevier Computer 508 Communications, Vol. 31(16), October 2008, pp. 3778-3788. 510 [Jac88] Van Jacobson. Congestion Avoidance and Control. ACM 511 SIGCOMM 1988. 513 [LK98] Dong Lin, H.T. Kung. TCP Fast Recovery Strategies: Analysis 514 and Improvements. Proceedings of InfoCom, San Francisco, CA, 515 March 1998. 517 [Mor97] Robert Morris. TCP Behavior with Many Flows. Proceedings 518 of the Fifth IEEE International Conference on Network Protocols. 519 October 1997. 521 [Pax97] Vern Paxson. End-to-End Internet Packet Dynamics. ACM 522 SIGCOMM, September 1997. 524 [Pir05] N. M. Piratla, "A Theoretical Foundation, Metrics and 525 Modeling of Packet Reordering and Methodology of Delay Modeling 526 using Inter-packet Gaps," Ph.D. Dissertation, Department of 527 Electrical and Computer Engineering, Colorado State University, 528 Fort Collins, CO, Fall 2005. 530 [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed. Performance Evaluation 531 of Explicit Congestion Notification (ECN) in IP Networks. RFC 532 2884, July 2000. 534 [RFC3150] Spencer Dawkins, Gabriel Montenegro, Markku Kojo, Vincent 535 Magret. End-to-end Performance Implications of Slow Links. RFC 536 3150, July 2001. 538 [RFC3168] K. K. Ramakrishnan, Sally Floyd, David Black. The 539 Addition of Explicit Congestion Notification (ECN) to IP. RFC 540 3168, September 2001. 542 [RFC3517] Ethan Blanton, Mark Allman, Kevin Fall, Lili Wang. A 543 Conservative Selective Acknowledgment (SACK)-based Loss Recovery 544 Algorithm for TCP. RFC 3517, April 2003. 546 [RFC3522] Reiner Ludwig, Michael Meyer. The Eifel Detection 547 Algorithm for TCP. RFC 3522, April 2003. 549 [RFC3782] Sally Floyd, Tom Henderson, Andrei Gurtov. The NewReno 550 Modification to TCP's Fast Recovery Algorithm. RFC 3782, April 551 2004. 553 [RFC4653] Sumitha Bhandarkar, A. L. Narasimha Reddy, Mark Allman, 554 Ethan Blanton. Improving the Robustness of TCP to 555 Non-Congestion Events, August 2006. RFC 4653. 557 Author's Addresses: 559 Mark Allman 560 International Computer Science Institute 561 1947 Center Street, Suite 600 562 Berkeley, CA 94704-1198 563 Phone: 440-235-1792 564 mallman@icir.org 565 http://www.icir.org/mallman/ 567 Konstantin Avrachenkov 568 INRIA 569 2004 route des Lucioles, B.P.93 570 06902, Sophia Antipolis 571 France 572 Phone: 00 33 492 38 7751 573 k.avrachenkov@sophia.inria.fr 574 http://www.inria.fr/mistral/personnel/K.Avrachenkov/moi.html 576 Urtzi Ayesta 577 LAAS-CNRS 578 7 Avenue Colonel Roche 579 31077 Toulouse 580 France 581 urtzi@laas.fr 582 http://www.laas.fr/~urtzi 584 Josh Blanton 585 Ohio University 586 301 Stocker Center 587 Athens, OH 45701 588 jblanton@irg.cs.ohiou.edu 590 Per Hurtig 591 Karlstad University 592 Department of Computer Science 593 Universitetsgatan 2 651 88 594 Karlstad Sweden 595 per.hurtig@kau.se 597 Appendix A: Research Issues in Adjusting the Duplicate ACK Threshold 599 Decreasing the number of duplicate ACKs required to trigger Fast 600 Retransmit, as suggested in section 2, has the drawback of making 601 Fast Retransmit less robust in the face of minor network reordering. 602 Two egregious examples of problems caused by reordering are given in 603 section 3. This appendix outlines several schemes that have been 604 suggested to mitigate the problems caused by Early Retransmit in the 605 face of segment reordering. These methods need further research 606 before they are suggested for general use (and, current consensus is 607 that the cases that make Early Retransmit unnecessarily retransmit a 608 large amount of data are pathological and therefore these 609 mitigations are not generally required). 611 MITIGATION A.1: Allow a connection to use Early Retransmit as long 612 as the algorithm is not injecting "too much" spurious data into 613 the network. For instance, using the information provided by 614 TCP's DSACK option [RFC2883] or SCTP's Duplicate-TSN 615 notification, a sender can determine when segments sent via 616 Early Retransmit are needless. Likewise, using Eifel [RFC3522] 617 the sender can detect spurious Early Retransmits. Once spurious 618 Early Retransmits are detected the sender can either eliminate 619 the use of Early Retransmit or limit the use of the algorithm to 620 ensure that an acceptably small fraction of the connection's 621 transmissions are not spurious. For example, a connection could 622 stop using Early Retransmit after the first spurious retransmit 623 is detected. 625 MITIGATION A.2: If a sender cannot reliably determine if an Early 626 Retransmitted segment is spurious or not the sender could simply 627 limit Early Retransmits either to some fixed number per 628 connection (e.g., Early Retransmit is allowed only once per 629 connection) or to some small percentage of the total traffic 630 being transmitted. 632 MITIGATION A.3: Allow a connection to trigger Early Retransmit using 633 the criteria given in section 2, in addition to a "small" 634 timeout [Pax97]. For instance, a sender may have to wait for 2 635 duplicate ACKs and then T msec before Early Retransmit is 636 invoked. The added time gives reordered acknowledgments time to 637 arrive at the sender and avoid a needless retransmit. Designing 638 a method for choosing an appropriate timeout is part of the 639 research that would need to be involved in this scheme.