idnits 2.17.1 draft-ietf-tcpm-early-rexmt-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 5 instances of too long lines in the document, the longest one being 2 characters in excess of 72. ** The abstract seems to contain references ([RFC5681], [RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 2010) is 5213 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'AA02' is defined on line 482, but no explicit reference was found in the text == Unused Reference: 'LK98' is defined on line 520, but no explicit reference was found in the text == Unused Reference: 'Mor97' is defined on line 524, but no explicit reference was found in the text == Unused Reference: 'RFC3150' is defined on line 541, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 3517 (Obsoleted by RFC 6675) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 6 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Mark Allman 2 INTERNET DRAFT ICSI 3 File: draft-ietf-tcpm-early-rexmt-04.txt Konstantin Avrachenkov 4 Intended Status: Experimental INRIA 5 Urtzi Ayesta 6 BCAM-IKERBASQUE and LAAS-CNRS 7 Josh Blanton 8 Ohio University 9 Per Hurtig 10 Karlstad University 11 January 2010 12 Expires: July 2010 14 Early Retransmit for TCP and SCTP 16 Status of this Memo 18 This Internet-Draft is submitted to IETF in full conformance with 19 the provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six 27 months and may be updated, replaced, or obsoleted by other documents 28 at any time. It is inappropriate to use Internet-Drafts as 29 reference material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on July 27, 2010. 39 Copyright Notice 41 Copyright (c) 2009 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with 49 respect to this document. Code Components extracted from this 50 document must include Simplified BSD License text as described in 51 Section 4.e of the Trust Legal Provisions and are provided without 52 warranty as described in the BSD License. 54 Abstract 55 This document proposes a new mechanism for TCP and SCTP that can be 56 used to recover lost segments when a connection's congestion window 57 is small. The "Early Retransmit" mechanism allows the transport to 58 reduce, in certain special circumstances, the number of duplicate 59 acknowledgments required to trigger a fast retransmission. This 60 allows the transport to use fast retransmit to recover segment 61 losses that would otherwise require a lengthy retransmission 62 timeout. 64 Terminology 66 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 67 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 68 document are to be interpreted as described in RFC 2119 [RFC2119]. 70 The reader is expected to be familiar with the definitions given in 71 [RFC5681]. 73 1 Introduction 75 Many researchers have studied problems with TCP's loss recovery 76 [RFC793,RFC5681] when the congestion window is small and have 77 outlined possible mechanisms to mitigate these problems 78 [Mor97,BPS+98,Bal98,LK98,RFC3150,AA02]. SCTP's [RFC4960] loss 79 recovery and congestion control mechanisms are based on TCP and 80 therefore the same problems impact the performance of SCTP 81 connections. When the transport detects a missing segment, the 82 connection enters a loss recovery phase. There are several variants 83 of the loss recovery phase depending on the TCP implementation. TCP 84 can use slow start based recovery or Fast Recovery [RFC5681], 85 NewReno [RFC3782], and loss recovery based on selective 86 acknowledgments (SACKs) [RFC2018,FF96,RFC3517]. SCTP's loss 87 recovery is not as varied due to the built-in selective 88 acknowledgments. 90 All the above variants have two methods for invoking loss recovery. 91 First, if an acknowledgment (ACK) for a given segment is not 92 received in a certain amount of time a retransmission timer fires 93 and the segment is resent [RFC2988,RFC4960]. Second, the "Fast 94 Retransmit" algorithm resends a segment when three duplicate ACKs 95 arrive at the sender [Jac88,RFC5681]. Duplicate ACKs are triggered by 96 out-of-order arrivals at the receiver. However, because duplicate 97 ACKs from the receiver are triggered by both segment loss and 98 segment reordering in the network path, the sender waits for three 99 duplicate ACKs in an attempt to disambiguate segment loss from 100 segment reordering. When the congestion window is small it may not 101 be possible to generate the required number of duplicate ACKs to 102 trigger Fast Retransmit when a loss does happen. 104 Small congestion windows can occur in a number of situations, such 105 as: 107 (1) The connection is constrained by end-to-end congestion control 108 when the connection's share of the path is small, the path has a 109 small bandwidth-delay product or the transport is ascertaining 110 the available bandwidth in the first few round-trip times of 111 slow start. 113 (2) The connection is "application limited" and has only a limited 114 amount of data to send. This can happen any time the 115 application does not produce enough data to fill the congestion 116 window. A particular case when all connections become 117 application limited is as the connection ends. 119 (3) The connection is limited by the receiver's advertised window. 121 The transport's retransmission timeout (RTO) is based on measured 122 round-trip times (RTT) between the sender and receiver, as specified 123 in [RFC2988] (for TCP) and [RFC4960] (for SCTP). To prevent 124 spurious retransmissions of segments that are only delayed and not 125 lost, the minimum RTO is conservatively chosen to be 1 second. 126 Therefore, it behooves TCP senders to detect and recover from as 127 many losses as possible without incurring a lengthy timeout during 128 which the connection remains idle. However, if not enough duplicate 129 ACKs arrive from the receiver, the Fast Retransmit algorithm is 130 never triggered---this situation occurs when the congestion window 131 is small, if a large number of segments in a window are lost or at 132 the end of a transfer as data drains from the network. For 133 instance, consider a congestion window of three segments worth of 134 data. If one segment is dropped by the network, then at most two 135 duplicate ACKs will arrive at the sender. Since three duplicate 136 ACKs are required to trigger Fast Retransmit, a timeout will be 137 required to resend the dropped segment. Note, delayed ACKs 138 [RFC5681] may further reduce the number of duplicate ACKs a receiver 139 sends. However, we assume that receivers send immediate ACKs when 140 there is a gap in the received sequence space per [RFC5681]. 142 [BPS+98] shows that roughly 56% of retransmissions sent by a busy 143 web server are sent after the RTO timer expires, while only 44% are 144 handled by Fast Retransmit. In addition, only 4% of the RTO 145 timer-based retransmissions could have been avoided with SACK, which 146 has to continue to disambiguate reordering from genuine loss. 147 Furthermore, [All00] shows that for one particular web server the 148 median number of bytes carried by a connection is less than four 149 segments, indicating that more than half of the connections will be 150 forced to rely on the RTO timer to recover from any losses that 151 occur. Thus, loss recovery that does not rely on the conservative 152 RTO is likely to be beneficial for short TCP transfers. 154 The Limited Transmit mechanism introduced in [RFC3042] and currently 155 codified in [RFC5681] allows a TCP sender to transmit previously 156 unsent data upon the reception of each of the two duplicate ACKs 157 that precede a Fast Retransmit. SCTP [RFC4960] uses SACK 158 information to calculate the number of outstanding segments in the 159 network. Hence, when the first two duplicate ACKs arrive at the 160 sender they will indicate that data has left the network and allow 161 the sender to transmit new data (if available) similar to TCP's 162 Limited Transmit algorithm. In the remainder of this document we 163 use "Limited Transmit" to include both TCP and SCTP mechanisms for 164 sending in response to the first two duplicate ACKs. By sending 165 these two new segments the sender is attempting to induce additional 166 duplicate ACKs (if appropriate) so that Fast Retransmit will be 167 triggered before the retransmission timeout expires. The 168 sender-side "Early Retransmit" mechanism outlined in this document 169 covers the case when previously unsent data is not available for 170 transmission (case (2) above) or cannot be transmitted due to an 171 advertised window limitation (case (3) above). 173 Note: This document is being published as an experimental RFC as 174 part of the process for the TCPM WG and the IETF to assess whether 175 the proposed change is useful and safe in the heterogeneous 176 environments, including which variants of the mechanism are the most 177 effective. In the future, this specification may be updated and put 178 on the standards track if the safeness and efficacy can be 179 demonstrated. 181 2 Early Retransmit Algorithm 183 The Early Retransmit algorithm calls for lowering the threshold for 184 triggering Fast Retransmit when the amount of outstanding data is 185 small and when no previously unsent data can be transmitted (such 186 that Limited Transmit could be used). Duplicate ACKs are triggered 187 by each arriving out-of-order segment. Therefore, Fast Retransmit 188 will not be invoked when there are less than four outstanding 189 segments (assuming only one segment loss in the window). However, 190 TCP and SCTP are not required to track the number of outstanding 191 segments, but rather the number of outstanding bytes or messages. 192 (Note, SCTP's message boundaries do not necessarily correspond to 193 segment boundaries.) Therefore, applying the intuitive notion of a 194 transport with less than four segments outstanding is more 195 complicated than it first appears. In section 2.1 we describe a 196 "byte-based" variant of Early Retransmit that attempts to roughly 197 map the number of outstanding bytes to a number of outstanding 198 segments that is then used when deciding whether to trigger Early 199 Retransmit. In section 2.2 we describe a "segment-based" variant 200 that represents a more precise algorithm for triggering Early 201 Retransmit. The precision comes at the cost of requiring additional 202 state to be kept by the TCP sender. In both cases we describe 203 SACK-based and non-SACK-based versions of the scheme (of course, the 204 non-SACK version will not apply to SCTP). This document explicitly 205 does not prefer one variant over the other, but leaves the choice to 206 the implementer. 208 2.1 Byte-based Early Retransmit 210 A TCP or SCTP sender MAY use byte-based Early Retransmit. 212 Upon the arrival of an ACK, a sender employing byte-based Early 213 Retransmit MUST use the following two conditions to determine when 214 an Early Retransmit is sent: 216 (2.a) The amount of outstanding data (ownd)---data sent but not yet 217 acknowledged---is less than 4*SMSS bytes. 219 Note that in the byte-based variant of Early Retransmit 220 'ownd' is equivalent to 'FlightSize' defined in [RFC5681]. We 221 use different notation because 'ownd' is not consistent with 222 FlightSize through this document. 224 Also note that in SCTP messages will have to be converted to 225 bytes to make this variant of Early Retransmit work. 227 (2.b) There is either no unsent data ready for transmission at the 228 sender or the advertised receive window does not permit new 229 segments to be transmitted. 231 When the above two conditions hold and a TCP connection does not 232 support SACK the duplicate ACK threshold used to trigger a 233 retransmission MUST be reduced to: 235 ER_thresh = ceiling (ownd/SMSS) - 1 (1) 237 duplicate ACKs, where ownd is in terms of bytes. We call this 238 reduced ACK threshold enabling "Early Retransmission". 240 When conditions (2.a) and (2.b) hold and a TCP connection does 241 support SACK or SCTP is in use, Early Retransmit MUST be used only 242 when "ownd - SMSS" bytes have been SACKed. 244 If either (or both) condition (2.a) or (2.b) does not hold, the 245 transport MUST NOT use Early Retransmit, but rather prefer the 246 standard mechanisms, including Fast Retransmit and Limited Transmit. 248 As noted above, the drawback of this byte-based variant is precision 249 [HB08]. We illustrate this with two examples: 251 + Consider a non-SACK TCP sender that uses an SMSS of 1460 bytes 252 and transmits three segments each with 400 bytes of payload. 253 This is a case where Early Retransmit could aid loss recovery if 254 one segment is lost. However, in this case ER_thresh will 255 become zero, per equation (1), because the number of outstanding 256 bytes is a poor estimate of the number of outstanding segments. 257 A similar problem occurs for senders that employ SACK as the 258 expression "ownd - SMSS" will become negative. 260 + Next, consider a non-SACK TCP sender that uses an SMSS of 1460 261 bytes and transmits 10 segments each with 400 bytes of payload. 262 In this case ER_thresh will be two, per equation (1). Thus, 263 even though there are enough segments outstanding to trigger 264 Fast Retransmit with the standard duplicate ACK threshold Early 265 Retransmit will be triggered. This could cause or exacerbate 266 performance problems caused by segment reordering in the network. 268 2.2 Segment-based Early Retransmit 269 A TCP or SCTP sender MAY use segment-based Early Retransmit. 271 Upon the arrival of an ACK, a sender employing segment-based Early 272 Retransmit MUST use the following two conditions to determine when 273 an Early Retransmit is sent: 275 (3.a) The number of outstanding segments (oseg)---segments sent but 276 not yet acknowledged---is less than four. 278 (3.b) There is either no unsent data ready for transmission at the 279 sender or the advertised receive window does not permit new 280 segments to be transmitted. 282 When the above two conditions hold and a TCP connection does not 283 support SACK the duplicate ACK threshold used to trigger a 284 retransmission MUST be reduced to: 286 ER_thresh = oseg - 1 (2) 288 duplicate ACKs, where oseg represents the number of outstanding 289 segments. (We discuss tracking the number of outstanding segments 290 below.) We call this reduced ACK threshold enabling "Early 291 Retransmission". 293 When conditions (3.a) and (3.b) hold and a TCP connection does 294 support SACK or SCTP is in use, Early Retransmit MUST be used only 295 when "oseg - 1" segments have been SACKed. A segment is considered 296 to be SACKed when all its data bytes (TCP) or data chunks (SCTP) 297 have been indicated as arrived by the receiver. 299 If either (or both) conditions (3.a) or (3.b) does not hold, the 300 transport MUST NOT use Early Retransmit, but rather prefer the 301 standard mechanisms, including Fast Retransmit and Limited Transmit. 303 This version of Early Retransmit solves the precision issues 304 discussed in the previous section. As noted previously, the cost is 305 that the implementation will have to track segment boundaries to 306 form an understanding as to how many actual segments have been 307 transmitted, but not acknowledged. This can be done by the sender 308 tracking the boundaries of the three segments on the right side of 309 the current window (which involves tracking four sequence numbers in 310 TCP). This could be done by keeping a circular list of the segment 311 boundaries, for instance. Cumulative ACKs that do not fall within 312 this region indicate that at least four segments are outstanding and 313 therefore Early Retransmit MUST NOT be used. When the outstanding 314 window becomes small enough that Early Retransmit can be invoked, a 315 full understanding of the number of outstanding segments will be 316 available from the four sequence numbers retained. (Note: the 317 implicit sequence number consumed by the TCP FIN can also included 318 in the tracking of segment boundaries.) 320 3 Discussion 322 In this section we discuss a number of issues surrounding the Early 323 Retransmit algorithm. 325 3.1 SACK vs. non-SACK 327 The SACK variant of the Early Retransmit algorithm is preferred to 328 the non-SACK variant in TCP due to its robustness in the face of ACK 329 loss (since SACKs are sent redundantly) and due to interactions with 330 the delayed ACK timer (SCTP does not have a non-SACK mode and 331 therefore naturally supports SACK-based Early Retransmit). Consider 332 a flight of three segments, S1...S3, with S2 being dropped by the 333 network. When S1 arrives it is in-order and so the receiver may or 334 may not delay the ACK, leading to two scenarios: 336 (A) The ACK for S1 is delayed: In this case the arrival of S3 will 337 trigger an ACK to be transmitted covering segment S1 (which was 338 previously unacknowledged). In this case Early Retransmit 339 without SACK will not prevent an RTO because no duplicate ACKs 340 will arrive. However, with SACK the ACK for S1 will also 341 include SACK information indicating that S3 has arrived at the 342 receiver. The sender can then invoke Early Retransmit on this 343 ACK because only one segment remains outstanding. 345 (B) The ACK for S1 is not delayed: In this case the arrival of S1 346 triggers an ACK of previously unacknowledged data. The arrival 347 of S3 triggers a duplicate ACK (because it is out-of-order). 348 Both ACKs will cover the same segment (S1). Therefore, 349 regardless of whether SACK is used Early Retransmit can be 350 performed by the sender (assuming no ACK loss). 352 3.2 Segment Reordering 354 Early Retransmit is less robust in the face of reordered segments 355 than when using the standard Fast Retransmit threshold. Research 356 shows that a general reduction in the number of duplicate ACKs 357 required to trigger Fast Retransmit to two (rather than three) leads 358 to a reduction in the ratio of good to bad retransmits by a factor 359 of three [Pax97]. However, this analysis did not include the 360 additional conditioning on the event that the ownd was smaller than 361 4 segments and that no new data was available for transmission. 363 A number of studies have shown that network reordering is not a rare 364 event across some network paths. Various measurement studies have 365 shown that reordering along most paths is negligible, but along 366 certain paths can be quite prevalent [Pax97,BPS99,BS02,Pir05]. 367 Evaluating Early Retransmit in the face of real segment reordering is 368 part of the experiment we hope to instigate with this document. 370 3.3 Worst Case 372 Next, we note two "worst case" scenarios for Early Retransmit: 374 (1) Persistent reordering of segments coupled with an application 375 that does not constantly send data can result in large numbers 376 of needless retransmissions when using Early Retransmit. For 377 instance, consider an application that sends data two segments 378 at a time, followed by an idle period when no data is queued for 379 delivery. If the network consistently reorders the two 380 segments, the sender will needlessly retransmit one out of every 381 two unique segments transmitted when using the above algorithm 382 (meaning that one-third of all segments sent are needless 383 retransmissions). However, this would only be a problem for 384 long-lived connections from applications that transmit in 385 spurts. 387 (2) Similar to the above, consider the case of 2 segment transfers 388 that always experience reordering. Just as in (1) above, one 389 out of every two unique data segments will be retransmitted 390 needlessly, therefore one-third of the traffic will be spurious. 392 Currently this document offers no suggestion on how to mitigate the 393 above problems. However, the worst cases are likely pathological 394 and part of the experiments that this document hopes to trigger 395 would involve better understanding of whether such theoretical worst 396 case scenarios are prevalent in the network and in general to 397 explore the tradeoff between spurious fast retransmits and the delay 398 imposed by the RTO. Appendix A does offer a survey of possible 399 mitigations that call for curtailing the use of Early Retransmit 400 when it is making poor retransmission decisions. 402 4 Related Work 404 There are a number of similar proposals in the literature that 405 attempt to mitigate the same problem Early Retransmit addresses. 407 Deployment of Explicit Congestion Notification (ECN) [Flo94,RFC3168] 408 may benefit connections with small congestion window sizes 409 [RFC2884]. ECN provides a method for indicating congestion to the 410 end-host without dropping segments. While some segment drops may 411 still occur, ECN may allow a transport to perform better with small 412 congestion window sizes because the sender will be required to 413 detect less segment loss [RFC2884]. 415 [Bal98] outlines another solution to the problem of having no new 416 segments to transmit into the network when the first two duplicate 417 ACKs arrive. In response to these duplicate ACKs, a TCP sender 418 transmits zero-byte segments to induce additional duplicate ACKs. 419 This method preserves the robustness of the standard Fast Retransmit 420 algorithm at the cost of injecting segments into the network that do 421 not deliver any data, and therefore are potentially wasting network 422 resources (at a time when there is a reasonable chance that the 423 resources are scarce). 425 [RFC4653] also defines an orthogonal method for altering the 426 duplicate ACK threshold. The mechanisms proposed in this document 427 decrease the duplicate ACK threshold when a small amount of data is 428 outstanding. Meanwhile, the mechanisms in [RFC4653] increase the 429 duplicate ACK threshold (over the standard of 3) when the congestion 430 window is large in an effort to increase robustness to segment 431 reordering. 433 5 Security Considerations 435 The security considerations found in [RFC5681] apply to this 436 document. No additional security problems have been identified with 437 Early Retransmit at this time. 439 6 IANA Considerations 441 None 443 Acknowledgments 445 We thank Sally Floyd for her feedback in discussions about Early 446 Retransmit. The notion of Early Transmit was originally sketched in 447 an Internet-Draft co-authored by Sally Floyd and Hari Balakrishnan. 448 Armando Caro, Joe Touch and Alexander Zimmermann and many members of 449 the TSVWG and TCPM working groups provided good discussions that 450 helped shape this document. Our thanks to all! 452 Normative References 454 [RFC793] Jon Postel. Transmission Control Protocol. Std 7, RFC 455 793. September 1981. 457 [RFC2018] Matt Mathis, Jamshid Mahdavi, Sally Floyd, Allyn Romanow. 458 TCP Selective Acknowledgement Options. RFC 2018, October 1996. 460 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 461 Requirement Levels", BCP 14, RFC 2119, March 1997. 463 [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis, Matt Podolsky. 464 An Extension to the Selective Acknowledgement (SACK) Option for 465 TCP. RFC 2883, July 2000. 467 [RFC2988] Vern Paxson, Mark Allman. Computing TCP's Retransmission 468 Timer. RFC 2988, April 2000. 470 [RFC3042] Mark Allman, Hari Balakrishnan, Sally Floyd. Enhancing 471 TCP's Loss Recovery Using Limited Transmit. RFC 3042, January 472 2001. 474 [RFC4960] R. Stewart. Stream Control Transmission Protocol. RFC 475 4960, September 2007. 477 [RFC5681] Mark Allman, Vern Paxson, Ethan Blanton. TCP Congestion 478 Control. RFC 5681, May 2009. 480 Informative References 482 [AA02] Urtzi Ayesta, Konstantin Avrachenkov, "The Effect of the 483 Initial Window Size and Limited Transmit Algorithm on the 484 Transient Behavior of TCP Transfers", In Proc. of the 15th ITC 485 Internet Specialist Seminar, Wurzburg, July 2002. 487 [All00] Mark Allman. A Web Server's View of the Transport Layer. 488 ACM Computer Communications Review, October 2000. 490 [Bal98] Hari Balakrishnan. Challenges to Reliable Data Transport 491 over Heterogeneous Wireless Networks. Ph.D. Thesis, University 492 of California at Berkeley, August 1998. 494 [BPS+98] Hari Balakrishnan, Venkata Padmanabhan, Srinivasan Seshan, 495 Mark Stemm, and Randy Katz. TCP Behavior of a Busy Web Server: 496 Analysis and Improvements. Proc. IEEE INFOCOM Conf., San 497 Francisco, CA, March 1998. 499 [BPS99] Jon Bennett, Craig Partridge, Nicholas Shectman. Packet 500 Reordering is Not Pathological Network Behavior. IEEE/ACM 501 Transactions on Networking, December 1999. 503 [BS02] John Bellardo, Stefan Savage. Measuring Packet Reordering, 504 ACM/USENIX Internet Measurement Workshop, November 2002. 506 [FF96] Kevin Fall, Sally Floyd. Simulation-based Comparisons of 507 Tahoe, Reno, and SACK TCP. ACM Computer Communication Review, 508 July 1996. 510 [Flo94] Sally Floyd. TCP and Explicit Congestion Notification. ACM 511 Computer Communication Review, October 1994. 513 [HB08] Per Hurtig, Anna Brunstrom. Enhancing SCTP Loss Recovery: An 514 Experimental Evaluation of Early Retransmit. Elsevier Computer 515 Communications, Vol. 31(16), October 2008, pp. 3778-3788. 517 [Jac88] Van Jacobson. Congestion Avoidance and Control. ACM 518 SIGCOMM 1988. 520 [LK98] Dong Lin, H.T. Kung. TCP Fast Recovery Strategies: Analysis 521 and Improvements. Proceedings of InfoCom, San Francisco, CA, 522 March 1998. 524 [Mor97] Robert Morris. TCP Behavior with Many Flows. Proceedings 525 of the Fifth IEEE International Conference on Network Protocols. 526 October 1997. 528 [Pax97] Vern Paxson. End-to-End Internet Packet Dynamics. ACM 529 SIGCOMM, September 1997. 531 [Pir05] N. M. Piratla, "A Theoretical Foundation, Metrics and 532 Modeling of Packet Reordering and Methodology of Delay Modeling 533 using Inter-packet Gaps," Ph.D. Dissertation, Department of 534 Electrical and Computer Engineering, Colorado State University, 535 Fort Collins, CO, Fall 2005. 537 [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed. Performance Evaluation 538 of Explicit Congestion Notification (ECN) in IP Networks. RFC 539 2884, July 2000. 541 [RFC3150] Spencer Dawkins, Gabriel Montenegro, Markku Kojo, Vincent 542 Magret. End-to-end Performance Implications of Slow Links. RFC 543 3150, July 2001. 545 [RFC3168] K. K. Ramakrishnan, Sally Floyd, David Black. The 546 Addition of Explicit Congestion Notification (ECN) to IP. RFC 547 3168, September 2001. 549 [RFC3517] Ethan Blanton, Mark Allman, Kevin Fall, Lili Wang. A 550 Conservative Selective Acknowledgment (SACK)-based Loss Recovery 551 Algorithm for TCP. RFC 3517, April 2003. 553 [RFC3522] Reiner Ludwig, Michael Meyer. The Eifel Detection 554 Algorithm for TCP. RFC 3522, April 2003. 556 [RFC3782] Sally Floyd, Tom Henderson, Andrei Gurtov. The NewReno 557 Modification to TCP's Fast Recovery Algorithm. RFC 3782, April 558 2004. 560 [RFC4653] Sumitha Bhandarkar, A. L. Narasimha Reddy, Mark Allman, 561 Ethan Blanton. Improving the Robustness of TCP to 562 Non-Congestion Events, August 2006. RFC 4653. 564 Author's Addresses: 566 Mark Allman 567 International Computer Science Institute 568 1947 Center Street, Suite 600 569 Berkeley, CA 94704-1198 570 Phone: 440-235-1792 571 mallman@icir.org 572 http://www.icir.org/mallman/ 574 Konstantin Avrachenkov 575 INRIA 576 2004 route des Lucioles, B.P.93 577 06902, Sophia Antipolis 578 France 579 Phone: 00 33 492 38 7751 580 k.avrachenkov@sophia.inria.fr 581 http://www.inria.fr/mistral/personnel/K.Avrachenkov/moi.html 583 Urtzi Ayesta 584 LAAS-CNRS 585 7 Avenue Colonel Roche 586 31077 Toulouse 587 France 588 urtzi@laas.fr 589 http://www.laas.fr/~urtzi 591 Josh Blanton 592 Ohio University 593 301 Stocker Center 594 Athens, OH 45701 595 jblanton@irg.cs.ohiou.edu 597 Per Hurtig 598 Karlstad University 599 Department of Computer Science 600 Universitetsgatan 2 651 88 601 Karlstad Sweden 602 per.hurtig@kau.se 604 Appendix A: Research Issues in Adjusting the Duplicate ACK Threshold 606 Decreasing the number of duplicate ACKs required to trigger Fast 607 Retransmit, as suggested in section 2, has the drawback of making 608 Fast Retransmit less robust in the face of minor network reordering. 609 Two egregious examples of problems caused by reordering are given in 610 section 3. This appendix outlines several schemes that have been 611 suggested to mitigate the problems caused by Early Retransmit in the 612 face of segment reordering. These methods need further research 613 before they are suggested for general use (and, current consensus is 614 that the cases that make Early Retransmit unnecessarily retransmit a 615 large amount of data are pathological and therefore these 616 mitigations are not generally required). 618 MITIGATION A.1: Allow a connection to use Early Retransmit as long 619 as the algorithm is not injecting "too much" spurious data into 620 the network. For instance, using the information provided by 621 TCP's DSACK option [RFC2883] or SCTP's Duplicate-TSN 622 notification, a sender can determine when segments sent via 623 Early Retransmit are needless. Likewise, using Eifel [RFC3522] 624 the sender can detect spurious Early Retransmits. Once spurious 625 Early Retransmits are detected the sender can either eliminate 626 the use of Early Retransmit or limit the use of the algorithm to 627 ensure that an acceptably small fraction of the connection's 628 transmissions are not spurious. For example, a connection could 629 stop using Early Retransmit after the first spurious retransmit 630 is detected. 632 MITIGATION A.2: If a sender cannot reliably determine if an Early 633 Retransmitted segment is spurious or not the sender could simply 634 limit Early Retransmits either to some fixed number per 635 connection (e.g., Early Retransmit is allowed only once per 636 connection) or to some small percentage of the total traffic 637 being transmitted. 639 MITIGATION A.3: Allow a connection to trigger Early Retransmit using 640 the criteria given in section 2, in addition to a "small" 641 timeout [Pax97]. For instance, a sender may have to wait for 2 642 duplicate ACKs and then T msec before Early Retransmit is 643 invoked. The added time gives reordered acknowledgments time to 644 arrive at the sender and avoid a needless retransmit. Designing 645 a method for choosing an appropriate timeout is part of the 646 research that would need to be involved in this scheme.