idnits 2.17.1 draft-allman-tcp-early-rexmt-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 21. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 600. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 576. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 583. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 589. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 4 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2008) is 5766 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 57, but not defined == Missing Reference: 'BPS99' is mentioned on line 314, but not defined == Unused Reference: 'AA02' is defined on line 413, but no explicit reference was found in the text == Unused Reference: 'LK98' is defined on line 448, but no explicit reference was found in the text == Unused Reference: 'Mor97' is defined on line 451, but no explicit reference was found in the text == Unused Reference: 'RFC3150' is defined on line 471, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) ** Downref: Normative reference to an Experimental RFC: RFC 3522 ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 2582 (Obsoleted by RFC 3782) -- Obsolete informational reference (is this intentional?): RFC 3517 (Obsoleted by RFC 6675) Summary: 9 errors (**), 0 flaws (~~), 8 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Mark Allman 2 INTERNET DRAFT ICSI 3 File: draft-allman-tcp-early-rexmt-07.txt Konstantin Avrachenkov 4 INRIA 5 Urtzi Ayesta 6 LAAS-CNRS 7 Josh Blanton 8 Ohio University 9 Per Hurtig 10 Karlstad University 11 June 2008 12 Expires: December 2008 14 Early Retransmit for TCP and SCTP 16 Status of this Memo 18 By submitting this Internet-Draft, each author represents that any 19 applicable patent or other IPR claims of which he or she is aware 20 have been or will be disclosed, and any of which he or she becomes 21 aware will be disclosed, in accordance with Section 6 of BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as 26 Internet-Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six 29 months and may be updated, replaced, or obsoleted by other documents 30 at any time. It is inappropriate to use Internet-Drafts as 31 reference material or to cite them other than as "work in progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt. 36 The list of Internet-Draft Shadow Directories can be accessed at 37 http://www.ietf.org/shadow.html. 39 Copyright Notice 41 Copyright (C) The IETF Trust (2008). 43 Abstract 45 This document proposes a new mechanism for TCP and SCTP that can be 46 used to recover lost segments when a connection's congestion window 47 is small. The "Early Retransmit" mechanism allows the transport to 48 reduce, in certain special circumstances, the number of duplicate 49 acknowledgments required to trigger a fast retransmission. This 50 allows the transport to use fast retransmit to recover packet losses 51 that would otherwise require a lengthy retransmission timeout. 53 Terminology 54 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 55 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 56 document are to be interpreted as described in RFC 2119 [RFC2119]. 58 1 Introduction 60 Many researchers have studied problems with TCP [RFC793,RFC2581] 61 when the congestion window is small and have outlined possible 62 mechanisms to mitigate these problems 63 [Mor97,BPS+98,Bal98,LK98,RFC3150,AA02]. SCTP's [RFC4960] loss 64 recovery and congestion control mechanisms are based on TCP and 65 therefore the same problems impact the performance of SCTP 66 connections. When the transport detects a missing segment, the 67 connection enters a loss recovery phase. There are several variants 68 of the loss recovery phase depending on a TCP's version. TCP can 69 use slow start based recovery or Fast Recovery [RFC2581], NewReno 70 [RFC2582], and loss recovery based on selective acknowledgments 71 (SACKs) [RFC2018,FF96,RFC3517]. SCTP's loss recovery is not as 72 varied due to the built-in selective acknowledgments. 74 All the above variants have two methods for invoking loss recovery. 75 First, if an acknowledgment (ACK) for a given segment is not 76 received in a certain amount of time a retransmission timer fires 77 and the segment is resent [RFC2988,RFC4960]. Second, the ``Fast 78 Retransmit'' algorithm resends a segment when three duplicate ACKs 79 arrive at the sender [Jac88,RFC2581]. Duplicate ACKs are triggered 80 by out-of-order arrivals at the receiver. However, because 81 duplicate ACKs from the receiver are triggered by both packet loss 82 and packet reordering in the network path, the sender waits for 83 three duplicate ACKs in an attempt to disambiguate packet loss from 84 packet reordering. When using small congestion windows it may not 85 be possible to generate the required number of duplicate ACKs to 86 trigger Fast Retransmit when a loss does happen. 88 Small windows can occur in a number of situations, such as: 90 (1) The connection is constrained by end-to-end congestion control 91 when the connection's share of the path is small, the path has a 92 small bandwidth-delay product or the transport is ascertaining 93 the available bandwidth in the first few round-trip times of 94 slow start. 96 (2) The connection is "application limited" and has only a limited 97 amount of data to send. This can happen any time the 98 application does not produce enough data to fill the congestion 99 window. A particular case when all connections become 100 application limited is as the connection ends. 102 (3) The connection is limited by the receiver's advertised window. 104 The transport's retransmission timeout (RTO) is based on measured 105 round-trip times (RTT) between the sender and receiver, as specified 106 in [RFC2988] (for TCP) and [RFC4960] (for SCTP). To prevent 107 spurious retransmissions of segments that are only delayed and not 108 lost, the minimum RTO is conservatively chosen to be 1 second. 109 Therefore, it behooves TCP senders to detect and recover from as 110 many losses as possible without incurring a lengthy timeout during 111 which the connection remains idle. However, if not enough duplicate 112 ACKs arrive from the receiver, the Fast Retransmit algorithm is 113 never triggered---this situation occurs when the congestion window 114 is small, if a large number of segments in a window are lost or at 115 the end of a transfer as data drains from the network. For 116 instance, consider a congestion window (cwnd) of three segments. If 117 one segment is dropped by the network, then at most two duplicate 118 ACKs will arrive at the sender, assuming no ACK loss. Since three 119 duplicate ACKs are required to trigger Fast Retransmit, a timeout 120 will be required to resend the dropped packet. 122 [BPS+98] shows that roughly 56% of retransmissions sent by a busy 123 web server are sent after the RTO timer expires, while only 44% are 124 handled by Fast Retransmit. In addition, only 4% of the RTO 125 timer-based retransmissions could have been avoided with SACK, which 126 has to continue to disambiguate reordering from genuine loss. 127 Furthermore, [All00] shows that for one particular web server the 128 median transfer size is less than four segments, indicating that 129 more than half of the connections will be forced to rely on the RTO 130 timer to recover from any losses that occur. Thus, loss recovery 131 that does not rely on the conservative RTO is likely to be 132 beneficial for short TCP transfers. 134 The Limited Transmit mechanism introduced in [RFC3042] allows a TCP 135 sender to transmit previously unsent data upon the reception of each 136 of the two duplicate ACKs that precede a Fast Retransmit. SCTP 137 [RFC4960] uses SACK information to calculate the number of 138 outstanding segments in the network. Hence, when the first two 139 duplicate ACKs arrive at the sender they will indicate that data has 140 left the network and allow the sender to transmit new data (if 141 available) similar to TCP's Limited Transmit algorithm. In the 142 remainder of this document we use "Limited Transmit" to include both 143 TCP and SCTP mechanisms for sending in response to the first two 144 duplicate ACKs. By sending these two new segments the TCP sender is 145 attempting to induce additional duplicate ACKs (if appropriate) so 146 that Fast Retransmit will be triggered before the retransmission 147 timeout expires. The "Early Retransmit" mechanism outlined in this 148 document covers the case when previously unsent data is not 149 available for transmission or cannot be transmitted due to an 150 advertised window limitation. 152 2 Early Retransmit Algorithm 154 The Early Retransmit algorithm calls for lowering the threshold for 155 triggering Fast Retransmit when the amount of outstanding data is 156 small and when no previously unsent data can be transmitted (such 157 that Limited Transmit could be used). Duplicate ACKs are triggered 158 by each arriving out-of-order segment. Therefore, Fast Retransmit 159 will not be invoked when there are less than four outstanding 160 segments (assuming only one segment loss in the window). However, 161 TCP and SCTP are not required to track the number of outstanding 162 segments, but rather the number of outstanding bytes or messages. 163 Therefore, applying the intuitive notion of a transport with less 164 than four segments outstanding is more complicated than it first 165 appears. In section 2.1 we describe a "byte-based" variant of Early 166 Retransmit that attempts to roughly map the number of outstanding 167 bytes to a number of outstanding packets that is then used when 168 deciding whether to trigger Early Retransmit. In section 2.2 we 169 describe a "packet-based" variant that represents a more precise 170 algorithm for triggering Early Retransmit. The precision comes at 171 the cost of requiring additional state to be kept by the TCP sender. 172 In both cases we described SACK-based and non-SACK-based versions of 173 the scheme (of course, the non-SACK version will not apply to SCTP). 175 2.1 Byte-based Early Retransmit 177 A TCP or SCTP sender MAY use byte-based Early Retransmit. 179 A sender employing byte-based Early Retransmit MUST use the 180 following two conditions to determine when an Early Retransmit is 181 sent: 183 (2.a) The amount of outstanding data (ownd)---data sent but not yet 184 acknowledged---is less than 4*SMSS bytes. 186 (2.b) There is either no unsent data ready for transmission at the 187 sender or the advertised window does not permit new segments 188 to be transmitted. 190 When the above two conditions hold and the connection does not 191 support SACK the duplicate ACK threshold used to trigger a 192 retransmission MUST be reduced to: 194 ER_thresh = ceiling (ownd/SMSS) - 1 (1) 196 duplicate ACKs, where ownd is in terms of bytes. 198 When conditions (2.a) and (2.b) hold and the connection does support 199 SACK, Early Retransmit MUST be used only when "ownd - SMSS" bytes 200 have been SACKed. 202 When conditions (2.a) and (2.b) do not hold, the transport MUST NOT 203 use Early Retransmit, but rather prefer the standard mechanisms, 204 including Limited Transmit. 206 As noted above, the drawback of this byte-based variant is 207 precision [HB07]. We illustrate this with two examples: 209 + Consider a non-SACK TCP sender that uses an SMSS of 1460 bytes 210 and transmits three segments each with 400 bytes of payload. 211 This is clearly a case where Early Retransmit could aid loss 212 recovery if one segment is lost. However, in this case 213 ER_thresh will become zero, per equation (1), because the number 214 of outstanding bytes is a poor estimate of the number of 215 outstanding packets. A similar problem occurs for senders that 216 employ SACK as the expression "ownd - SMSS" will become 217 negative. 219 + Next, consider a non-SACK TCP sender that uses an SMSS of 1460 220 bytes and transmits 10 segments each with 400 bytes of payload. 221 In this case ER_thresh will be two, per equation (1). Thus, 222 even though there are enough segments outstanding to trigger 223 Fast Retransmit with the standard duplicate ACK threshold Early 224 Retransmit will be triggered. This could cause or exacerbate 225 performance problems caused by packet reordering in the network. 227 2.2 Packet-based Early Retransmit 229 A TCP or SCTP sender MAY use packet-based Early Retransmit. 231 A sender employing packet-based Early Retransmit MUST use the 232 following two conditions to determine when an Early Retransmit is 233 sent: 235 (3.a) The number of outstanding segments (oseg)---segments sent but 236 not yet acknowledged---is less than four. 238 (3.b) There is either no unsent data ready for transmission at the 239 sender or the advertised window does not permit new segments 240 to be transmitted. 242 When the above two conditions hold and the connection does not 243 support SACK the duplicate ACK threshold used to trigger a 244 retransmission MUST be reduced to: 246 ER_thresh = oseg - 1 (2) 248 duplicate ACKs, where oseg represents the number of outstanding 249 segments. (We discuss tracking the number of outstanding segments 250 below.) 252 When conditions (3.a) and (3.b) hold and the connection does support 253 SACK, Early Retransmit MUST be used only when "oseg - 1" segments 254 have been SACKed. 256 When conditions (3.a) and (3.b) do not hold, the transport MUST NOT 257 use Early Retransmit, but rather prefer the standard mechanisms, 258 including Limited Transmit. 260 This version of Early Retransmit solves the precision issues 261 discussed in the previous section. As noted previously, the cost is 262 that the implementation will have to track packet boundaries to form 263 an understanding as to how many actual segments have been 264 transmitted, but not acknowledged. This can be done by tracking the 265 boundaries of the three segments on the right side of the current 266 window (which involves tracking four sequence numbers in TCP). This 267 could be done by keeping a circular list of the packet boundaries, 268 for instance. Cumulative ACKs that do not fall within this region 269 indicate that at least four segments are outstanding and therefore 270 Early Retransmit MUST NOT be used. When the outstanding window 271 becomes small enough that Early Retransmit can be invoked, a full 272 understanding of the number of outstanding packets will be 273 available. 275 3 Discussion 277 The SACK variant of the Early Retransmit algorithm is preferred to 278 the non-SACK variant due to its robustness in the face of ACK loss 279 (since SACKs are sent redundantly) and due to interactions with the 280 delayed ACK timer. Consider a flight of three segments, S1...S3, 281 with S2 being dropped by the network. When S1 arrives it is 282 in-order and so the receiver may or may not delay the ACK, leading 283 to two scenarios: 285 (A) The ACK for S1 is delayed: In this case the arrival of S3 will 286 trigger an ACK to be transmitted covering segment S1 (which was 287 previously unacknowledged). In this case Early Retransmit 288 without SACK will not prevent an RTO because no duplicate ACKs 289 will arrive. However, with SACK the ACK for S1 will also 290 include SACK information indicating that S3 has arrived at the 291 receiver. The sender can then invoke Fast Retransmit on this 292 ACK because ownd - SMSS bytes have been SACKed when the ACK 293 arrives. 295 (B) The ACK for S1 is not delayed: In this case the arrival of S1 296 triggers an ACK of previously unacknowledged data. The arrival 297 of S3 triggers a duplicate ACK (because it is out-of-order). 298 Both ACKs will cover the same segment (S1). Therefore, 299 regardless of whether SACK is used Early Retransmit can be 300 performed by the sender (assuming no ACK loss). 302 Early Retransmit is less robust in the face of reordered segments 303 than when using the standard Fast Retransmit threshold. Research 304 shows that a general reduction in the number of duplicate ACKs 305 required to trigger Fast Retransmit to two (rather than three) leads 306 to a reduction in the ratio of good to bad retransmits by a factor 307 of three [Pax97]. However, this analysis did not include the 308 additional conditioning on the event that the ownd was smaller than 309 4 segments and that no new data was available for transmission. 311 A number of studies have shown that network reordering is not a rare 312 event across some network paths. Various measurement studies have 313 shown that reordering along most paths is negligible, but along 314 certain paths can be quite prevalent [Pax97,BPS99,BS02,Pir05]. 315 Evaluating Early Retransmit in the face of real packet reordering is 316 part of the experiment we hope to instigate with this document. 318 Next, we note two "worst case" scenarios for Early Retransmit: 320 (1) Persistent reordering of segments coupled with an application 321 that does not constantly send data can result in large numbers 322 of needless retransmissions when using Early Retransmit. For 323 instance, consider an application that sends data two segments 324 at a time, followed by an idle period when no data is queued for 325 delivery. If the network consistently reorders the two 326 segments, the sender will needlessly retransmit one out of every 327 two unique segments transmitted when using the above algorithm 328 (meaning that one-third of all segments sent are needless 329 retransmissions). However, this would only be a problem for 330 long-lived connections from applications that transmit in 331 spurts. 333 (2) Similar to the above, consider the case of 2 segment transfers 334 that always experience reordering. Just as in (1) above, one 335 out of every two unique data segments will be retransmitted 336 needlessly, therefore one-third of the traffic will be spurious. 338 Currently this document offers no suggestion on how to mitigate the 339 above problems. However, the worst cases are likely pathological 340 and part of the experiments that this document hopes to trigger 341 would involve better understanding of whether such theoretical worst 342 case scenarios are prevalent in the network and in general to 343 explore the tradeoff between spurious fast retransmits and the delay 344 imposed by the RTO. Appendix A does offer a survey of possible 345 mitigations call for curtailing the use of Early Retransmit when it 346 is making poor retransmission decisions. 348 4 Related Work 350 Deployment of Explicit Congestion Notification (ECN) [Flo94,RFC3168] 351 may benefit connections with small congestion window sizes 352 [RFC2884]. ECN provides a method for indicating congestion to the 353 end-host without dropping segments. While some segment drops may 354 still occur, ECN may allow a transport to perform better with small 355 cwnd sizes because the sender will be required to detect less 356 segment loss [RFC2884]. 358 [Bal98] outlines another solution to the problem of having no new 359 segments to transmit into the network when the first two duplicate 360 ACKs arrive. In response to these duplicate ACKs, a TCP sender 361 transmits zero-byte segments to induce additional duplicate ACKs. 362 This method preserves the robustness of the standard Fast Retransmit 363 algorithm at the cost of injecting segments into the network that do 364 not deliver any data (and, therefore are potentially wasting network 365 resources). 367 5 Security Considerations 369 The security considerations found in [RFC2581] apply to this 370 document. No additional security problems have been identified with 371 Early Retransmit at this time. 373 Acknowledgments 375 We thank Sally Floyd for her feedback in discussions about Early 376 Retransmit. The notion of Early Transmit was originally sketched in 377 an Internet-Draft co-authored by Sally Floyd and Hari Balakrishnan. 379 Armando Caro and many members of the TSVWG and TCPM working groups 380 provided good discussions that helped shape this document. Our 381 thanks to all! 383 Normative References 385 [RFC793] Jon Postel. Transmission Control Protocol. Std 7, RFC 386 793. September 1981. 388 [RFC2018] Matt Mathis, Jamshid Mahdavi, Sally Floyd, Allyn Romanow. 389 TCP Selective Acknowledgement Options. RFC 2018, October 1996. 391 [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens. TCP 392 Congestion Control. RFC 2581, April 1999. 394 [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis, Matt Podolsky. 395 An Extension to the Selective Acknowledgement (SACK) Option for 396 TCP. RFC 2883, July 2000. 398 [RFC2988] Vern Paxson, Mark Allman. Computing TCP's Retransmission 399 Timer. RFC 2988, April 2000. 401 [RFC3042] Mark Allman, Hari Balakrishnan, Sally Floyd. Enhancing 402 TCP's Loss Recovery Using Limited Transmit. RFC 3042, January 403 2001. 405 [RFC3522] Reiner Ludwig, Michael Meyer. The Eifel Detection 406 Algorithm for TCP. RFC 3522, April 2003. 408 [RFC4960] R. Stewart. Stream Control Transmission Protocol. 409 September 2007. 411 Informative References 413 [AA02] Urtzi Ayesta, Konstantin Avrachenkov, "The Effect of the 414 Initial Window Size and Limited Transmit Algorithm on the 415 Transient Behavior of TCP Transfers", In Proc. of the 15th ITC 416 Internet Specialist Seminar, Wurzburg, July 2002. 418 [All00] Mark Allman. A Server-Side View of WWW Characteristics. 419 ACM Computer Communications Review, October 2000. 421 [Bal98] Hari Balakrishnan. Challenges to Reliable Data Transport 422 over Heterogeneous Wireless Networks. Ph.D. Thesis, University 423 of California at Berkeley, August 1998. 425 [BPS+98] Hari Balakrishnan, Venkata Padmanabhan, Srinivasan Seshan, 426 Mark Stemm, and Randy Katz. TCP Behavior of a Busy Web Server: 427 Analysis and Improvements. Proc. IEEE INFOCOM Conf., San 428 Francisco, CA, March 1998. 430 [BS02] John Bellardo, Stefan Savage. Measuring Packet Reordering, 431 ACM/USENIX Internet Measurement Workshop, November 2002. 433 [FF96] Kevin Fall, Sally Floyd. Simulation-based Comparisons of 434 Tahoe, Reno, and SACK TCP. ACM Computer Communication Review, 435 July 1996. 437 [Flo94] Sally Floyd. TCP and Explicit Congestion Notification. ACM 438 Computer Communication Review, October 1994. 440 [HB07] Per Hurtig, Anna Brunstrom. Packet Loss Recovery of Signaling 441 Traffic in SCTP. In Proc. of the International Symposium of Computer 442 and Telecommunications Systems (SPECTS '07), San Diego, California, 443 July, 2007. 445 [Jac88] Van Jacobson. Congestion Avoidance and Control. ACM 446 SIGCOMM 1988. 448 [LK98] Dong Lin, H.T. Kung. TCP Fast Recovery Strategies: Analysis 449 and Improvements. Proceedings of InfoCom, March 1998. 451 [Mor97] Robert Morris. TCP Behavior with Many Flows. Proceedings 452 of the Fifth IEEE International Conference on Network Protocols. 453 October 1997. 455 [Pax97] Vern Paxson. End-to-End Internet Packet Dynamics. ACM 456 SIGCOMM, September 1997. 458 [Pir05] N. M. Piratla, "A Theoretical Foundation, Metrics and 459 Modeling of Packet Reordering and Methodology of Delay Modeling 460 using Inter-packet Gaps," Ph.D. Dissertation, Department of 461 Electrical and Computer Engineering, Colorado State University, 462 Fort Collins, CO, Fall 2005. 464 [RFC2582] Sally Floyd, Tom Henderson. The NewReno Modification to 465 TCP's Fast Recovery Algorithm. RFC 2582, April 1999. 467 [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed. Performance Evaluation 468 of Explicit Congestion Notification (ECN) in IP Networks. RFC 469 2884, July 2000. 471 [RFC3150] Spencer Dawkins, Gabriel Montenegro, Markku Kojo, Vincent 472 Magret. End-to-end Performance Implications of Slow Links. RFC 473 3150, July 2001. 475 [RFC3168] K. K. Ramakrishnan, Sally Floyd, David Black. The 476 Addition of Explicit Congestion Notification (ECN) to IP. RFC 477 3168, September 2001. 479 [RFC3517] Ethan Blanton, Mark Allman, Kevin Fall, Lili Wang. A 480 Conservative Selective Acknowledgment (SACK)-based Loss Recovery 481 Algorithm for TCP. RFC 3517, April 2003. 483 Author's Addresses: 485 Mark Allman 486 International Computer Science Institute 487 1947 Center Street, Suite 600 488 Berkeley, CA 94704-1198 489 Phone: 440-235-1792 490 mallman@icir.org 491 http://www.icir.org/mallman/ 493 Konstantin Avrachenkov 494 INRIA 495 2004 route des Lucioles, B.P.93 496 06902, Sophia Antipolis 497 France 498 Phone: 00 33 492 38 7751 499 k.avrachenkov@sophia.inria.fr 500 http://www.inria.fr/mistral/personnel/K.Avrachenkov/moi.html 502 Urtzi Ayesta 503 LAAS-CNRS 504 7 Avenue Colonel Roche 505 31077 Toulouse 506 France 507 urtzi@laas.fr 508 http://www.laas.fr/~urtzi 510 Josh Blanton 511 Ohio University 512 301 Stocker Center 513 Athens, OH 45701 514 jblanton@irg.cs.ohiou.edu 516 Per Hurtig 517 Karlstad University 518 Department of Computer Science 519 Universitetsgatan 2 651 88 520 Karlstad Sweden 521 per.hurtig@kau.se 523 Appendix A: Research Issues in Adjusting the Duplicate ACK Threshold 525 Decreasing the number of duplicate ACKs required to trigger Fast 526 Retransmit, as suggested in section 2, has the drawback of making 527 Fast Retransmit less robust in the face of minor network reordering. 528 Two egregious examples of problems caused by reordering are given in 529 section 3. This appendix outlines several schemes that have been 530 suggested to mitigate the problems caused by Early Retransmit in the 531 face of packet reordering. These methods need further research 532 before they are suggested for general use (and, current consensus is 533 that the cases that make Early Retransmit unnecessarily retransmit a 534 large amount of data are pathological and therefore these 535 mitigations are not generally required). 537 MITIGATION A.1: Allow a connection to use Early Retransmit as long 538 as the algorithm is not injecting "too much" spurious data into 539 the network. For instance, using the information provided by 540 TCP's DSACK option [RFC2883] or SCTP's Duplicate-TSN 541 notification, a sender can determine when segments sent via 542 Early Retransmit are needless. Likewise, using Eifel [RFC3522] 543 the sender can detect spurious Early Retransmits. Once spurious 544 Early Retransmits are detected the sender can either eliminate 545 the use of Early Retransmit or limit the use of the algorithm to 546 ensure that an acceptably small fraction of the connection's 547 transmissions are not spurious. For example, a connection could 548 stop using Early Retransmit after the first spurious retransmit 549 is detected. 551 MITIGATION A.2: If a sender cannot reliably determine if an Early 552 Retransmitted segment is spurious or not the sender could simply 553 limit Early Retransmits either to some fixed number per 554 connection (e.g., Early Retransmit is allowed only once per 555 connection) or to some small percentage of the total traffic 556 being transmitted. 558 MITIGATION A.3: Allow a connection to trigger Early Retransmit using 559 the criteria given in section 2, in addition to a "small" 560 timeout [Pax97]. For instance, a sender may have to wait for 2 561 duplicate ACKs and then T msec before Early Retransmit is 562 invoked. The added time gives reordered acknowledgments time to 563 arrive at the sender and avoid a needless retransmit. Designing 564 a method for choosing an appropriate timeout is part of the 565 research that would need to be involved in this scheme. 567 Intellectual Property Statement 569 The IETF takes no position regarding the validity or scope of any 570 Intellectual Property Rights or other rights that might be claimed 571 to pertain to the implementation or use of the technology described 572 in this document or the extent to which any license under such 573 rights might or might not be available; nor does it represent that 574 it has made any independent effort to identify any such rights. 575 Information on the procedures with respect to rights in RFC 576 documents can be found in BCP 78 and BCP 79. 578 Copies of IPR disclosures made to the IETF Secretariat and any 579 assurances of licenses to be made available, or the result of an 580 attempt made to obtain a general license or permission for the use 581 of such proprietary rights by implementers or users of this 582 specification can be obtained from the IETF on-line IPR repository 583 at http://www.ietf.org/ipr. 585 The IETF invites any interested party to bring to its attention any 586 copyrights, patents or patent applications, or other proprietary 587 rights that may cover technology that may be required to implement 588 this standard. Please address the information to the IETF at 589 ietf-ipr@ietf.org. 591 Disclaimer of Validity 593 This document and the information contained herein are provided on 594 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 595 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 596 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 597 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 598 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 599 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 600 FOR A PARTICULAR PURPOSE. 602 Copyright Statement 604 Copyright (C) The IETF Trust (2008). This document is subject 605 to the rights, licenses and restrictions contained in BCP 78, and 606 except as set forth therein, the authors retain all their rights. 608 Acknowledgment 610 Funding for the RFC Editor function is currently provided by the 611 Internet Society.