idnits 2.17.1 draft-ietf-tsvwg-tcp-frto-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 51: '... The keywords MUST, MUST NOT, REQUIR...' RFC 2119 keyword, line 52: '... SHOULD NOT, RECOMMENDED, MAY, and O...' RFC 2119 keyword, line 171: '... A TCP sender MAY implement the basi...' RFC 2119 keyword, line 172: '...thm, the following steps MUST be taken...' RFC 2119 keyword, line 175: '..., the TCP sender SHOULD retransmit the...' (22 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2026' is mentioned on line 16, but not defined == Missing Reference: 'RFC2119' is mentioned on line 54, but not defined == Missing Reference: 'RTO' is mentioned on line 779, but not defined == Missing Reference: 'SACK 8' is mentioned on line 787, but not defined == Unused Reference: 'GL03' is defined on line 562, but no explicit reference was found in the text == Unused Reference: 'Sar03' is defined on line 581, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (ref. 'BAFW03') (Obsoleted by RFC 6675) ** Obsolete normative reference: RFC 2988 (ref. 'PA00') (Obsoleted by RFC 6298) ** Obsolete normative reference: RFC 793 (ref. 'Pos81') (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2960 (ref. 'Ste00') (Obsoleted by RFC 4960) -- Obsolete informational reference (is this intentional?): RFC 1323 (ref. 'BBJ92') (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 2582 (ref. 'FH99') (Obsoleted by RFC 3782) Summary: 10 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force P. Sarolahti 3 INTERNET DRAFT Nokia Research Center 4 File: draft-ietf-tsvwg-tcp-frto-01.txt M. Kojo 5 University of Helsinki 6 February, 2004 7 Expires: August, 2004 9 F-RTO: An Algorithm for Detecting 10 Spurious Retransmission Timeouts with TCP and SCTP 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of [RFC2026]. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract 35 Spurious retransmission timeouts (RTOs) cause suboptimal TCP 36 performance, because they often result in unnecessary retransmission 37 of the last window of data. This document describes the "Forward RTO 38 Recovery" (F-RTO) algorithm for detecting spurious TCP RTOs. F-RTO is 39 a TCP sender only algorithm that does not require any TCP options to 40 operate. After retransmitting the first unacknowledged segment 41 triggered by an RTO, the F-RTO algorithm at a TCP sender monitors the 42 incoming acknowledgements to determine whether the timeout was 43 spurious and to decide whether to send new segments or retransmit 44 unacknowledged segments. The algorithm effectively helps to avoid 45 additional unnecessary retransmissions and thereby improves TCP 46 performance in case of a spurious timeout. The F-RTO algorithm can 47 also be applied with the SCTP protocol. 49 Terminology 51 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 52 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 53 document, are to be interpreted as described in [RFC2119]. 55 1. Introduction 57 The TCP protocol [Pos81] has two methods for triggering 58 retransmissions. Primarily, the TCP sender relies on incoming 59 duplicate ACKs, which indicate that the receiver is missing some of 60 the data. After a required number of successive duplicate ACKs have 61 arrived at the sender, it retransmits the first unacknowledged 62 segment [APS99]. Secondarily, the TCP sender maintains a 63 retransmission timer which triggers retransmission of segments, if 64 they have not been acknowledged within the retransmission timer 65 expiration period. When the retransmission timer expires, the TCP 66 sender enters the RTO recovery where congestion window is initialized 67 to one segment and unacknowledged segments are retransmitted using 68 the slow-start algorithm. The retransmission timer is adjusted 69 dynamically based on the measured round-trip times [PA00]. 71 It has been pointed out that the retransmission timer can expire 72 spuriously and trigger unnecessary retransmissions when no segments 73 have been lost [LK00, GL02, LM03]. After a spurious retransmission 74 timeout the late acknowledgements of original segments arrive at the 75 sender, usually triggering unnecessary retransmissions of whole 76 window of segments during the RTO recovery. Furthermore, after a 77 spurious retransmission timeout a conventional TCP sender increases 78 the congestion window on each late acknowledgement in slow start, 79 injecting a large number of data segments to the network within one 80 round-trip time. 82 There are a number of potential reasons for spurious retransmission 83 timeouts. First, some mobile networking technologies involve sudden 84 delay peaks on transmission because of actions taken during a hand- 85 off. Second, arrival of competing traffic, possibly with higher 86 priority, on a low-bandwidth link or some other change in available 87 bandwidth involves a sudden increase of round-trip time which may 88 trigger a spurious retransmission timeout. A persistently reliable 89 link layer can also cause a sudden delay when several data frames are 90 lost for some reason. This document does not distinguish the 91 different causes of such a delay, but discusses the spurious 92 retransmission timeouts caused by a delay in general. 94 This document describes an alternative RTO recovery algorithm called 95 "Forward RTO-Recovery" (F-RTO) to be used for detecting spurious RTOs 96 and thus avoiding unnecessary retransmissions following the RTO. When 97 the RTO is not spurious, the F-RTO algorithm reverts back to the 98 conventional RTO recovery algorithm and should have similar behavior 99 and performance. F-RTO does not require any TCP options in its 100 operation, and it can be implemented by modifying only the TCP 101 sender. This is different from alternative algorithms (Eifel [LK00], 102 [LM03] and DSACK-based algorithms [BA02]) that have been suggested 103 for detecting unnecessary retransmissions. The Eifel algorithm uses 104 TCP timestamps [BBJ92] for detecting a spurious timeout upon arrival 105 of the first acknowledgement after the retransmission. The DSACK- 106 based algorithms require that the TCP Selective Acknowledgment Option 107 [MMFR96] with DSACK extension [FMMP00] is in use. With DSACK, the TCP 108 receiver can report if it has received a duplicate segment, making it 109 possible for the sender to detect afterwards whether it has 110 retransmitted segments unnecessarily. In addition, the F-RTO 111 algorithm only attempts to detect and avoid unnecessary 112 retransmissions after an RTO. Eifel and DSACK can also be used in 113 detecting unnecessary retransmissions in other events, for example 114 due to packet reordering. 116 When an RTO occurs, the F-RTO sender retransmits the first 117 unacknowledged segment as usual. Deviating from the normal operation 118 after a timeout, it then tries to transmit new, previously unsent 119 data, for the first acknowledgement that arrives after the timeout 120 given that the acknowledgement advances the window. If the second 121 acknowledgement that arrives after the timeout also advances the 122 window, i.e., acknowledges data that was not retransmitted, the F-RTO 123 sender declares the RTO spurious and exit the RTO recovery. However, 124 if either of the next two acknowledgements is a duplicate ACK, there 125 was no sufficient evidence of spurious RTO; therefore the F-RTO 126 sender retransmits the unacknowledged segments in slow start 127 similarly to the traditional algorithm. With a SACK-enhanced version 128 of the F-RTO algorithm, spurious RTOs may be detected even if 129 duplicate ACKs arrive after an RTO. 131 The F-RTO algorithm can also be applied with the SCTP protocol 132 [Ste00], because SCTP has similar acknowledgement and packet 133 retransmission concepts as TCP. When a SCTP retransmission timeout 134 occurs, the SCTP sender is required to retransmit the outstanding 135 data similarly to TCP, thus being prone to unnecessary 136 retransmissions and congestion control adjustments, if delay spikes 137 occur in the network. The SACK-enhanced version of F-RTO should be 138 directly applicable to SCTP, which has selective acknowledgements as 139 a built-in feature. For simplicity, this document mostly refers to 140 TCP, but the algorithms and other discussion should be applicable 141 also to SCTP. 143 This document is organized as follows. Section 2 describes the basic 144 F-RTO algorithm. Section 3 outlines an optional enhancement to the F- 145 RTO algorithm that takes leverage on the TCP SACK option. Section 4 146 discusses the possible actions to be taken after detecting a spurious 147 RTO, and Section 5 discusses the security considerations. 149 2. F-RTO Algorithm 151 An RTO is spurious if there are segments outstanding in the network 152 that would have prevented the RTO, had their acknowledgements arrived 153 earlier at the sender. F-RTO affects the TCP sender behavior only 154 after a retransmission timeout, otherwise the TCP behavior remains 155 unmodified. When RTO expires the F-RTO algorithm monitors incoming 156 acknowledgements and declares an RTO spurious, if the TCP sender gets 157 an acknowledgement for a segment that was not retransmitted due to 158 RTO. The actions taken in response to spurious RTO are not specified 159 in this document, but we discuss the different alternatives for 160 congestion control in Section 4. 162 Following the practice used with the Eifel Detection algorithm 163 [LM03], we use the "SpuriousRecovery" variable to indicate whether 164 the retransmission is declared spurious by the sender. This variable 165 can be used as an input for a related response algorithm. With F-RTO, 166 the outcome of SpuriousRecovery can either be SPUR_TO, indicating a 167 spurious retransmission timeout; or FALSE, when the RTO is not 168 declared spurious, and the TCP sender should follow the conventional 169 RTO recovery algorithm. 171 A TCP sender MAY implement the basic F-RTO algorithm, and if it 172 chooses to apply the algorithm, the following steps MUST be taken 173 after the retransmission timer expires. 175 1) When RTO expires, the TCP sender SHOULD retransmit the first 176 unacknowledged segment and set SpuriousRecovery to FALSE. Store 177 the highest sequence number transmitted so far in variable 178 "send_high". 180 2) When the first acknowledgement after the RTO arrives at the 181 sender, the sender chooses the following actions depending on 182 whether the ACK advances the window or whether it is a duplicate 183 ACK. 185 a) If the acknowledgement is a duplicate ACK OR it is 186 acknowledging a sequence number equal to (or above) the value 187 of send_high OR it does not acknowledge all of the data that 188 was retransmitted in step 1, the TCP sender MUST revert to the 189 conventional RTO recovery and continue by retransmitting 190 unacknowledged data in slow start. The TCP sender MUST NOT 191 enter step 3 of this algorithm, and the SpuriousRecovery 192 variable remains as FALSE. 194 b) If the acknowledgement advances the window AND it is below the 195 value of send_high, the TCP sender SHOULD transmit up to two 196 new (previously unsent) segments and enter step 3 of this 197 algorithm. If the TCP sender does not have enough unsent data, 198 it SHOULD send only one segment. In addition, the TCP sender 199 MAY override the Nagle algorithm and send immediately an 200 undersized segment if needed. If the TCP sender does not have 201 any new data to send, the TCP sender SHOULD transmit a segment 202 from the retransmission queue. If TCP sender retransmits the 203 first unacknowledged segment, it MUST NOT enter step 3 of this 204 algorithm but continue with the conventional RTO recovery 205 algorithm. In this case acknowledgement of the next segment 206 would not unambiguously indicate that the original transmission 207 arrived at the receiver. 209 3) When the second acknowledgement after the RTO arrives at the 210 sender, either declare the RTO spurious, or start retransmitting 211 the unacknowledged segments. 213 a) If the acknowledgement is a duplicate ACK, the TCP sender MUST 214 set congestion window to no more than 3 * MSS, and continue 215 with the slow start algorithm retransmitting unacknowledged 216 segments. The sender leaves SpuriousRecovery set to FALSE. 218 b) If the acknowledgement advances the window, i.e. it 219 acknowledges data that was not retransmitted after the RTO, the 220 TCP sender SHOULD declare the RTO spurious, set 221 SpuriousRecovery to SPUR_TO and set the value of send_high 222 variable to SND.UNA. 224 The F-RTO sender takes cautious actions when it receives duplicate 225 acknowledgements after an RTO. Since duplicate ACKs may indicate that 226 segments have been lost, reliably detecting a spurious RTO is 227 difficult in the lack of additional information. Therefore the safest 228 alternative is to follow the conventional TCP recovery in those 229 cases. 231 If the first acknowledgement after RTO covers the send_high point at 232 algorithm step (2a), there is not enough evidence that a non- 233 retransmitted segment has arrived at the receiver after the RTO. 235 This is a common case when a fast retransmission is lost and it has 236 been retransmitted again after an RTO, while the rest of the 237 unacknowledged segments have successfully been delivered to the TCP 238 receiver before the RTO. Therefore the RTO cannot be declared 239 spurious in this case. 241 If the first acknowledgement after RTO does not acknowledge all of 242 the data that was retransmitted in step 1, the TCP sender reverts to 243 the conventional RTO recovery. Otherwise, a malicious receiver 244 acknowledging partial segments could cause the sender to declare the 245 RTO spurious in a case where data was lost. 247 The TCP sender is allowed to send two new segments in algorithm 248 branch (2b), because the conventional TCP sender would transmit two 249 segments when the first new ACK arrives after the RTO. If sending new 250 data is not possible in algorithm branch (2b), or the receiver window 251 limits the transmission, the TCP sender has to send something in 252 order to prevent the TCP transfer from stalling. If no segments were 253 sent, the pipe between sender and receiver may run out of segments, 254 and no further acknowledgements arrive. If transmitting previously 255 unsent data is not possible, the following options are available for 256 the sender. 258 - Continue with the conventional RTO recovery algorithm and do not 259 try to detect the spurious RTO. The disadvantage is that the sender 260 may do unnecessary retransmissions due to possible spurious RTO. On 261 the other hand, we believe that the benefits of detecting spurious 262 RTO in an application limited or receiver limited situations are 263 not very remarkable. 265 - Use additional information if available, e.g. TCP timestamps with 266 the Eifel Detection algorithm, for detecting a spurious RTO. 267 However, Eifel detection may yield different results from F-RTO 268 when ACK losses and a RTO occur within the same round-trip time 269 [SKR03]. 271 - Retransmit data from the tail of the retransmission queue and 272 continue with step 3 of the F-RTO algorithm. It is possible that 273 the retransmission is unnecessarily made, hence this option is not 274 encouraged, except for hosts that are known to operate in an 275 environment that is highly likely to have spurious RTOs. On the 276 other hand, with this method it is possible to avoid several 277 unnecessary retransmissions due to spurious RTO by doing only one 278 retransmission that may be unnecessary. 280 - Send a zero-sized segment below SND.UNA similar to TCP Keep-Alive 281 probe and continue with step 3 of the F-RTO algorithm. Since the 282 receiver replies with a duplicate ACK, the sender is able to detect 283 from the incoming acknowledgement whether the RTO was spurious. 284 While this method does not send data unnecessarily, it delays the 285 recovery by one round-trip time in cases where the RTO was not 286 spurious, and therefore is not encouraged. 288 - In receiver-limited cases, send one octet of new data regardless of 289 the advertised window limit, and continue with step 3 of the F-RTO 290 algorithm. It is possible that the receiver has free buffer space 291 to receive the data by the time the segment has propagated through 292 the network, in which case no harm is done. If the receiver is not 293 capable of receiving the segment, it rejects the segment and sends 294 a duplicate ACK. 296 If the RTO is declared spurious, the TCP sender sets the value of 297 send_high variable to SND.UNA in order to disable the NewReno 298 "bugfix" [FH99]. The send_high variable was proposed for avoiding 299 unnecessary multiple fast retransmits when RTO expires during fast 300 recovery with NewReno TCP. As the sender has not retransmitted other 301 segments but the one that triggered RTO, the problem addressed by the 302 bugfix cannot occur. Therefore, if there are three duplicate ACKs 303 arriving at the sender after the RTO, they are likely to indicate a 304 packet loss, hence fast retransmit should be used to allow efficient 305 recovery. If there are not enough duplicate ACKs arriving at the 306 sender after a packet loss, the retransmission timer expires another 307 time and the sender enters step 1 of this algorithm. 309 When the RTO is declared spurious, the TCP sender cannot detect 310 whether the unnecessary RTO retransmission was lost. In principle the 311 loss of the RTO retransmission should be taken as a congestion 312 signal, and thus there is a small possibility that the F-RTO sender 313 violates the congestion control rules, if it chooses to fully revert 314 congestion control parameters after detecting a spurious RTO. The 315 Eifel detection algorithm has a similar property, while the DSACK 316 option can be used to detect whether the retransmitted segment was 317 successfully delivered to the receiver. 319 The F-RTO algorithm has a side-effect on the TCP round-trip time 320 measurement. Because the TCP sender can avoid most of the unnecessary 321 retransmissions after detecting a spurious RTO, the sender is able to 322 take round-trip time samples on the delayed segments. If the regular 323 RTO recovery was used without TCP timestamps, this would not be 324 possible due to retransmission ambiguity. As a result, the RTO 325 estimator is likely to be more accurate and have larger values with 326 F-RTO than with the regular TCP after a spurious RTO that was 327 triggered due to delayed segments. We believe this is an advantage in 328 the networks that are prone to delay spikes. 330 It is possible that the F-RTO algorithm does not always avoid 331 unnecessary retransmissions after a spurious RTO. If packet 332 reordering or packet duplication occurs on the segment that triggered 333 the spurious RTO, the F-RTO algorithm may not detect the spurious RTO 334 due to incoming duplicate ACKs. Additionally, if a spurious RTO 335 occurs during fast recovery, the F-RTO algorithm often cannot detect 336 the spurious RTO. However, we consider these cases relatively rare, 337 and note that in cases where F-RTO fails to detect the spurious RTO, 338 it performs similarly to the regular RTO recovery. 340 3. A SACK-enhanced version of the F-RTO algorithm 342 This section describes an alternative version of the F-RTO algorithm, 343 that makes use of TCP Selective Acknowledgement Option [MMFR96]. By 344 using the SACK option the TCP sender can detect spurious RTOs in most 345 of the cases when packet reordering or packet duplication is present. 346 The difference to the basic F-RTO algorithm is that the sender may 347 declare RTO spurious even when duplicate ACKs follow the RTO, if the 348 SACK blocks acknowledge new data that was not transmitted after RTO. 349 The algorithm principle presented in this section is also applicable 350 to be used with the SCTP protocol. 352 Given that the TCP Selective Acknowledgement Option [MMFR96] is 353 enabled for a TCP connection, TCP sender MAY implement the SACK- 354 enhanced F-RTO algorithm. If the sender applies the SACK-enhanced F- 355 RTO algorithm, it MUST follow the steps below. This algorithm SHOULD 356 NOT be applied, if the TCP sender is already in loss recovery when 357 RTO occurs. However, it should be possible to apply the principle of 358 F-RTO within certain limitations also when RTO occurs during existing 359 loss recovery. While this is a topic of further research, Appendix B 360 briefly discusses the related issues. 362 1) When RTO expires, the TCP sender SHOULD retransmit first 363 unacknowledged segment and set SpuriousRecovery to FALSE. Variable 364 "send_high" is set to indicate the highest segment transmitted so 365 far. 367 2) Wait until the acknowledgement for the segment retransmitted due 368 to RTO arrives at the sender. If duplicate ACKs arrive, store the 369 incoming SACK information but stay in step 2. If RTO expires, 370 restart the algorithm. 372 a) if the cumulative ACK acknowledges all segments up to 373 send_high, the TCP sender SHOULD revert to the conventional RTO 374 recovery and it MUST set congestion window to no more than 2 * 375 MSS. The sender does not enter step 3 of this algorithm. 377 b) otherwise, the TCP sender SHOULD transmit up to two new 378 (previously unsent) segments, within the limitations of the 379 congestion window. If the TCP sender is not able to transmit 380 any previously unsent data due to receiver window limitation or 381 because it does not have any new data to send, it MAY follow 382 one of the options presented in Section 2. However, if the TCP 383 sender chooses to retransmit a data segment here, SACK of that 384 segment MUST NOT be used for declaring a spurious RTO in step 385 (3b). 387 3) When the next acknowledgement arrives at the sender. 389 a) if the ACK acknowledges data above send_high, either in SACK 390 blocks or as a cumulative ACK, the sender MUST set congestion 391 window to no more than 3 * MSS and proceed with conventional 392 recovery, retransmitting unacknowledged segments. The sender 393 SHOULD take this branch also when the acknowledgement is a 394 duplicate ACK and it does not contain any new SACK blocks for 395 previously unacknowledged data below send_high. 397 b) if the ACK does not acknowledge data above send_high AND it 398 acknowledges some previously unacknowledged data below 399 send_high, the TCP sender SHOULD declare the RTO spurious and 400 set SpuriousRecovery to SPUR_TO. 402 If there are unacknowledged holes between the received SACK blocks, 403 those segments SHOULD be retransmitted similarly to the conventional 404 SACK recovery algorithm [BAFW03]. If the algorithm exits with 405 SpuriousRecovery set to SPUR_TO, send_high SHOULD be set to SND.UNA, 406 thus allowing fast recovery on incoming duplicate acknowledgements. 408 4. Taking Actions after Detecting Spurious RTO 410 Upon retransmission timeout, a conventional TCP sender assumes that 411 outstanding segments are lost and starts retransmitting the 412 unacknowledged segments. When the RTO is detected to be spurious, the 413 TCP sender should not continue retransmitting based on the RTO. For 414 example, if the sender was in congestion avoidance phase transmitting 415 new previously unsent segments, it should continue transmitting 416 previously unsent segments after detecting spurious RTO. In addition, 417 it is suggested that the RTO estimation is reinitialized and the RTO 418 timer is adjusted to a more conservative value in order to avoid 419 subsequent spurious RTOs [LG03]. 421 Different approaches have been discussed for adjusting the congestion 422 control state after a spurious RTO in various research papers [SKR03, 423 GL03, Sar03] and Internet-Drafts [SL03, LG03]. The different response 424 suggestions vary in whether the spurious retransmission timeout 425 should be taken as a congestion signal, thus causing the congestion 426 window or slow start threshold to be reduced at the sender, or 427 whether the congestion control state should be fully reverted to the 428 state valid prior to the retransmission timeout. 430 This document does not give recommendation on selecting the response 431 alternative, but considers the response to spurious RTO as a subject 432 of further research. 434 5. SCTP Considerations 436 The basic F-RTO or the SACK-enhanced F-RTO algorithm can be applied 437 with the SCTP protocol. However, SCTP contains features that are not 438 present with TCP that need to be discussed when applying the F-RTO 439 algorithm. 441 SCTP association can be multi-homed. The current retransmission pol- 442 icy states that retransmissions should go to alternative addresses. 443 If the retransmission was due to spurious RTO caused by a delay 444 spike, it is possible that the acknowledgement for the retransmission 445 arrives back at the sender before the acknowledgements of the origi- 446 nal transmissions arrive. If this happens, a possible loss of the 447 original transmission of the data chunk that was retransmitted due to 448 the spurious RTO may remain undetected when applying the F-RTO algo- 449 rithm. Because the RTO was caused by the delay, and it was spurious 450 in that respect, a suitable response is to continue by sending new 451 data. However, if the original transmission was lost, fully reverting 452 the congestion control parameters is too aggressive. Therefore, tak- 453 ing conservative actions on congestion control is recommended, if the 454 SCTP association is multi-homed and retransmissions go to alternative 455 address. The information in duplicate TSNs can be then used for 456 reverting congestion control, if desired [BA02]. 458 Note that the forward transmissions made in F-RTO algorithm step (2b) 459 should be destined to the primary address, since they are not 460 retransmissions. 462 When making a retransmission, a SCTP sender can bundle a number of 463 unacknowledged data chunks and include them in the same packet. This 464 needs to be considered when implementing F-RTO for SCTP. The basic 465 principle of F-RTO still holds: in order to declare the RTO spurious, 466 the sender must get an acknowledgement for a data chunk that was not 467 retransmitted after the RTO. In other words, acknowledgements of data 468 chunks that were bundled in RTO retransmission must not be used for 469 declaring the RTO spurious. 471 6. Security Considerations 473 The main security threat regarding F-RTO is the possibility of a 474 receiver misleading the sender to set too large a congestion window 475 after an RTO. There are two possible ways a malicious receiver could 476 trigger a wrong output from the F-RTO algorithm. First, the receiver 477 can acknowledge data that it has not received. Second, it can delay 478 acknowledgement of a segment it has received earlier, and acknowledge 479 the segment after the TCP sender has been deluded to enter algorithm 480 step 3. 482 If the receiver acknowledges a segment it has not really received, 483 the sender can be lead to declare RTO spurious in F-RTO algorithm 484 step 3. However, since this causes the sender to have incorrect 485 state, it cannot retransmit the segment that has never reached the 486 receiver. Therefore, this attack is unlikely to be useful for the 487 receiver to maliciously gain a larger congestion window. 489 A common case of an RTO is that a fast retransmission of a segment is 490 lost. If all other segments have been received, the RTO retransmis- 491 sion causes the whole window to be acknowledged at once. This case is 492 recognized in F-RTO algorithm branch (2a). However, if the receiver 493 only acknowledges one segment after receiving the RTO retransmission, 494 and then the rest of the segments, it could cause the RTO to be 495 declared spurious when it is not. Therefore, it is suggested that 496 when an RTO expires during fast recovery phase, the sender would not 497 fully revert the congestion window even if the RTO was declared spu- 498 rious, but reduce the congestion window to 1. However, the sender can 499 take actions to avoid unnecessary retransmissions normally. If a TCP 500 sender implements a burst avoidance algorithm that limits the sending 501 rate to be no higher than in slow start, this precaution is not 502 needed, and the sender may apply F-RTO normally. 504 If there are more than one segments missing at the time when an RTO 505 occurs, the receiver does not benefit from misleading the sender to 506 declare a spurious RTO, because the sender would then have to go 507 through another recovery period to retransmit the missing segments, 508 usually after an RTO. 510 Acknowledgements 512 We are grateful to Reiner Ludwig, Andrei Gurtov, Josh Blanton, Mark 513 Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias 514 Rodriguez, Sourabh Ladha, and Martin Duke for the discussion and 515 feedback contributed to this text. 517 Normative References 519 [APS99] M. Allman, V. Paxson, and W. Stevens. TCP Congestion Con- 520 trol. RFC 2581, April 1999. 522 [BAFW03] E. Blanton, M. Allman, K. Fall, and L. Wang. A Conservative 523 Selective Acknowledgment (SACK)-based Loss Recovery Algo- 524 rithm for TCP. RFC 3517. April 2003. 526 [MMFR96] M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow. TCP Selec- 527 tive Acknowledgement Options. RFC 2018, October 1996. 529 [PA00] V. Paxson and M. Allman. Computing TCP's Retransmission 530 Timer. RFC 2988, November 2000. 532 [Pos81] J. Postel. Transmission Control Protocol. RFC 793, Septem- 533 ber 1981. 535 [Ste00] R. Stewart, et. al. Stream Control Transmission Protocol. 536 RFC 2960, October 2000. 538 Informative References 540 [ABF01] M. Allman, H. Balakrishnan, and S. Floyd. Enhancing TCP's 541 Loss Recovery Using Limited Transmit. RFC 3042, January 542 2001. 544 [BA02] E. Blanton and M. Allman. On Making TCP more Robust to 545 Packet Reordering. ACM SIGCOMM Computer Communication 546 Review, 32(1), January 2002. 548 [BBJ92] D. Borman, R. Braden, and V. Jacobson. TCP Extensions for 549 High Performance. RFC 1323, May 1992. 551 [FH99] S. Floyd and T. Henderson. The NewReno Modification to 552 TCP's Fast Recovery Algorithm. RFC 2582, April 1999. 554 [FMMP00] S. Floyd, J. Mahdavi, M. Mathis, and M. Podolsky. An Exten- 555 sion to the Selective Acknowledgement (SACK) Option to TCP. 556 RFC 2883, July 2000. 558 [GL02] A. Gurtov and R. Ludwig. Evaluating the Eifel Algorithm for 559 TCP in a GPRS Network. In Proc. of European Wireless, Flo- 560 rence, Italy, February 2002 562 [GL03] A. Gurtov and R. Ludwig, Responding to Spurious Timeouts in 563 TCP, In Proceedings of IEEE INFOCOM 03, March 2003. 565 [LG03] R. Ludwig and A. Gurtov. The Eifel Response Algorithm for 566 TCP. Internet draft "draft-ietf-tsvwg-tcp-eifel- 567 response-04.txt". October 2003. Work in progress. 569 [LK00] R. Ludwig and R.H. Katz. The Eifel Algorithm: Making TCP 570 Robust Against Spurious Retransmissions. ACM SIGCOMM Com- 571 puter Communication Review, 30(1), January 2000. 573 [LM03] R. Ludwig and M. Meyer. The Eifel Detection Algorithm for 574 TCP. RFC 3522, April 2003. 576 [SKR03] P. Sarolahti, M. Kojo, and K. Raatikainen. F-RTO: An 577 Enhanced Recovery Algorithm for TCP Retransmission Time- 578 outs. ACM SIGCOMM Computer Communication Review, 33(2), 579 April 2003. 581 [Sar03] P. Sarolahti. Congestion Control on Spurious TCP Retrans- 582 mission Timeouts. In Proceedings of IEEE Globecom 2003. 583 December 2003. 585 [SL03] Y. Swami and K. Le. DCLOR: De-correlated Loss Recovery 586 using SACK option for spurious timeouts. Internet draft 587 "draft-swami-tsvwg-tcp-dclor-02.txt". September 2003. Work 588 in progress. 590 Appendix A: Scenarios 592 This section discusses different scenarios where RTOs occur and how 593 the basic F-RTO algorithm performs in those scenarios. The 594 interesting scenarios are a sudden delay triggering RTO, loss of a 595 retransmitted packet during fast recovery, link outage causing the 596 loss of several packets, and packet reordering. A performance 597 evaluation with a more thorough analysis on a real implementation of 598 F-RTO is given in [SKR03]. 600 A.1. Sudden delay 602 The main motivation of F-RTO algorithm is to improve TCP performance 603 when a delay spike triggers a spurious retransmission timeout. The 604 example below illustrates the segments and acknowledgements 605 transmitted by the TCP end hosts when a spurious RTO occurs, but no 606 packets are lost. For simplicity, delayed acknowledgements are not 607 used in the example. The example below reduces the congestion window 608 and slow start threshold by half after detecting a spurious RTO. 610 ... 611 (cwnd = 6, 612 ssthresh < 6, 613 FlightSize = 5) 614 1. SEND 10 ----------------------------> 615 2. <---------------------------- ACK 6 616 3. SEND 11 ----------------------------> 617 4. | 618 [delay] 619 | 620 [RTO] 621 5. SEND 6 ----------------------------> 622 ---> 623 6. <---------------------------- ACK 7 624 [F-RTO step (2b)] 625 7. SEND 12 ----------------------------> 626 8. SEND 13 ----------------------------> 627 ---> 628 9. <---------------------------- ACK 8 629 [F-RTO step (3b)] 630 [SpuriousRecovery <- SPUR_TO] 631 [cwnd <- 3, ssthresh <- 3] 632 10. <---------------------------- ACK 9 633 11. <---------------------------- ACK 10 634 12. <---------------------------- ACK 11 635 13. SEND 14 ----------------------------> 636 ... 638 When a sudden delay long enough to trigger RTO occurs at step 4, the 639 TCP sender retransmits the first unacknowledged segment (step 5). 640 Because the next ACK covers the RTO retransmission because originally 641 transmitted segment 6 arrives at the receiver, the TCP sender 642 continues by sending two new data segments (steps 7, 8). Because the 643 second acknowledgement arriving after the RTO acknowledges data that 644 was not retransmitted due to RTO (step 9), the TCP sender declares 645 the RTO as spurious and continues by sending new data. Because the 646 TCP sender reduces cwnd when it detects the spurious RTO, it has to 647 wait for some outstanding segments to leave the network before it can 648 continue transmitting again at step 13. 650 A.2. Loss of a retransmission 652 If a retransmitted segment is lost, the only way to retransmit it 653 again is to wait for the RTO to trigger the retransmission. Once the 654 segment is successfully received, the receiver usually acknowledges 655 several segments at once, because other segments in the same window 656 have been successfully delivered before the retransmission arrives at 657 the receiver. The example below shows a scenario where retransmission 658 (of segment 6) is lost, as well as a later segment (segment 9) in the 659 same window. The limited transmit [ABF01] or SACK TCP [MMFR96] 660 enhancements are not in use in this example. 662 ... 663 (cwnd = 6, 664 ssthresh < 6, 665 FlightSize = 5) 666 667 668 1. SEND 10 ----------------------------> 669 2. <---------------------------- ACK 6 670 3. SEND 11 ----------------------------> 671 4. <---------------------------- ACK 6 672 5. <---------------------------- ACK 6 673 6. <---------------------------- ACK 6 674 7. SEND 6 --------------X 675 676 [ssthresh <- 3, cwnd <- ssthresh + 3 = 6] 677 8. <---------------------------- ACK 6 678 | 679 | 680 [RTO] 681 [ssthresh <- 2] 682 9. SEND 6 ----------------------------> 683 10. <---------------------------- ACK 9 684 [F-RTO step (2b)] 685 11. SEND 12 ----------------------------> 686 12. SEND 13 ----------------------------> 687 13. <---------------------------- ACK 9 688 [F-RTO step (3a)] 689 [SpuriousRecovery <- FALSE] 690 [cwnd <- 3] 691 14. SEND 9 ----------------------------> 692 15. SEND 10 ----------------------------> 693 16. SEND 11 ----------------------------> 694 17. <---------------------------- ACK 11 695 ... 697 In the example above, segment 6 is lost and the sender retransmits it 698 after three duplicate ACKs in step 7. However, the retransmission is 699 also lost, and the sender has to wait for the RTO to expire before 700 retransmitting it again. Because the first ACK following the RTO 701 acknowledges the RTO retransmission (step 10), the sender transmits 702 two new segments. The second ACK in step 13 does not acknowledge any 703 previously unacknowledged data. Therefore the F-RTO sender enters the 704 slow start and sets cwnd to 3 * MSS. Congestion window can be set to 705 three segments, because two round-trips have elapsed after the RTO. 706 After this the receiver acknowledges all segments transmitted prior 707 to entering recovery and the sender can continue transmitting new 708 data in congestion avoidance. 710 A.3. Link outage 712 The example below illustrates the F-RTO behavior when 4 consecutive 713 packets are lost in the network causing the TCP sender to fall back 714 to RTO recovery. Limited transmit and SACK are not used in this 715 example. 717 ... 718 (cwnd = 6, 719 ssthresh < 6, 720 FlightSize = 5) 721 722 1. SEND 10 ----------------------------> 723 2. <---------------------------- ACK 6 724 3. SEND 11 ----------------------------> 725 4. <---------------------------- ACK 6 726 | 727 | 728 [RTO] 729 [ssthresh <- 3] 730 5. SEND 6 ----------------------------> 731 6. <---------------------------- ACK 7 732 [F-RTO step (2b)] 733 7. SEND 12 ----------------------------> 734 8. SEND 13 ----------------------------> 735 9. <---------------------------- ACK 7 736 [F-RTO step (3a)] 737 [SpuriousRecovery <- FALSE] 738 [cwnd <- 3] 739 10. SEND 7 ----------------------------> 740 11. SEND 8 ----------------------------> 741 12. SEND 9 ----------------------------> 742 13. <---------------------------- ACK 14 744 Again, F-RTO sender transmits two new segments (steps 7 and 8) after 745 the RTO retransmission is acknowledged. Because the next ACK does not 746 acknowledge any data that was not retransmitted after the RTO (step 747 9), the F-RTO sender proceeds with conventional recovery and slow 748 start retransmissions. 750 A.4. Packet reordering 752 Since F-RTO modifies the TCP sender behavior only after a 753 retransmission timeout and it is intended to avoid unnecessary 754 retransmits only after spurious RTO, we limit the discussion on the 755 effects of packet reordering in F-RTO behavior to the cases where 756 packet reordering occurs immediately after the RTO. When the TCP 757 receiver gets an out-of-order segment, it generates a duplicate ACK. 758 If the TCP sender implements the basic F-RTO algorithm, this may 759 prevent the sender from detecting a spurious RTO. 761 However, if the TCP sender applies the SACK-enhanced F-RTO, it is 762 possible to detect a spurious RTO also when packet reordering occurs. 763 We illustrate the behavior of SACK-enhanced F-RTO below when segment 764 8 arrives before segments 6 and 7, and segments starting from segment 765 6 are delayed in the network. In this example the TCP sender reduces 766 the congestion window and slow start threshold in response to 767 spurious RTO. 769 ... 770 (cwnd = 6, 771 ssthresh < 6, 772 FlightSize = 5) 773 1. SEND 10 ----------------------------> 774 2. <---------------------------- ACK 6 775 3. SEND 11 ----------------------------> 776 4. | 777 [delay] 778 | 779 [RTO] 780 5. SEND 6 ----------------------------> 781 ---> 782 6. <---------------------------- ACK 6 783 [SACK 8] 784 [SACK F-RTO stays in step 2] 785 7. ---> 786 8. <---------------------------- ACK 7 787 [SACK 8] 788 [SACK F-RTO step (2b)] 789 9. SEND 12 ----------------------------> 790 10. SEND 13 ----------------------------> 791 11. ---> 792 12. <---------------------------- ACK 9 793 [SACK F-RTO step (3b)] 794 [SpuriousRecovery <- SPUR_TO] 795 [ssthresh <- 3, cwnd <- 3] 796 13. <---------------------------- ACK 10 797 14. <---------------------------- ACK 11 798 15. SEND 14 ----------------------------> 799 ... 801 After RTO expires and the sender retransmits segment 6 (step 5), the 802 receiver gets segment 8 and generates duplicate ACK with SACK for 803 segment 8. In response to the acknowledgement the TCP sender does not 804 send anything but stays in F-RTO step 2. Because the next 805 acknowledgement advances the cumulative ACK point (step 8), the 806 sender can transmit two new segments according to SACK-enhanced F- 807 RTO. The next segment acknowledges new data between 7 and 11 that was 808 not acknowledged earlier (segment 7), so the F-RTO sender declares 809 the RTO spurious. 811 Appendix B: Applying SACK-enhanced F-RTO when RTO occurs during loss 812 recovery 814 We believe that slightly modified SACK-enhanced F-RTO algorithm can 815 be used to detect spurious RTOs also when RTO occurs while an earlier 816 loss recovery is underway. However, there are issues that need to be 817 considered if F-RTO is applied in this case. 819 The original SACK-based F-RTO requires in algorithm step 3 that an 820 ACK acknowledges previously unacknowledged non-retransmitted data 821 between SND.UNA and send_high. If RTO takes place during earlier 822 (SACK-based) loss recovery, the F-RTO sender must only use 823 acknowledgements for non-retransmitted segments transmitted before 824 the SACK-based loss recovery started. This means that in order to 825 declare RTO spurious the TCP sender must receive an acknowledgement 826 for non-retransmitted segment between SND.UNA and RecoveryPoint in 827 algorithm step 3. RecoveryPoint is defined in conservative SACK- 828 recovery algorithm [BAFW03], and it is set to indicate the highest 829 segment transmitted so far when SACK-based loss recovery begins. In 830 other words, if the TCP sender receives acknowledgement for segment 831 that was transmitted more than one RTO ago, it can declare the RTO 832 spurious. Defining an efficient algorithm for checking these 833 conditions remains as a future work item. 835 When spurious RTO is detected according to the rules given above, it 836 may be possible that the response algorithm needs to consider this 837 case separately, for example in terms of what segments to retransmit 838 after RTO, and whether it is safe to revert the congestion control 839 parameters in this case. This is considered as a topic of future 840 research. 842 Authors' Addresses 844 Pasi Sarolahti 845 Nokia Research Center 846 P.O. Box 407 847 FIN-00045 NOKIA GROUP 848 Finland 849 Phone: +358 50 4876607 850 EMail: pasi.sarolahti@nokia.com 851 http://www.cs.helsinki.fi/u/sarolaht/ 853 Markku Kojo 854 University of Helsinki 855 Department of Computer Science 856 P.O. Box 26 857 FIN-00014 UNIVERSITY OF HELSINKI 858 Finland 860 Phone: +358 9 1914 4179 861 EMail: markku.kojo@cs.helsinki.fi