idnits 2.17.1 draft-ietf-tsvwg-tcp-frto-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 159: '... A TCP sender MAY implement the basi...' RFC 2119 keyword, line 160: '...thm, the following steps MUST be taken...' RFC 2119 keyword, line 163: '..., the TCP sender SHOULD retransmit the...' RFC 2119 keyword, line 175: '..., the TCP sender MUST revert to the co...' RFC 2119 keyword, line 177: '.... The TCP sender MUST NOT enter step 3...' (20 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2026' is mentioned on line 16, but not defined == Missing Reference: 'RTO' is mentioned on line 779, but not defined == Missing Reference: 'SACK 8' is mentioned on line 787, but not defined ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (ref. 'BAFW03') (Obsoleted by RFC 6675) ** Obsolete normative reference: RFC 2988 (ref. 'PA00') (Obsoleted by RFC 6298) ** Obsolete normative reference: RFC 793 (ref. 'Pos81') (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2960 (ref. 'Ste00') (Obsoleted by RFC 4960) -- Obsolete informational reference (is this intentional?): RFC 1323 (ref. 'BBJ92') (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 2582 (ref. 'FH99') (Obsoleted by RFC 3782) Summary: 9 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force P. Sarolahti 3 INTERNET DRAFT Nokia Research Center 4 File: draft-ietf-tsvwg-tcp-frto-00.txt M. Kojo 5 University of Helsinki 6 October, 2003 7 Expires: April, 2004 9 F-RTO: An Algorithm for Detecting 10 Spurious Retransmission Timeouts with TCP and SCTP 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of [RFC2026]. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract 35 Spurious retransmission timeouts (RTOs) cause suboptimal TCP 36 performance, because they often result in unnecessary retransmission 37 of the last window of data. This document describes the "Forward RTO 38 Recovery" (F-RTO) algorithm for detecting spurious TCP RTOs. F-RTO is 39 a TCP sender only algorithm that does not require any TCP options to 40 operate. After retransmitting the first unacknowledged segment 41 triggered by an RTO, the F-RTO algorithm at a TCP sender monitors the 42 incoming acknowledgements to determine whether the timeout was 43 spurious and to decide whether to send new segments or retransmit 44 unacknowledged segments. The algorithm effectively helps to avoid 45 additional unnecessary retransmissions and thereby improves TCP 46 performance in case of a spurious timeout. The F-RTO algorithm can 47 also be applied with the SCTP protocol. 49 1. Introduction 51 The TCP protocol [Pos81] has two methods for triggering 52 retransmissions. Primarily, the TCP sender relies on incoming 53 duplicate ACKs, which indicate that the receiver is missing some of 54 the data. After a required amount of successive duplicate ACKs have 55 arrived at the sender, it retransmits the first unacknowledged 56 segment [APS99]. Secondarily, the TCP sender maintains a 57 retransmission timer which triggers retransmission of segments, if 58 they have not been acknowledged within the retransmission timer 59 expiration period. When the retransmission timer expires, the TCP 60 sender enters the RTO recovery where congestion window is initialized 61 to one segment and unacknowledged segments are retransmitted using 62 the slow-start algorithm. The retransmission timer is adjusted 63 dynamically based on the measured round-trip times [PA00]. 65 It has been pointed out that the retransmission timer can expire 66 spuriously and trigger unnecessary retransmissions when no segments 67 have been lost [GL02]. After a spurious RTO the late acknowledgements 68 of original segments arrive at the sender, usually triggering 69 unnecessary retransmissions of whole window of segments during the 70 RTO recovery. Furthermore, after a spurious RTO a conventional TCP 71 sender increases the congestion window on each late acknowledgement 72 in slow start, injecting a large number of data segments to the 73 network within one round-trip time. 75 There are a number of potential reasons for spurious RTOs. First, 76 some mobile networking technologies involve sudden delay peaks on 77 transmission because of actions taken during a hand-off. Second, 78 arrival of competing traffic, possibly with higher priority, on a 79 low-bandwidth link or some other change in available bandwidth 80 involves a sudden increase of round-trip time which may trigger a 81 spurious retransmission timeout. A persistently reliable link layer 82 can also cause a sudden delay when several data frames are lost for 83 some reason. This document does not distinguish the different causes 84 of such a delay, but discusses the spurious RTOs caused by a delay in 85 general. 87 This document describes an alternative RTO recovery algorithm called 88 "Forward RTO-Recovery" (F-RTO) to be used for detecting spurious RTOs 89 and thus avoiding unnecessary retransmissions following the RTO. When 90 the RTO is not spurious, the F-RTO algorithm reverts back to the 91 conventional RTO recovery algorithm and should have similar behavior 92 and performance. F-RTO does not require any TCP options in its 93 operation, and it can be implemented by modifying only the TCP 94 sender. This is different from alternative algorithms (Eifel [LK00], 95 [LM03] and DSACK-based algorithms [BA02]) that have been suggested 96 for detecting unnecessary retransmissions. The Eifel algorithm uses 97 TCP timestamps [BBJ92] for detecting a spurious timeout and the 98 DSACK-based algorithms require that the TCP Selective Acknowledgment 99 Option [MMFR96] with DSACK extension [FMMP00] is in use. With DSACK, 100 the TCP receiver can report if it has received a duplicate segment, 101 making it possible for the sender to detect afterwards whether it has 102 retransmitted segments unnecessarily. 104 When an RTO occurs, the F-RTO sender retransmits the first 105 unacknowledged segment normally, but tries to transmit new, 106 previously unsent data after that. If the next two acknowledgements 107 cover segments that were not retransmitted, the F-RTO sender can 108 declare the RTO spurious and exit the RTO recovery. However, if 109 either of the next two acknowledgements is a duplicate ACK, there was 110 no sufficient evidence of spurious RTO; therefore the F-RTO sender 111 retransmits the unacknowledged segments in slow start similarly to 112 the traditional algorithm. With a SACK-enhanced version of the F-RTO 113 algorithm, spurious RTOs may be detected even if duplicate ACKs 114 arrive after an RTO. The F-RTO algorithm only attempts to detect and 115 avoid unnecessary retransmissions after an RTO. Eifel and DSACK can 116 also be used in detecting unnecessary retransmissions in other 117 events, for example due to packet reordering. 119 The F-RTO algorithm can also be applied with the SCTP protocol 120 [Ste00], because SCTP has similar acknowledgement and packet 121 retransmission concepts as TCP. When a SCTP retransmission timeout 122 occurs, the SCTP sender is required to retransmit the outstanding 123 data similarly to TCP, thus being prone to unnecessary 124 retransmissions and congestion control adjustments, if delay spikes 125 occur in the network. The SACK-enhanced version of F-RTO should be 126 directly applicable to SCTP, which has selective acknowledgements as 127 a built-in feature. For simplicity, this document mostly refers to 128 TCP, but the algorithms and other discussion should be applicable 129 also to SCTP. 131 This document is organized as follows. Section 2 describes the basic 132 F-RTO algorithm. Section 3 outlines an optional enhancement to the F- 133 RTO algorithm that takes leverage on the TCP SACK option. Section 4 134 discusses the possible actions to be taken after detecting a spurious 135 RTO, and Section 5 discusses the security considerations. 137 2. F-RTO Algorithm 139 An RTO is spurious if there are segments outstanding in the network 140 that would have prevented the RTO, had their acknowledgements arrived 141 earlier at the sender. F-RTO affects the TCP sender behavior only 142 after a retransmission timeout, otherwise the TCP behavior remains 143 unmodified. When RTO expires the F-RTO algorithm monitors incoming 144 acknowledgements and declares an RTO spurious, if the TCP sender gets 145 an acknowledgement for a segment that was not retransmitted due to 146 RTO. The actions taken in response to spurious RTO are not specified 147 in this document, but we discuss the different alternatives for 148 congestion control in Section 4. 150 Following the practice used with the Eifel Detection algorithm 151 [LM03], we use the "SpuriousRecovery" variable to indicate whether 152 the retransmission is declared spurious by the sender. This variable 153 can be used as an input for a related response algorithm. With F-RTO, 154 the outcome of SpuriousRecovery can either be SPUR_TO, indicating a 155 spurious retransmission timeout; or FALSE, when the RTO is not 156 declared spurious, and the TCP sender should follow the conventional 157 RTO recovery algorithm. 159 A TCP sender MAY implement the basic F-RTO algorithm, and if it 160 chooses to apply the algorithm, the following steps MUST be taken 161 after the retransmission timer expires. 163 1) When RTO expires, the TCP sender SHOULD retransmit the first 164 unacknowledged segment and set SpuriousRecovery to FALSE. Store 165 the highest sequence number transmitted so far in variable 166 "send_high". 168 2) When the first acknowledgement after the RTO arrives at the 169 sender, the sender chooses the following actions depending on 170 whether the ACK advances the window or whether it is a duplicate 171 ACK. 173 a) If the acknowledgement is a duplicate ACK OR it is 174 acknowledging a sequence number equal to (or above) the value 175 of send_high, the TCP sender MUST revert to the conventional 176 RTO recovery, and continue by retransmitting unacknowledged 177 data in slow start. The TCP sender MUST NOT enter step 3 of 178 this algorithm, and the SpuriousRecovery variable remains as 179 FALSE. 181 b) If the acknowledgement advances the window AND it is below the 182 value of send_high, the TCP sender SHOULD transmit up to two 183 new (previously unsent) segments and enter step 3 of this 184 algorithm. If the TCP sender does not have enough unsent data, 185 it SHOULD send only one segment. In addition, the TCP sender 186 MAY override the Nagle algorithm and send immediately an 187 undersized segment if needed. If the TCP sender does not have 188 any new data to send, the TCP sender SHOULD transmit a segment 189 from the retransmission queue. If TCP sender retransmits the 190 first unacknowledged segment, it MUST NOT enter step 3 of this 191 algorithm but continue with the conventional RTO recovery 192 algorithm. 194 3) When the second acknowledgement after the RTO arrives at the 195 sender, either declare the RTO spurious, or start retransmitting 196 the unacknowledged segments. 198 a) If the acknowledgement is a duplicate ACK, the TCP sender MUST 199 set congestion window to no more than 3 * MSS, and continue 200 with the slow start algorithm retransmitting unacknowledged 201 segments. The sender leaves SpuriousRecovery to FALSE. 203 b) If the acknowledgement advances the window, i.e. it 204 acknowledges data that was not retransmitted after the RTO, the 205 TCP sender SHOULD declare the RTO spurious, set 206 SpuriousRecovery to SPUR_TO and set the value of send_high 207 variable to SND.UNA. 209 The F-RTO sender takes cautious actions when it receives duplicate 210 acknowledgements after an RTO. Since duplicate ACKs may indicate that 211 segments have been lost, reliably detecting a spurious RTO is 212 difficult in the lack of additional information. Therefore the safest 213 alternative is to follow the conventional TCP recovery in those 214 cases. 216 If the first acknowledgement after RTO covers the send_high point at 217 algorithm step (2a), there is not enough evidence that a non- 218 retransmitted segment has arrived at the receiver after the RTO. 219 This is a common case when a fast retransmission is lost and it has 220 been retransmitted again after an RTO, while the rest of the 221 unacknowledged segments have successfully been delivered to the TCP 222 receiver before the RTO. Therefore the RTO cannot be declared 223 spurious in this case. 225 If the first acknowledgement after RTO does not acknowledge all of 226 the data that was retransmitted in step 1, the TCP sender must not 227 enter step 3 of this algorithm, but revert to the conventional RTO 228 recovery. Otherwise, a malicious receiver acknowledging partial 229 segments could cause the sender to declare the RTO spurious in a case 230 where data was lost. 232 The TCP sender is allowed to send two new segments in algorithm 233 branch (2b), because the conventional TCP sender would retransmit two 234 segments after one round-trip has elapsed since the RTO. If sending 235 new data is not possible in algorithm branch (2b), or the receiver 236 window limits the transmission, it has to send something in order to 237 prevent the TCP transfer from stalling. In that case the following 238 options are available for the sender. 240 - Continue with the conventional RTO recovery algorithm and do not 241 try to detect the spurious RTO. The disadvantage is that the sender 242 may do unnecessary retransmissions due to possible spurious RTO. On 243 the other hand, we believe that the benefits of detecting spurious 244 RTO in an application limited or receiver limited situations are 245 not very remarkable. 247 - Use additional information if available, e.g. TCP timestamps with 248 the Eifel Detection algorithm, for detecting a spurious RTO. 249 However, Eifel detection may yield different results from F-RTO 250 when ACK losses and a RTO occur within the same round-trip time 251 [SKR02]. 253 - Retransmit data from the tail of the retransmission queue and 254 continue with step 3 of the F-RTO algorithm. It is possible that 255 the retransmission is unnecessarily made, hence this option is not 256 encouraged, except for hosts that are known to operate in an 257 environment that is highly likely to have spurious RTOs. On the 258 other hand, with this method it is possible to avoid several 259 unnecessary retransmissions due to spurious RTO by doing only one 260 retransmission that may be unnecessary. 262 - Send a zero-sized segment below SND.UNA similar to TCP Keep-Alive 263 probe and continue with step 3 of the F-RTO algorithm. Since the 264 receiver replies with a duplicate ACK, the sender is able to detect 265 from the incoming acknowledgement whether the RTO was spurious. 266 While this method does not send data unnecessarily, it delays the 267 recovery by one round-trip time in cases where the RTO was not 268 spurious, and therefore is not encouraged. 270 - In receiver-limited cases, send one octet of new data regardless of 271 the advertised window limit, and continue with step 3 of the F-RTO 272 algorithm. It is possible that the receiver has free buffer space 273 to receive the data by the time the segment has propagated through 274 the network, in which case no harm is done. If the receiver is not 275 capable of receiving the segment, it rejects the segment, and sends 276 a duplicate ACK. 278 If the RTO is declared spurious, the TCP sender sets the value of 279 send_high variable to SND.UNA in order to disable the NewReno 280 "bugfix" [FH99]. The send_high variable was proposed for avoiding 281 unnecessary multiple fast retransmits when RTO expires during fast 282 recovery with NewReno TCP. As the sender has not retransmitted other 283 segments but the one that triggered RTO, the problem addressed by the 284 bugfix cannot occur. Therefore, if there are three duplicate ACKs 285 arriving at the sender after the RTO, they are likely to indicate a 286 packet loss, hence fast retransmit should be used to allow efficient 287 recovery. If there are not enough duplicate ACKs arriving at the 288 sender after a packet loss, the retransmission timer expires another 289 time and the sender enters step 1 of this algorithm. 291 When the RTO is declared spurious, the TCP sender cannot detect 292 whether the unnecessary RTO retransmission was lost. In principle the 293 loss of the RTO retransmission should be taken as a congestion 294 signal, and thus there is a small possibility that the F-RTO sender 295 violates the congestion control rules, if it chooses to fully revert 296 congestion control parameters after detecting a spurious RTO. The 297 Eifel detection algorithm has a similar property, while the DSACK 298 option can be used to detect whether the retransmitted segment was 299 successfully delivered to the receiver. 301 The F-RTO algorithm has a side-effect on the TCP round-trip time 302 measurement. Because the TCP sender can avoid most of the unnecessary 303 retransmissions after detecting a spurious RTO, the sender is able to 304 take round-trip time samples on the delayed segments. If the regular 305 RTO recovery was used without TCP timestamps, this would not be 306 possible due to retransmission ambiguity. As a result, the RTO 307 estimator is likely have more accurate and larger values with F-RTO 308 than with the regular TCP after a spurious RTO that was triggered due 309 to delayed segments. We believe this is an advantage in the networks 310 that are prone to delay spikes. 312 It is possible that the F-RTO algorithm does not always avoid 313 unnecessary retransmissions after a spurious RTO. If packet 314 reordering or packet duplication occurs on the segment that triggered 315 the spurious RTO, the F-RTO algorithm may not detect the spurious RTO 316 due to incoming duplicate ACKs. Additionally, if a spurious RTO 317 occurs during fast recovery, the F-RTO algorithm often cannot detect 318 the spurious RTO. However, we consider these cases relatively rare, 319 and note that in cases where F-RTO fails to detect the spurious RTO, 320 it performs similarly to the regular RTO recovery. 322 3. A SACK-enhanced version of the F-RTO algorithm 324 This section describes an alternative version of the F-RTO algorithm, 325 that makes use of TCP Selective Acknowledgement Option [MMFR96]. By 326 using the SACK option the TCP sender can detect spurious RTOs in most 327 of the cases when packet reordering or packet duplication is present. 328 The difference to the basic F-RTO algorithm is that the sender may 329 declare RTO spurious even when duplicate ACKs follow the RTO, if the 330 SACK blocks acknowledge new data that was not transmitted after RTO. 332 The algorithm principle presented in this section is also applicable 333 to be used with the SCTP protocol. 335 Given that the TCP Selective Acknowledgement Option [MMFR96] is 336 enabled for a TCP connection, TCP sender MAY implement the SACK- 337 enhanced F-RTO algorithm. If the sender applies the SACK-enhanced F- 338 RTO algorithm, it MUST follow the steps below. This algorithm SHOULD 339 NOT be applied, if the TCP sender is already in loss recovery when 340 RTO occurs. However, it should be possible to apply the principle of 341 F-RTO within certain limitations also when RTO occurs during existing 342 loss recovery. While this is a topic of further research, Appendix B 343 briefly discusses the related issues. 345 1) When RTO expires, the TCP sender SHOULD retransmit first 346 unacknowledged segment and set SpuriousRecovery to FALSE. Variable 347 "send_high" is set to indicate the highest segment transmitted so 348 far. 350 2) Wait until the acknowledgement for the segment retransmitted due 351 to RTO arrives at the sender. If duplicate ACKs arrive, store the 352 incoming SACK information but stay in step 2. If RTO expires, 353 restart the algorithm. 355 a) if the cumulative ACK acknowledges all segments up to 356 send_high, the TCP sender SHOULD revert to the conventional RTO 357 recovery and it MUST set congestion window to no more than 2 * 358 MSS. The sender does not enter step 3 of this algorithm. 360 b) otherwise, the TCP sender SHOULD transmit up to two new 361 (previously unsent) segments, within the limitations of the 362 congestion window. If the TCP sender is not able to transmit 363 any previously unsent data due to receiver window limitation or 364 because it does not have any new data to send, it MAY follow 365 one of the options presented in Section 2. However, if the TCP 366 sender chooses to retransmit a data segment here, SACK of that 367 segment MUST NOT be used for declaring a spurious RTO in step 368 (3b). 370 3) When the next acknowledgement arrives at the sender. 372 a) if the ACK acknowledges data above send_high, either in SACK 373 blocks or as a cumulative ACK, the sender MUST set congestion 374 window to no more than 3 * MSS and proceed with conventional 375 recovery, retransmitting unacknowledged segments. The sender 376 SHOULD take this branch also when the acknowledgement is a 377 duplicate ACK and it does not contain any new SACK blocks for 378 previously unacknowledged data below send_high. 380 b) if the ACK does not acknowledge data above send_high AND it 381 acknowledges some previously unacknowledged data below 382 send_high, the TCP sender SHOULD declare the RTO spurious and 383 set SpuriousRecovery to SPUR_TO. 385 If there are unacknowledged holes between the received SACK blocks, 386 those segments SHOULD be retransmitted similarly to the conventional 387 SACK recovery algorithm [BAFW03]. If the algorithm exits with 388 SpuriousRecovery set to SPUR_TO, send_high SHOULD be set to SND.UNA, 389 thus allowing fast recovery on incoming duplicate acknowledgements. 391 4. Taking Actions after Detecting Spurious RTO 393 Upon retransmission timeout, a conventional TCP sender assumes that 394 outstanding segments are lost and starts retransmitting the 395 unacknowledged segments. When the RTO is detected to be spurious, the 396 TCP sender should not start retransmitting based on the RTO. For 397 example, if the sender was in congestion avoidance phase transmitting 398 new previously unsent segments, it should continue transmitting 399 previously unsent segments after detecting spurious RTO. In addition, 400 it is suggested that the RTO estimation is reinitialized and the RTO 401 timer is adjusted to a more conservative value in order to avoid 402 subsequent spurious RTOs [LG03]. 404 Different approaches have been suggested for adjusting the congestion 405 control state after a spurious RTO. This document does not 406 specifically recommend any of the alternatives below, but considers 407 the response to spurious RTO as a subject of further research. 409 1) Revert the congestion control parameters to the state before the 410 RTO [LG03]. This appears to be a justified decision, because it is 411 similar to the situation in which the RTO did not expire 412 spuriously. However, two concerns exists with this approach: 413 First, some detection mechanisms, such as F-RTO or the Eifel 414 Detection algorithm, do not notice the loss of the spurious 415 retransmission, thus introducing a small risk of violation of the 416 congestion control principles. Second, a spurious RTO indicates 417 that some part of the network was unable to deliver packets for a 418 while, which can be considered as a potential indication of 419 congestion. 421 2) Reduce congestion window to half of its earlier value but revert 422 slow start threshold to its earlier value [SL03]. This 423 alternative takes measures to validate the congestion window after 424 a period during which no data has been transmitted. This would be 425 a justified action to take if the spurious RTO is assumed to be 426 caused due to changes in the network conditions, such as a change 427 in the available bandwidth or a wireless handoff to another point 428 of attachment in the network. 430 3) Reduce ssthresh and congestion window when detecting a spurious 431 RTO [SKR02]. For example, ssthresh and cwnd could be set to half 432 of their earlier values, as done with the other congestion 433 notification events. This alternative would be conservative enough 434 considering the possibility of not detecting a packet loss of the 435 RTO-triggered retransmission, but the TCP sender should avoid 436 reducing the congestion window more than once in a round-trip 437 time. Furthermore, if a spurious RTO occurs in the beginning of a 438 TCP connection, this alternative causes the slow start to be 439 canceled, which may sacrifice TCP performance. 441 5. SCTP Considerations 443 The SACK-enhanced F-RTO algorithm can be applied with the SCTP proto- 444 col. However, SCTP contains features that are not present with TCP 445 that need to be discussed when applying the F-RTO algorithm. 447 SCTP association can be multi-homed. The current retransmission pol- 448 icy states that retransmissions should go to alternative addresses. 449 If the retransmission was due to spurious RTO caused by a delay 450 spike, it is possible that the acknowledgement for the retransmission 451 arrives back at the sender before the acknowledgements of the origi- 452 nal transmissions arrive. If this happens, a possible loss of the 453 original transmission of the data chunk that was retransmitted due to 454 RTO may remain undetected when applying the F-RTO algorithm and there 455 was a delay spike that triggered the RTO. Because the RTO was caused 456 by the delay, and it was spurious in that respect, a suitable 457 response is to continue by sending new data. However, if the original 458 transmission was lost, fully reverting the congestion control parame- 459 ters is too aggressive. Therefore, taking conservative actions on 460 congestion control is recommended, if the SCTP association is multi- 461 homed and retransmissions go to alternative address. The information 462 in duplicate TSNs can be then used for reverting congestion control, 463 if desired [BA02]. 465 Note that the forward transmissions made in F-RTO algorithm step (2b) 466 should be destined to the primary address, since they are not 467 retransmissions. 469 When making a retransmission, a SCTP sender can bundle a number of 470 unacknowledged data chunks and include them in the same packet. This 471 needs to be considered when implementing F-RTO for SCTP. The basic 472 principle of F-RTO still holds: in order to declare the RTO spurious, 473 the sender must get an acknowledgement for a data chunk that was not 474 retransmitted after the RTO. In other words, acknowledgements of data 475 chunks that were bundled in RTO retransmission must not be used for 476 declaring the RTO spurious. 478 6. Security Considerations 480 The main security threat regarding F-RTO is the possibility of 481 receiver misleading the sender to set too large a congestion window 482 after an RTO. There are two possible ways a malicious receiver could 483 trigger a wrong output from the F-RTO algorithm. First, the receiver 484 can acknowledge data that it has not received. Second, it can delay 485 acknowledgement of a segment it has received earlier, and acknowledge 486 the segment after the TCP sender has been deluded to enter algorithm 487 step 3. 489 If the receiver acknowledges a segment it has not really received, 490 the sender can be lead to declare RTO spurious in F-RTO algorithm 491 step 3. However, since this causes the sender to have incorrect 492 state, it cannot retransmit the segment that has never reached the 493 receiver. Therefore, this attack is unlikely to be useful for the 494 receiver to maliciously gain a larger congestion window. 496 A common case of an RTO is that a fast retransmission of a segment is 497 lost. If all other segments have been received, the RTO retransmis- 498 sion causes the whole window to be acknowledged at once. This case is 499 recognized in F-RTO algorithm branch (2a). However, if the receiver 500 only acknowledges one segment after receiving the RTO retransmission, 501 and then the rest of the segments, it could cause the RTO to be 502 declared spurious when it is not. Therefore, it is suggested that 503 when an RTO expires during fast recovery phase, the sender would not 504 fully revert the congestion window even if the RTO was declared spu- 505 rious, but reduce the congestion window to 1. However, the sender can 506 take actions to avoid unnecessary retransmissions normally. If a TCP 507 sender implements a burst avoidance algorithm that limits the sending 508 rate to be no higher than in slow start, this precaution is not 509 needed, and the sender may apply F-RTO normally. 511 If there are more than one segments missing at the time when an RTO 512 occurs, the receiver does not benefit from misleading the sender to 513 declare a spurious RTO, because the sender would then have to go 514 through another recovery period to retransmit the missing segments, 515 usually after an RTO. 517 Acknowledgements 518 We are grateful to Reiner Ludwig, Andrei Gurtov, Josh Blanton, Mark 519 Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias 520 Rodriguez, Sourabh Ladha, and Martin Duke for the discussion and 521 feedback contributed to this text. 523 Normative References 525 [APS99] M. Allman, V. Paxson, and W. Stevens. TCP Congestion Con- 526 trol. RFC 2581, April 1999. 528 [BAFW03] E. Blanton, M. Allman, K. Fall, and L. Wang. A Conservative 529 Selective Acknowledgment (SACK)-based Loss Recovery Algo- 530 rithm for TCP. RFC 3517. April 2003. 532 [MMFR96] M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow. TCP Selec- 533 tive Acknowledgement Options. RFC 2018, October 1996. 535 [PA00] V. Paxson and M. Allman. Computing TCP's Retransmission 536 Timer. RFC 2988, November 2000. 538 [Pos81] J. Postel. Transmission Control Protocol. RFC 793, Septem- 539 ber 1981. 541 [Ste00] R. Stewart, et. al. Stream Control Transmission Protocol. 542 RFC 2960, October 2000. 544 Informative References 546 [ABF01] M. Allman, H. Balakrishnan, and S. Floyd. Enhancing TCP's 547 Loss Recovery Using Limited Transmit. RFC 3042, January 548 2001. 550 [BA02] E. Blanton and M. Allman. On Making TCP more Robust to 551 Packet Reordering. ACM Computer Communication Review, 552 32(1), January 2002. 554 [BBJ92] D. Borman, R. Braden, and V. Jacobson. TCP Extensions for 555 High Performance. RFC 1323, May 1992. 557 [FH99] S. Floyd and T. Henderson. The NewReno Modification to 558 TCP's Fast Recovery Algorithm. RFC 2582, April 1999. 560 [FMMP00] S. Floyd, J. Mahdavi, M. Mathis, and M. Podolsky. An Exten- 561 sion to the Selective Acknowledgement (SACK) Option to TCP. 562 RFC 2883, July 2000. 564 [GL02] A. Gurtov and R. Ludwig. Evaluating the Eifel Algorithm for 565 TCP in a GPRS Network. In Proc. of European Wireless, Flo- 566 rence, Italy, February 2002 568 [LG03] R. Ludwig and A. Gurtov. The Eifel Response Algorithm for 569 TCP. Internet draft "draft-ietf-tsvwg-tcp-eifel- 570 response-03.txt". March 2003. Work in progress. 572 [LK00] R. Ludwig and R.H. Katz. The Eifel Algorithm: Making TCP 573 Robust Against Spurious Retransmissions. ACM Computer Com- 574 munication Review, 30(1), January 2000. 576 [LM03] R. Ludwig and M. Meyer. The Eifel Detection Algorithm for 577 TCP. RFC 3522, April 2003. 579 [SKR02] P. Sarolahti, M. Kojo, and K. Raatikainen. F-RTO: A New 580 Recovery Algorithm for TCP Retransmission Timeouts. Univer- 581 sity of Helsinki, Dept. of Computer Science. Series of Pub- 582 lications C, No. C-2002-07. February 2002. Available at: 583 http://www.cs.helsinki.fi/research/iwtcp/papers/f-rto.ps 585 [SL03] Y. Swami and K. Le. DCLOR: De-correlated Loss Recovery 586 using SACK option for spurious timeouts. Internet draft 587 "draft-swami-tsvwg-tcp-dclor-01.txt". April 2003. Work in 588 progress. 590 Appendix A: Scenarios 592 This section discusses different scenarios where RTOs occur and how 593 the basic F-RTO algorithm performs in those scenarios. The 594 interesting scenarios are a sudden delay triggering RTO, loss of a 595 retransmitted packet during fast recovery, link outage causing the 596 loss of several packets, and packet reordering. A performance 597 evaluation with a more thorough analysis on a real implementation of 598 F-RTO is given in [SKR02]. 600 A.1. Sudden delay 602 The main motivation of F-RTO algorithm is to improve TCP performance 603 when a delay spike triggers a spurious retransmission timeout. The 604 example below illustrates the segments and acknowledgements 605 transmitted by the TCP end hosts when a spurious RTO occurs, but no 606 packets are lost. For simplicity, delayed acknowledgements are not 607 used in the example. The example below reduces the congestion window 608 and slow start threshold by half after detecting a spurious RTO. 610 ... 611 (cwnd = 6, 612 ssthresh < 6, 613 FlightSize = 5) 614 1. SEND 10 ----------------------------> 615 2. <---------------------------- ACK 6 616 3. SEND 11 ----------------------------> 617 4. | 618 [delay] 619 | 620 [RTO] 621 5. SEND 6 ----------------------------> 622 ---> 623 6. <---------------------------- ACK 7 624 [F-RTO step (2b)] 625 7. SEND 12 ----------------------------> 626 8. SEND 13 ----------------------------> 627 ---> 628 9. <---------------------------- ACK 8 629 [F-RTO step (3b)] 630 [SpuriousRecovery <- SPUR_TO] 631 [cwnd <- 3, ssthresh <- 3] 632 10. <---------------------------- ACK 9 633 11. <---------------------------- ACK 10 634 12. <---------------------------- ACK 11 635 13. SEND 14 ----------------------------> 636 ... 638 When a sudden delay long enough to trigger RTO occurs at step 4, the 639 TCP sender retransmits the first unacknowledged segment (step 5). 640 Because the next ACK covers the RTO retransmission because originally 641 transmitted segment 6 arrives at the receiver, the TCP sender 642 continues by sending two new data segments (steps 7, 8). Because the 643 second acknowledgement arriving after the RTO acknowledges data that 644 was not retransmitted due to RTO (step 9), the TCP sender declares 645 the RTO as spurious and continues by sending new data. Because the 646 TCP sender reduces cwnd when it detects the spurious RTO, it has to 647 wait for some outstanding segments to leave the network before it can 648 continue transmitting again at step 13. 650 A.2. Loss of a retransmission 652 If a retransmitted segment is lost, the only way to retransmit it 653 again is to wait for the RTO to trigger the retransmission. Once the 654 segment is successfully received, the receiver usually acknowledges 655 several segments at once, because other segments in the same window 656 have been successfully delivered before the retransmission arrives at 657 the receiver. The example below shows a scenario where retransmission 658 (of segment 6) is lost, as well as a later segment (segment 9) in the 659 same window. The limited transmit [ABF01] or SACK TCP [MMFR96] 660 enhancements are not in use in this example. 662 ... 663 (cwnd = 6, 664 ssthresh < 6, 665 FlightSize = 5) 666 667 668 1. SEND 10 ----------------------------> 669 2. <---------------------------- ACK 6 670 3. SEND 11 ----------------------------> 671 4. <---------------------------- ACK 6 672 5. <---------------------------- ACK 6 673 6. <---------------------------- ACK 6 674 7. SEND 6 --------------X 675 676 [ssthresh <- 3, cwnd <- ssthresh + 3 = 6] 677 8. <---------------------------- ACK 6 678 | 679 | 680 [RTO] 681 [ssthresh <- 2] 682 9. SEND 6 ----------------------------> 683 10. <---------------------------- ACK 9 684 [F-RTO step (2b)] 685 11. SEND 12 ----------------------------> 686 12. SEND 13 ----------------------------> 687 13. <---------------------------- ACK 9 688 [F-RTO step (3a)] 689 [SpuriousRecovery <- FALSE] 690 [cwnd <- 3] 691 14. SEND 9 ----------------------------> 692 15. SEND 10 ----------------------------> 693 16. SEND 11 ----------------------------> 694 17. <---------------------------- ACK 11 695 ... 697 In the example above, segment 6 is lost and the sender retransmits it 698 after three duplicate ACKs in step 7. However, the retransmission is 699 also lost, and the sender has to wait for the RTO to expire before 700 retransmitting it again. Because the first ACK following the RTO 701 acknowledges the RTO retransmission (step 10), the sender transmits 702 two new segments. The second ACK in step 13 does not acknowledge any 703 previously unacknowledged data. Therefore the F-RTO sender enters the 704 slow start and sets cwnd to 3 * MSS. Congestion window can be set to 705 three segments, because two round-trips have elapsed after the RTO. 706 After this the receiver acknowledges all segments transmitted prior 707 to entering recovery and the sender can continue transmitting new 708 data in congestion avoidance. 710 A.3. Link outage 712 The example below illustrates the F-RTO behavior when 4 consecutive 713 packets are lost in the network causing the TCP sender to fall back 714 to RTO recovery. Limited transmit and SACK are not used in this 715 example. 717 ... 718 (cwnd = 6, 719 ssthresh < 6, 720 FlightSize = 5) 721 722 1. SEND 10 ----------------------------> 723 2. <---------------------------- ACK 6 724 3. SEND 11 ----------------------------> 725 4. <---------------------------- ACK 6 726 | 727 | 728 [RTO] 729 [ssthresh <- 3] 730 5. SEND 6 ----------------------------> 731 6. <---------------------------- ACK 7 732 [F-RTO step (2b)] 733 7. SEND 12 ----------------------------> 734 8. SEND 13 ----------------------------> 735 9. <---------------------------- ACK 7 736 [F-RTO step (3a)] 737 [SpuriousRecovery <- FALSE] 738 [cwnd <- 3] 739 10. SEND 7 ----------------------------> 740 11. SEND 8 ----------------------------> 741 12. SEND 9 ----------------------------> 742 13. <---------------------------- ACK 14 744 Again, F-RTO sender transmits two new segments (steps 7 and 8) after 745 the RTO retransmission is acknowledged. Because the next ACK does not 746 acknowledge any data that was not retransmitted after the RTO (step 747 9), the F-RTO sender proceeds with conventional recovery and slow 748 start retransmissions. 750 A.4. Packet reordering 752 Since F-RTO modifies the TCP sender behavior only after a 753 retransmission timeout and it is intended to avoid unnecessary 754 retransmits only after spurious RTO, we limit the discussion on the 755 effects of packet reordering in F-RTO behavior to the cases where 756 packet reordering occurs immediately after the RTO. When the TCP 757 receiver gets an out-of-order segment, it generates a duplicate ACK. 758 If the TCP sender implements the basic F-RTO algorithm, this may 759 prevent the sender from detecting a spurious RTO. 761 However, if the TCP sender applies the SACK-enhanced F-RTO, it is 762 possible to detect a spurious RTO also when packet reordering occurs. 763 We illustrate the behavior of SACK-enhanced F-RTO below when segment 764 8 arrives before segments 6 and 7, and segments starting from segment 765 6 are delayed in the network. In this example the TCP sender reduces 766 the congestion window and slow start threshold in response to 767 spurious RTO. 769 ... 770 (cwnd = 6, 771 ssthresh < 6, 772 FlightSize = 5) 773 1. SEND 10 ----------------------------> 774 2. <---------------------------- ACK 6 775 3. SEND 11 ----------------------------> 776 4. | 777 [delay] 778 | 779 [RTO] 780 5. SEND 6 ----------------------------> 781 ---> 782 6. <---------------------------- ACK 6 783 [SACK 8] 784 [SACK F-RTO stays in step 2] 785 7. ---> 786 8. <---------------------------- ACK 7 787 [SACK 8] 788 [SACK F-RTO step (2b)] 789 9. SEND 12 ----------------------------> 790 10. SEND 13 ----------------------------> 791 11. ---> 792 12. <---------------------------- ACK 9 793 [SACK F-RTO step (3b)] 794 [SpuriousRecovery <- SPUR_TO] 795 [ssthresh <- 3, cwnd <- 3] 796 13. <---------------------------- ACK 10 797 14. <---------------------------- ACK 11 798 15. SEND 14 ----------------------------> 799 ... 801 After RTO expires and the sender retransmits segment 6 (step 5), the 802 receiver gets segment 8 and generates duplicate ACK with SACK for 803 segment 8. In response to the acknowledgement the TCP sender does not 804 send anything but stays in F-RTO step 2. Because the next 805 acknowledgement advances the cumulative ACK point (step 8), the 806 sender can transmit two new segments according to SACK-enhanced F- 807 RTO. The next segment acknowledges new data between 7 and 11 that was 808 not acknowledged earlier (segment 7), so the F-RTO sender declares 809 the RTO spurious. 811 Appendix B: Applying SACK-enhanced F-RTO when RTO occurs during loss 812 recovery 814 We believe that slightly modified SACK-enhanced F-RTO algorithm can 815 be used to detect spurious RTOs also when RTO occurs while an earlier 816 loss recovery is underway. However, there are issues that need to be 817 considered if F-RTO is applied in this case. 819 The original SACK-based F-RTO requires in algorithm step 3 that an 820 ACK acknowledges previously unacknowledged non-retransmitted data 821 between SND.UNA and send_high. If RTO takes place during earlier 822 (SACK-based) loss recovery, the F-RTO sender must only use 823 acknowledgements for non-retransmitted segments transmitted before 824 the SACK-based loss recovery started. This means that in order to 825 declare RTO spurious the TCP sender must receive an acknowledgement 826 for non-retransmitted segment between SND.UNA and RecoveryPoint in 827 algorithm step 3. RecoveryPoint is defined in conservative SACK- 828 recovery algorithm [BAFW03], and it is set to indicate the highest 829 segment transmitted so far when SACK-based loss recovery begins. In 830 other words, if the TCP sender receives acknowledgement for segment 831 that was transmitted more than one RTO ago, it can declare the RTO 832 spurious. Defining an efficient algorithm for checking these 833 conditions remains as a future work item. 835 When spurious RTO is detected according to the rules given above, it 836 may be possible that the response algorithm needs to consider this 837 case separately, for example in terms of what segments to retransmit 838 after RTO, and whether it is safe to revert the congestion control 839 parameters in this case. This is considered as a topic of future 840 research. 842 Authors' Addresses 844 Pasi Sarolahti 845 Nokia Research Center 846 P.O. Box 407 847 FIN-00045 NOKIA GROUP 848 Finland 849 Phone: +358 50 4876607 850 EMail: pasi.sarolahti@nokia.com 851 http://www.cs.helsinki.fi/u/sarolaht/ 853 Markku Kojo 854 University of Helsinki 855 Department of Computer Science 856 P.O. Box 26 857 FIN-00014 UNIVERSITY OF HELSINKI 858 Finland 860 Phone: +358 9 1914 4179 861 EMail: markku.kojo@cs.helsinki.fi