idnits 2.17.1 draft-sarolahti-tsvwg-tcp-frto-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 135: '..., the TCP sender SHOULD retransmit fir...' RFC 2119 keyword, line 139: '.... The TCP sender MAY postpone adjustin...' RFC 2119 keyword, line 154: '..., the TCP sender SHOULD revert to the ...' RFC 2119 keyword, line 157: '... The sender MUST set cwnd to 1 * ...' RFC 2119 keyword, line 168: '...h, the TCP sender MAY transmit two new...' (26 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2026' is mentioned on line 16, but not defined ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2988 (ref. 'PA00') (Obsoleted by RFC 6298) ** Obsolete normative reference: RFC 793 (ref. 'Pos81') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1323 (ref. 'BBJ92') (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 2582 (ref. 'FH99') (Obsoleted by RFC 3782) Summary: 6 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force P. Sarolahti 3 INTERNET DRAFT Nokia Research Center 4 File: draft-sarolahti-tsvwg-tcp-frto-03.txt M. Kojo 5 University of Helsinki 6 January, 2003 7 Expires: July, 2003 9 F-RTO: A TCP RTO Recovery Algorithm for 10 Avoiding Unnecessary Retransmissions 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of [RFC2026]. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract 35 Spurious retransmission timeouts (RTOs) cause suboptimal TCP 36 performance, because they often result in unnecessary retransmission 37 of the last window of data. This document describes the "Forward RTO 38 Recovery" (F-RTO) algorithm for detecting spurious TCP RTOs. F-RTO is 39 a TCP sender only algorithm that does not require any TCP options to 40 operate. After retransmitting the first unacknowledged segment 41 triggered by an RTO, the F-RTO algorithm at a TCP sender monitors the 42 incoming acknowledgements to determine whether the timeout was 43 spurious and to decide whether to send new segments or retransmit 44 unacknowledged segments. The algorithm effectively helps to avoid 45 additional unnecessary retransmissions and thereby improves TCP 46 performance in case of a spurious timeout. 48 1. Introduction 50 The TCP protocol [Pos81] has two methods for triggering 51 retransmissions. Primarily, the TCP sender relies on incoming 52 duplicate ACKs, which indicate that the receiver is missing some of 53 the data. After a required amount of successive duplicate ACKs have 54 arrived at the sender, it retransmits the first unacknowledged 55 segment [APS99]. Secondarily, the TCP sender maintains a 56 retransmission timer which triggers retransmission of segments, if 57 they have not been acknowledged within the retransmission timer 58 expiration period. When the retransmission timer expires, the 59 congestion window is initialized to one segment and unacknowledged 60 segments are retransmitted using the slow-start algorithm. The 61 retransmission timer is adjusted dynamically based on the measured 62 round-trip times [PA00]. 64 It has been pointed out that the retransmission timer can expire 65 spuriously and trigger unnecessary retransmissions when no segments 66 have been lost [GL02]. After a spurious RTO the acknowledgements of 67 original segments arrive at the sender, usually triggering 68 unnecessary retransmissions of whole window of segments during the 69 RTO recovery. Furthermore, after a spurious RTO a conventional TCP 70 sender increases the congestion window in slow start, injecting a 71 large number of data segments to the network within one round-trip 72 time. 74 There are a number of potential reasons for spurious RTOs. First, 75 some mobile networking technologies involve sudden delay peaks on 76 transmission because of actions taken during a hand-off. Second, 77 arrival of competing traffic, possibly with higher priority, on a 78 low-bandwidth link or some other change in available bandwidth 79 involves a sudden increase of round-trip time which may trigger a 80 spurious retransmission timeout. A persistently reliable link layer 81 can also cause a sudden delay when several data frames are lost for 82 some reason. This document does not distinguish the different causes 83 of such a delay, but discusses the spurious RTO caused by delay in 84 general. 86 This document describes an alternative RTO recovery algorithm called 87 "Forward RTO-Recovery" (F-RTO) to be used for detecting spurious RTO 88 and thus avoiding unnecessary retransmissions following the RTO. When 89 the RTO is not spurious, the F-RTO algorithm reverts back to the 90 conventional RTO recovery algorithm and should have similar 91 performance. F-RTO does not require any TCP options in its operation, 92 and it can be implemented by modifying only the TCP sender. This is 93 different from alternative algorithms (Eifel [LK00] and DSACK-based 94 algorithms [BA02]) that have been suggested for detecting unnecessary 95 retransmissions. The Eifel algorithm uses TCP timestamps for 96 detecting a spurious timeout and the DSACK-based algorithms require 97 that the SACK option with DSACK extension [FMMP00] is in use. With 98 DSACK, the TCP receiver can report if it has received a duplicate 99 segment, making it possible for the sender to detect afterwards 100 whether it has made unnecessary retransmissions. 102 When an RTO occurs, the F-RTO sender retransmits the first 103 unacknowledged segment normally. If the next two acknowledgements 104 advance the window, the F-RTO sender continues sending new data and 105 exits the recovery. However, if either of the next two 106 acknowledgements is a duplicate ACK, there was no sufficient evidence 107 of spurious RTO; therefore the F-RTO sender retransmits the 108 unacknowledged segments in slow start similarly to the traditional 109 algorithm. The F-RTO algorithm only attempts to avoid unnecessary 110 retransmissions after a RTO. Eifel can also be used in avoiding 111 unnecessary retransmissions in other events, for example due to 112 packet reordering. 114 This document is organized as follows. Section 2 describes the basic 115 F-RTO algorithm. Section 3 outlines an optional enhancement to the F- 116 RTO algorithm that takes leverage on the TCP Selective Acknowledgment 117 Option [MMFR96] and Section 4 presents an alternative of F-RTO that 118 uses the TCP timestamp option. Section 5 discusses the possible 119 actions to be taken after detecting a spurious RTO, and Section 6 120 discusses the security considerations. 122 2. F-RTO Algorithm 124 The F-RTO algorithm affects the TCP sender behavior only after a 125 retransmission timeout. Otherwise the TCP behavior remains 126 unmodified. This section describes a basic version of the F-RTO 127 algorithm that does not require TCP options to work. The actions 128 taken in response to spurious RTO are not described in this document, 129 but we discuss the different alternatives for congestion control in 130 Section 5. 132 When the retransmission timer expires, the F-RTO algorithm takes the 133 following steps at the TCP sender. 135 1) When RTO expires, the TCP sender SHOULD retransmit first 136 unacknowledged segment. 138 The highest sequence number transmitted so far is stored in 139 variable "send_high". The TCP sender MAY postpone adjusting the 140 congestion control parameters for the next two incoming ACKs, 141 until it has got more input on whether the RTO was spurious or 142 not. If the TCP sender adjusts the congestion control parameters 143 at this point, it may store the earlier values of the parameters 144 to be able to restore the values when it detects that the RTO was 145 spurious. 147 2) When the first acknowledgement after the RTO arrives at the 148 sender, the sender chooses the following actions depending on 149 whether the ACK advances the window or whether it is a duplicate 150 ACK. 152 a) If the acknowledgement is a duplicate ACK OR it is 153 acknowledging a sequence number equal or above to the value of 154 send_high, the TCP sender SHOULD revert to the conventional 155 recovery and not enter step 3 of this algorithm. 157 The sender MUST set cwnd to 1 * MSS. This duplicate ACK is 158 triggered by a segment that was sent before the RTO 159 retransmission. This is possible, for example, if the RTO 160 expired during fast recovery while forward transmissions are 161 triggering duplicate ACKs. Furthermore, if a segment 162 retransmitted during fast recovery is lost, it needs to be 163 retransmitted again by retransmission timer. In this case it is 164 also possible that the duplicate ACK is triggered by a new 165 segment transmitted during the fast recovery before the RTO. 167 b) If the acknowledgement advances the window AND it is below the 168 value of send_high, the TCP sender MAY transmit two new 169 (previously unsent) segments. 171 Sending two new segments at this point is equally aggressive to 172 the conventional RTO recovery algorithm, which would have 173 increased its cwnd to 2 * MSS when the first valid ACK arrives 174 after RTO. It is possible that the sender can transmit only one 175 new segment at this time, because the receiver window limits 176 it, or because the TCP sender does not have more data to send. 177 This does not prevent the algorithm from working. In any case, 178 the TCP sender SHOULD transmit at least one segment, either new 179 data or from the retransmission queue. If the sender 180 retransmits the next unacknowledged segment, it MUST NOT enter 181 the step 3 of this algorithm, but continue retransmitting 182 similarly to the conventional RTO recovery algorithm. 184 If the first acknowledgement after RTO does not acknowledge all 185 of the data that was retransmitted in step 1, the TCP sender 186 MUST NOT enter step 3 of this algorithm. Otherwise, a malicious 187 receiver acknowledging partial segments could cause the sender 188 to declare the RTO spurious in a case where data was lost. When 189 receiving an acknowledgement for a partial segment, the TCP 190 sender SHOULD revert to conventional RTO recovery. 192 3) When the second acknowledgement after the RTO arrives at the 193 sender, either declare the RTO spurious, or start retransmitting 194 the unacknowledged segments. 196 a) If the acknowledgement is a duplicate ACK, the TCP sender MUST 197 set congestion window to no more than 3 * MSS, and continue 198 with the slow start algorithm retransmitting unacknowledged 199 segments. 201 The duplicate ACK indicates that at least one segment other 202 than the segment which triggered RTO is lost in the last window 203 of data. There is no sufficient evidence that the RTO was 204 spurious. Therefore, the sender proceeds with retransmissions 205 similarly to the conventional RTO recovery algorithm, with the 206 send_high variable stored when the retransmission timer expired 207 to avoid unnecessary fast retransmits. 209 b) If the acknowledgement advances the window and acknowledges 210 data beyond the highest sequence number that was retransmitted 211 on RTO, the TCP sender SHOULD declare the RTO spurious. 213 Because the TCP sender has retransmitted only one segment after 214 the RTO, this acknowledgement indicates that an originally 215 transmitted segment has arrived at the receiver. This is 216 regarded as a strong indication of a spurious RTO. The TCP 217 sender should not assume that the unacknowledged segments are 218 lost, and it should continue by sending new previously unsent 219 segments. 221 If this algorithm branch is taken, the TCP sender SHOULD set 222 the value of send_high variable to SND.UNA in order to disable 223 the Reno "bugfix" [FH99]. The send_high variable was proposed 224 for avoiding unnecessary multiple fast retransmits when RTO 225 expires during fast recovery with NewReno TCP. As the sender 226 has not retransmitted other segments but the one that triggered 227 RTO, the problem addressed by the bugfix cannot occur. 228 Therefore, if there are duplicate ACKs arriving at the sender 229 after the RTO, they are likely to indicate a packet loss, hence 230 fast retransmit should be used to allow efficient recovery. If 231 there are not enough duplicate ACKs arriving at the sender 232 after a packet loss, the retransmission timer expires another 233 time and the sender enters step 1 of this algorithm. 235 If the TCP sender does not have any new data to send in algorithm 236 branch (2b), or the receiver window limits the transmission, the 237 sender SHOULD revert back to retransmitting unacknowledged data 238 similarly to the regular TCP. The motivation for this is to ensure 239 that the flow of segments into the network does not stop. In the 240 worst case that would result in additional RTO significantly 241 degrading the TCP performance. The TCP sender could try to proceed 242 with the F-RTO algorithm by alternatively transmitting one segment 243 from the tail of the retransmission queue, if it is not possible to 244 transmit new data in algorithm step (2b). Another option would be to 245 transmit data beyond the advertised receiver window. If the RTO was 246 spurious, the receiver is likely to be able to store the segment at 247 the time when it arrives. However, the current recommendation is to 248 revert to the conventional RTO recovery if sending new data is not 249 possible, because we believe the benefits of doing otherwise are not 250 very remarkable. 252 After the RTO is declared spurious, the TCP sender cannot detect if 253 the unnecessary RTO retransmission was lost. In principle the loss of 254 the RTO retransmission should be taken as a congestion signal, and 255 thus there is a small possibility that the F-RTO sender violates the 256 congestion control rules, if it chooses to fully revert congestion 257 control parameters after detecting a spurious RTO. The Eifel 258 detection algorithm has a similar property, but the DSACK option can 259 be used to detect whether the retransmitted segment was successfully 260 delivered to the receiver. 262 The F-RTO algorithm has a side-effect on the TCP round-trip time 263 measurement. Because the TCP sender avoids most of the unnecessary 264 retransmissions after a spurious RTO, the sender is able to take 265 round-trip time samples of the delayed segments. This would not be 266 possible due to retransmission ambiguity, if the regular RTO recovery 267 is used without TCP timestamps. As a result, the RTO estimator is 268 likely have larger values with F-RTO than with the regular TCP after 269 the spurious RTO. We believe this is an advantage in the networks 270 that are prone to delay spikes. 272 It is possible that the F-RTO algorithm does not always avoid 273 unnecessary retransmissions after spurious RTO. If packet reordering 274 or packet duplication occurs on the segment that triggered the 275 spurious RTO, the F-RTO algorithm may not detect the spurious RTO. 276 Additionally, if a spurious RTO occurs during fast recovery, the F- 277 RTO algorithm often cannot detect the spurious RTO. However, we 278 consider these cases relatively rare, and note that in cases where F- 279 RTO fails to detect the spurious RTO, it performs similarly to the 280 regular RTO recovery. 282 3. A SACK-enhanced version of the F-RTO algorithm 284 This section describes an alternative version of the F-RTO algorithm, 285 that makes use of TCP Selective Acknowledgement Option [MMFR96]. By 286 using the SACK option the TCP sender can detect spurious RTOs in most 287 of the cases when packet reordering or packet duplication is present, 288 or when the TCP sender is under loss recovery. The difference to the 289 basic F-RTO algorithm is that the sender may declare RTO spurious 290 even when duplicate ACKs follow the RTO, if the SACK blocks 291 acknowledge new data that was not transmitted after RTO. 293 DCLOR is a related TCP enhancement that uses SACK option for avoiding 294 unnecessary retransmissions after a spurious RTO [SL02]. However, 295 DCLOR is different from F-RTO in that it does not declare the RTO 296 spurious before all segments outstanding when the RTO occurs have 297 been acknowledged. 299 The SACK-enhanced F-RTO algorithm takes the following steps: 301 1) When RTO expires, the TCP sender SHOULD retransmit first 302 unacknowledged segment. 304 The TCP sender should also store the highest sequence number 305 transmitted in variable "send_high". 307 2) The first acknowledgement after RTO arrives at the sender. 309 a) if the cumulative ACK acknowledges all segments up to send_high 310 stored in algorithm step 1, the TCP sender SHOULD revert to the 311 conventional RTO recovery and it MUST set congestion window to 312 no more than 2 * MSS. The sender does not enter step 3 of this 313 algorithm. 315 b) otherwise, the TCP sender MAY transmit two new segments. If the 316 TCP sender does not transmit any previously unsent data, it 317 MUST NOT enter step 3 of this algorithm, but revert to the 318 conventional RTO recovery. 320 3) The second acknowledgement after RTO arrives at the sender. 322 a) if the ACK acknowledges data above send_high, either in SACK 323 blocks or as a cumulative ACK, the sender MUST set congestion 324 window to no more than 3 * MSS and proceed with slow start, 325 retransmitting unacknowledged segments. The sender SHOULD take 326 this branch also when the acknowledgement is a duplicate ACK 327 and it does not contain any new SACK blocks for previously 328 unacknowledged data below send_high. 330 b) if the ACK does not acknowledge data above send_high and some 331 previously unacknowledged data below send_high is acknowledged, 332 the TCP sender SHOULD declare the RTO spurious. 334 If there are unacknowledged holes between the received SACK 335 blocks, those segments SHOULD be retransmitted similarly to the 336 conventional SACK recovery algorithm. In addition, send_high 337 should be set to its earlier value, since no loss recovery was 338 needed due to the RTO. 340 As with the basic version of the F-RTO algorithm, in step (2b) the 341 sender may transmit only one segment if the receiver window does not 342 allow more, or there are no more application data. 344 4. On using the TCP timestamps with F-RTO 346 The basic F-RTO algorithm suggests applying the conventional RTO 347 recovery if the receiver window or application limits the 348 transmission of new previously unsent data, and in such a case it is 349 possible that the F-RTO algorithm cannot be used to detect a spurious 350 RTO. The F-RTO sender can avoid the need of transmitting new 351 previously unsent segments after RTO, if it has TCP timestamps 352 [BBJ92] available. The Eifel detection algorithm [LK00] describes how 353 the TCP timestamps can be used to avoid unnecessary retransmissions 354 after a spurious RTO. However, if the RTO is declared spurious based 355 on the timestamp echoed with the first acceptable ACK following the 356 RTO, the TCP sender may falsely declare the RTO spurious and continue 357 by transmitting new data when the RTO was caused by loss of 358 acknowledgements. The Eifel algorithm may signal spurious RTO 359 falsely, if the first data segment retransmitted after RTO was not 360 lost, but the corresponding acknowledgement was, and the 361 acknowledgement does not include DSACK option [FMMP00]. If sender and 362 receiver implement DSACK, this problem can be avoided. 364 An alternative algorithm for detecting spurious RTOs by using TCP 365 timestamps without DSACK is described below. When TCP timestamps are 366 available, the F-RTO sender MAY apply the following algorithm. 368 1) When RTO expires, retransmit first unacknowledged segment and 369 store the timestamp of retransmitted segment in variable 370 "RetransmitTS". Store the highest sequence number transmitted so 371 far in variable "send_high". 373 2) Wait until the first ACK that acknowledges previously 374 unacknowledged data arrives at the sender. If duplicate ACKs 375 arrive, they are processed normally while the sender stays in this 376 step of the algorithm. 378 a) if the timestamp echoed with the ACK is later or equal than 379 what is stored in "RetransmitTS", the TCP sender SHOULD revert 380 to the conventional RTO recovery and it MUST NOT enter step 3 381 of this algorithm. The sender should adjust the congestion 382 window according to the standard congestion control rules. 384 b) if the timestamp echoed with the first ACK is earlier than what 385 is stored in "RetransmitTS", the TCP sender SHOULD transmit the 386 first unacknowledged segment and enter step 3 of this 387 algorithm. 389 3) When the next acknowledgement arrives at the sender, it SHOULD 390 apply one of the following branches of the algorithm. 392 a) if the timestamp echoed with the ACK is later or equal than 393 what is stored in "RetransmitTS", or if the acknowledgement is 394 duplicate ACK, the TCP sender SHOULD revert to the conventional 395 RTO recovery. The TCP sender MUST set the congestion window to 396 no more than 2 * MSS. 398 b) if the timestamp echoed with the ACK is earlier than what is 399 stored in "RetransmitTS", the TCP sender SHOULD declare the RTO 400 spurious. send_high SHOULD be set to the value of SND.UNA to 401 cancel the NewReno bugfix, as described in Section 2. 403 The drawback of this algorithm compared to the original Eifel 404 detection is that the above-presented algorithm can make two 405 unnecessary retransmissions instead of one. In addition, packet 406 reordering, packet duplication, or packet loss for the next segment 407 after the one that triggered RTO may prevent the detection of 408 spurious RTO. Therefore, it may be desirable to apply the basic F- 409 RTO or the SACK-enhanced version of the F-RTO algorithm whenever the 410 sender is able to transmit previously unsent data when the first ACK 411 after RTO arrives. However, we believe the algorithm above 412 effectively avoids false spurious RTO signals. 414 5. Taking Actions after Detecting Spurious RTO 416 Upon retransmission timeout, a conventional TCP sender assumes that 417 outstanding segments are lost and starts retransmitting the 418 unacknowledged segments. When the RTO is detected to be spurious, the 419 TCP sender should not start retransmitting based on the RTO. For 420 example, if the sender was in congestion avoidance phase transmitting 421 new previously unsent segments, it should continue transmitting 422 previously unsent segments after detecting spurious RTO. In addition, 423 it is suggested that the RTO estimation is reinitialized and the RTO 424 timer is adjusted to a more conservative value in order to avoid 425 subsequent spurious RTOs [LG02]. 427 Different approaches have been suggested for adjusting the congestion 428 control state after a spurious RTO. This document does not recommend 429 any of the alternatives below, but considers the response to spurious 430 RTO as a subject of further research. 432 1) Revert the congestion control parameters to the state before the 433 RTO [LG02]. This appears to be a justified decision, because it is 434 similar to the situation in which the RTO did not expire 435 spuriously. However, we identified two concerns in this approach: 436 First, some detection mechanisms, such as F-RTO or the Eifel 437 Detection algorithm, do not notice the loss of the spurious 438 retransmission, thus introducing a small risk of violation of the 439 congestion control principles. Second, a spurious RTO indicates 440 that some part of the network was unable to deliver packets for a 441 while, which can be considered as a potential indication of 442 congestion. 444 2) Reduce ssthresh and congestion window when detecting a spurious 445 RTO [SKR02]. For example, ssthresh and cwnd could be set to half 446 of their earlier values, as done with the other congestion 447 notification events. This alternative would be conservative enough 448 considering the possibility of not detecting a packet loss of the 449 RTO-triggered retransmission, but the TCP sender should avoid 450 reducing the congestion window more than once in a round-trip 451 time. 453 3) Reset congestion window to one segment and proceed with slow 454 start, once the pipe is assumed to be empty from earlier packets 455 [SL02]. This would be a justified action to take if the spurious 456 RTO is assumed to be caused due to changes in the network 457 conditions, such as a change in the available bandwidth or a 458 wireless handoff to another point in the network. Disadvantage of 459 this alternative is that it is rather inefficient on a network 460 paths with high delay, and on the other hand, it may result in 461 slow start overshoot. 463 6. Security Considerations 465 No additional security threats on TCP due to the F-RTO algorithm are 466 known. 468 Acknowledgements 469 We are grateful to Reiner Ludwig, Andrei Gurtov, Josh Blanton, Mark 470 Allman, Sally Floyd, Yogesh Swami, and Mika Liljeberg for the 471 discussion and feedback contributed to this text. 473 Normative References 475 [APS99] M. Allman, V. Paxson, and W. Stevens. TCP Congestion Con- 476 trol. RFC 2581, April 1999. 478 [MMFR96] M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow. TCP Selec- 479 tive Acknowledgement Options. RFC 2018, October 1996. 481 [PA00] V. Paxson and M. Allman. Computing TCP's Retransmission 482 Timer. RFC 2988, November 2000. 484 [Pos81] J. Postel. Transmission Control Protocol. RFC 793, Septem- 485 ber 1981. 487 Informative References 489 [ABF01] M. Allman, H. Balakrishnan, and S. Floyd. Enhancing TCP's 490 Loss Recovery Using Limited Transmit. RFC 3042, January 491 2001. 493 [BA02] E. Blanton and M. Allman. On Making TCP more Robust to 494 Packet Reordering. ACM Computer Communication Review, 495 32(1), January 2002. 497 [BBJ92] D. Borman, R. Braden, and V. Jacobson. TCP Extensions for 498 High Performance. RFC 1323, May 1992. 500 [FH99] S. Floyd and T. Henderson. The NewReno Modification to 501 TCP's Fast Recovery Algorithm. RFC 2582, April 1999. 503 [FMMP00] S. Floyd, J. Mahdavi, M. Mathis, and M. Podolsky. An Exten- 504 sion to the Selective Acknowledgement (SACK) Option to TCP. 505 RFC 2883, July 2000. 507 [GL02] A. Gurtov and R. Ludwig. Evaluating the Eifel Algorithm for 508 TCP in a GPRS Network. In Proc. of European Wireless, Flo- 509 rence, Italy, February 2002 511 [LG02] R. Ludwig and A. Gurtov. The Eifel Response Algorithm for 512 TCP. Internet draft "draft-ietf-tsvwg-tcp-eifel- 513 response-02.txt". December 2002. Work in progress. 515 [LK00] R. Ludwig and R.H. Katz. The Eifel Algorithm: Making TCP 516 Robust Against Spurious Retransmissions. ACM Computer Com- 517 munication Review, 30(1), January 2000. 519 [SKR02] P. Sarolahti, M. Kojo, and K. Raatikainen. F-RTO: A New 520 Recovery Algorithm for TCP Retransmission Timeouts. Univer- 521 sity of Helsinki, Dept. of Computer Science. Series of Pub- 522 lications C, No. C-2002-07. February 2002. Available at: 523 http://www.cs.helsinki.fi/research/iwtcp/papers/f-rto.ps 525 [SL02] Y. Swami and K. Le. DCLOR: De-correlated Loss Recovery 526 using SACK option for spurious timeouts. Internet draft 527 "draft-swami-tsvwg-tcp-dclor-00.txt". November 2002. Work 528 in progress. 530 Appendix A: Scenarios 532 This section discusses different scenarios where RTOs occur and how 533 the basic F-RTO algorithm performs in those scenarios. The 534 interesting scenarios are a sudden delay triggering RTO, loss of a 535 retransmitted packet during fast recovery, link outage causing the 536 loss of several packets, and packet reordering. A performance 537 evaluation with a more thorough analysis on a real implementation of 538 F-RTO is given in [SKR02]. 540 A.1. Sudden delay 542 An unexpectedly long delay can trigger an RTO, should it occur on a 543 single packet blocking the following packets, or appear as increased 544 RTTs for several successive packets. The example below illustrates 545 the sequence of packets and acknowledgements seen by the TCP sender 546 that follows the F-RTO algorithm, when a sudden delay occurs 547 triggering RTO but no packets are lost. For simplicity, delayed 548 acknowledgements are not used in the example. 550 ... (cwnd = 6, ssthresh < 6, FlightSize = 5) 551 1. SEND(10) 552 2. ACK(6) 553 3. SEND(11) 554 4. (set ssthresh <- 3) 555 5. SEND(6) 556 6. ACK(7) 557 7. SEND(12) 558 8. SEND(13) 559 9. ACK(8) (set cwnd <- 3, FlightSize = 6) 560 10. ACK(9) (cwnd = 3, FlightSize = 5) 561 11. ACK(10) (cwnd = 3, FlightSize = 4) 562 12. ACK(11) (cwnd = 4, FlightSize = 3) 563 13. SEND(14) 564 ... 566 When a sudden delay long enough to trigger RTO occurs at step 4, the 567 TCP sender retransmits the first unacknowledged segment (step 5). 568 Because the next ACK advances the cumulative ACK point, the TCP 569 sender continues by sending two new data segments (steps 7, 8) and 570 adjusts cwnd to 3 MSS. Because the second acknowledgement arriving 571 after the RTO also advances the cumulative ACK point, the TCP sender 572 exits the recovery and continues with the congestion avoidance. From 573 this point on the retransmissions are invoked either by fast 574 retransmit or when triggered by the retransmission timer. Because the 575 TCP sender reduces cwnd when receiving the first ACK after RTO and 576 sends the two new data segments at steps 7 and 8, it has to wait 577 until the FlightSize is reduced to the level of congestion window 578 before it can continue transmitting again at step 13. 580 A.2. Loss of a retransmission 582 If a retransmitted segment is lost, the only way to retransmit it 583 again is to wait for the RTO to trigger the retransmission. Once the 584 segment is successfully received, the receiver usually acknowledges 585 several segments cumulatively. The example below shows a scenario 586 where retransmission (of segment 6) is lost, as well as a later 587 segment (segment 9) in the same window. The limited transmit [ABF01] 588 or SACK TCP [MMFR96] enhancements are not in use in this example. 590 ... (cwnd = 6, ssthresh < 6, FlightSize = 5) 591 592 1. SEND(10) 593 2. ACK(6) 594 3. SEND(11) 595 4. ACK(6) 596 5. ACK(6) 597 6. ACK(6) 598 7. SEND(6) (set cwnd <- 6, set ssthresh <- 3) 599 600 8. ACK(6) 601 9. (set ssthresh <- 2) 602 10. SEND(6) 603 11. ACK(9) 604 12. SEND(12) 605 13. SEND(13) 606 14. ACK(9) (set cwnd <- 3) 607 15. SEND(9) 608 16. SEND(10) 609 17. SEND(11) 610 18. ACK(11) 611 ... 613 In the example above, segment 6 is lost and the sender retransmits it 614 after three duplicate ACKs in step 7. However, the retransmission is 615 also lost, and the sender has to wait for the RTO to expire before 616 retransmitting it again. Because the first ACK following the RTO 617 advances the cumulative ACK point (step 11), the sender transmits two 618 new segments. The second ACK in step 14 does not advance the 619 cumulative ACK point, and the sender enters the slow start, sets cwnd 620 to 3 * MSS, and retransmits the next three unacknowledged segments, 621 as per the F-RTO algorithm description given in Section 2. After this 622 the receiver acknowledges all segments transmitted prior to entering 623 recovery and the sender can continue transmitting new data in 624 congestion avoidance. 626 A.3. Link outage 628 A performance study shows that F-RTO performs similarly to the 629 regular recovery when consecutive packets are lost both up- and 630 downstream as a result of link outage, triggering an RTO [SKR02]. If 631 the RTO was not spurious but some data was actually lost, one of the 632 next two ACKs after RTO does not advance the cumulative ACK point 633 when RTO was caused by data loss, because the basic F-RTO retransmits 634 only one segment after RTO. As a result, F-RTO sender continues by 635 retransmitting unacknowledged segments similarly to the conventional 636 RTO recovery. 638 A.4. Packet reordering 640 Since F-RTO modifies the TCP sender behavior only after a 641 retransmission timeout and it is intended to avoid unnecessary 642 retransmits only after spurious RTO, we limit the discussion on the 643 effects of packet reordering in F-RTO behavior to the cases where 644 packet reordering occurs immediately after RTO. We consider the 645 retransmission timeout due to packet reordering to be very rare case, 646 since reordering often triggers fast retransmit due to duplicate ACKs 647 caused by out-of-order segments. Should packet reordering occur after 648 an RTO, duplicate ACKs arrive to the sender, taking the F-RTO 649 algorithm to retransmit in slow start as a regular RTO recovery would 650 do. Although this might not be the correct action, it is similar to 651 the behavior of the regular TCP, making F-RTO a safe modification 652 also in the presence of reordering. 654 Authors' Addresses 656 Pasi Sarolahti 657 Nokia Research Center 658 P.O. Box 407 659 FIN-00045 NOKIA GROUP 660 Finland 662 Phone: +358 50 4876607 663 EMail: pasi.sarolahti@nokia.com 664 http://www.cs.helsinki.fi/u/sarolaht/ 666 Markku Kojo 667 University of Helsinki 668 Department of Computer Science 669 P.O. Box 26 670 FIN-00014 UNIVERSITY OF HELSINKI 671 Finland 673 Phone: +358 9 1914 4179 674 EMail: markku.kojo@cs.helsinki.fi