idnits 2.17.1 draft-ietf-tcpm-rfc4138bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 830. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 841. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 848. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 854. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (18 November 2007) is 6002 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC 5681) -- Duplicate reference: RFC2581, mentioned in 'APB07', was also mentioned in 'APS99'. ** Obsolete normative reference: RFC 2581 (ref. 'APB07') (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (ref. 'BAFW03') (Obsoleted by RFC 6675) ** Obsolete normative reference: RFC 3782 (ref. 'FHG04') (Obsoleted by RFC 6582) ** Obsolete normative reference: RFC 2988 (ref. 'PA00') (Obsoleted by RFC 6298) ** Obsolete normative reference: RFC 793 (ref. 'Pos81') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1323 (ref. 'BBJ92') (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 896 (ref. 'Nag84') (Obsoleted by RFC 7805) -- Duplicate reference: RFC4138, mentioned in 'SK05', was also mentioned in 'KYHS07'. -- Obsolete informational reference (is this intentional?): RFC 2960 (ref. 'Ste00') (Obsoleted by RFC 4960) Summary: 8 errors (**), 0 flaws (~~), 2 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force P. Sarolahti 2 INTERNET-DRAFT Nokia Research Center 3 draft-ietf-tcpm-rfc4138bis-01.txt M. Kojo 4 Expires: May 2008 University of Helsinki 5 K. Yamamoto 6 M. Hata 7 NTT Docomo 9 18 November 2007 11 Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 12 Spurious Retransmission Timeouts with TCP 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six 27 months and may be updated, replaced, or obsoleted by other documents 28 at any time. It is inappropriate to use Internet-Drafts as 29 reference material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on May 2008. 39 Abstract 41 Spurious retransmission timeouts cause suboptimal TCP performance 42 because they often result in unnecessary retransmission of the last 43 window of data. This document describes the F-RTO detection 44 algorithm for detecting spurious TCP retransmission timeouts. F-RTO 45 is a TCP sender-only algorithm that does not require any TCP options 46 to operate. After retransmitting the first unacknowledged segment 47 triggered by a timeout, the F-RTO algorithm of the TCP sender 48 monitors the incoming acknowledgments to determine whether the 49 timeout was spurious. It then decides whether to send new segments 50 or retransmit unacknowledged segments. The algorithm effectively 51 helps to avoid additional unnecessary retransmissions and thereby 52 improves TCP performance in the case of a spurious timeout. 54 Table of Contents 56 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3 57 1.1. Conventions and Terminology. . . . . . . . . . . . . . . 5 58 2. Basic F-RTO Algorithm . . . . . . . . . . . . . . . . . . . . 5 59 2.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . . 6 60 2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 7 61 3. SACK-Enhanced Version of the F-RTO Algorithm. . . . . . . . . 9 62 4. Taking Actions after Detecting Spurious RTO . . . . . . . . . 11 63 5. Evaluation of RFC 4138 and Differences to this 64 Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 65 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 66 7. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 14 67 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 68 A. Discussion of Window-Limited Cases. . . . . . . . . . . . . . 14 69 B. List of Changes . . . . . . . . . . . . . . . . . . . . . . . 15 70 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 71 Normative References . . . . . . . . . . . . . . . . . . . . . . 16 72 Informative References . . . . . . . . . . . . . . . . . . . . . 17 73 AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . 18 74 Full Copyright Statement . . . . . . . . . . . . . . . . . . . . 20 75 Intellectual Property. . . . . . . . . . . . . . . . . . . . . . 20 77 1. Introduction 79 The Transmission Control Protocol (TCP) [Pos81] has two methods for 80 triggering retransmissions. First, the TCP sender relies on 81 incoming duplicate ACKs, which indicate that the receiver is missing 82 some of the data. After a required number of successive duplicate 83 ACKs have arrived at the sender, it retransmits the first 84 unacknowledged segment [APS99] and continues with a loss recovery 85 algorithm such as NewReno [FHG04] or SACK-based loss recovery 86 [BAFW03]. Second, the TCP sender maintains a retransmission timer 87 which triggers retransmission of segments, if they have not been 88 acknowledged before the retransmission timeout (RTO) expires. When 89 the retransmission timeout occurs, the TCP sender enters the RTO 90 recovery where the congestion window is initialized to one segment 91 and unacknowledged segments are retransmitted using the slow-start 92 algorithm. The retransmission timer is adjusted dynamically, based 93 on the measured round-trip times [PA00]. 95 It has been pointed out that the retransmission timer can expire 96 spuriously and cause unnecessary retransmissions when no segments 97 have been lost [LK00, GL02, LM03]. After a spurious retransmission 98 timeout, the late acknowledgments of the original segments arrive at 99 the sender, usually triggering unnecessary retransmissions of a 100 whole window of segments during the RTO recovery. Furthermore, 101 after a spurious retransmission timeout, a conventional TCP sender 102 increases the congestion window on each late acknowledgment in slow 103 start. This injects a large number of data segments into the 104 network within one round-trip time, thus violating the packet 105 conservation principle [Jac88]. 107 There are a number of potential reasons for spurious retransmission 108 timeouts. First, some mobile networking technologies involve sudden 109 delay spikes on transmission because of actions taken during a hand- 110 off. Second, a hand-off may take place from a low latency path to a 111 high latency path, suddenly increasing the round-trip time beyond 112 the current RTO value. Third, on a low-bandwidth link the arrival 113 of competing traffic (possibly with higher priority), or some other 114 change in available bandwidth, can cause a sudden increase of the 115 round-trip time. This may trigger a spurious retransmission 116 timeout. A persistently reliable link layer can also cause a sudden 117 delay when a data frame and several retransmissions of it are lost 118 for some reason. This document does not distinguish between the 119 different causes of such a delay spike. Rather, it discusses the 120 spurious retransmission timeouts caused by a delay spike in general. 122 This document describes the F-RTO detection algorithm. It is based 123 on the detection mechanism of the "Forward RTO-Recovery" (F-RTO) 124 algorithm [SKR03] that is used for detecting spurious retransmission 125 timeouts and thus avoids unnecessary retransmissions following the 126 retransmission timeout. When the timeout is not spurious, the F-RTO 127 algorithm reverts back to the conventional RTO recovery algorithm, 128 and therefore has similar behavior and performance. In contrast to 129 alternative algorithms proposed for detecting unnecessary 130 retransmissions (Eifel [LK00], [LM03] and DSACK-based algorithms 131 [BA04]), F-RTO does not require any TCP options for its operation, 132 and it can be implemented by modifying only the TCP sender. The 133 Eifel algorithm uses TCP timestamps [BBJ92] for detecting a spurious 134 timeout upon arrival of the first acknowledgment after the 135 retransmission. The DSACK-based algorithms require that the TCP 136 Selective Acknowledgment Option [MMFR96], with the DSACK extension 137 [FMMP00], is in use. With DSACK, the TCP receiver can report if it 138 has received a duplicate segment, enabling the sender to detect 139 afterwards whether it has retransmitted segments unnecessarily. The 140 F-RTO algorithm only attempts to detect and avoid unnecessary 141 retransmissions after an RTO. Eifel and DSACK can also be used for 142 detecting unnecessary retransmissions caused by other events, such 143 as packet reordering. 145 When an RTO expires, the F-RTO sender retransmits the first 146 unacknowledged segment as usual [APS99]. Deviating from the normal 147 operation after a timeout, it then tries to transmit new, previously 148 unsent data for the first acknowledgment that arrives after the 149 timeout, given that the acknowledgment advances the window. If the 150 second acknowledgment that arrives after the timeout advances the 151 window (i.e., acknowledges data that was not retransmitted), the F- 152 RTO sender declares the timeout spurious and exits the RTO recovery. 153 However, if either of these two acknowledgments is a duplicate ACK, 154 there will not be sufficient evidence of a spurious timeout. 155 Therefore, the F-RTO sender retransmits the unacknowledged segments 156 in slow start similarly to the traditional algorithm. 158 With a SACK-enhanced version of the F-RTO algorithm, spurious 159 timeouts may be detected even if duplicate ACKs arrive after an RTO 160 retransmission. Even though this document only specifies F-RTO 161 algorithm for TCP, the algorithm can also be applied to the Stream 162 Control Transmission Protocol (SCTP) [Ste00] that has acknowledgment 163 and packet retransmission concepts similar to TCP. Considerations on 164 applying F-RTO for SCTP are discussed in RFC 4138 [SK05]. 166 This document is organized as follows. Section 2 describes the 167 basic F-RTO algorithm, and the SACK-enhanced F-RTO algorithm is 168 given in Section 3. Section 4 discusses the possible actions to be 169 taken after detecting a spurious RTO and Section 5 discusses the 170 security considerations. 172 1.1. Conventions and Terminology 174 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 175 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 176 document are to be interpreted as described in BCP 14, RFC 2119 177 [RFC2119] and indicate requirement levels for protocols. 179 2. Basic F-RTO Algorithm 181 A timeout is considered spurious if it would have been avoided had 182 the sender waited longer for an acknowledgment to arrive [LM03]. F- 183 RTO affects the TCP sender behavior only after a retransmission 184 timeout. Otherwise, the TCP behavior remains the same. When the 185 RTO expires, the F-RTO algorithm monitors incoming acknowledgments 186 and if the TCP sender gets an acknowledgment for a segment that was 187 not retransmitted due to timeout, the F-RTO algorithm declares a 188 timeout spurious. The actions taken in response to a spurious 189 timeout are not specified in this document, but we discuss some 190 alternatives in Section 4. This section introduces the algorithm 191 and then discusses the different steps of the algorithm in more 192 detail. 194 Following the practice used with the Eifel Detection algorithm 196 [LM03], we use the "SpuriousRecovery" variable to indicate whether 197 the retransmission is declared spurious by the sender. This 198 variable can be used as an input for a corresponding response 199 algorithm. With F-RTO, the value of SpuriousRecovery can be either 200 SPUR_TO (indicating a spurious retransmission timeout) or FALSE 201 (indicating that the timeout is not declared spurious), and the TCP 202 sender should follow the conventional RTO recovery algorithm. 204 2.1. The Algorithm 206 A TCP sender implementing the basic F-RTO algorithm MUST take the 207 following steps after the retransmission timer expires. If the 208 retransmission timer expires again during the execution of the F-RTO 209 algorithm, the TCP sender MUST re-start the algorithm processing 210 from step 1. If the sender implements some loss recovery algorithm 211 other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT 212 be entered when earlier fast recovery is underway. 214 The F-RTO algorithm takes different actions based on whether an 215 incoming acknowledgement advances the cumulative acknowledgement 216 point for an received in-order segment, or whether it is a duplicate 217 acknowledgement to indicate an out-of-order segment. Duplicate 218 acknowledgement is defined in [APB07]. The F-RTO algorithm does not 219 specify actions for receiving a segment that does not acknowledge 220 new data but is not a duplicate acknowledgement. The TCP sender 221 SHOULD ignore such segments and wait for a segment that either 222 acknowledges new data or is a duplicate acknowledgment. 224 1) When RTO expires, retransmit the first unacknowledged segment and 225 set SpuriousRecovery to FALSE. Also, store the highest sequence 226 number transmitted so far in variable "recover". 228 2) When the first acknowledgment after the RTO retransmission 229 arrives at the TCP sender, the TCP sender chooses one of the 230 following actions, depending on whether the ACK advances the 231 window or whether it is a duplicate ACK. 233 a) If the acknowledgment is a duplicate ACK OR the 234 Acknowledgement field covers "recover" but not more than 235 "recover" OR the acknowledgment does not acknowledge all of 236 the data that was retransmitted in step 1, revert to the 237 conventional RTO recovery and continue by retransmitting 238 unacknowledged data in slow start. Do not enter step 3 of 239 this algorithm. The SpuriousRecovery variable remains as 240 FALSE. 242 b) Else, if the acknowledgment advances the window AND the 243 Acknowledgement field does not cover "recover", transmit up to 244 two new (previously unsent) segments and enter step 3 of this 245 algorithm. If the TCP sender does not have enough unsent data, 246 it can send only one segment. In addition, the TCP sender MAY 247 override the Nagle algorithm [Nag84] and immediately send a 248 segment if needed. Note that sending two segments in this step 249 is allowed by TCP congestion control requirements [APS99]: An 250 F-RTO TCP sender simply chooses different segments to 251 transmit. 253 If the TCP sender does not have any new data to send, or the 254 advertised window prohibits new transmissions, the recommended 255 action is to skip step 3 of this algorithm and continue with 256 slow start retransmissions, following the conventional RTO 257 recovery algorithm. However, alternative ways of handling the 258 window-limited cases that could result in better performance 259 are discussed in Appendix A. 261 3) When the second acknowledgment after the RTO retransmission 262 arrives at the TCP sender, the TCP sender either declares the 263 timeout spurious, or starts retransmitting the unacknowledged 264 segments. 266 a) If the acknowledgment is a duplicate ACK, set the congestion 267 window to no more than 3 * MSS, and continue with the slow 268 start algorithm retransmitting unacknowledged segments. The 269 congestion window can be set to 3 * MSS, because two round- 270 trip times have elapsed since the RTO, and a conventional TCP 271 sender would have increased cwnd to 3 during the same time. 272 Leave SpuriousRecovery set to FALSE. 274 b) If the acknowledgment advances the window (i.e., if it 275 acknowledges data that was not retransmitted after the 276 timeout), declare the timeout spurious, set SpuriousRecovery 277 to SPUR_TO, and set the value of the "recover" variable to 278 SND.UNA (the oldest unacknowledged sequence number [Pos81]). 280 2.2. Discussion 282 The F-RTO sender takes cautious actions when it receives duplicate 283 acknowledgments after a retransmission timeout. Because duplicate 284 ACKs may indicate that segments have been lost, reliably detecting a 285 spurious timeout is difficult due to the lack of additional 286 information. Therefore, it is prudent to follow the conventional 287 TCP recovery in those cases. 289 If the first acknowledgment after the RTO retransmission covers the 290 "recover" point at algorithm step (2a), there is not enough evidence 291 that a non-retransmitted segment has arrived at the receiver after 292 the timeout. This is a common case when a fast retransmission is 293 lost and has been retransmitted again after an RTO, while the rest 294 of the unacknowledged segments were successfully delivered to the 295 TCP receiver before the retransmission timeout. Therefore, the 296 timeout cannot be declared spurious in this case. 298 If the first acknowledgment after the RTO retransmission does not 299 acknowledge all of the data that was retransmitted in step 1, the 300 TCP sender reverts to the conventional RTO recovery. Otherwise, a 301 malicious receiver acknowledging partial segments could cause the 302 sender to declare the timeout spurious in a case where data was 303 lost. 305 The TCP sender is allowed to send two new segments in algorithm 306 branch (2b) because the conventional TCP sender would transmit two 307 segments when the first new ACK arrives after the RTO 308 retransmission. If sending new data is not possible in algorithm 309 branch (2b), or if the receiver window limits the transmission, the 310 TCP sender has to send something in order to prevent the TCP 311 transfer from stalling. If no segments were sent, the pipe between 312 sender and receiver might run out of segments, and no further 313 acknowledgments would arrive. Therefore, in the window-limited 314 case, the recommendation is to revert to the conventional RTO 315 recovery with slow start retransmissions. Appendix A discusses some 316 alternative solutions for window-limited situations. 318 If the retransmission timeout is declared spurious, the TCP sender 319 sets the value of the "recover" variable to SND.UNA in order to 320 allow fast retransmit [FHG04]. The "recover" variable was proposed 321 for avoiding unnecessary, multiple fast retransmits when RTO expires 322 during fast recovery with NewReno TCP. Because the F-RTO sender 323 retransmits only the segment that triggered the timeout, the problem 324 of unnecessary multiple fast retransmits [FHG04] cannot occur. 325 Therefore, if three duplicate ACKs arrive at the sender after the 326 timeout, they probably indicate a packet loss, and thus fast 327 retransmit should be used to allow efficient recovery. If there are 328 not enough duplicate ACKs arriving at the sender after a packet 329 loss, the retransmission timer expires again and the sender enters 330 step 1 of this algorithm. 332 When the timeout is declared spurious, the TCP sender cannot detect 333 whether the unnecessary RTO retransmission was lost. In principle, 334 the loss of the RTO retransmission should be taken as a congestion 335 signal. Thus, there is a small possibility that the F-RTO sender 336 will violate the congestion control rules, if it chooses to fully 337 revert congestion control parameters after detecting a spurious 338 timeout. The Eifel detection algorithm has a similar property, 339 while the DSACK option can be used to detect whether the 340 retransmitted segment was successfully delivered to the receiver. 342 The F-RTO algorithm has a side-effect on the TCP round-trip time 343 measurement. Because the TCP sender can avoid most of the 344 unnecessary retransmissions after detecting a spurious timeout, the 345 sender is able to take round-trip time samples on the delayed 346 segments. If the regular RTO recovery was used without TCP 347 timestamps, this would not be possible due to the retransmission 348 ambiguity. As a result, the RTO is likely to have more accurate and 349 larger values with F-RTO than with the regular TCP after a spurious 350 timeout that was triggered due to delayed segments. We believe this 351 is an advantage in networks that are prone to delay spikes. 353 There are some situations where the F-RTO algorithm may not avoid 354 unnecessary retransmissions after a spurious timeout. If packet 355 reordering or packet duplication occurs on the segment that 356 triggered the spurious timeout, the F-RTO algorithm may not detect 357 the spurious timeout due to incoming duplicate ACKs. Additionally, 358 if a spurious timeout occurs during fast recovery, the F-RTO 359 algorithm often cannot detect the spurious timeout because the 360 segments that were transmitted before the fast recovery trigger 361 duplicate ACKs. However, we consider these cases rare, and note 362 that in cases where F-RTO fails to detect the spurious timeout, it 363 retransmits the unacknowledged segments in slow start, and thus 364 performs similarly to the regular RTO recovery. 366 3. SACK-Enhanced Version of the F-RTO Algorithm 368 This section describes an alternative version of the F-RTO algorithm 369 that uses the TCP Selective Acknowledgment Option [MMFR96]. By 370 using the SACK option, the TCP sender detects spurious timeouts in 371 most of the cases when packet reordering or packet duplication is 372 present. If the SACK blocks acknowledge new data that was not 373 transmitted after the RTO retransmission, the sender may declare the 374 timeout spurious, even when duplicate ACKs follow the RTO. 376 Given that the TCP Selective Acknowledgment Option [MMFR96] is 377 enabled for a TCP connection, a TCP sender MAY implement the SACK- 378 enhanced F-RTO algorithm. If the sender applies the SACK-enhanced 379 F-RTO algorithm, it MUST follow the steps below. This algorithm 380 SHOULD NOT be applied if the TCP sender is already in SACK loss 381 recovery when retransmission timeout occurs. 383 The steps of the SACK-enhanced version of the F-RTO algorithm are as 384 follows. If the retransmission timer expires again during the 385 execution of the SACK-enhanced F-RTO algorithm, the TCP sender MUST 386 re-start the algorithm processing from step 1. 388 1) When the RTO expires, retransmit the first unacknowledged segment 389 and set SpuriousRecovery to FALSE. Set variable "RecoveryPoint" 390 to indicate the highest segment transmitted so far. Following the 391 recommendation in SACK specification [MMFR96], reset the SACK 392 scoreboard. 394 2) Wait until the acknowledgment of the data retransmitted due to 395 the timeout arrives at the sender. If duplicate ACKs arrive 396 before the cumulative acknowledgment for retransmitted data, 397 adjust the scoreboard according to the incoming SACK information. 398 Stay in step 2 and wait for the next new acknowledgment. If RTO 399 expires again, go to step 1 of the algorithm. 401 a) if the Cumulative Acknowledgement field covers "RecoveryPoint" 402 but not more than "RecoveryPoint", revert to the conventional 403 RTO recovery and set the congestion window to no more than 2 * 404 MSS, like a regular TCP would do. Do not enter step 3 of this 405 algorithm. 407 b) else, if the Cumulative Acknowledgement field does not cover 408 "RecoveryPoint" but is larger than SND.UNA, transmit up to two 409 new (previously unsent) segments and proceed to step 3. If 410 the TCP sender is not able to transmit any previously unsent 411 data -- either due to receiver window limitation or because it 412 does not have any new data to send -- the recommended action 413 is to refrain from entering step 3 of this algorithm. Rather, 414 continue with slow start retransmissions following the 415 conventional RTO recovery algorithm. 417 It is also possible to apply some of the alternatives for 418 handling window-limited cases discussed in Appendix A. 420 3) The next acknowledgment arrives at the sender. Either a 421 duplicate ACK or a new cumulative ACK (advancing the window) 422 applies in this step. Other types of ACKs are ignored without any 423 action. 425 a) if the Cumulative Acknowledgement field or a SACK block covers 426 more than "RecoveryPoint", set the congestion window to no 427 more than 3 * MSS and proceed with the conventional RTO 428 recovery, retransmitting unacknowledged segments. Take this 429 branch also when the acknowledgment is a duplicate ACK and it 430 does not acknowledge any new, previously unacknowledged data 431 below "RecoveryPoint" in the SACK blocks. Leave 432 SpuriousRecovery set to FALSE. 434 b) if the Cumulative Acknowledgement field or a SACK block in the 435 ACK does not cover more than "RecoveryPoint" AND it 436 acknowledges data that was not acknowledged earlier (either 437 with cumulative acknowledgment or using SACK blocks), declare 438 the timeout spurious and set SpuriousRecovery to SPUR_TO. The 439 retransmission timeout can be declared spurious, because the 440 segment acknowledged with this ACK was transmitted before the 441 timeout. 443 If there are unacknowledged holes between the received SACK blocks, 444 those segments are retransmitted similarly to the conventional SACK 445 recovery algorithm [BAFW03]. If the algorithm exits with 446 SpuriousRecovery set to SPUR_TO, "RecoveryPoint" is set to SND.UNA, 447 thus allowing fast recovery on incoming duplicate acknowledgments. 449 The SACK enhanced algorithm works on the same principle as the basic 450 algorithm, but by utilizing the additional information from the SACK 451 option. When a genuine retransmission timeout occurs during a steady 452 state of a connection, it can be assumed that there are no segments 453 left in the pipe. Otherwise, the acknowledgments triggered by these 454 segments would have triggered the SACK loss recovery or transmission 455 of new segments. Therefore, if the F-RTO sender receives 456 acknowledgements for segments transmitted before the retransmission 457 timeout in response to the two new segments sent at the algorithm 458 step 2, the normal operation of TCP has been just delayed, and the 459 retransmission timeout is considered spurious. Note that this 460 reasoning works only when the TCP sender is not in SACK loss 461 recovery at the time the retransmission timeout occurs. 463 4. Taking Actions after Detecting Spurious RTO 465 Upon a retransmission timeout, a conventional TCP sender assumes 466 that outstanding segments are lost and starts retransmitting the 467 unacknowledged segments. When the retransmission timeout is 468 detected to be spurious, the TCP sender should not continue 469 retransmitting based on the timeout. For example, if the sender was 470 in congestion avoidance phase transmitting new, previously unsent 471 segments, it should continue transmitting previously unsent segments 472 in congestion avoidance. 474 There are currently two alternatives specified for a spurious 475 timeout response algorithm, the Eifel Response Algorithm [LG04], and 476 an algorithm for adapting the retransmission timeout after a 477 spurious RTO [BBA06]. If no specific response algorithm is 478 implemented, the TCP SHOULD respond to spurious timeout 479 conservatively, applying the TCP congestion control specification 480 [APS99]. Different response algorithms for spurious retransmission 481 timeouts have been analyzed in some research papers [GL03, Sar03] 482 and IETF documents [SL03]. 484 5. Evaluation of RFC 4138 and Differences to this Document 486 F-RTO was first specified in an Experimental RFC 4138 that has been 487 implemented in a number of operating systems since it was published. 488 Gained experience has been documented in a separate document 489 [KYHS07], and can be summarized as follows. 491 If the TCP sender employs F-RTO, it is able to detect spurious RTOs 492 and avoid the unnecessary retransmission of the whole window of 493 data. Because F-RTO avoids the unnecessary retransmissions after a 494 spurious RTO, it is able to adhere to the packet conservation 495 principle, unlike a regular TCP that enters the slow-start recovery 496 unnecessarily an inappropriately restarts the ACK clock while there 497 are segments outstanding in the network. When a spurious RTO has 498 been detected, a sender can select an appropriate congestion control 499 response instead of setting the congestion window to one segment. 500 Because F-RTO avoids unnecessary retransmissions, it is able to take 501 the RTT of the delayed segments into account when calculating the 502 RTO estimate, which may help in avoiding further spurious 503 retransmission timeouts. 505 Experimental results with the basic F-RTO have been reported in an 506 emulated network using a Linux implementation [SKR03]. Also 507 different congestion control responses along with the SACK-enhanced 508 version of F-RTO were tested in a similar environment [Sar03]. There 509 are publications analyzing F-RTO performance over commercial W-CDMA 510 networks, and in an emulated HSDPA network [Yam05, Hok05]. Also 511 Microsoft reported positive experiences with their implementation of 512 F-RTO in the IETF-68 meeting. 514 It is known that some spurious RTOs may remain undetected by F-RTO 515 if duplicate acknowledgements arrive at the sender immediately after 516 the spurious RTO, for example due to packet reordering or packet 517 loss. There are rare corner cases where F-RTO could "hide" a packet 518 loss and therefore lead to inappropriate behavior with non- 519 conservative congestion control response: first, if a massive packet 520 reordering occurred so that the acknowledgement of RTO 521 retransmission arrived at the sender before the acknowledgments of 522 original transmissions, the sender might not detect the loss of the 523 segment that triggered the RTO. Second, a malicious receiver could 524 lead F-RTO to make a wrong conclusion after an RTO by acknowledging 525 segments it has not received. Such receiver would, however, risk 526 breaking the consistency of the TCP state between the sender and 527 receiver, causing the connection to become unusable, which cannot be 528 of any benefit to the receiver. Therefore we believe it is not 529 likely that receivers would start employing such tricks in a 530 significant scale. Finally, loss of the unnecessary RTO 531 retransmission cannot be detected without using some explicit 532 acknowledgement scheme such as DSACK. This is common to the other 533 mechanisms for detecting spurious RTO, as well as to regular TCP 534 that does not use DSACK. We note that if the congestion control 535 response to spurious RTO is conservative enough, the above corner 536 cases do not cause problems due to increased congestion. 538 6. Security Considerations 540 The main security threat regarding F-RTO is the possibility that a 541 receiver could mislead the sender into setting too large a 542 congestion window after an RTO. There are two possible ways a 543 malicious receiver could trigger a wrong output from the F-RTO 544 algorithm. First, the receiver can acknowledge data that it has not 545 received. Second, it can delay acknowledgment of a segment it has 546 received earlier, and acknowledge the segment after the TCP sender 547 has been deluded to enter algorithm step 3. 549 If the receiver acknowledges a segment it has not really received, 550 the sender can be led to declare spurious timeout in the F-RTO 551 algorithm, step 3. However, because the sender will have an 552 incorrect state, it cannot retransmit the segment that has never 553 reached the receiver. Therefore, this attack is unlikely to be 554 useful for the receiver to maliciously gain a larger congestion 555 window. 557 A common case for a retransmission timeout is that a fast 558 retransmission of a segment is lost. If all other segments have 559 been received, the RTO retransmission causes the whole window to be 560 acknowledged at once. This case is recognized in F-RTO algorithm 561 branch (2a). However, if the receiver only acknowledges one segment 562 after receiving the RTO retransmission, and then the rest of the 563 segments, it could cause the timeout to be declared spurious when it 564 is not. Therefore, it is suggested that, when an RTO expires during 565 the fast recovery phase, the sender would not fully revert the 566 congestion window even if the timeout was declared spurious. 567 Instead, the sender would reduce the congestion window to 1. 569 If there is more than one segment missing at the time of a 570 retransmission timeout, the receiver does not benefit from 571 misleading the sender to declare a spurious timeout because the 572 sender would have to go through another recovery period to 573 retransmit the missing segments, usually after an RTO has elapsed. 575 7. Acknowledgements 577 The authors would like to thank Alfred Hoenes and Ilpo Jarvinen for 578 the comments on this document. 580 We are also thankful to Reiner Ludwig, Andrei Gurtov, Josh Blanton, 581 Mark Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias 582 Rodriguez, Sourabh Ladha, Martin Duke, Motoharu Miyake, Ted Faber, 583 Samu Kontinen, and Kostas Pentikousis who gave valuable feedback 584 during the preparation of RFC 4138, the precursor of this document. 586 Appendix 588 A. Discussion of Window-Limited Cases 590 When the advertised window limits the transmission of two new 591 previously unsent segments, or there are no new data to send, it is 592 recommended in F-RTO algorithm step (2b) that the TCP sender 593 continue with the conventional RTO recovery algorithm. The 594 disadvantage is that the sender may continue unnecessary 595 retransmissions due to possible spurious timeout. This section 596 briefly discusses the options that can potentially improve 597 performance when transmitting previously unsent data is not 598 possible. 600 - The TCP sender could reserve an unused space of a size of one or 601 two segments in the advertised window to ensure the use of 602 algorithms such as F-RTO or Limited Transmit [ABF01] in receiver 603 window-limited situations. On the other hand, while doing this, 604 the TCP sender should ensure that the window of outstanding 605 segments is large enough for proper utilization of the available 606 pipe. 608 - Use additional information if available, e.g., TCP timestamps with 609 the Eifel Detection algorithm, for detecting a spurious timeout. 610 However, Eifel detection may yield different results from F-RTO 611 when ACK losses and an RTO occur within the same round-trip time 613 [SKR03]. 615 - Retransmit data from the tail of the retransmission queue and 616 continue with step 3 of the F-RTO algorithm. It is possible that 617 the retransmission will be made unnecessarily. Furthermore, the 618 operation of the SACK-based F-RTO algorithm would need to consider 619 this case separately, to not use the retransmitted segment to 620 indicate spurious timeout. Given these considerations, this option 621 is not recommended. 623 - Send a zero-sized segment below SND.UNA, similar to TCP Keep-Alive 624 probe, and continue with step 3 of the F-RTO algorithm. Because 625 the receiver replies with a duplicate ACK, the sender is able to 626 detect whether the timeout was spurious from the incoming 627 acknowledgment. This method does not send data unnecessarily, but 628 it delays the recovery by one round-trip time in cases where the 629 timeout was not spurious. Therefore, this method is not 630 encouraged. 632 - In receiver-limited cases, send one octet of new data, regardless 633 of the advertised window limit, and continue with step 3 of the F- 634 RTO algorithm. It is possible that the receiver will have free 635 buffer space to receive the data by the time the segment has 636 propagated through the network, in which case no harm is done. If 637 the receiver is not capable of receiving the segment, it rejects 638 the segment and sends a duplicate ACK. 640 B. List of Changes 642 Changes between different document versions are summarized below, 643 apart from minor editing and language improvements. 645 Changes from draft-ietf-tcpm-rfc4138bis-00: 647 * Added back the original SACK-algorithm from RFC 4138 after the 648 common feedback to have the SACK-algorithm in the document. 649 Clarified the algorithm a bit, and added one paragraph of 650 description of the basic idea of the algorithm. 652 * Clarified behavior on multiple timeouts. 654 * Added a paragraph on acknowledgements that do not acknowledge new 655 data but are not duplicate acknowledgements 657 Changes from RFC 4138: 659 * Removed description of the SACK-enhanced algorithm 661 * Removed SCTP considerations 663 * Removed earlier Appendix sections, except Appendix C from RFC 664 4138, which is now Appendix A 666 * Clarified text about the possible response algorithms 668 * Added section that summarizes the evaluation of RFC 4138 670 References 672 Normative References 674 [APS99] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion 675 Control", RFC 2581, April 1999. 677 [APB07] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 678 Control", Internet-Draft "draft-ietf-tcpm- 679 rfc2581bis-03.txt", 680 September 2007. 682 [BAFW03] Blanton, E., Allman, M., Fall, K., and L. Wang, "A 683 Conservative Selective Acknowledgment (SACK)-based Loss 684 Recovery Algorithm for TCP", RFC 3517, April 2003. 686 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 687 Requirement Levels", BCP 14, RFC 2119, March 1997. 689 [FHG04] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 690 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 691 April 2004. 693 [MMFR96] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 694 Selective Acknowledgement Options", RFC 2018, 695 October 1996. 697 [PA00] Paxson, V. and M. Allman, "Computing TCP's Retransmission 698 Timer", RFC 2988, November 2000. 700 [Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 701 793, September 1981. 703 Informative References 705 [ABF01] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing 706 TCP's Loss Recovery Using Limited Transmit", RFC 3042, 707 January 2001. 709 [BA04] Blanton, E. and M. Allman, "Using TCP Duplicate Selective 710 Acknowledgement (DSACKs) and Stream Control Transmission 711 Protocol (SCTP) Duplicate Transmission Sequence Numbers 712 (TSNs) to Detect Spurious Retransmissions", RFC 3708, 713 February 2004. 715 [BBA06] J. Blanton, E. Blanton, and M. Allman. Using Spurious 716 Retransmissions to Adapt the Retransmission Timeout, 717 Internet-Draft "draft-allman-rto-backoff-04.txt", December 718 2006. Work in progress. 720 [BBJ92] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions 721 for High Performance", RFC 1323, May 1992. 723 [FMMP00] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 724 Extension to the Selective Acknowledgement (SACK) Option 725 for TCP", RFC 2883, July 2000. 727 [GL02] A. Gurtov and R. Ludwig. Evaluating the Eifel Algorithm 728 for TCP in a GPRS Network. In Proc. of European Wireless, 729 Florence, Italy, February 2002. 731 [GL03] A. Gurtov and R. Ludwig, Responding to Spurious Timeouts 732 in TCP. In Proceedings of IEEE INFOCOM 03, San Francisco, 733 CA, USA, March 2003. 735 [Jac88] V. Jacobson. Congestion Avoidance and Control. In 736 Proceedings of ACM SIGCOMM 88. 738 [Hok05] A. Hokamura, et al. "Performance Evaluation of F-RTO and 739 Eifel Response Algorithms over W-CDMA packet network". 740 Wireless Personal Multimedia Communications (WPMC'05), 741 Sept. 2005. 743 [KYHS07] M. Kojo, K. Yamamoto, M. Hata, and P. Sarolahti. 744 Evaluation of RFC 4138. 745 Internet-draft "draft-kojo-tcpm-frto-eval-00.txt", 746 June 2007. Work in progress. 748 [LG04] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 749 for TCP", RFC 4015, February 2005. 751 [LK00] R. Ludwig and R.H. Katz. The Eifel Algorithm: Making TCP 752 Robust Against Spurious Retransmissions. ACM SIGCOMM 753 Computer Communication Review, 30(1), January 2000. 755 [LM03] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 756 for TCP", RFC 3522, April 2003. 758 [Nag84] Nagle, J., "Congestion Control in IP/TCP Internetworks", 759 RFC 896, January 1984. 761 [SK05] P. Sarolahti and M. Kojo, "Forward RTO-Recovery (F-RTO): 762 An Algorithm for Detecting Spurious Retransmission 763 Timeouts with TCP and the Stream Control Transmission 764 Protocol (SCTP), RFC 4138, August 2005. 766 [SKR03] P. Sarolahti, M. Kojo, and K. Raatikainen. F-RTO: An 767 Enhanced Recovery Algorithm for TCP Retransmission 768 Timeouts. ACM SIGCOMM Computer Communication Review, 769 33(2), April 2003. 771 [Sar03] P. Sarolahti. Congestion Control on Spurious TCP 772 Retransmission Timeouts. In Proceedings of IEEE Globecom 773 2003, San Francisco, CA, USA. December 2003. 775 [SL03] Y. Swami and K. Le, "DCLOR: De-correlated Loss Recovery 776 using SACK Option for Spurious Timeouts", Expired 777 Internet-Draft, September 2003. 779 [Ste00] R. Stewart, et. al. Stream Control Transmission Protocol, 780 RFC 2960, October 2000. 782 [Yam05] K. Yamamoto, et al. "Effects of F-RTO and Eifel Response 783 Algorithms for W-CDMA and HSDPA networks". Wireless 784 Personal Multimedia Communications (WPMC'05), 785 Sept. 2005. 787 AUTHORS' ADDRESSES 789 Pasi Sarolahti 790 Nokia Research Center 791 P.O. Box 407 792 FI-00045 NOKIA GROUP 793 Finland 794 Phone: +358 50 4876607 795 Email: pasi.sarolahti@nokia.com 796 Markku Kojo 797 University of Helsinki 798 P.O. Box 68 799 FI-00014 UNIVERSITY OF HELSINKI 800 Finland 801 Email: kojo@cs.helsinki.fi 803 Kazunori Yamamoto 804 NTT Docomo, Inc. 805 3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan 806 Phone: +81-46-840-3812 807 Email: yamamotokaz@nttdocomo.co.jp 809 Max Hata 810 NTT Docomo, Inc. 811 3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan 812 Phone: +81-46-840-3812 813 Email: hatama@s1.nttdocomo.co.jp 815 Full Copyright Statement 817 Copyright (C) The IETF Trust (2007). 819 This document is subject to the rights, licenses and restrictions 820 contained in BCP 78, and except as set forth therein, the authors 821 retain all their rights. 823 This document and the information contained herein are provided on 824 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 825 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 826 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 827 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 828 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 829 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 830 FOR A PARTICULAR PURPOSE. 832 Intellectual Property 834 The IETF takes no position regarding the validity or scope of any 835 Intellectual Property Rights or other rights that might be claimed 836 to pertain to the implementation or use of the technology described 837 in this document or the extent to which any license under such 838 rights might or might not be available; nor does it represent that 839 it has made any independent effort to identify any such rights. 840 Information on the procedures with respect to rights in RFC 841 documents can be found in BCP 78 and BCP 79. 843 Copies of IPR disclosures made to the IETF Secretariat and any 844 assurances of licenses to be made available, or the result of an 845 attempt made to obtain a general license or permission for the use 846 of such proprietary rights by implementers or users of this 847 specification can be obtained from the IETF on-line IPR repository 848 at http://www.ietf.org/ipr. 850 The IETF invites any interested party to bring to its attention any 851 copyrights, patents or patent applications, or other proprietary 852 rights that may cover technology that may be required to implement 853 this standard. Please address the information to the IETF at ietf- 854 ipr@ietf.org.