idnits 2.17.1 draft-ietf-tcpm-rfc4138bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 865. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 876. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 883. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 889. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (14 July 2008) is 5764 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC 5681) -- Duplicate reference: RFC2581, mentioned in 'APB07', was also mentioned in 'APS99'. ** Obsolete normative reference: RFC 2581 (ref. 'APB07') (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (ref. 'BAFW03') (Obsoleted by RFC 6675) ** Obsolete normative reference: RFC 3782 (ref. 'FHG04') (Obsoleted by RFC 6582) ** Obsolete normative reference: RFC 2988 (ref. 'PA00') (Obsoleted by RFC 6298) ** Obsolete normative reference: RFC 793 (ref. 'Pos81') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1323 (ref. 'BBJ92') (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 896 (ref. 'Nag84') (Obsoleted by RFC 7805) -- Duplicate reference: RFC4138, mentioned in 'SK05', was also mentioned in 'KYHS07'. -- Obsolete informational reference (is this intentional?): RFC 2960 (ref. 'Ste00') (Obsoleted by RFC 4960) Summary: 8 errors (**), 0 flaws (~~), 2 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force P. Sarolahti 2 INTERNET-DRAFT Nokia Research Center 3 draft-ietf-tcpm-rfc4138bis-02.txt M. Kojo 4 Expires: January 2009 University of Helsinki 5 K. Yamamoto 6 M. Hata 7 NTT Docomo 9 14 July 2008 11 Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 12 Spurious Retransmission Timeouts with TCP 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six 27 months and may be updated, replaced, or obsoleted by other documents 28 at any time. It is inappropriate to use Internet-Drafts as 29 reference material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on January 2009. 39 Abstract 41 Spurious retransmission timeouts cause suboptimal TCP performance 42 because they often result in unnecessary retransmission of the last 43 window of data. This document describes the F-RTO detection 44 algorithm for detecting spurious TCP retransmission timeouts. F-RTO 45 is a TCP sender-only algorithm that does not require any TCP options 46 to operate. After retransmitting the first unacknowledged segment 47 triggered by a timeout, the F-RTO algorithm of the TCP sender 48 monitors the incoming acknowledgments to determine whether the 49 timeout was spurious. It then decides whether to send new segments 50 or retransmit unacknowledged segments. The algorithm effectively 51 helps to avoid additional unnecessary retransmissions and thereby 52 improves TCP performance in the case of a spurious timeout. 54 Table of Contents 56 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3 57 1.1. Conventions and Terminology. . . . . . . . . . . . . . . 5 58 2. Basic F-RTO Algorithm . . . . . . . . . . . . . . . . . . . . 5 59 2.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . . 6 60 2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 8 61 3. SACK-Enhanced Version of the F-RTO Algorithm. . . . . . . . . 10 62 4. Taking Actions after Detecting Spurious RTO . . . . . . . . . 12 63 5. Evaluation of RFC 4138 and Differences to this 64 Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 65 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 66 7. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 14 67 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 68 A. Discussion of Window-Limited Cases. . . . . . . . . . . . . . 15 69 B. List of Changes . . . . . . . . . . . . . . . . . . . . . . . 16 70 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 71 Normative References . . . . . . . . . . . . . . . . . . . . . . 17 72 Informative References . . . . . . . . . . . . . . . . . . . . . 17 73 AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . 19 74 Full Copyright Statement . . . . . . . . . . . . . . . . . . . . 21 75 Intellectual Property. . . . . . . . . . . . . . . . . . . . . . 21 77 1. Introduction 79 The Transmission Control Protocol (TCP) [Pos81] has two methods for 80 triggering retransmissions. First, the TCP sender relies on 81 incoming duplicate ACKs, which indicate that the receiver is missing 82 some of the data. After a required number of successive duplicate 83 ACKs have arrived at the sender, it retransmits the first 84 unacknowledged segment [APS99] and continues with a loss recovery 85 algorithm such as NewReno [FHG04] or SACK-based loss recovery 86 [BAFW03]. Second, the TCP sender maintains a retransmission timer 87 which triggers retransmission of segments, if they have not been 88 acknowledged before the retransmission timeout (RTO) expires. When 89 the retransmission timeout occurs, the TCP sender enters the RTO 90 recovery where the congestion window is initialized to one segment 91 and unacknowledged segments are retransmitted using the slow-start 92 algorithm. The retransmission timer is adjusted dynamically, based 93 on the measured round-trip times [PA00]. 95 It has been pointed out that the retransmission timer can expire 96 spuriously and cause unnecessary retransmissions when no segments 97 have been lost [LK00, GL02, LM03]. After a spurious retransmission 98 timeout, the late acknowledgments of the original segments arrive at 99 the sender, usually triggering unnecessary retransmissions of a 100 whole window of segments during the RTO recovery. Furthermore, 101 after a spurious retransmission timeout, a conventional TCP sender 102 increases the congestion window on each late acknowledgment in slow 103 start. This injects a large number of data segments into the 104 network within one round-trip time, thus violating the packet 105 conservation principle [Jac88]. 107 There are a number of potential reasons for spurious retransmission 108 timeouts. First, some mobile networking technologies involve sudden 109 delay spikes on transmission because of actions taken during a hand- 110 off. Second, a hand-off may take place from a low latency path to a 111 high latency path, suddenly increasing the round-trip time beyond 112 the current RTO value. Third, on a low-bandwidth link the arrival 113 of competing traffic (possibly with higher priority), or some other 114 change in available bandwidth, can cause a sudden increase of the 115 round-trip time. This may trigger a spurious retransmission 116 timeout. A persistently reliable link layer can also cause a sudden 117 delay when a data frame and several retransmissions of it are lost 118 for some reason. This document does not distinguish between the 119 different causes of such a delay spike. Rather, it discusses the 120 spurious retransmission timeouts caused by a delay spike in general. 122 This document describes the F-RTO detection algorithm. It is based 123 on the detection mechanism of the "Forward RTO-Recovery" (F-RTO) 124 algorithm [SKR03] that is used for detecting spurious retransmission 125 timeouts and thus avoids unnecessary retransmissions following the 126 retransmission timeout. When the timeout is not spurious, the F-RTO 127 algorithm reverts back to the conventional RTO recovery algorithm, 128 and therefore has similar behavior and performance. In contrast to 129 alternative algorithms proposed for detecting unnecessary 130 retransmissions (Eifel [LK00], [LM03] and DSACK-based algorithms 131 [BA04]), F-RTO does not require any TCP options for its operation, 132 and it can be implemented by modifying only the TCP sender. The 133 Eifel algorithm uses TCP timestamps [BBJ92] for detecting a spurious 134 timeout upon arrival of the first acknowledgment after the 135 retransmission. The DSACK-based algorithms require that the TCP 136 Selective Acknowledgment Option [MMFR96], with the DSACK extension 137 [FMMP00], is in use. With DSACK, the TCP receiver can report if it 138 has received a duplicate segment, enabling the sender to detect 139 afterwards whether it has retransmitted segments unnecessarily. The 140 F-RTO algorithm only attempts to detect and avoid unnecessary 141 retransmissions after an RTO. Eifel and DSACK can also be used for 142 detecting unnecessary retransmissions caused by other events, such 143 as packet reordering. 145 When an RTO expires, the F-RTO sender retransmits the first 146 unacknowledged segment as usual [APS99]. Deviating from the normal 147 operation after a timeout, it then tries to transmit new, previously 148 unsent data for the first acknowledgment that arrives after the 149 timeout, given that the acknowledgment advances the window. If the 150 second acknowledgment that arrives after the timeout advances the 151 window (i.e., acknowledges data that was not retransmitted), the F- 152 RTO sender declares the timeout spurious and exits the RTO recovery. 153 However, if either of these two acknowledgments is a duplicate ACK, 154 there will not be sufficient evidence of a spurious timeout. 155 Therefore, the F-RTO sender retransmits the unacknowledged segments 156 in slow start similarly to the traditional algorithm. 158 With a SACK-enhanced version of the F-RTO algorithm, spurious 159 timeouts may be detected even if duplicate ACKs arrive after an RTO 160 retransmission. Even though this document only specifies the F-RTO 161 algorithm for TCP, the algorithm can also be applied to the Stream 162 Control Transmission Protocol (SCTP) [Ste00] that has acknowledgment 163 and packet retransmission concepts similar to TCP. Considerations on 164 applying F-RTO for SCTP are discussed in RFC 4138 [SK05]. 166 This document is organized as follows. Section 2 describes the 167 basic F-RTO algorithm, and the SACK-enhanced F-RTO algorithm is 168 given in Section 3. Section 4 discusses the possible actions to be 169 taken after detecting a spurious RTO and Section 5 discusses the 170 security considerations. 172 1.1. Conventions and Terminology 174 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 175 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 176 document are to be interpreted as described in BCP 14, RFC 2119 177 [RFC2119] and indicate requirement levels for protocols. 179 2. Basic F-RTO Algorithm 181 A timeout is considered spurious if it would have been avoided had 182 the sender waited longer for an acknowledgment to arrive [LM03]. F- 183 RTO affects the TCP sender behavior only after a retransmission 184 timeout. Otherwise, the TCP behavior remains the same. When the 185 RTO expires, the F-RTO algorithm monitors incoming acknowledgments 186 and if the TCP sender gets an acknowledgment for a segment that was 187 not retransmitted due to timeout, the F-RTO algorithm declares a 188 timeout spurious. The actions taken in response to a spurious 189 timeout are not specified in this document, but we discuss some 190 alternatives in Section 4. This section introduces the algorithm 191 and then discusses the different steps of the algorithm in more 192 detail. 194 Following the practice used with the Eifel Detection algorithm 196 [LM03], we use the "SpuriousRecovery" variable to indicate whether 197 the retransmission is declared spurious by the sender. This variable 198 can be used as an input for a corresponding response algorithm. With 199 F-RTO, the value of SpuriousRecovery can be either SPUR_TO 200 (indicating a spurious retransmission timeout) or FALSE (indicating 201 that the timeout is not declared spurious), and the TCP sender 202 should follow the conventional RTO recovery algorithm. In addition, 203 we use the "recover" variable specified in the NewReno algorithm 204 [FHG04]. 206 2.1. The Algorithm 208 A TCP sender implementing the basic F-RTO algorithm MUST take the 209 following steps after the retransmission timer expires. If the 210 retransmission timer expires again during the execution of the F-RTO 211 algorithm, the TCP sender MUST re-start the algorithm processing 212 from step 1. If the sender implements some loss recovery algorithm 213 other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT 214 be entered when earlier fast recovery is underway. 216 The F-RTO algorithm takes different actions based on whether an 217 incoming acknowledgement advances the cumulative acknowledgement 218 point for a received in-order segment, or whether it is a duplicate 219 acknowledgement to indicate an out-of-order segment. Duplicate 220 acknowledgement is defined in [APB07]. The F-RTO algorithm does not 221 specify actions for receiving a segment that does not acknowledge 222 new data but is not a duplicate acknowledgement. The TCP sender 223 SHOULD ignore such segments and wait for a segment that either 224 acknowledges new data or is a duplicate acknowledgment. 226 1) When RTO expires, retransmit the first unacknowledged segment and 227 set SpuriousRecovery to FALSE. If the TCP sender is already in 228 RTO recovery AND "recover" is larger then SND.UNA (the oldest 229 unacknowledged sequence number [Pos81]), do not enter step 2 of 230 this algorithm. Instead, store the highest sequence number 231 transmitted so far in variable "recover" and continue with slow 232 start retransmissions following the conventional RTO recovery 233 algorithm. 235 2) When the first acknowledgment after the RTO retransmission 236 arrives at the TCP sender, store the highest sequence number 237 transmitted so far in variable "recover". The TCP sender chooses 238 one of the following actions, depending on whether the ACK 239 advances the window or whether it is a duplicate ACK. 241 a) If the acknowledgment is a duplicate ACK OR the 242 Acknowledgement field covers "recover" but not more than 243 "recover" OR the acknowledgment does not acknowledge all of 244 the data that was retransmitted in step 1, revert to the 245 conventional RTO recovery and continue by retransmitting 246 unacknowledged data in slow start. Do not enter step 3 of 247 this algorithm. The SpuriousRecovery variable remains as 248 FALSE. 250 b) Else, if the acknowledgment advances the window AND the 251 Acknowledgement field does not cover "recover", transmit up to 252 two new (previously unsent) segments and enter step 3 of this 253 algorithm. If the TCP sender does not have enough unsent data, 254 it can send only one segment. In addition, the TCP sender MAY 255 override the Nagle algorithm [Nag84] and immediately send a 256 segment if needed. Note that sending two segments in this step 257 is allowed by TCP congestion control requirements [APS99]: An 258 F-RTO TCP sender simply chooses different segments to 259 transmit. 261 If the TCP sender does not have any new data to send, or the 262 advertised window prohibits new transmissions, the recommended 263 action is to skip step 3 of this algorithm and continue with 264 slow start retransmissions, following the conventional RTO 265 recovery algorithm. However, alternative ways of handling the 266 window-limited cases that could result in better performance 267 are discussed in Appendix A. 269 3) When the second acknowledgment after the RTO retransmission 270 arrives at the TCP sender, the TCP sender either declares the 271 timeout spurious, or starts retransmitting the unacknowledged 272 segments. 274 a) If the acknowledgment is a duplicate ACK, set the congestion 275 window to no more than 3 * MSS, and continue with the slow 276 start algorithm retransmitting unacknowledged segments. The 277 congestion window can be set to 3 * MSS, because two round- 278 trip times have elapsed since the RTO, and a conventional TCP 279 sender would have increased cwnd to 3 during the same time. 280 Leave SpuriousRecovery set to FALSE. 282 b) If the acknowledgment advances the window (i.e., if it 283 acknowledges data that was not retransmitted after the 284 timeout), declare the timeout spurious, set SpuriousRecovery 285 to SPUR_TO, and set the value of the "recover" variable to 286 SND.UNA (the oldest unacknowledged sequence number [Pos81]). 288 2.2. Discussion 290 The F-RTO sender takes cautious actions when it receives duplicate 291 acknowledgments after a retransmission timeout. Because duplicate 292 ACKs may indicate that segments have been lost, reliably detecting a 293 spurious timeout is difficult due to the lack of additional 294 information. Therefore, it is prudent to follow the conventional 295 TCP recovery in those cases. 297 The condition in step 1 prevents the execution of the F-RTO 298 algorithm in case a previous RTO recovery is underway when the 299 retransmission timer expires, except in case the retransmission 300 timer expires multiple times for the same segment. If RTO expires 301 during an earlier RTO-based loss recovery, acknowledgements for 302 retransmitted segments may falsely lead the TCP sender to declare 303 the timeout spurious. 305 If the first acknowledgment after the RTO retransmission covers the 306 "recover" point at algorithm step (2a), there is not enough evidence 307 that a non-retransmitted segment has arrived at the receiver after 308 the timeout. This is a common case when a fast retransmission is 309 lost and has been retransmitted again after an RTO, while the rest 310 of the unacknowledged segments were successfully delivered to the 311 TCP receiver before the retransmission timeout. Therefore, the 312 timeout cannot be declared spurious in this case. 314 If the first acknowledgment after the RTO retransmission does not 315 acknowledge all of the data that was retransmitted in step 1, the 316 TCP sender reverts to the conventional RTO recovery. Otherwise, a 317 malicious receiver acknowledging partial segments could cause the 318 sender to declare the timeout spurious in a case where data was 319 lost. 321 The TCP sender is allowed to send two new segments in algorithm 322 branch (2b) because the conventional TCP sender would transmit two 323 segments when the first new ACK arrives after the RTO 324 retransmission. If sending new data is not possible in algorithm 325 branch (2b), or if the receiver window limits the transmission, the 326 TCP sender has to send something in order to prevent the TCP 327 transfer from stalling. If no segments were sent, the pipe between 328 sender and receiver might run out of segments, and no further 329 acknowledgments would arrive. Therefore, in the window-limited 330 case, the recommendation is to revert to the conventional RTO 331 recovery with slow start retransmissions. Appendix A discusses some 332 alternative solutions for window-limited situations. 334 If the retransmission timeout is declared spurious, the TCP sender 335 sets the value of the "recover" variable to SND.UNA in order to 336 allow fast retransmit [FHG04]. The "recover" variable was proposed 337 for avoiding unnecessary, multiple fast retransmits when RTO expires 338 during fast recovery with NewReno TCP. Because the F-RTO sender 339 retransmits only the segment that triggered the timeout, the problem 340 of unnecessary multiple fast retransmits [FHG04] cannot occur. 341 Therefore, if three duplicate ACKs arrive at the sender after the 342 timeout, they probably indicate a packet loss, and thus fast 343 retransmit should be used to allow efficient recovery. If there are 344 not enough duplicate ACKs arriving at the sender after a packet 345 loss, the retransmission timer expires again and the sender enters 346 step 1 of this algorithm. 348 When the timeout is declared spurious, the TCP sender cannot detect 349 whether the unnecessary RTO retransmission was lost. In principle, 350 the loss of the RTO retransmission should be taken as a congestion 351 signal. Thus, there is a small possibility that the F-RTO sender 352 will violate the congestion control rules, if it chooses to fully 353 revert congestion control parameters after detecting a spurious 354 timeout. The Eifel detection algorithm has a similar property, 355 while the DSACK option can be used to detect whether the 356 retransmitted segment was successfully delivered to the receiver. 358 The F-RTO algorithm has a side-effect on the TCP round-trip time 359 measurement. Because the TCP sender can avoid most of the 360 unnecessary retransmissions after detecting a spurious timeout, the 361 sender is able to take round-trip time samples on the delayed 362 segments. If the regular RTO recovery was used without TCP 363 timestamps, this would not be possible due to the retransmission 364 ambiguity. As a result, the RTO is likely to have more accurate and 365 larger values with F-RTO than with the regular TCP after a spurious 366 timeout that was triggered due to delayed segments. We believe this 367 is an advantage in networks that are prone to delay spikes. 369 There are some situations where the F-RTO algorithm may not avoid 370 unnecessary retransmissions after a spurious timeout. If packet 371 reordering or packet duplication occurs on the segment that 372 triggered the spurious timeout, the F-RTO algorithm may not detect 373 the spurious timeout due to incoming duplicate ACKs. Additionally, 374 if a spurious timeout occurs during fast recovery, the F-RTO 375 algorithm often cannot detect the spurious timeout because the 376 segments that were transmitted before the fast recovery trigger 377 duplicate ACKs. However, we consider these cases rare, and note 378 that in cases where F-RTO fails to detect the spurious timeout, it 379 retransmits the unacknowledged segments in slow start, and thus 380 performs similarly to the regular RTO recovery. 382 3. SACK-Enhanced Version of the F-RTO Algorithm 384 This section describes an alternative version of the F-RTO algorithm 385 that uses the TCP Selective Acknowledgment Option [MMFR96]. By 386 using the SACK option, the TCP sender detects spurious timeouts in 387 most of the cases when packet reordering or packet duplication is 388 present. If the SACK blocks acknowledge new data that was not 389 transmitted after the RTO retransmission, the sender may declare the 390 timeout spurious, even when duplicate ACKs follow the RTO. 392 Given that the TCP Selective Acknowledgment Option [MMFR96] is 393 enabled for a TCP connection, a TCP sender MAY implement the SACK- 394 enhanced F-RTO algorithm. If the sender applies the SACK-enhanced 395 F-RTO algorithm, it MUST follow the steps below. This algorithm 396 SHOULD NOT be applied if the TCP sender is already in loss recovery 397 when retransmission timeout occurs. 399 The steps of the SACK-enhanced version of the F-RTO algorithm are as 400 follows. If the retransmission timer expires again during the 401 execution of the SACK-enhanced F-RTO algorithm, the TCP sender MUST 402 re-start the algorithm processing from step 1. 404 1) When the RTO expires, retransmit the first unacknowledged segment 405 and set SpuriousRecovery to FALSE. Following the recommendation 406 in SACK specification [MMFR96], reset the SACK scoreboard. If 407 "RecoveryPoint" is larger than SND.UNA, do not enter step 2 of 408 this algorithm. Instead, set variable "RecoveryPoint" to 409 indicate the highest sequence number transmitted so far and 410 continue with slow start retransmissions following the 411 conventional RTO recovery algorithm. 413 2) Wait until the acknowledgment of the data retransmitted due to 414 the timeout arrives at the sender. If duplicate ACKs arrive 415 before the cumulative acknowledgment for retransmitted data, 416 adjust the scoreboard according to the incoming SACK information. 417 Stay in step 2 and wait for the next new acknowledgment. If RTO 418 expires again, go to step 1 of the algorithm. When a new 419 acknowledgment arrives, set variable "RecoveryPoint" to indicate 420 the highest sequence number transmitted so far. 422 a) If the Cumulative Acknowledgement field covers "RecoveryPoint" 423 but not more than "RecoveryPoint", revert to the conventional 424 RTO recovery and set the congestion window to no more than 2 * 425 MSS, like a regular TCP would do. Do not enter step 3 of this 426 algorithm. 428 b) Else, if the Cumulative Acknowledgement field does not cover 429 "RecoveryPoint" but is larger than SND.UNA, transmit up to two 430 new (previously unsent) segments and proceed to step 3. If 431 the TCP sender is not able to transmit any previously unsent 432 data -- either due to receiver window limitation or because it 433 does not have any new data to send -- the recommended action 434 is to refrain from entering step 3 of this algorithm. Rather, 435 continue with slow start retransmissions following the 436 conventional RTO recovery algorithm. 438 It is also possible to apply some of the alternatives for 439 handling window-limited cases discussed in Appendix A. 441 3) The next acknowledgment arrives at the sender. Either a 442 duplicate ACK or a new cumulative ACK (advancing the window) 443 applies in this step. Other types of ACKs are ignored without any 444 action. 446 a) If the Cumulative Acknowledgement field or a SACK block covers 447 more than "RecoveryPoint", set the congestion window to no 448 more than 3 * MSS and proceed with the conventional RTO 449 recovery, retransmitting unacknowledged segments. Take this 450 branch also when the acknowledgment is a duplicate ACK and it 451 does not acknowledge any new, previously unacknowledged data 452 below "RecoveryPoint" in the SACK blocks. Leave 453 SpuriousRecovery set to FALSE. 455 b) If the Cumulative Acknowledgement field or a SACK block in the 456 ACK does not cover more than "RecoveryPoint" AND it 457 acknowledges data that was not acknowledged earlier (either 458 with cumulative acknowledgment or using SACK blocks), declare 459 the timeout spurious and set SpuriousRecovery to SPUR_TO. The 460 retransmission timeout can be declared spurious, because the 461 segment acknowledged with this ACK was transmitted before the 462 timeout. 464 If there are unacknowledged holes between the received SACK blocks, 465 those segments are retransmitted similarly to the conventional SACK 466 recovery algorithm [BAFW03]. If the algorithm exits with 467 SpuriousRecovery set to SPUR_TO, "RecoveryPoint" is set to SND.UNA, 468 thus allowing fast recovery on incoming duplicate acknowledgments. 470 The SACK enhanced algorithm works on the same principle as the basic 471 algorithm, but by utilizing the additional information from the SACK 472 option. When a genuine retransmission timeout occurs during a steady 473 state of a connection, it can be assumed that there are no segments 474 left in the pipe. Otherwise, the acknowledgments triggered by these 475 segments would have triggered the SACK loss recovery or transmission 476 of new segments. Therefore, if the F-RTO sender receives 477 acknowledgements for segments transmitted before the retransmission 478 timeout in response to the two new segments sent at the algorithm 479 step 2, the normal operation of TCP has been just delayed, and the 480 retransmission timeout is considered spurious. Note that this 481 reasoning works only when the TCP sender is not in loss recovery at 482 the time the retransmission timeout occurs. The condition in step 1 483 checking that "RecoveryPoint" is larger than SND.UNA prevents the 484 execution of the F-RTO algorithm in case a previous loss recovery, 485 either RTO recovery or SACK loss recovery, is underway when the 486 retransmission timer expires. It, however, allows the execution of 487 the F-RTO algorithm, if the retransmission timer expires multiple 488 times for the same segment. 490 4. Taking Actions after Detecting Spurious RTO 492 Upon a retransmission timeout, a conventional TCP sender assumes 493 that outstanding segments are lost and starts retransmitting the 494 unacknowledged segments. When the retransmission timeout is 495 detected to be spurious, the TCP sender should not continue 496 retransmitting based on the timeout. For example, if the sender was 497 in congestion avoidance phase transmitting new, previously unsent 498 segments, it should continue transmitting previously unsent segments 499 in congestion avoidance. 501 There are currently two alternatives specified for a spurious 502 timeout response algorithm, the Eifel Response Algorithm [LG04], and 503 an algorithm for adapting the retransmission timeout after a 504 spurious RTO [BBA06]. If no specific response algorithm is 505 implemented, the TCP SHOULD respond to spurious timeout 506 conservatively, applying the TCP congestion control specification 507 [APS99]. Different response algorithms for spurious retransmission 508 timeouts have been analyzed in some research papers [GL03, Sar03] 509 and IETF documents [SL03]. 511 5. Evaluation of RFC 4138 and Differences to this Document 513 F-RTO was first specified in an Experimental RFC 4138 that has been 514 implemented in a number of operating systems since it was published. 515 Gained experience has been documented in a separate document 516 [KYHS07], and can be summarized as follows. 518 If the TCP sender employs F-RTO, it is able to detect spurious RTOs 519 and avoid the unnecessary retransmission of the whole window of 520 data. Because F-RTO avoids the unnecessary retransmissions after a 521 spurious RTO, it is able to adhere to the packet conservation 522 principle, unlike a regular TCP that enters the slow-start recovery 523 unnecessarily an inappropriately restarts the ACK clock while there 524 are segments outstanding in the network. When a spurious RTO has 525 been detected, a sender can select an appropriate congestion control 526 response instead of setting the congestion window to one segment. 527 Because F-RTO avoids unnecessary retransmissions, it is able to take 528 the RTT of the delayed segments into account when calculating the 529 RTO estimate, which may help in avoiding further spurious 530 retransmission timeouts. 532 Experimental results with the basic F-RTO have been reported in an 533 emulated network using a Linux implementation [SKR03]. Also 534 different congestion control responses along with the SACK-enhanced 535 version of F-RTO were tested in a similar environment [Sar03]. There 536 are publications analyzing F-RTO performance over commercial W-CDMA 537 networks, and in an emulated HSDPA network [Yam05, Hok05]. Also 538 Microsoft reported positive experiences with their implementation of 539 F-RTO in the IETF-68 meeting. 541 It is known that some spurious RTOs may remain undetected by F-RTO 542 if duplicate acknowledgements arrive at the sender immediately after 543 the spurious RTO, for example due to packet reordering or packet 544 loss. There are rare corner cases where F-RTO could "hide" a packet 545 loss and therefore lead to inappropriate behavior with non- 546 conservative congestion control response: first, if a massive packet 547 reordering occurred so that the acknowledgement of RTO 548 retransmission arrived at the sender before the acknowledgments of 549 original transmissions, the sender might not detect the loss of the 550 segment that triggered the RTO. Second, a malicious receiver could 551 lead F-RTO to make a wrong conclusion after an RTO by acknowledging 552 segments it has not received. Such receiver would, however, risk 553 breaking the consistency of the TCP state between the sender and 554 receiver, causing the connection to become unusable, which cannot be 555 of any benefit to the receiver. Therefore we believe it is not 556 likely that receivers would start employing such tricks in a 557 significant scale. Finally, loss of the unnecessary RTO 558 retransmission cannot be detected without using some explicit 559 acknowledgement scheme such as DSACK. This is common to the other 560 mechanisms for detecting spurious RTO, as well as to regular TCP 561 that does not use DSACK. We note that if the congestion control 562 response to spurious RTO is conservative enough, the above corner 563 cases do not cause problems due to increased congestion. 565 6. Security Considerations 567 The main security threat regarding F-RTO is the possibility that a 568 receiver could mislead the sender into setting too large a 569 congestion window after an RTO. There are two possible ways a 570 malicious receiver could trigger a wrong output from the F-RTO 571 algorithm. First, the receiver can acknowledge data that it has not 572 received. Second, it can delay acknowledgment of a segment it has 573 received earlier, and acknowledge the segment after the TCP sender 574 has been deluded to enter algorithm step 3. 576 If the receiver acknowledges a segment it has not really received, 577 the sender can be led to declare spurious timeout in the F-RTO 578 algorithm, step 3. However, because the sender will have an 579 incorrect state, it cannot retransmit the segment that has never 580 reached the receiver. Therefore, this attack is unlikely to be 581 useful for the receiver to maliciously gain a larger congestion 582 window. 584 A common case for a retransmission timeout is that a fast 585 retransmission of a segment is lost. If all other segments have 586 been received, the RTO retransmission causes the whole window to be 587 acknowledged at once. This case is recognized in F-RTO algorithm 588 branch (2a). However, if the receiver only acknowledges one segment 589 after receiving the RTO retransmission, and then the rest of the 590 segments, it could cause the timeout to be declared spurious when it 591 is not. Therefore, it is suggested that, when an RTO expires during 592 the fast recovery phase, the sender would not fully revert the 593 congestion window even if the timeout was declared spurious. 594 Instead, the sender would reduce the congestion window to 1. 596 If there is more than one segment missing at the time of a 597 retransmission timeout, the receiver does not benefit from 598 misleading the sender to declare a spurious timeout because the 599 sender would have to go through another recovery period to 600 retransmit the missing segments, usually after an RTO has elapsed. 602 7. Acknowledgements 604 The authors would like to thank Alfred Hoenes, Ilpo Jarvinen and 605 Murari Sridharan for the comments on this document. 607 We are also thankful to Reiner Ludwig, Andrei Gurtov, Josh Blanton, 608 Mark Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias 609 Rodriguez, Sourabh Ladha, Martin Duke, Motoharu Miyake, Ted Faber, 610 Samu Kontinen, and Kostas Pentikousis who gave valuable feedback 611 during the preparation of RFC 4138, the precursor of this document. 613 Appendix 615 A. Discussion of Window-Limited Cases 617 When the advertised window limits the transmission of two new 618 previously unsent segments, or there are no new data to send, it is 619 recommended in F-RTO algorithm step (2b) that the TCP sender 620 continue with the conventional RTO recovery algorithm. The 621 disadvantage is that the sender may continue unnecessary 622 retransmissions due to possible spurious timeout. This section 623 briefly discusses the options that can potentially improve 624 performance when transmitting previously unsent data is not 625 possible. 627 - The TCP sender could reserve an unused space of a size of one or 628 two segments in the advertised window to ensure the use of 629 algorithms such as F-RTO or Limited Transmit [ABF01] in receiver 630 window-limited situations. On the other hand, while doing this, 631 the TCP sender should ensure that the window of outstanding 632 segments is large enough for proper utilization of the available 633 pipe. 635 - Use additional information if available, e.g., TCP timestamps with 636 the Eifel Detection algorithm, for detecting a spurious timeout. 637 However, Eifel detection may yield different results from F-RTO 638 when ACK losses and an RTO occur within the same round-trip time 639 [SKR03]. 641 - Retransmit data from the tail of the retransmission queue and 642 continue with step 3 of the F-RTO algorithm. It is possible that 643 the retransmission will be made unnecessarily. Furthermore, the 644 operation of the SACK-based F-RTO algorithm would need to consider 645 this case separately, to not use the retransmitted segment to 646 indicate spurious timeout. Given these considerations, this option 647 is not recommended. 649 - Send a zero-sized segment below SND.UNA, similar to a TCP Keep- 650 Alive probe, and continue with step 3 of the F-RTO algorithm. 651 Because the receiver replies with a duplicate ACK, the sender is 652 able to detect whether the timeout was spurious from the incoming 653 acknowledgment. This method does not send data unnecessarily, but 654 it delays the recovery by one round-trip time in cases where the 655 timeout was not spurious. Therefore, this method is not 656 encouraged. 658 - In receiver-limited cases, send one octet of new data, regardless 659 of the advertised window limit, and continue with step 3 of the F- 660 RTO algorithm. It is possible that the receiver will have free 661 buffer space to receive the data by the time the segment has 662 propagated through the network, in which case no harm is done. If 663 the receiver is not capable of receiving the segment, it rejects 664 the segment and sends a duplicate ACK. 666 B. List of Changes 668 Changes between different document versions are summarized below, 669 apart from minor editing and language improvements. 671 Changes from draft-ietf-tcpm-rfc4138bis-01: 673 * Modified the basic F-RTO algorithm and SACK-enhanced F-RTO 674 algorithm to prevent the TCP sender from applying F-RTO algorithm if 675 retransmission timer expires when an earlier RTO recovery is 676 underway, except when RTO expires multiple times for the same 677 segment. 679 Changes from draft-ietf-tcpm-rfc4138bis-00: 681 * Added back the original SACK-algorithm from RFC 4138 after the 682 common feedback to have the SACK-algorithm in the document. 683 Clarified the algorithm a bit, and added one paragraph of 684 description of the basic idea of the algorithm. 686 * Clarified behavior on multiple timeouts. 688 * Added a paragraph on acknowledgements that do not acknowledge new 689 data but are not duplicate acknowledgements 691 Changes from RFC 4138: 693 * Removed description of the SACK-enhanced algorithm 695 * Removed SCTP considerations 697 * Removed earlier Appendix sections, except Appendix C from RFC 698 4138, which is now Appendix A 700 * Clarified text about the possible response algorithms 701 * Added section that summarizes the evaluation of RFC 4138 703 References 705 Normative References 707 [APS99] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion 708 Control", RFC 2581, April 1999. 710 [APB07] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 711 Control", Internet-Draft "draft-ietf-tcpm- 712 rfc2581bis-03.txt", 713 September 2007. 715 [BAFW03] Blanton, E., Allman, M., Fall, K., and L. Wang, "A 716 Conservative Selective Acknowledgment (SACK)-based Loss 717 Recovery Algorithm for TCP", RFC 3517, April 2003. 719 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 720 Requirement Levels", BCP 14, RFC 2119, March 1997. 722 [FHG04] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 723 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 724 April 2004. 726 [MMFR96] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 727 Selective Acknowledgement Options", RFC 2018, 728 October 1996. 730 [PA00] Paxson, V. and M. Allman, "Computing TCP's Retransmission 731 Timer", RFC 2988, November 2000. 733 [Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 734 793, September 1981. 736 Informative References 738 [ABF01] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing 739 TCP's Loss Recovery Using Limited Transmit", RFC 3042, 740 January 2001. 742 [BA04] Blanton, E. and M. Allman, "Using TCP Duplicate Selective 743 Acknowledgement (DSACKs) and Stream Control Transmission 744 Protocol (SCTP) Duplicate Transmission Sequence Numbers 745 (TSNs) to Detect Spurious Retransmissions", RFC 3708, 746 February 2004. 748 [BBA06] J. Blanton, E. Blanton, and M. Allman. Using Spurious 749 Retransmissions to Adapt the Retransmission Timeout, 750 Internet-Draft "draft-allman-rto-backoff-04.txt", December 751 2006. Work in progress. 753 [BBJ92] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions 754 for High Performance", RFC 1323, May 1992. 756 [FMMP00] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 757 Extension to the Selective Acknowledgement (SACK) Option 758 for TCP", RFC 2883, July 2000. 760 [GL02] A. Gurtov and R. Ludwig. Evaluating the Eifel Algorithm 761 for TCP in a GPRS Network. In Proc. of European Wireless, 762 Florence, Italy, February 2002. 764 [GL03] A. Gurtov and R. Ludwig, Responding to Spurious Timeouts 765 in TCP. In Proceedings of IEEE INFOCOM 03, San Francisco, 766 CA, USA, March 2003. 768 [Jac88] V. Jacobson. Congestion Avoidance and Control. In 769 Proceedings of ACM SIGCOMM 88. 771 [Hok05] A. Hokamura, et al. "Performance Evaluation of F-RTO and 772 Eifel Response Algorithms over W-CDMA packet network". 773 Wireless Personal Multimedia Communications (WPMC'05), 774 Sept. 2005. 776 [KYHS07] M. Kojo, K. Yamamoto, M. Hata, and P. Sarolahti. 777 Evaluation of RFC 4138. Internet-draft 778 "draft-kojo-tcpm-frto-eval-00.txt", June 2007. Work 779 in progress. 781 [LG04] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 782 for TCP", RFC 4015, February 2005. 784 [LK00] R. Ludwig and R.H. Katz. The Eifel Algorithm: Making TCP 785 Robust Against Spurious Retransmissions. ACM SIGCOMM 786 Computer Communication Review, 30(1), January 2000. 788 [LM03] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 789 for TCP", RFC 3522, April 2003. 791 [Nag84] Nagle, J., "Congestion Control in IP/TCP Internetworks", 792 RFC 896, January 1984. 794 [SK05] P. Sarolahti and M. Kojo, "Forward RTO-Recovery (F-RTO): 795 An Algorithm for Detecting Spurious Retransmission 796 Timeouts with TCP and the Stream Control Transmission 797 Protocol (SCTP), RFC 4138, August 2005. 799 [SKR03] P. Sarolahti, M. Kojo, and K. Raatikainen. F-RTO: An 800 Enhanced Recovery Algorithm for TCP Retransmission 801 Timeouts. ACM SIGCOMM Computer Communication Review, 802 33(2), April 2003. 804 [Sar03] P. Sarolahti. Congestion Control on Spurious TCP 805 Retransmission Timeouts. In Proceedings of IEEE Globecom 806 2003, San Francisco, CA, USA. December 2003. 808 [SL03] Y. Swami and K. Le, "DCLOR: De-correlated Loss Recovery 809 using SACK Option for Spurious Timeouts", Expired 810 Internet-Draft, September 2003. 812 [Ste00] R. Stewart, et. al. Stream Control Transmission Protocol, 813 RFC 2960, October 2000. 815 [Yam05] K. Yamamoto, et al. "Effects of F-RTO and Eifel Response 816 Algorithms for W-CDMA and HSDPA networks". Wireless 817 Personal Multimedia Communications (WPMC'05), 818 Sept. 2005. 820 AUTHORS' ADDRESSES 822 Pasi Sarolahti 823 Nokia Research Center 824 P.O. Box 407 825 FI-00045 NOKIA GROUP 826 Finland 827 Phone: +358 50 4876607 828 Email: pasi.sarolahti@nokia.com 830 Markku Kojo 831 University of Helsinki 832 P.O. Box 68 833 FI-00014 UNIVERSITY OF HELSINKI 834 Finland 835 Email: kojo@cs.helsinki.fi 837 Kazunori Yamamoto 838 NTT Docomo, Inc. 840 3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan 841 Phone: +81-46-840-3812 842 Email: yamamotokaz@nttdocomo.co.jp 844 Max Hata 845 NTT Docomo, Inc. 846 3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan 847 Phone: +81-46-840-3812 848 Email: hatama@s1.nttdocomo.co.jp 850 Full Copyright Statement 852 Copyright (C) The IETF Trust (2007). 854 This document is subject to the rights, licenses and restrictions 855 contained in BCP 78, and except as set forth therein, the authors 856 retain all their rights. 858 This document and the information contained herein are provided on 859 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 860 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 861 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 862 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 863 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 864 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 865 FOR A PARTICULAR PURPOSE. 867 Intellectual Property 869 The IETF takes no position regarding the validity or scope of any 870 Intellectual Property Rights or other rights that might be claimed 871 to pertain to the implementation or use of the technology described 872 in this document or the extent to which any license under such 873 rights might or might not be available; nor does it represent that 874 it has made any independent effort to identify any such rights. 875 Information on the procedures with respect to rights in RFC 876 documents can be found in BCP 78 and BCP 79. 878 Copies of IPR disclosures made to the IETF Secretariat and any 879 assurances of licenses to be made available, or the result of an 880 attempt made to obtain a general license or permission for the use 881 of such proprietary rights by implementers or users of this 882 specification can be obtained from the IETF on-line IPR repository 883 at http://www.ietf.org/ipr. 885 The IETF invites any interested party to bring to its attention any 886 copyrights, patents or patent applications, or other proprietary 887 rights that may cover technology that may be required to implement 888 this standard. Please address the information to the IETF at ietf- 889 ipr@ietf.org.