idnits 2.17.1 draft-ietf-tcpm-rfc4138bis-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 877. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 888. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 895. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 901. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC4138, but the abstract doesn't seem to directly say this. It does mention RFC4138 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year (Using the creation date from RFC4138, updated by this document, for RFC5378 checks: 2004-05-05) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (30 October 2008) is 5628 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC 5681) -- Duplicate reference: RFC2581, mentioned in 'APB08', was also mentioned in 'APS99'. ** Obsolete normative reference: RFC 2581 (ref. 'APB08') (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (ref. 'BAFW03') (Obsoleted by RFC 6675) ** Obsolete normative reference: RFC 3782 (ref. 'FHG04') (Obsoleted by RFC 6582) ** Obsolete normative reference: RFC 2988 (ref. 'PA00') (Obsoleted by RFC 6298) ** Obsolete normative reference: RFC 793 (ref. 'Pos81') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1323 (ref. 'BBJ92') (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 896 (ref. 'Nag84') (Obsoleted by RFC 7805) -- Duplicate reference: RFC4138, mentioned in 'SK05', was also mentioned in 'KYHS07'. -- Obsolete informational reference (is this intentional?): RFC 4960 (ref. 'Ste07') (Obsoleted by RFC 9260) Summary: 7 errors (**), 0 flaws (~~), 1 warning (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force P. Sarolahti 2 INTERNET-DRAFT Nokia Research Center 3 draft-ietf-tcpm-rfc4138bis-04.txt M. Kojo 4 Intended status: Proposed Standard University of Helsinki 5 Updates: 4138 K. Yamamoto 6 Expires: April 2009 M. Hata 7 NTT Docomo 9 30 October 2008 11 Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 12 Spurious Retransmission Timeouts with TCP 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six 27 months and may be updated, replaced, or obsoleted by other documents 28 at any time. It is inappropriate to use Internet-Drafts as 29 reference material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on April 2009. 39 Abstract 41 The purpose of this document is to move the F-RTO functionality for 42 TCP in RFC 4138 from Experimental to Standards Track status. The F- 43 RTO support for SCTP in RFC 4138 remains with Experimental status. 44 See Appendix B for the differences between this document and RFC 45 4138. 47 Spurious retransmission timeouts cause suboptimal TCP performance 48 because they often result in unnecessary retransmission of the last 49 window of data. This document describes the F-RTO detection 50 algorithm for detecting spurious TCP retransmission timeouts. F-RTO 51 is a TCP sender-only algorithm that does not require any TCP options 52 to operate. After retransmitting the first unacknowledged segment 53 triggered by a timeout, the F-RTO algorithm of the TCP sender 54 monitors the incoming acknowledgments to determine whether the 55 timeout was spurious. It then decides whether to send new segments 56 or retransmit unacknowledged segments. The algorithm effectively 57 helps to avoid additional unnecessary retransmissions and thereby 58 improves TCP performance in the case of a spurious timeout. 60 Table of Contents 62 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Conventions and Terminology. . . . . . . . . . . . . . . 5 64 2. Basic F-RTO Algorithm . . . . . . . . . . . . . . . . . . . . 5 65 2.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . . 6 66 2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 8 67 3. SACK-Enhanced Version of the F-RTO Algorithm. . . . . . . . . 10 68 3.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . . 10 69 3.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 12 70 4. Taking Actions after Detecting Spurious RTO . . . . . . . . . 12 71 5. Evaluation of RFC 4138. . . . . . . . . . . . . . . . . . . . 13 72 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 73 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 74 8. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 15 75 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 76 A. Discussion of Window-Limited Cases. . . . . . . . . . . . . . 15 77 B. Changes since RFC 4138. . . . . . . . . . . . . . . . . . . . 16 78 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 79 Normative References . . . . . . . . . . . . . . . . . . . . . . 17 80 Informative References . . . . . . . . . . . . . . . . . . . . . 17 81 AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . 19 82 Full Copyright Statement . . . . . . . . . . . . . . . . . . . . 21 83 Intellectual Property. . . . . . . . . . . . . . . . . . . . . . 21 85 1. Introduction 87 The Transmission Control Protocol (TCP) [Pos81] has two methods for 88 triggering retransmissions. First, the TCP sender relies on 89 incoming duplicate ACKs, which indicate that the receiver is missing 90 some of the data. After a required number of successive duplicate 91 ACKs have arrived at the sender, it retransmits the first 92 unacknowledged segment [APS99] and continues with a loss recovery 93 algorithm such as NewReno [FHG04] or SACK-based loss recovery 94 [BAFW03]. Second, the TCP sender maintains a retransmission timer 95 which triggers retransmission of segments, if they have not been 96 acknowledged before the retransmission timeout (RTO) occurs. When 97 the retransmission timeout occurs, the TCP sender enters the RTO 98 recovery where the congestion window is initialized to one segment 99 and unacknowledged segments are retransmitted using the slow-start 100 algorithm. The retransmission timer is adjusted dynamically, based 101 on the measured round-trip times [PA00]. 103 It has been pointed out that the retransmission timer can expire 104 spuriously and cause unnecessary retransmissions when no segments 105 have been lost [LK00, GL02, LM03]. After a spurious retransmission 106 timeout, the late acknowledgments of the original segments arrive at 107 the sender, usually triggering unnecessary retransmissions of a 108 whole window of segments during the RTO recovery. Furthermore, 109 after a spurious retransmission timeout, a conventional TCP sender 110 increases the congestion window on each late acknowledgment in slow 111 start. This injects a large number of data segments into the 112 network within one round-trip time, thus violating the packet 113 conservation principle [Jac88]. 115 There are a number of potential reasons for spurious retransmission 116 timeouts. First, some mobile networking technologies involve sudden 117 delay spikes on transmission because of actions taken during a hand- 118 off. Second, a hand-off may take place from a low latency path to a 119 high latency path, suddenly increasing the round-trip time beyond 120 the current RTO value. Third, on a low-bandwidth link the arrival 121 of competing traffic (possibly with higher priority), or some other 122 change in available bandwidth, can cause a sudden increase of the 123 round-trip time. This may trigger a spurious retransmission 124 timeout. A persistently reliable link layer can also cause a sudden 125 delay when a data frame and several retransmissions of it are lost 126 for some reason. This document does not distinguish between the 127 different causes of such a delay spike. Rather, it discusses the 128 spurious retransmission timeouts caused by a delay spike in general. 130 This document describes the F-RTO detection algorithm for TCP. It is 131 based on the detection mechanism of the "Forward RTO-Recovery" (F- 132 RTO) algorithm [SKR03] that is used for detecting spurious 133 retransmission timeouts and thus avoids unnecessary retransmissions 134 following the retransmission timeout. When the timeout is not 135 spurious, the F-RTO algorithm reverts back to the conventional RTO 136 recovery algorithm, and therefore has similar behavior and 137 performance. In contrast to alternative algorithms proposed for 138 detecting unnecessary retransmissions (Eifel [LK00], [LM03] and 139 DSACK-based algorithms [BA04]), F-RTO does not require any TCP 140 options for its operation, and it can be implemented by modifying 141 only the TCP sender. The Eifel algorithm uses TCP timestamps 142 [BBJ92] for detecting a spurious timeout upon arrival of the first 143 acknowledgment after the retransmission. The DSACK-based algorithms 144 require that the TCP Selective Acknowledgment Option [MMFR96], with 145 the DSACK extension [FMMP00], is in use. With DSACK, the TCP 146 receiver can report if it has received a duplicate segment, enabling 147 the sender to detect afterwards whether it has retransmitted 148 segments unnecessarily. The F-RTO algorithm only attempts to detect 149 and avoid unnecessary retransmissions after an RTO. Eifel and DSACK 150 can also be used for detecting unnecessary retransmissions caused by 151 other events, such as packet reordering. 153 When the retransmission timer expires, the F-RTO sender retransmits 154 the first unacknowledged segment as usual [APS99]. Deviating from 155 the normal operation after a timeout, it then tries to transmit new, 156 previously unsent data for the first acknowledgment that arrives 157 after the timeout, given that the acknowledgment advances the 158 window. If the second acknowledgment that arrives after the timeout 159 advances the window (i.e., acknowledges data that was not 160 retransmitted), the F-RTO sender declares the timeout spurious and 161 exits the RTO recovery. However, if either of these two 162 acknowledgments is a duplicate ACK, there will not be sufficient 163 evidence of a spurious timeout. Therefore, the F-RTO sender 164 retransmits the unacknowledged segments in slow start similarly to 165 the traditional algorithm. With a SACK-enhanced version of the F-RTO 166 algorithm, spurious timeouts may be detected even if duplicate ACKs 167 arrive after an RTO retransmission. 169 This document specifies the F-RTO algorithm for TCP only, replacing 170 the F-RTO functionality with TCP in RFC 4138 [SK05] and moving it 171 from Experimental to Standards Track status. The algorithm can also 172 be applied to the Stream Control Transmission Protocol (SCTP) 173 [Ste07] that has acknowledgment and packet retransmission concepts 174 similar to TCP. The considerations on applying F-RTO to SCTP are 175 discussed in RFC 4138, but the F-RTO support for SCTP remains with 176 Experimental status. 178 This document is organized as follows. Section 2 describes the basic 179 F-RTO algorithm, and the SACK-enhanced F-RTO algorithm is given in 180 Section 3. Section 4 discusses the possible actions to be taken 181 after detecting a spurious RTO and Section 5 discusses the security 182 considerations. 184 1.1. Conventions and Terminology 186 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 187 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 188 document are to be interpreted as described in BCP 14, RFC 2119 189 [RFC2119] and indicate requirement levels for protocols. 191 2. Basic F-RTO Algorithm 193 A timeout is considered spurious if it would have been avoided had 194 the sender waited longer for an acknowledgment to arrive [LM03]. F- 195 RTO affects the TCP sender behavior only after a retransmission 196 timeout. Otherwise, the TCP behavior remains the same. When the 197 retransmission timer expires, the F-RTO algorithm monitors incoming 198 acknowledgments and if the TCP sender gets an acknowledgment for a 199 segment that was not retransmitted due to the timeout, the F-RTO 200 algorithm declares a timeout spurious. The actions taken in response 201 to a spurious timeout are not specified in this document, but we 202 discuss some alternatives in Section 4. This section introduces the 203 algorithm and then discusses the different steps of the algorithm in 204 more detail. 206 Following the practice used with the Eifel Detection algorithm 207 [LM03], we use the "SpuriousRecovery" variable to indicate whether 208 the retransmission is declared spurious by the sender. This variable 209 can be used as an input for a corresponding response algorithm. With 210 F-RTO, the value of SpuriousRecovery can be either SPUR_TO 211 (indicating a spurious retransmission timeout) or FALSE (indicating 212 that the timeout is not declared spurious and the TCP sender should 213 follow the conventional RTO recovery algorithm). In addition, we use 214 the "recover" variable specified in the NewReno algorithm [FHG04]. 216 2.1. The Algorithm 218 A TCP sender implementing the basic F-RTO algorithm MUST take the 219 following steps after the retransmission timer expires. If the 220 retransmission timer expires again during the execution of the F-RTO 221 algorithm, the TCP sender MUST re-start the algorithm processing 222 from step 1. If the sender implements some loss recovery algorithm 223 other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT 224 be entered when earlier fast recovery is underway. 226 The F-RTO algorithm takes different actions based on whether an 227 incoming acknowledgement advances the cumulative acknowledgement 228 point for a received in-order segment, or whether it is a duplicate 229 acknowledgement to indicate an out-of-order segment. Duplicate 230 acknowledgement is defined in [APB08]. The F-RTO algorithm does not 231 specify actions for receiving a segment that neither acknowledges 232 new data nor is a duplicate acknowledgement. The TCP sender SHOULD 233 ignore such segments and wait for a segment that either acknowledges 234 new data or is a duplicate acknowledgment. 236 1) When the retransmission timer expires, retransmit the first 237 unacknowledged segment and set SpuriousRecovery to FALSE. If the 238 TCP sender is already in RTO recovery AND "recover" is larger 239 than or equal to SND.UNA (the oldest unacknowledged sequence 240 number [Pos81]), do not enter step 2 of this algorithm. Instead, 241 store the highest sequence number transmitted so far in variable 242 "recover" and continue with slow start retransmissions following 243 the conventional RTO recovery algorithm. 245 2) When the first acknowledgment after the RTO retransmission 246 arrives at the TCP sender, store the highest sequence number 247 transmitted so far in variable "recover". The TCP sender chooses 248 one of the following actions, depending on whether the ACK 249 advances the window or whether it is a duplicate ACK. 251 a) If the acknowledgment is a duplicate ACK OR the 252 Acknowledgement field covers "recover" but not more than 253 "recover" OR the acknowledgment does not acknowledge all of 254 the data that was retransmitted in step 1, revert to the 255 conventional RTO recovery and continue by retransmitting 256 unacknowledged data in slow start. Do not enter step 3 of 257 this algorithm. The SpuriousRecovery variable remains as 258 FALSE. 260 b) Else, if the acknowledgment advances the window AND the 261 Acknowledgement field does not cover "recover", transmit up to 262 two new (previously unsent) segments and enter step 3 of this 263 algorithm. If the TCP sender does not have enough unsent data, 264 it can send only one segment. In addition, the TCP sender MAY 265 override the Nagle algorithm [Nag84] and immediately send a 266 segment if needed. Note that sending two segments in this step 267 is allowed by TCP congestion control requirements [APS99]: An 268 F-RTO TCP sender simply chooses different segments to 269 transmit. 271 If the TCP sender does not have any new data to send, or the 272 advertised window prohibits new transmissions, the recommended 273 action is to skip step 3 of this algorithm and continue with 274 slow start retransmissions, following the conventional RTO 275 recovery algorithm. However, alternative ways of handling the 276 window-limited cases that could result in better performance 277 are discussed in Appendix A. 279 3) When the second acknowledgment after the RTO retransmission 280 arrives at the TCP sender, the TCP sender either declares the 281 timeout spurious, or starts retransmitting the unacknowledged 282 segments. 284 a) If the acknowledgment is a duplicate ACK, set the congestion 285 window to no more than 3 * MSS, and continue with the slow 286 start algorithm retransmitting unacknowledged segments. The 287 congestion window can be set to 3 * MSS, because two round- 288 trip times have elapsed since the RTO, and a conventional TCP 289 sender would have increased cwnd to 3 during the same time. 290 Leave SpuriousRecovery set to FALSE. 292 b) If the acknowledgment advances the window (i.e., if it 293 acknowledges data that was not retransmitted after the 294 timeout), declare the timeout spurious, set SpuriousRecovery 295 to SPUR_TO, and set the value of the "recover" variable to 296 SND.UNA (the oldest unacknowledged sequence number [Pos81]). 298 2.2. Discussion 300 The F-RTO sender takes cautious actions when it receives duplicate 301 acknowledgments after a retransmission timeout. Because duplicate 302 ACKs may indicate that segments have been lost, reliably detecting a 303 spurious timeout is difficult due to the lack of additional 304 information. Therefore, it is prudent to follow the conventional 305 TCP recovery in those cases. 307 The condition in step 1 prevents the execution of the F-RTO 308 algorithm in case a previous RTO recovery is underway when the 309 retransmission timer expires, except in case the retransmission 310 timer expires multiple times for the same segment. If the 311 retransmission timer expires during an earlier RTO-based loss 312 recovery, acknowledgements for retransmitted segments may falsely 313 lead the TCP sender to declare the timeout spurious. 315 If the first acknowledgment after the RTO retransmission covers the 316 "recover" point at algorithm step (2a), there is not enough evidence 317 that a non-retransmitted segment has arrived at the receiver after 318 the timeout. This is a common case when a fast retransmission is 319 lost and has been retransmitted again after an RTO, while the rest 320 of the unacknowledged segments were successfully delivered to the 321 TCP receiver before the retransmission timeout. Therefore, the 322 timeout cannot be declared spurious in this case. 324 If the first acknowledgment after the RTO retransmission does not 325 acknowledge all of the data that was retransmitted in step 1, the 326 TCP sender reverts to the conventional RTO recovery. Otherwise, a 327 malicious receiver acknowledging partial segments could cause the 328 sender to declare the timeout spurious in a case where data was 329 lost. 331 The TCP sender is allowed to send two new segments in algorithm 332 branch (2b) because the conventional TCP sender would transmit two 333 segments when the first new ACK arrives after the RTO 334 retransmission. If sending new data is not possible in algorithm 335 branch (2b), or if the receiver window limits the transmission, the 336 TCP sender has to send something in order to prevent the TCP 337 transfer from stalling. If no segments were sent, the pipe between 338 sender and receiver might run out of segments, and no further 339 acknowledgments would arrive. Therefore, in the window-limited 340 case, the recommendation is to revert to the conventional RTO 341 recovery with slow start retransmissions. Appendix A discusses some 342 alternative solutions for window-limited situations. 344 If the retransmission timeout is declared spurious, the TCP sender 345 sets the value of the "recover" variable to SND.UNA in order to 346 allow fast retransmit [FHG04]. The "recover" variable was proposed 347 for avoiding unnecessary, multiple fast retransmits when the 348 retransmission timer expires during fast recovery with NewReno TCP. 349 Because the F-RTO sender retransmits only the segment that triggered 350 the timeout, the problem of unnecessary multiple fast retransmits 351 [FHG04] cannot occur. Therefore, if three duplicate ACKs arrive at 352 the sender after the timeout, they probably indicate a packet loss, 353 and thus fast retransmit should be used to allow efficient recovery. 354 If there are not enough duplicate ACKs arriving at the sender after 355 a packet loss, the retransmission timer expires again and the sender 356 enters step 1 of this algorithm. 358 When the timeout is declared spurious, the TCP sender cannot detect 359 whether the unnecessary RTO retransmission was lost. In principle, 360 the loss of the RTO retransmission should be taken as a congestion 361 signal. Thus, there is a small possibility that the F-RTO sender 362 will violate the congestion control rules, if it chooses to fully 363 revert congestion control parameters after detecting a spurious 364 timeout. The Eifel detection algorithm has a similar property, 365 while the DSACK option can be used to detect whether the 366 retransmitted segment was successfully delivered to the receiver. 368 The F-RTO algorithm has a side-effect on the TCP round-trip time 369 measurement. Because the TCP sender can avoid most of the 370 unnecessary retransmissions after detecting a spurious timeout, the 371 sender is able to take round-trip time samples on the delayed 372 segments. If the regular RTO recovery was used without TCP 373 timestamps, this would not be possible due to the retransmission 374 ambiguity. As a result, the RTO is likely to have more accurate and 375 larger values with F-RTO than with the regular TCP after a spurious 376 timeout that was triggered due to delayed segments. We believe this 377 is an advantage in networks that are prone to delay spikes. 379 There are some situations where the F-RTO algorithm may not avoid 380 unnecessary retransmissions after a spurious timeout. If packet 381 reordering or packet duplication occurs on the segment that 382 triggered the spurious timeout, the F-RTO algorithm may not detect 383 the spurious timeout due to incoming duplicate ACKs. Additionally, 384 if a spurious timeout occurs during fast recovery, the F-RTO 385 algorithm often cannot detect the spurious timeout because the 386 segments that were transmitted before the fast recovery trigger 387 duplicate ACKs. However, we consider these cases rare, and note 388 that in cases where F-RTO fails to detect the spurious timeout, it 389 retransmits the unacknowledged segments in slow start, and thus 390 performs similarly to the regular RTO recovery. 392 3. SACK-Enhanced Version of the F-RTO Algorithm 394 This section describes an alternative version of the F-RTO algorithm 395 that uses the TCP Selective Acknowledgment Option [MMFR96]. By 396 using the SACK option, the TCP sender detects spurious timeouts in 397 most of the cases when packet reordering or packet duplication is 398 present. If the SACK blocks acknowledge new data that was not 399 transmitted after the RTO retransmission, the sender may declare the 400 timeout spurious, even when duplicate ACKs follow the RTO. 402 3.1. The Algorithm 404 Given that the TCP Selective Acknowledgment Option [MMFR96] is 405 enabled for a TCP connection, a TCP sender MAY apply the SACK- 406 enhanced F-RTO algorithm. If the sender applies the SACK-enhanced 407 F-RTO algorithm, it MUST follow the steps below. This algorithm 408 SHOULD NOT be applied if the TCP sender is already in loss recovery 409 when a retransmission timeout occurs. 411 The steps of the SACK-enhanced version of the F-RTO algorithm are as 412 follows. If the retransmission timer expires again during the 413 execution of the SACK-enhanced F-RTO algorithm, the TCP sender MUST 414 re-start the algorithm processing from step 1. 416 1) When the retransmission timer expires, retransmit the first 417 unacknowledged segment and set SpuriousRecovery to FALSE. 418 Following the recommendation in the SACK specification [MMFR96], 419 reset the SACK scoreboard. If "RecoveryPoint" is larger than or 420 equal to SND.UNA, do not enter step 2 of this algorithm. Instead, 421 set variable "RecoveryPoint" to indicate the highest sequence 422 number transmitted so far and continue with slow start 423 retransmissions following the conventional RTO recovery 424 algorithm. 426 2) Wait until the acknowledgment of the data retransmitted due to 427 the timeout arrives at the sender. If duplicate ACKs arrive 428 before the cumulative acknowledgment for retransmitted data, 429 adjust the scoreboard according to the incoming SACK information. 430 Stay in step 2 and wait for the next new acknowledgment. If the 431 retransmission timeout expires again, go to step 1 of the 432 algorithm. When a new acknowledgment arrives, set variable 433 "RecoveryPoint" to indicate the highest sequence number 434 transmitted so far. 436 a) If the Cumulative Acknowledgement field covers "RecoveryPoint" 437 but not more than "RecoveryPoint", revert to the conventional 438 RTO recovery and set the congestion window to no more than 2 * 439 MSS, like a regular TCP would do. Do not enter step 3 of this 440 algorithm. 442 b) Else, if the Cumulative Acknowledgement field does not cover 443 "RecoveryPoint" but is larger than SND.UNA, transmit up to two 444 new (previously unsent) segments and proceed to step 3. If 445 the TCP sender is not able to transmit any previously unsent 446 data -- either due to receiver window limitation or because it 447 does not have any new data to send -- the recommended action 448 is to refrain from entering step 3 of this algorithm. Rather, 449 continue with slow start retransmissions following the 450 conventional RTO recovery algorithm. 452 It is also possible to apply some of the alternatives for 453 handling window-limited cases discussed in Appendix A. 455 3) The next acknowledgment arrives at the sender. Either a 456 duplicate ACK or a new cumulative ACK (advancing the window) 457 applies in this step. Other types of ACKs are ignored without any 458 action. 460 a) If the Cumulative Acknowledgement field or a SACK block covers 461 more than "RecoveryPoint", set the congestion window to no 462 more than 3 * MSS and proceed with the conventional RTO 463 recovery, retransmitting unacknowledged segments. Take this 464 branch also when the acknowledgment is a duplicate ACK and it 465 does not acknowledge any new, previously unacknowledged data 466 below "RecoveryPoint" in the SACK blocks. Leave 467 SpuriousRecovery set to FALSE. 469 b) If the Cumulative Acknowledgement field or a SACK block in the 470 ACK does not cover more than "RecoveryPoint" AND it 471 acknowledges data that was not acknowledged earlier (either 472 with cumulative acknowledgment or using SACK blocks), declare 473 the timeout spurious and set SpuriousRecovery to SPUR_TO. The 474 retransmission timeout can be declared spurious, because the 475 segment acknowledged with this ACK was transmitted before the 476 timeout. 478 If there are unacknowledged holes between the received SACK blocks, 479 those segments are retransmitted similarly to the conventional SACK 480 recovery algorithm [BAFW03]. If the algorithm exits with 481 SpuriousRecovery set to SPUR_TO, "RecoveryPoint" is set to SND.UNA, 482 thus allowing fast recovery on incoming duplicate acknowledgments. 484 3.2. Discussion 486 The SACK enhanced algorithm works on the same principle as the basic 487 algorithm, but by utilizing the additional information from the SACK 488 option. When a genuine retransmission timeout occurs during a steady 489 state of a connection, it can be assumed that there are no segments 490 left in the pipe. Otherwise, the acknowledgments triggered by these 491 segments would have triggered the SACK loss recovery or transmission 492 of new segments. Therefore, if the F-RTO sender receives 493 acknowledgements for segments transmitted before the retransmission 494 timeout in response to the two new segments sent at the algorithm 495 step 2, the normal operation of TCP has been just delayed, and the 496 retransmission timeout is considered spurious. Note that this 497 reasoning works only when the TCP sender is not in loss recovery at 498 the time the retransmission timeout occurs. The condition in step 1 499 checking that "RecoveryPoint" is larger than or equal to SND.UNA 500 prevents the execution of the F-RTO algorithm in case a previous 501 loss recovery, either RTO recovery or SACK loss recovery, is 502 underway when the retransmission timer expires. It, however, allows 503 the execution of the F-RTO algorithm, if the retransmission timer 504 expires multiple times for the same segment. 506 4. Taking Actions after Detecting Spurious RTO 508 Upon a retransmission timeout, a conventional TCP sender assumes 509 that outstanding segments are lost and starts retransmitting the 510 unacknowledged segments. When the retransmission timeout is 511 detected to be spurious, the TCP sender should not continue 512 retransmitting based on the timeout. For example, if the sender was 513 in congestion avoidance phase transmitting new, previously unsent 514 segments, it should continue transmitting previously unsent segments 515 in congestion avoidance. 517 There are currently two alternatives specified for a spurious 518 timeout response algorithm, the Eifel Response Algorithm [LG04], and 519 an algorithm for adapting the retransmission timeout after a 520 spurious RTO [BBA06]. If no specific response algorithm is 521 implemented, the TCP SHOULD respond to spurious timeout 522 conservatively, applying the TCP congestion control specification 523 [APS99]. Different response algorithms for spurious retransmission 524 timeouts have been analyzed in some research papers [GL03, Sar03] 525 and IETF documents [SL03]. 527 5. Evaluation of RFC 4138 529 F-RTO was first specified in an Experimental RFC 4138 that has been 530 implemented in a number of operating systems since it was published. 531 Gained experience has been documented in a separate document 532 [KYHS07], and can be summarized as follows. 534 If the TCP sender employs F-RTO, it is able to detect spurious RTOs 535 and avoid the unnecessary retransmission of the whole window of 536 data. Because F-RTO avoids the unnecessary retransmissions after a 537 spurious RTO, it is able to adhere to the packet conservation 538 principle, unlike a regular TCP that enters the slow-start recovery 539 unnecessarily an inappropriately restarts the ACK clock while there 540 are segments outstanding in the network. When a spurious RTO has 541 been detected, a sender can select an appropriate congestion control 542 response instead of setting the congestion window to one segment. 543 Because F-RTO avoids unnecessary retransmissions, it is able to take 544 the RTT of the delayed segments into account when calculating the 545 RTO estimate, which may help in avoiding further spurious 546 retransmission timeouts. 548 Experimental results with the basic F-RTO have been reported in an 549 emulated network using a Linux implementation [SKR03]. Also 550 different congestion control responses along with the SACK-enhanced 551 version of F-RTO were tested in a similar environment [Sar03]. There 552 are publications analyzing F-RTO performance over commercial W-CDMA 553 networks, and in an emulated HSDPA network [Yam05, Hok05]. Also 554 Microsoft reported positive experiences with their implementation of 555 F-RTO in the IETF-68 meeting. 557 It is known that some spurious RTOs may remain undetected by F-RTO 558 if duplicate acknowledgements arrive at the sender immediately after 559 the spurious RTO, for example due to packet reordering or packet 560 loss. There are rare corner cases where F-RTO could "hide" a packet 561 loss and therefore lead to inappropriate behavior with non- 562 conservative congestion control response: first, if a massive packet 563 reordering occurred so that the acknowledgement of RTO 564 retransmission arrived at the sender before the acknowledgments of 565 original transmissions, the sender might not detect the loss of the 566 segment that triggered the RTO. Second, a malicious receiver could 567 lead F-RTO to make a wrong conclusion after an RTO by acknowledging 568 segments it has not received. Such receiver would, however, risk 569 breaking the consistency of the TCP state between the sender and 570 receiver, causing the connection to become unusable, which cannot be 571 of any benefit to the receiver. Therefore we believe it is not 572 likely that receivers would start employing such tricks in a 573 significant scale. Finally, loss of the unnecessary RTO 574 retransmission cannot be detected without using some explicit 575 acknowledgement scheme such as DSACK. This is common to the other 576 mechanisms for detecting spurious RTO, as well as to regular TCP 577 that does not use DSACK. We note that if the congestion control 578 response to spurious RTO is conservative enough, the above corner 579 cases do not cause problems due to increased congestion. 581 6. Security Considerations 583 The main security threat regarding F-RTO is the possibility that a 584 receiver could mislead the sender into setting too large a 585 congestion window after an RTO. There are two possible ways a 586 malicious receiver could trigger a wrong output from the F-RTO 587 algorithm. First, the receiver can acknowledge data that it has not 588 received. Second, it can delay acknowledgment of a segment it has 589 received earlier, and acknowledge the segment after the TCP sender 590 has been deluded to enter algorithm step 3. 592 If the receiver acknowledges a segment it has not really received, 593 the sender can be led to declare spurious timeout in the F-RTO 594 algorithm, step 3. However, because the sender will have an 595 incorrect state, it cannot retransmit the segment that has never 596 reached the receiver. Therefore, this attack is unlikely to be 597 useful for the receiver to maliciously gain a larger congestion 598 window. 600 A common case for a retransmission timeout is that a fast 601 retransmission of a segment is lost. If all other segments have 602 been received, the RTO retransmission causes the whole window to be 603 acknowledged at once. This case is recognized in F-RTO algorithm 604 branch (2a). However, if the receiver only acknowledges one segment 605 after receiving the RTO retransmission, and then the rest of the 606 segments, it could cause the timeout to be declared spurious when it 607 is not. Therefore, it is suggested that, when an RTO occurs during 608 the fast recovery phase, the sender would not fully revert the 609 congestion window even if the timeout was declared spurious. 610 Instead, the sender would reduce the congestion window to 1. 612 If there is more than one segment missing at the time of a 613 retransmission timeout, the receiver does not benefit from 614 misleading the sender to declare a spurious timeout because the 615 sender would have to go through another recovery period to 616 retransmit the missing segments, usually after an RTO has elapsed. 618 7. IANA Considerations 620 This document has no actions for IANA. 622 8. Acknowledgements 624 The authors would like to thank Alfred Hoenes, Ilpo Jarvinen and 625 Murari Sridharan for the comments on this document. 627 We are also thankful to Reiner Ludwig, Andrei Gurtov, Josh Blanton, 628 Mark Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias 629 Rodriguez, Sourabh Ladha, Martin Duke, Motoharu Miyake, Ted Faber, 630 Samu Kontinen, and Kostas Pentikousis who gave valuable feedback 631 during the preparation of RFC 4138, the precursor of this document. 633 Appendix 635 A. Discussion of Window-Limited Cases 637 When the advertised window limits the transmission of two new 638 previously unsent segments, or there are no new data to send, it is 639 recommended in F-RTO algorithm step (2b) that the TCP sender 640 continue with the conventional RTO recovery algorithm. The 641 disadvantage is that the sender may continue unnecessary 642 retransmissions due to possible spurious timeout. This section 643 briefly discusses the options that can potentially improve 644 performance when transmitting previously unsent data is not 645 possible. 647 - The TCP sender could reserve an unused space of a size of one or 648 two segments in the advertised window to ensure the use of 649 algorithms such as F-RTO or Limited Transmit [ABF01] in receiver 650 window-limited situations. On the other hand, while doing this, 651 the TCP sender should ensure that the window of outstanding 652 segments is large enough for proper utilization of the available 653 pipe. 655 - Use additional information if available, e.g., TCP timestamps with 656 the Eifel Detection algorithm, for detecting a spurious timeout. 657 However, Eifel detection may yield different results from F-RTO 658 when ACK losses and an RTO occur within the same round-trip time 659 [SKR03]. 661 - Retransmit data from the tail of the retransmission queue and 662 continue with step 3 of the F-RTO algorithm. It is possible that 663 the retransmission will be made unnecessarily. Furthermore, the 664 operation of the SACK-based F-RTO algorithm would need to consider 665 this case separately, to not use the retransmitted segment to 666 indicate spurious timeout. Given these considerations, this option 667 is not recommended. 669 - Send a zero-sized segment below SND.UNA, similar to a TCP Keep- 670 Alive probe, and continue with step 3 of the F-RTO algorithm. 671 Because the receiver replies with a duplicate ACK, the sender is 672 able to detect whether the timeout was spurious from the incoming 673 acknowledgment. This method does not send data unnecessarily, but 674 it delays the recovery by one round-trip time in cases where the 675 timeout was not spurious. Therefore, this method is not 676 encouraged. 678 - In receiver-limited cases, send one octet of new data, regardless 679 of the advertised window limit, and continue with step 3 of the F- 680 RTO algorithm. It is possible that the receiver will have free 681 buffer space to receive the data by the time the segment has 682 propagated through the network, in which case no harm is done. If 683 the receiver is not capable of receiving the segment, it rejects 684 the segment and sends a duplicate ACK. 686 B. Changes since RFC 4138 688 Changes from RFC 4138 are summarized below, apart from minor editing 689 and language improvements. 691 * Modified the basic F-RTO algorithm and the SACK-enhanced F-RTO 692 algorithm to prevent the TCP sender from applying the F-RTO 693 algorithm if the retransmission timer expires when an earlier RTO 694 recovery is underway, except when the retransmission timer expires 695 multiple times for the same segment. 697 * Clarified behavior on multiple timeouts. 699 * Added a paragraph on acknowledgements that do not acknowledge new 700 data but are not duplicate acknowledgements. 702 * Clarified the SACK-algorithm a bit, and added one paragraph of 703 description of the basic idea of the algorithm. 705 * Removed SCTP considerations. 707 * Removed earlier Appendix sections, except Appendix C from RFC 708 4138, which is now Appendix A. 710 * Clarified text about the possible response algorithms. 712 * Added section that summarizes the evaluation of RFC 4138. 714 References 716 Normative References 718 [APS99] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion 719 Control", RFC 2581, April 1999. 721 [APB08] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 722 Control", Internet-Draft "draft-ietf-tcpm- 723 rfc2581bis-04.txt", 724 April 2008. 726 [BAFW03] Blanton, E., Allman, M., Fall, K., and L. Wang, "A 727 Conservative Selective Acknowledgment (SACK)-based Loss 728 Recovery Algorithm for TCP", RFC 3517, April 2003. 730 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 731 Requirement Levels", BCP 14, RFC 2119, March 1997. 733 [FHG04] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno 734 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 735 April 2004. 737 [MMFR96] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 738 Selective Acknowledgement Options", RFC 2018, 739 October 1996. 741 [PA00] Paxson, V. and M. Allman, "Computing TCP's Retransmission 742 Timer", RFC 2988, November 2000. 744 [Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 745 793, September 1981. 747 Informative References 749 [ABF01] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing 750 TCP's Loss Recovery Using Limited Transmit", RFC 3042, 751 January 2001. 753 [BA04] Blanton, E. and M. Allman, "Using TCP Duplicate Selective 754 Acknowledgement (DSACKs) and Stream Control Transmission 755 Protocol (SCTP) Duplicate Transmission Sequence Numbers 756 (TSNs) to Detect Spurious Retransmissions", RFC 3708, 757 February 2004. 759 [BBA06] Blanton, J., Blanton, E., and M. Allman, "Using Spurious 760 Retransmissions to Adapt the Retransmission Timeout", 761 Internet-Draft "draft-allman-rto-backoff-04.txt", December 762 2006. Work in progress. 764 [BBJ92] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions 765 for High Performance", RFC 1323, May 1992. 767 [FMMP00] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 768 Extension to the Selective Acknowledgement (SACK) Option 769 for TCP", RFC 2883, July 2000. 771 [GL02] Gurtov A. and R. Ludwig, "Evaluating the Eifel Algorithm 772 for TCP in a GPRS Network", In Proc. European Wireless, 773 Florence, Italy, February 2002. 775 [GL03] Gurtov A. and R. Ludwig, "Responding to Spurious Timeouts 776 in TCP", In Proc. IEEE INFOCOM 03, San Francisco, CA, USA, 777 March 2003. 779 [Jac88] Jacobson, V., "Congestion Avoidance and Control", In 780 Proc. ACM SIGCOMM 88. 782 [Hok05] Hokamura, A., et al., "Performance Evaluation of F-RTO and 783 Eifel Response Algorithms over W-CDMA packet network", In 784 Proc. Wireless Personal Multimedia Communications 785 (WPMC'05), 786 Sept. 2005. 788 [KYHS07] Kojo, M., Yamamoto, K., Hata, M., and P. Sarolahti, 789 "Evaluation of RFC 4138", Internet-draft 790 "draft-kojo-tcpm-frto-eval-00.txt", June 2007. Work 791 in progress. 793 [LG04] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm 794 for TCP", RFC 4015, February 2005. 796 [LK00] Ludwig R. and R.H. Katz, "The Eifel Algorithm: Making TCP 797 Robust Against Spurious Retransmissions", ACM SIGCOMM 798 Computer Communication Review, 30(1), January 2000. 800 [LM03] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 801 for TCP", RFC 3522, April 2003. 803 [Nag84] Nagle, J., "Congestion Control in IP/TCP Internetworks", 804 RFC 896, January 1984. 806 [SK05] Sarolahti, P. and M. Kojo, "Forward RTO-Recovery (F-RTO): 807 An Algorithm for Detecting Spurious Retransmission 808 Timeouts with TCP and the Stream Control Transmission 809 Protocol (SCTP)", RFC 4138, August 2005. 811 [SKR03] Sarolahti, P., Kojo, M., and K. Raatikainen, "F-RTO: An 812 Enhanced Recovery Algorithm for TCP Retransmission 813 Timeouts", ACM SIGCOMM Computer Communication Review, 814 33(2), April 2003. 816 [Sar03] P. Sarolahti, P., "Congestion Control on Spurious TCP 817 Retransmission Timeouts", In Proc. of IEEE Globecom 818 2003, San Francisco, CA, USA. December 2003. 820 [SL03] Swami Y. and K. Le, "DCLOR: De-correlated Loss Recovery 821 using SACK Option for Spurious Timeouts", Expired 822 Internet-Draft, September 2003. 824 [Ste07] Stewart, R., Ed., "Stream Control Transmission Protocol", 825 RFC 4960, September 2007. 827 [Yam05] Yamamoto, K., et al., "Effects of F-RTO and Eifel Response 828 Algorithms for W-CDMA and HSDPA networks", In Proc. 829 Wireless 830 Personal Multimedia Communications (WPMC'05), September 831 2005. 833 AUTHORS' ADDRESSES 835 Pasi Sarolahti 836 Nokia Research Center 837 P.O. Box 407 838 FI-00045 NOKIA GROUP 839 Finland 840 Phone: +358 50 4876607 841 Email: pasi.sarolahti@nokia.com 843 Markku Kojo 844 University of Helsinki 845 P.O. Box 68 846 FI-00014 UNIVERSITY OF HELSINKI 847 Finland 848 Email: kojo@cs.helsinki.fi 850 Kazunori Yamamoto 851 NTT Docomo, Inc. 852 3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan 853 Phone: +81-46-840-3812 854 Email: yamamotokaz@nttdocomo.co.jp 856 Max Hata 857 NTT Docomo, Inc. 858 3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan 859 Phone: +81-46-840-3812 860 Email: hatama@s1.nttdocomo.co.jp 862 Full Copyright Statement 864 Copyright (C) The IETF Trust (2008). 866 This document is subject to the rights, licenses and restrictions 867 contained in BCP 78, and except as set forth therein, the authors 868 retain all their rights. 870 This document and the information contained herein are provided on 871 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 872 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 873 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 874 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 875 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 876 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 877 FOR A PARTICULAR PURPOSE. 879 Intellectual Property 881 The IETF takes no position regarding the validity or scope of any 882 Intellectual Property Rights or other rights that might be claimed 883 to pertain to the implementation or use of the technology described 884 in this document or the extent to which any license under such 885 rights might or might not be available; nor does it represent that 886 it has made any independent effort to identify any such rights. 887 Information on the procedures with respect to rights in RFC 888 documents can be found in BCP 78 and BCP 79. 890 Copies of IPR disclosures made to the IETF Secretariat and any 891 assurances of licenses to be made available, or the result of an 892 attempt made to obtain a general license or permission for the use 893 of such proprietary rights by implementers or users of this 894 specification can be obtained from the IETF on-line IPR repository 895 at http://www.ietf.org/ipr. 897 The IETF invites any interested party to bring to its attention any 898 copyrights, patents or patent applications, or other proprietary 899 rights that may cover technology that may be required to implement 900 this standard. Please address the information to the IETF at ietf- 901 ipr@ietf.org.