idnits 2.17.1 draft-ietf-tsvwg-newreno-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 19 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 20 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([RFC2018], [RFC2581], [FF96], [RFC2582], [Hoe95]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 517: '... retransmission of three packets [Gur03]. The TCP sender MAY use such...' RFC 2119 keyword, line 593: '... [RFC2581] specifies that "Out-of-order data segments SHOULD be...' RFC 2119 keyword, line 748: '...two heuristics that the TCP sender MAY...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2003) is 7529 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2988' is defined on line 793, but no explicit reference was found in the text == Unused Reference: 'Hen98' is defined on line 830, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782) ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) -- Obsolete informational reference (is this intentional?): RFC 2001 (ref. 'F98') (Obsoleted by RFC 2581) -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force S. Floyd 3 INTERNET DRAFT ICSI 4 draft-ietf-tsvwg-newreno-01.txt T. Henderson 5 Boeing 6 A. Gurtov 7 U. Helsinki 8 September 2003 10 The NewReno Modification to TCP's Fast Recovery Algorithm 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC2026. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract 35 RFC 2581 [RFC2581] documents the following four intertwined TCP 36 congestion control algorithms: Slow Start, Congestion Avoidance, Fast 37 Retransmit, and Fast Recovery. RFC 2581 [RFC2581] explicitly allows 38 certain modifications of these algorithms, including modifications 39 that use the TCP Selective Acknowledgement (SACK) option [RFC2018], 40 and modifications that respond to "partial acknowledgments" (ACKs 41 which cover new data, but not all the data outstanding when loss was 42 detected) in the absence of SACK. The NewReno mechanism uses an 43 algorithm for responding to partial acknowledgments that was first 44 proposed by Janey Hoe in [Hoe95]. 46 RFC 2582 [RFC2582] specified the NewReno mechanisms as Experimental 47 in 1999. This document is a small revision of RFC 2582 intended to 48 advance the NewReno mechanisms to Proposed Standard. RFC 2581 notes 49 that the Fast Retransmit/Fast Recovery algorithm specified in that 50 document does not recover very efficiently from multiple losses in a 51 single flight of packets, and that RFC 2582 contains one set of 52 modifications to address this problem. 54 NOTE TO THE RFC EDITOR: PLEASE REMOVE THIS SECTION UPON PUBLICATION. 56 Changes from draft-ietf-tsvwg-newreno-00.txt: 58 * In Section 8, added a cautionary note about using the duplicate 59 acknowledgment counter as a flag for whether Fast Recovery is in 60 effect. 62 * In Section 8, added a note about pulling along "recover" with 63 "snd_una" when Fast Recovery is not in effect. 65 * Added a discussion in Section 6 about heuristics for distinguishing 66 between a retransmitted packet that was dropped, and three duplicate 67 acknowledgements simply from the unnecessary retransmission of three 68 packets. 70 * Added more text and examples for comparing the Impatient and the 71 Slow-but-Steady variants. 73 * In Section 8, added a cautionary note saying that when the sender 74 is not in Fast Retransmit, the sender should not use the Fast 75 Recovery response to multiple duplicate acknowledgements. 77 Changes from draft-floyd-newreno-00.txt: 79 * In Section 8 on "Implementation issues for the data sender", 80 mentioned alternate methods for limiting bursts when exiting Fast 81 Recovery. 83 * Changed draft from draft-floyd-newreno to draft-ietf-tsvwg-newreno 85 Changes from RFC 2582: 87 * Rephrasing and rearrangements of the text. 89 * RFC 2582 described the Careful and Less Careful variants of 90 NewReno, along with a default version that was neither Careful nor 91 Less Careful, and recommended the Careful variant. This document 92 only specifies the Careful version. 94 * RFC 2582 used two separate variables, "send_high" and "recover", 95 and this document has merged them into a single variable "recover". 97 * Added sections on "Comparisons between Reno and NewReno TCP", and 98 on "Changes relative to RFC 2582". The section on "Comparisons 99 between Reno and NewReno TCP" includes a discussion of the one area 100 where NewReno is known to perform worse than Reno or SACK, and that 101 is in the response to reordering. 103 * Moved all of the discussions of the Impatient and Slow-but-Steady 104 variants to one place, and recommended the Impatient variant (as in 105 the default version in RFC 2582). 107 * Added a section on Implementation issues for the data sender, 108 mentioning maxburst_. 110 * Added a paragraph about differences between RFC 2582 and [FF96]. 112 END OF NOTE TO RFC EDITOR 114 1. Introduction 116 For the typical implementation of the TCP Fast Recovery algorithm 117 described in [RFC2581] (first implemented in the 1990 BSD Reno 118 release, and referred to as the Reno algorithm in [FF96]), the TCP 119 data sender only retransmits a packet after a retransmit timeout has 120 occurred, or after three duplicate acknowledgements have arrived 121 triggering the Fast Retransmit algorithm. A single retransmit 122 timeout might result in the retransmission of several data packets, 123 but each invocation of the Fast Retransmit algorithm in RFC 2581 124 leads to the retransmission of only a single data packet. 126 Problems can arise, therefore, when multiple packets have been 127 dropped from a single window of data and the Fast Retransmit and Fast 128 Recovery algorithms are invoked. In this case, if the SACK option is 129 available, the TCP sender has the information to make intelligent 130 decisions about which packets to retransmit and which packets not to 131 retransmit during Fast Recovery. This document applies only for TCP 132 connections that are unable to use the TCP Selective Acknowledgement 133 (SACK) option, either because the option is not locally supported or 134 because the TCP peer did not indicate a willingness to use SACK. 136 In the absence of SACK, there is little information available to the 137 TCP sender in making retransmission decisions during Fast Recovery. 138 From the three duplicate acknowledgements, the sender infers a packet 139 loss, and retransmits the indicated packet. After this, the data 140 sender could receive additional duplicate acknowledgements, as the 141 data receiver acknowledges additional data packets that were already 142 in flight when the sender entered Fast Retransmit. 144 In the case of multiple packets dropped from a single window of data, 145 the first new information available to the sender comes when the 146 sender receives an acknowledgement for the retransmitted packet (that 147 is, the packet retransmitted when Fast Retransmit was first entered). 148 If there had been a single packet drop and no reordering, then the 149 acknowledgement for this packet will acknowledge all of the packets 150 transmitted before Fast Retransmit was entered. However, when there 151 were multiple packet drops, then the acknowledgement for the 152 retransmitted packet will acknowledge some but not all of the packets 153 transmitted before the Fast Retransmit. We call this acknowledgement 154 a partial acknowledgment. 156 Along with several other suggestions, [Hoe95] suggested that during 157 Fast Recovery the TCP data sender responds to a partial 158 acknowledgment by inferring that the next in-sequence packet has been 159 lost, and retransmitting that packet. This document describes a 160 modification to the Fast Recovery algorithm in RFC 2581 that 161 incorporates a response to partial acknowledgements received during 162 Fast Recovery. We call this modified Fast Recovery algorithm 163 NewReno, because it is a slight but significant variation of the 164 basic Reno algorithm in RFC 2581. This document does not discuss the 165 other suggestions in [Hoe95] and [Hoe96], such as a change to the 166 ssthresh parameter during Slow-Start, or the proposal to send a new 167 packet for every two duplicate acknowledgements during Fast Recovery. 168 The version of NewReno in this document also draws on other 169 discussions of NewReno in the literature [LM97]. 171 We do not claim that the NewReno version of Fast Recovery described 172 here is an optimal modification of Fast Recovery for responding to 173 partial acknowledgements, for TCP connections that are unable to use 174 SACK. Based on our experiences with the NewReno modification in the 175 NS simulator [NS] and with numerous implementations of NewReno, we 176 believe that this modification improves the performance of the Fast 177 Retransmit and Fast Recovery algorithms in a wide variety of 178 scenarios. 180 2. Terminology and Definitions 182 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 183 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 184 and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 185 and indicate requirement levels for compliant TCP implementations 186 implementing the NewReno Fast Retransmit and Fast Recovery algorithms 187 described in this document. 189 This document assumes that the reader is familiar with the terms 190 SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and 191 FLIGHT SIZE (FlightSize) defined in [RFC2581]. FLIGHT SIZE is 192 defined as in [RFC2581] as follows: 194 FLIGHT SIZE: 195 The amount of data that has been sent but not yet acknowledged. 197 3. The Fast Retransmit and Fast Recovery Algorithms in NewReno 199 The standard implementation of the Fast Retransmit and Fast Recovery 200 algorithms is given in [RFC2581]. This section specifies the basic 201 NewReno algorithm. Sections 4 through 6 describe some optional 202 variants, and the motivations behind them, that an implementor may 203 want to consider when tuning performance for certain network 204 scenarios. Sections 7 and 8 provide some guidance to implementors 205 based on experience with NewReno implementations. 207 The NewReno modification concerns the Fast Recovery procedure that 208 begins when three duplicate ACKs are received and ends when either a 209 retransmission timeout occurs or an ACK arrives that acknowledges all 210 of the data up to and including the data that was outstanding when 211 the Fast Recovery procedure began. 213 The NewReno algorithm specified in this document differs from the 214 implementation in [RFC2581] in the introduction of the variable 215 "recover" in step 1, in the response to a partial or new 216 acknowledgement in step 5, and in modifications to step 1 and the 217 addition of step 6 for avoiding multiple Fast Retransmits caused by 218 the retransmission of packets already received by the receiver. 220 The algorithm specified in this document uses a variable "recover", 221 whose initial value is the initial send sequence number. 223 1) Three duplicate ACKs: 224 When the third duplicate ACK is received and the sender is not 225 already in the Fast Recovery procedure, check to see if the 226 Cumulative Acknowledgement field covers more than "recover". 227 If so, go to Step 1A. Otherwise, go to Step 1B. 229 1A) Invoking Fast Retransmit: 230 If so, then set ssthresh to no more than the value given in 231 equation 1 below. (This is equation 3 from [RFC2581]). 233 ssthresh = max (FlightSize / 2, 2*SMSS) (1) 235 In addition, record the highest sequence number transmitted in 236 the variable "recover", and go to Step 2. 238 1B) Not invoking Fast Retransmit: 239 Do not enter the Fast Retransmit and Fast Recovery procedure. 240 In particular, do not change ssthresh, do not go to Step 2 to 241 retransmit the "lost" segment, and do not execute Step 3 upon 242 subsequent duplicate ACKs. 244 2) Entering Fast Retransmit: 245 Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS. 246 This artificially "inflates" the congestion window by the number 247 of segments (three) that have left the network and which the 248 receiver has buffered. 250 3) Fast Recovery: 251 For each additional duplicate ACK received while in Fast 252 Recovery, increment cwnd by SMSS. This artificially inflates 253 the congestion window in order to reflect the additional segment 254 that has left the network. 256 4) Fast Recovery, continued: 257 Transmit a segment, if allowed by the new value of cwnd and the 258 receiver's advertised window. 260 5) When an ACK arrives that acknowledges new data, this ACK could be 261 the acknowledgment elicited by the retransmission from step 2, or 262 elicited by a later retransmission. 264 Full acknowledgements: 265 If this ACK acknowledges all of the data up to and including 266 "recover", then the ACK acknowledges all the intermediate 267 segments sent between the original transmission of the lost 268 segment and the receipt of the third duplicate ACK. Set cwnd to 269 either (1) min (ssthresh, FlightSize + SMSS); or (2) ssthresh, 270 where ssthresh is the value set in step 1; this is termed 271 "deflating" the window. (We note that "FlightSize" in step 1 272 referred to the amount of data outstanding in step 1, when Fast 273 Recovery was entered, while "FlightSize" in step 5 refers to the 274 amount of data outstanding in step 5, when Fast Recovery is 275 exited.) If the second option is selected, the implementation 276 should take measures to avoid a possible burst of data, in case 277 the amount of data outstanding in the network was much less than 278 the new congestion window allows. A simple mechanism is to limit 279 the number of data packets that can be sent in response to a 280 single acknowledgement. (This is known as "maxburst_" in the NS 281 simulator). Exit the Fast Recovery procedure. 283 Partial acknowledgements: 284 If this ACK does *not* acknowledge all of the data up to and 285 including "recover", then this is a partial ACK. In this case, 286 retransmit the first unacknowledged segment. Deflate the 287 congestion window by the amount of new data acknowledged, then 288 add back one SMSS (if the partial ACK acknowledges at least one 289 SMSS of new data) and send a new segment if permitted by the new 290 value of cwnd. This "partial window deflation" attempts to 291 ensure that, when Fast Recovery eventually ends, approximately 292 ssthresh amount of data will be outstanding in the network. Do 293 not exit the Fast Recovery procedure (i.e., if any duplicate ACKs 294 subsequently arrive, execute Steps 3 and 4 above). 296 For the first partial ACK that arrives during Fast Recovery, also 297 reset the retransmit timer. 299 6) Retransmit timeouts: 300 After a retransmit timeout, record the highest sequence number 301 transmitted in the variable "recover" and exit the Fast 302 Recovery procedure if applicable. 304 Step 1 specifies a check that the Cumulative Acknowledgement field 305 covers more than "recover". Because the acknowledgement field 306 contains the sequence number that the sender next expects to receive, 307 the acknowledgement "ack_number" covers more than "recover" when: 309 ack_number - one > recover. 311 Note that in Step 5, the congestion window is deflated after a 312 partial acknowledgement is received. The congestion window was 313 likely to have been inflated considerably when the partial 314 acknowledgement was received. In addition, depending on the original 315 pattern of packet losses, the partial acknowledgement might 316 acknowledge nearly a window of data. In this case, if the congestion 317 window was not deflated, the data sender might be able to send nearly 318 a window of data back-to-back. 320 This document does not specify the sender's response to duplicate 321 ACKs when the Fast Retransmit/Fast Recovery algorithm is not invoked. 322 This is addressed in other documents, such as those describing the 323 Limited Transmit procedure [RFC3042]. This document also does not 324 address issues of adjusting the duplicate acknowledgement threshold, 325 but assumes the threshold of three duplicate acknowledgements 326 currently specified in RFC 2581. 328 As a final note, we would observe that in the absence of the SACK 329 option, the data sender is working from limited information. When 330 the issue of recovery from multiple dropped packets from a single 331 window of data is of particular importance, the best alternative 332 would be to use the SACK option. 334 4. Resetting the Retransmit Timer in Response to Partial 335 Acknowledgements. 337 One possible variant to the response to partial acknowledgements 338 specified in Section 3 concerns when to reset the retransmit timer 339 after a partial acknowledgement. The algorithm in Section 3, Step 5, 340 resets the retransmit timer only after the first partial ACK. In 341 this case, if a large number of packets were dropped from a window of 342 data, the TCP data sender's retransmit timer will ultimately expire, 343 and the TCP data sender will invoke Slow-Start. (This is illustrated 344 on page 12 of [F98].) We call this the Impatient variant of NewReno. 345 We note that the Impatient variant in Section 3 doesn't follow the 346 recommended algorithm in RFC 2988 of restarting the retransmit timer 347 after every packet transmission or retransmission [RFC2988, Step 348 5.1]. 350 In contrast, the NewReno simulations in [FF96] illustrate the 351 algorithm described above with the modification that the retransmit 352 timer is reset after each partial acknowledgement. We call this the 353 Slow-but-Steady variant of NewReno. In this case, for a window with 354 a large number of packet drops, the TCP data sender retransmits at 355 most one packet per roundtrip time. (This behavior is illustrated in 356 the New-Reno TCP simulation of Figure 5 in [FF96], and on page 11 of 357 [F98]. 359 When N packets have been dropped from a window of data for a large 360 value of N, the Slow-but-Steady variant can remain in Fast Recovery 361 for N round-trip times, retransmitting one more dropped packet each 362 round-trip time; for these scenarios, the Impatient variant gives a 363 faster recovery and better performance. The tests "ns test-suite- 364 newreno.tcl impatient1" and "ns test-suite-newreno.tcl slow1" in the 365 NS simulator illustrate such a scenario, where the Impatient variant 366 performs better than the Slow-but-Steady variant. The Impatient 367 variant can be particularly important for TCP connections with large 368 congestion windows, as illustrated by the tests "ns test-suite- 369 newreno.tcl impatient4" and "ns test-suite-newreno.tcl slow4" in the 370 NS simulator. 372 One can also construct scenarios where the Slow-but-Steady variant 373 gives better performance than the Impatient variant. As an example, 374 this occurs then only a small number of packets are dropped, the RTO 375 is sufficiently small that the retransmit timer expires, and 376 performance would have been better without a retransmit timeout. The 377 tests "ns test-suite-newreno.tcl impatient2" and "ns test-suite- 378 newreno.tcl slow2" in the NS simulator illustrate such a scenario. 380 The Slow-but-Steady variant can also achieve higher goodput than the 381 Impatient variant, by avoiding unnecessary retransmissions. This 382 could be of special interest for cellular links, where every 383 transmission costs battery power and money. The tests "ns test- 384 suite-newreno.tcl impatient3" and "ns test-suite-newreno.tcl slow3" 385 in the NS simulator illustrate such a scenario. The Slow-but-Steady 386 variant can also be more robust to delay variation in the network, 387 where a delay spike might force the Impatient variant into a timeout 388 and go-back-N recovery. 390 Neither of the two variants discussed above are optimal. Our 391 recommendation is for the Impatient variant, as specified in Section 392 3 of this document, because of the poor performance of the Slow-but- 393 Steady variant for TCP connections with large congestion windows. 395 One possibility for a more optimal algorithm would be one that 396 recovered from multiple packet drops as quickly as does slow-start, 397 while resetting the retransmit timers after each partial 398 acknowledgement, as described in the section below. We note, 399 however, that there is a limitation to the potential performance in 400 this case in the absence of the SACK option. 402 5. Retransmissions after a Partial Acknowledgement 404 One possible variant to the response to partial acknowledgements 405 specified in Section 3 would be to retransmit more than one packet 406 after each partial acknowledgement, and to reset the retransmit timer 407 after each retransmission. The algorithm specified in Section 3 408 retransmits a single packet after each partial acknowledgement. This 409 is the most conservative alternative, in that it is the least likely 410 to result in an unnecessarily-retransmitted packet. A variant that 411 would recover faster from a window with many packet drops would be to 412 effectively Slow-Start, retransmitting two packets after each partial 413 acknowledgement. Such an approach would take less than N roundtrip 414 times to recover from N losses [Hoe96]. However, in the absence of 415 SACK, recovering as quickly as slow-start introduces the likelihood 416 of unnecessarily retransmitting packets, and this could significantly 417 complicate the recovery mechanisms. 419 We note that the response to partial acknowledgements specified in 420 Section 3 of this document and in RFC 2582 differs from the response 421 in [FF96], even though both approaches only retransmit one packet in 422 response to a partial acknowledgement. Step 5 of Section 3 specifies 423 that the TCP sender responds to a partial ACK by deflating the 424 congestion window by the amount of new data acknowledged, then adding 425 back one SMSS if the partial ACK acknowledges at least one SMSS of 426 new data, and sending a new segment if permitted by the new value of 427 cwnd. Thus, only one previously-sent packet is retransmitted in 428 response to each partial acknowledgement, but additional new packets 429 might be transmitted as well, depending on the amount of new data 430 acknowledged by the partial acknowledgement. In contrast, the 431 variant of NewReno illustrated in [FF96] simply set the congestion 432 window to ssthresh when a partial acknowledgement was received. The 433 approach in [FF96] is more conservative, and does not attempt to 434 accurately track the actual number of outstanding packets after a 435 partial acknowledgement is received. While either of these 436 approaches gives acceptable performance, the variant specified in 437 Section 3 recovers more smoothly when multiple packets are dropped 438 from a window of data. (The [FF96] behavior can be seen in the NS 439 simulator by setting the variable "partial_window_deflation_" for 440 "Agent/TCP/Newreno" to 0, and the behavior specified in Section 3 is 441 achieved by setting "partial_window_deflation_" to 1.) 443 6. Avoiding Multiple Fast Retransmits 445 This section describes the motivation for the sender's state variable 446 "recover", and discusses possible heuristics for distinguishing 447 between a retransmitted packet that was dropped, and three duplicate 448 acknowledgements simply from the unnecessary retransmission of three 449 packets. 451 In the absence of the SACK option or timestamps, a duplicate 452 acknowledgement carries no information to identify the data packet or 453 packets at the TCP data receiver that triggered that duplicate 454 acknowledgement. In this case, the TCP data sender is unable to 455 distinguish between a duplicate acknowledgement that results from a 456 lost or delayed data packet, and a duplicate acknowledgement that 457 results from the sender's unnecessary retransmission of a data packet 458 that had already been received at the TCP data receiver. Because of 459 this, with the Retransmit and Fast Recovery algorithms in Reno TCP, 460 multiple segment losses from a single window of data can sometimes 461 result in unnecessary multiple Fast Retransmits (and multiple 462 reductions of the congestion window) [F94]. 464 With the Fast Retransmit and Fast Recovery algorithms in Reno TCP, 465 the performance problems caused by multiple Fast Retransmits are 466 relatively minor compared to the potential problems with Tahoe TCP, 467 which does not implement Fast Recovery. Nevertheless, unnecessary 468 Fast Retransmits can occur with Reno TCP unless some explicit 469 mechanism is added to avoid this, such as the use of the "recover" 470 variable. (This modification is called "bugfix" in [F98], and is 471 illustrated on pages 7 and 9 of that document. Unnecessary Fast 472 Retransmits for Reno without "bugfix" is illustrated on page 6 of 473 [F98].) 475 Section 3 of RFC 2582 defined a default variant of NewReno TCP that 476 did not use the variable "recover", and did not check if duplicate 477 ACKs cover the variable "recover" before invoking Fast Retransmit. 479 With this default variant from RFC 2582, the problem of multiple Fast 480 Retransmits from a single window of data can occur after a Retransmit 481 Timeout (as in page 8 of [F98]) or in scenarios with reordering (as 482 in the validation test "./test-all-newreno newreno5_noBF" in 483 directory "tcl/test" of the NS simulator. This gives performance 484 similar to that on page 8 of [F03].) RFC 2582 also defined Careful 485 and Less Careful variants of the NewReno algorithm, and recommended 486 the Careful variant. 488 The algorithm specified in Section 3 of this document corresponds to 489 the Careful variant of NewReno TCP from RFC 2582, and eliminates the 490 problem of multiple Fast Retransmits. This algorithm uses the 491 variable "recover", whose initial value is the initial send sequence 492 number. After each retransmit timeout, the highest sequence number 493 transmitted so far is recorded in the variable "recover". 495 If, after a retransmit timeout, the TCP data sender retransmits three 496 consecutive packets that have already been received by the data 497 receiver, then the TCP data sender will receive three duplicate 498 acknowledgements that do not cover more than "recover". In this 499 case, the duplicate acknowledgements are not an indication of a new 500 instance of congestion. They are simply an indication that the 501 sender has unnecessarily retransmitted at least three packets. 503 However, when a retransmitted packet is itself dropped, the sender 504 can also receive three duplicate acknowledgements that do not cover 505 more than "recover", and in this case the sender would have been 506 better off if it had initiated Fast Retransmit. For a TCP that 507 implements the algorithm specified in Section 3 of this document, the 508 sender does not infer a packet drop from duplicate acknowledgements 509 in this scenario. As always, the retransmit timer is the backup 510 mechanism for inferring packet loss in this case. 512 There are several heuristics, based on timestamps or on the amount of 513 advancement of the cumulative acknowledgement field, that allow the 514 sender to distinguish in some cases between three duplicate 515 acknowledgements following a retransmitted packet that was dropped, 516 and three duplicate acknowledgements simply from the unnecessary 517 retransmission of three packets [Gur03]. The TCP sender MAY use such 518 a heuristic to decide to invoke a Fast Retransmit in some cases even 519 when the three duplicate acknowledgements do not cover more than 520 "recover". 522 For example, when three duplicate acknowledgements are caused by the 523 unnecessary retransmission of three packets, this is likely to be 524 accompanied by the cumulative acknowledgement field advancing by at 525 least four segments. Similarly, a heuristic based on timestamps uses 526 the fact that when there is a hole in the sequence space, the 527 timestamp echoed in the duplicate acknowledgement is the timestamp of 528 the most recent data packet that advanced the cumulative 529 acknowledgement field [RFC1323]. If timestamps are used, and the 530 sender stores the timestamp of the last acknowledged segment, then 531 the timestamp echoed by duplicate acknowledgements can be used to 532 distinguish between a retransmitted packet that was dropped, and 533 three duplicate acknowledgements simply from the unnecessary 534 retransmission of three packets. The heuristics are illustrated in 535 the NS simulator in the validation test "./test-all-newreno". 537 6.1. ACK Heuristic. 539 If the ACK-based heuristic is used, then following the advancement of 540 the cumulative acknowledgement field, the sender stores the value of 541 previous cumulative acknowledgement as prev_highest_ack and stores 542 the latest cumulative ACK as highest_ack. In addition, the following 543 step is performed if Step 1 in Section 3 fails, before proceeding to 544 Step 1B. 546 1*) If the Cumulative Acknowledgement field didn't cover more than 547 "recover", check to see if the congestion window is greater 548 than one SMSS and the difference between highest_ack and 549 prev_highest_ack is at most four SMSS. If true, duplicate 550 ACKs indicate a lost segment (proceed to Step 1A in Section 551 3). Otherwise, duplicate ACKs likely result from unnecessary 552 retransmissions (proceed to Step 1B in Section 3). 554 The congestion window check serves to protect against fast retransmit 555 immediately after a retransmit timeout, similar to the 556 "exitFastRetrans_" variable in NS. Examples of applying the ACK 557 heuristic are in validation tests "./test-all-newreno 558 newreno_rto_loss_ack" and "./test-all-newreno newreno_rto_dup_ack" in 559 directory "tcl/test" of the NS simulator. 561 If several ACKs are lost, the sender can see a jump in the cumulative 562 ACK of more than three segments and the heuristic can fail. A 563 validation test for this scenario is "./test-all-newreno 564 newreno_rto_loss_ackf". The ACK heuristic is more likely to fail if 565 the receiver uses delayed ACKs, because then a smaller number of ACK 566 losses are needed to produce a sufficient jump in the cumulative ACK. 568 6.2. Timestamp Heuristic. 570 If this heuristic is used, the sender stores the timestamp of the 571 last acknowledged segment. In addition, the second paragraph of step 572 1 in Section 3 is replaced as follows: 574 1**) If the Cumulative Acknowledgement field didn't cover more than 575 "recover", check to see if the echoed timestamp equals the 576 stored timestamp. If true, duplicate ACKs indicate a lost 577 segment (proceed to Step 1A in Section 3). Otherwise, duplicate 578 ACKs likely result from unnecessary retransmissions (proceed 579 to Step 1B in Section 3). 581 Examples of applying the timestamp heuristic are in validation tests 582 "./test-all-newreno newreno_rto_loss_tsh" and "./test-all-newreno 583 newreno_rto_dup_tsh". The timestamp heuristic works correctly both 584 when the receiver echoes timestamps as specified by [RFC1323] or by 585 its revision attempts. However, if the receiver arbitrarily echos 586 timestamps, the heuristic can fail. The heuristic can also fail if a 587 timeout was spurious and returning ACKs are not from retransmitted 588 segments. This can be prevented by detection algorithms such as 589 [RFC3522]. 591 7. Implementation Issues for the Data Receiver 593 [RFC2581] specifies that "Out-of-order data segments SHOULD be 594 acknowledged immediately, in order to accelerate loss recovery." 595 Neal Cardwell has noted that some data receivers do not send an 596 immediate acknowledgement when they send a partial acknowledgment, 597 but instead wait first for their delayed acknowledgement timer to 598 expire [C98]. As [C98] notes, this severely limits the potential 599 benefit from NewReno by delaying the receipt of the partial 600 acknowledgement at the data sender. Our recommendation is that the 601 data receiver send an immediate acknowledgement for an out-of-order 602 segment, even when that out-of-order segment fills a hole in the 603 buffer. 605 8. Implementation Issues for the Data Sender 607 In Section 3, Step 5 above, it is noted that implementations should 608 take measures to avoid a possible burst of data when leaving Fast 609 Recovery, in case the amount of new data that the sender is eligible 610 to send due to the new value of the congestion window is large. This 611 can arise during NewReno when ACKs are lost or treated as pure window 612 updates, thereby causing the sender to underestimate the number of 613 new segments that can be sent during the recovery procedure. 614 Specifically, bursts can occur when the FlightSize is much less than 615 the new congestion window when exiting from Fast Recovery. One 616 simple mechanism to avoid a burst of data when leaving Fast Recovery 617 is to limit the number of data packets that can be sent in response 618 to a single acknowledgment. (This is known as "maxburst_" in the ns 619 simulator.) Other possible mechanisms for avoiding bursts include 620 rate-based pacing, or setting the slow-start threshold to the 621 resultant congestion window and then resetting the congestion window 622 to FlightSize. A recommendation on the general mechanism to avoid 623 excessively bursty sending patterns is outside the scope of this 624 document. 626 An implementation may want to use a separate flag to record whether 627 or not it is presently in the Fast Recovery procedure. The use of 628 the value of the duplicate acknowledgment counter as such a flag is 629 not reliable because it can be reset upon window updates and out-of- 630 order acknowledgments. 632 When not in Fast Recovery, the value of the state variable "recover" 633 should be pulled along with the value of the state variable for 634 acknowledgments (typically, "snd_una") so that, when large amounts of 635 data has been sent and acked, the sequence space does not wrap and 636 falsely indicate that Fast Recovery should not be entered (Section 3, 637 step 1, last paragraph). 639 It is important for the sender to respond correctly to duplicate ACKs 640 received when the sender is no longer in Fast Recovery (e.g., because 641 of a Retransmit Timeout). The Limited Transmit procedure [RFC3042] 642 describes possible responses to the first and second duplicate 643 acknowledgements. When three or more duplicate acknowledgements are 644 received, the Cumulative Acknowledgement field doesn't cover more 645 than "recover", and a new Fast Recovery is not invoked, it is 646 important that the sender not execute the Fast Recovery steps (3) and 647 (4) in Section 3. Otherwise, the sender could end up in a chain of 648 spurious timeouts. We mention this only because several NewReno 649 implementations had this bug, including the implementation in the NS 650 simulator. (This bug in the NS simulator was fixed in July 2003, 651 with the variable "exitFastRetrans_".) 653 9. Simulations 655 Simulations with NewReno are illustrated with the validation test 656 "tcl/test/test-all-newreno" in the NS simulator. The command 657 "../../ns test-suite-newreno.tcl reno" shows a simulation with Reno 658 TCP, illustrating the data sender's lack of response to a partial 659 acknowledgement. In contrast, the command "../../ns test-suite- 660 newreno.tcl newreno_B" shows a simulation with the same scenario 661 using the NewReno algorithms described in this paper. 663 10. Comparisons between Reno and NewReno TCP 665 As we stated in the introduction, we believe that the NewReno 666 modification described in this document improves the performance of 667 the Fast Retransmit and Fast Recovery algorithms of Reno TCP in a 668 wide variety of scenarios. This has been discussed in some depth in 669 [FF96], which illustrates Reno TCP's poor performance when multiple 670 packets are dropped from a window of data and also illustrates 671 NewReno TCP's good performance in that scenario. 673 We do, however, know of one scenario where Reno TCP gives better 674 performance than NewReno TCP, that we describe here for the sake of 675 completeness. Consider a scenario with no packet loss, but with 676 sufficient reordering that the TCP sender receives three duplicate 677 acknowledgements. This will trigger the Fast Retransmit and Fast 678 Recovery algorithms. With Reno TCP or with Sack TCP, this will 679 result in the unnecessary retransmission of a single packet, combined 680 with a halving of the congestion window (shown on pages 4 and 6 of 681 [F03]). With NewReno TCP, however, this reordering will also result 682 in the unnecessary retransmission of an entire window of data (shown 683 on page 5 of [F03]). 685 While Reno TCP performs better than NewReno TCP in the presence of 686 reordering, NewReno's superior performance in the presence of 687 multiple packet drops generally outweighs its less optimal 688 performance in the presence of reordering. (Sack TCP is the 689 preferred solution, with good performance in both scenarios.) This 690 document recommends the Fast Retransmit and Fast Recovery algorithms 691 of NewReno TCP instead of those of Reno TCP for those TCP connections 692 that do not support SACK. We would also note that NewReno's Fast 693 Retransmit and Fast Recovery mechanisms are widely deployed in TCP 694 implementations in the Internet today, as documented in [PF01]. For 695 example, tests of TCP implementations in several thousand web servers 696 in 2001 showed that for those TCP connections where the web browser 697 was not SACK-capable, more web servers used the Fast Retransmit and 698 Fast Recovery algorithms of NewReno than those of Reno or Tahoe TCP 699 [PF01]. 701 11. Changes Relative to RFC 2582 703 The purpose of this document is to advance the NewReno's Fast 704 Retransmit and Fast Recovery algorithms in RFC 2582 to Proposed 705 Standard. 707 The main change in this document relative to RFC 2582 is to specify 708 the Careful variant of NewReno's Fast Retransmit and Fast Recovery 709 algorithms. The base algorithm described in RFC 2582 did not attempt 710 to avoid unnecessary multiple Fast Retransmits that can occur after a 711 timeout (described in more detail in the section above). However, 712 RFC 2582 also defined "Careful" and "Less Careful" variants that 713 avoid these unnecessary Fast Retransmits, and recommended the Careful 714 variant. This document specifies the previously-named "Careful" 715 variant as the basic version of NewReno. As described below, this 716 algorithm uses a variable "recover", whose initial value is the send 717 sequence number. 719 The algorithm specified in Section 3 checks whether the 720 acknowledgement field of a partial acknowledgement covers *more* than 721 "recover". Another possible variant would be to require simply that 722 the acknowledgement field *cover* "recover" before initiating another 723 Fast Retransmit. We called this the Less Careful variant in RFC 724 2582. 726 There are two separate scenarios in which the TCP sender could 727 receive three duplicate acknowledgements acknowledging "recover" but 728 no more than "recover". One scenario would be that the data sender 729 transmitted four packets with sequence numbers higher than "recover", 730 that the first packet was dropped in the network, and the following 731 three packets triggered three duplicate acknowledgements 732 acknowledging "recover". The second scenario would be that the 733 sender unnecessarily retransmitted three packets below "recover", and 734 that these three packets triggered three duplicate acknowledgements 735 acknowledging "recover". In the absence of SACK, the TCP sender in 736 unable to distinguish between these two scenarios. 738 For the Careful variant of Fast Retransmit, the data sender would 739 have to wait for a retransmit timeout in the first scenario, but 740 would not have an unnecessary Fast Retransmit in the second scenario. 741 For the Less Careful variant to Fast Retransmit, the data sender 742 would Fast Retransmit as desired in the first scenario, and would 743 unnecessarily Fast Retransmit in the second scenario. This document 744 only specifies the Careful variant in Section 3. Unnecessary Fast 745 Retransmits with the Less Careful variant in scenarios with 746 reordering are illustrated in page 8 of [F03]. 748 The document also specifies two heuristics that the TCP sender MAY 749 use to decide to invoke Fast Retransmit even when the three duplicate 750 acknowledgements do not cover more than "recover". These heuristics, 751 an ACK-based heuristic and a timestamp heuristic, are described in 752 Sections 6.1 and 6.2 respectively. 754 12. Conclusions 756 This document specifies the NewReno Fast Retransmit and Fast Recovery 757 algorithms for TCP. This NewReno modification to TCP can be 758 important even for TCP implementations that support the SACK option, 759 because the SACK option can only be used for TCP connections when 760 both TCP end-nodes support the SACK option. NewReno performs better 761 than Reno (RFC 2581) in a number of scenarios discussed herein. 763 A number of options to the basic algorithm presented in Section 3 are 764 also described. These include the handling of the retransmission 765 timer (Section 4), the response to partial acknowledgments (Section 766 5), and the value of the congestion window when leaving Fast Recovery 767 (section 3, step 5). Our belief is that the differences between 768 these variants of NewReno are small compared to the differences 769 between Reno and NewReno. That is, the important thing is to 770 implement NewReno instead of Reno, for a TCP connection without SACK; 771 it is less important exactly which of the variants of NewReno is 772 implemented. 774 13. Acknowledgements 776 Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu, 777 Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed 778 feedback on this document or on its precursor RFC 2582. 780 14. References 782 Normative References 784 [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective 785 Acknowledgement Options", RFC 2018, October 1996. 787 [RFC2581] W. Stevens, M. Allman, and V. Paxson, "TCP Congestion 788 Control", RFC 2581, April 1999. 790 [RFC2582] S. Floyd and T. Henderson, The NewReno Modification to 791 TCP's Fast Recovery Algorithm, RFC 2582, April 1999. 793 [RFC2988] V. Paxson and M. Allman, Computing TCP's Retransmission 794 Timer, RFC 2988, November 2000. 796 [RFC3042] M. Allman, H. Balakrishnan, and S. Floyd, Enhancing TCP's 797 Loss Recovery Using Limited Transmit, RFC 3042, January 2001. 799 Informative References 801 [C98] N. Cardwell, "delayed ACKs for retransmitted packets: ouch!". 802 November 1998, Email to the tcpimpl mailing list, Message-ID 803 "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu", 804 archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". 806 [F98] S. Floyd, Revisions to RFC 2001, "Presentation to the TCPIMPL 807 Working Group", August 1998. URLs "ftp://ftp.ee.lbl.gov/talks/sf- 808 tcpimpl-aug98.ps" and "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl- 809 aug98.pdf". 811 [F03] S. Floyd, "Moving NewReno from Experimental to Proposed 812 Standard? Presentation to the TSVWG Working Group", March 2003. 813 URLs "http://www.icir.org/floyd/talks/newreno-Mar03.ps" and 814 "http://www.icir.org/floyd/talks/newreno-Mar03.pdf". 816 [FF96] K. Fall and S. Floyd, "Simulation-based Comparisons of Tahoe, 817 Reno and SACK TCP", Computer Communication Review, July 1996. URL 818 "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z". 820 [F94] S. Floyd, "TCP and Successive Fast Retransmits", Technical 821 report, October 1994. URL 822 "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps". 824 [Gur03] A. Gurtov, "[Tsvwg] resolving the problem of unnecessary fast 825 retransmits in go-back-N", email to the tsvwg mailing list, message 826 ID <3F25B467.9020609@cs.helsinki.fi>, July 28, 2003. URL 827 "http://www1.ietf.org/mail-archive/working- 828 groups/tsvwg/current/msg04334.html". 830 [Hen98] T. Henderson, Re: NewReno and the 2001 Revision. September 831 1998. Email to the tcpimpl mailing list, Message ID 832 "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS.Berkeley.EDU", 833 archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". 835 [Hoe95] J. Hoe, "Startup Dynamics of TCP's Congestion Control and 836 Avoidance Schemes", Master's Thesis, MIT, 1995. URL "http://ana- 837 www.lcs.mit.edu/anaweb/ps-papers/hoe-thesis.ps". 839 [Hoe96] J. Hoe, "Improving the Start-up Behavior of a Congestion 840 Control Scheme for TCP", ACM SIGCOMM, August 1996. URL 841 "http://www.acm.org/sigcomm/sigcomm96/program.html". 843 [LM97] D. Lin and R. Morris, "Dynamics of Random Early Detection", 844 SIGCOMM 97, September 1997. URL 845 "http://www.acm.org/sigcomm/sigcomm97/program.html". 847 [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/". 849 [PF01] J. Padhye and S. Floyd, "Identifying the TCP Behavior of Web 850 Servers", June 2001, SIGCOMM 2001. 852 [RFC1323] V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for 853 High Performance,", RFC 1323, May 1992. 855 [RFC3522] R. Ludwig and M. Meyer, The Eifel Detection Algorithm for 856 TCP, RFC 3522, April 2003. 858 15. Security Considerations 860 RFC 2581 discusses general security considerations concerning TCP 861 congestion control. This document describes a specific algorithm 862 that conforms with the congestion control requirements of RFC 2581, 863 and so those considerations apply to this algorithm, too. There are 864 no known additional security concerns for this specific algorithm. 866 AUTHORS' ADDRESSES 868 Sally Floyd 869 International Computer Science Institute 871 Phone: +1 (510) 666-2989 872 Email: floyd@acm.org 873 URL: http://www.icir.org/floyd/ 874 Tom Henderson 875 The Boeing Company 877 Email: thomas.r.henderson@boeing.com 879 Andrei Gurtov 880 U. Helsinki 882 Email: gurtov@cs.helsinki.fi