idnits 2.17.1 draft-ietf-tcpm-rfc3782-bis-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 22, 2011) is 4563 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC6298' is defined on line 512, but no explicit reference was found in the text == Unused Reference: 'F98' is defined on line 524, but no explicit reference was found in the text == Unused Reference: 'F03' is defined on line 529, but no explicit reference was found in the text == Unused Reference: 'PF01' is defined on line 575, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2001 (ref. 'F98') (Obsoleted by RFC 2581) -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 2582 (Obsoleted by RFC 3782) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance and Minor T. Henderson 3 Extensions Working Group Boeing 4 Internet-Draft S. Floyd 5 Obsoletes: 3782 (if approved) ICSI 6 Intended status: Standards Track A. Gurtov 7 Expires: April 22, 2012 HIIT 8 Y. Nishida 9 WIDE Project 10 October 22, 2011 12 The NewReno Modification to TCP's Fast Recovery Algorithm 13 draft-ietf-tcpm-rfc3782-bis-03.txt 15 Abstract 17 RFC 5681 documents the following four intertwined TCP 18 congestion control algorithms: slow start, congestion avoidance, fast 19 retransmit, and fast recovery. RFC 5681 explicitly allows 20 certain modifications of these algorithms, including modifications 21 that use the TCP Selective Acknowledgement (SACK) option (RFC 2883), 22 and modifications that respond to "partial acknowledgments" (ACKs 23 which cover new data, but not all the data outstanding when loss was 24 detected) in the absence of SACK. This document describes a specific 25 algorithm for responding to partial acknowledgments, referred to as 26 NewReno. This response to partial acknowledgments was first proposed 27 by Janey Hoe. This document obsoletes RFC 3782. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on April 22, 2012. 46 Copyright Notice 48 Copyright (c) 2011 IETF Trust and the persons identified as 49 the document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 This document may contain material from IETF Documents or IETF 62 Contributions published or made publicly available before November 63 10, 2008. The person(s) controlling the copyright in some of this 64 material may not have granted the IETF Trust the right to allow 65 modifications of such material outside the IETF Standards Process. 66 Without obtaining an adequate license from the person(s) controlling 67 the copyright in such materials, this document may not be modified 68 outside the IETF Standards Process, and derivative works of it may 69 not be created outside the IETF Standards Process, except to format 70 it for publication as an RFC or to translate it into languages other 71 than English. 73 1. Introduction 75 For the typical implementation of the TCP Fast Recovery algorithm 76 described in [RFC5681] (first implemented in the 1990 BSD Reno 77 release, and referred to as the Reno algorithm in [FF96]), the TCP 78 data sender only retransmits a packet after a retransmit timeout has 79 occurred, or after three duplicate acknowledgments have arrived 80 triggering the Fast Retransmit algorithm. A single retransmit 81 timeout might result in the retransmission of several data packets, 82 but each invocation of the Fast Retransmit algorithm in RFC 5681 83 leads to the retransmission of only a single data packet. 85 Two problems arise with Reno TCP when multiple packet losses occur 86 in a single window. First, Reno will often take a timeout, as 87 has been documented in [Hoe95]. Second, even if a retransmission 88 timeout is avoided, multiple fast retransmits and window reductions 89 can occur, as documented in [F94]. When multiple packet losses 90 occur, if the SACK option [RFC2883] is available, the TCP sender 91 has the information to make intelligent decisions about which packets 92 to retransmit and which packets not to retransmit during Fast 93 Recovery. This document applies to TCP connections that are 94 unable to use the TCP Selective Acknowledgement (SACK) option, 95 either because the option is not locally supported or 96 because the TCP peer did not indicate a willingness to use SACK. 98 In the absence of SACK, there is little information available to the 99 TCP sender in making retransmission decisions during Fast 100 Recovery. From the three duplicate acknowledgments, the sender 101 infers a packet loss, and retransmits the indicated packet. After 102 this, the data sender could receive additional duplicate 103 acknowledgments, as the data receiver acknowledges additional data 104 packets that were already in flight when the sender entered Fast 105 Retransmit. 107 In the case of multiple packets dropped from a single window of data, 108 the first new information available to the sender comes when the 109 sender receives an acknowledgment for the retransmitted packet (that 110 is, the packet retransmitted when Fast Retransmit was first 111 entered). If there is a single packet drop and no reordering, then 112 the acknowledgment for this packet will acknowledge all of the 113 packets transmitted before Fast Retransmit was entered. However, if 114 there are multiple packet drops, then the acknowledgment for the 115 retransmitted packet will acknowledge some but not all of the packets 116 transmitted before the Fast Retransmit. We call this acknowledgment 117 a partial acknowledgment. 119 Along with several other suggestions, [Hoe95] suggested that during 120 Fast Recovery the TCP data sender responds to a partial 121 acknowledgment by inferring that the next in-sequence packet has been 122 lost, and retransmitting that packet. This document describes a 123 modification to the Fast Recovery algorithm in RFC 5681 that 124 incorporates a response to partial acknowledgments received during 125 Fast Recovery. We call this modified Fast Recovery algorithm 126 NewReno, because it is a slight but significant variation of the 127 basic Reno algorithm in RFC 5681. This document does not discuss the 128 other suggestions in [Hoe95] and [Hoe96], such as a change to the 129 ssthresh parameter during Slow-Start, or the proposal to send a new 130 packet for every two duplicate acknowledgments during Fast 131 Recovery. The version of NewReno in this document also draws on 132 other discussions of NewReno in the literature [LM97, Hen98]. 134 We do not claim that the NewReno version of Fast Recovery described 135 here is an optimal modification of Fast Recovery for responding to 136 partial acknowledgments, for TCP connections that are unable to use 137 SACK. Based on our experiences with the NewReno modification in the 138 NS simulator [NS] and with numerous implementations of NewReno, we 139 believe that this modification improves the performance of the Fast 140 Retransmit and Fast Recovery algorithms in a wide variety of 141 scenarios. Previous versions of this RFC [RFC2582, RFC3782] provide 142 simulation-based evidence of the possible performance gains. 144 2. Terminology and Definitions 146 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 147 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 148 "OPTIONAL" in this document are to be interpreted as described in 149 RFC 2119 [RFC2119]. 151 This document assumes that the reader is familiar with the terms 152 SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and 153 FLIGHT SIZE (FlightSize) defined in [RFC5681]. FLIGHT SIZE is 154 defined as in [RFC5681] as follows: 156 FLIGHT SIZE: 157 The amount of data that has been sent but not yet cumulatively 158 acknowledged. 160 This document defines an additional sender-side state variable 161 called RECOVER: 163 RECOVER: 164 When in Fast Recovery, this variable records the send sequence 165 number that must be acknowledged before the Fast Recovery 166 procedure is declared to be over. 168 3. The Fast Retransmit and Fast Recovery Algorithms in NewReno 170 3.1. Protocol Overview 172 The basic idea of these extensions to the Fast Retransmit and 173 Fast Recovery algorithms described in Section 3.2 of [RFC5681] 174 is as follows. The TCP sender can infer, from the arrival of 175 duplicate acknowledgments, whether multiple losses in the same 176 window of data have most likely occurred, and avoid taking a 177 retransmit timeout or making multiple congestion window reductions 178 due to such an event. 180 The NewReno modification applies to the Fast Recovery procedure that 181 begins when three duplicate ACKs are received and ends when either a 182 retransmission timeout occurs or an ACK arrives that acknowledges all 183 of the data up to and including the data that was outstanding when 184 the Fast Recovery procedure began. 186 3.2. Specification 188 The procedures specified in Section 3.2 of [RFC5681] are followed 189 with the following modifications. 191 1) Initialization of TCP protocol control block: 192 When the TCP protocol control block is initialized, Recover is 193 set to the initial send sequence number. 195 2) Three duplicate ACKs: 196 When the third duplicate ACK is received, the TCP sender first 197 checks the value of Recover to see if the Cumulative 198 Acknowledgment field covers more than Recover. If so, the value 199 of Recover is incremented to the value of the highest sequence 200 number transmitted by the TCP so far. The TCP then enters Fast 201 Retransmit (step 2 of Section 3.2 of [RFC5681]). If not, the 202 TCP does not enter fast retransmit and does not reset ssthresh. 204 3) Response to newly acknowledged data: 205 Step 6 of [RFC5681] specifies the response to the next ACK that 206 acknowledges previously unacknowledged data. When an ACK 207 arrives that acknowledges new data, this ACK could be the 208 acknowledgment elicited by the retransmission from step 2, or 209 elicited by a later retransmission. There are two cases. 211 Full acknowledgments: 212 If this ACK acknowledges all of the data up to and including 213 Recover, then the ACK acknowledges all the intermediate 214 segments sent between the original transmission of the lost 215 segment and the receipt of the third duplicate ACK. Set cwnd to 216 either (1) min (ssthresh, max(FlightSize, SMSS) + SMSS) or 217 (2) ssthresh, where ssthresh is the value set when Fast 218 Retransmit was entered, and where FlightSize in (1) is the amount 219 of data presently outstanding. This is termed "deflating" the 220 window. If the second option is selected, the implementation 221 is encouraged to take measures to avoid a possible burst of 222 data, in case the amount of data outstanding in the network is 223 much less than the new congestion window allows. A simple 224 mechanism is to limit the number of data packets that can be sent 225 in response to a single acknowledgment. Exit the Fast Recovery 226 procedure. 228 Partial acknowledgments: 229 If this ACK does *not* acknowledge all of the data up to and 230 including Recover, then this is a partial ACK. In this case, 231 retransmit the first unacknowledged segment. Deflate the 232 congestion window by the amount of new data acknowledged by the 233 cumulative acknowledgment field. If the partial ACK 234 acknowledges at least one SMSS of new data, then add back SMSS 235 bytes to the congestion window. This artificially 236 inflates the congestion window in order to reflect the additional 237 segment that has left the network. Send a new segment if 238 permitted by the new value of cwnd. This "partial window 239 deflation" attempts to ensure that, when Fast Recovery eventually 240 ends, approximately ssthresh amount of data will be outstanding 241 in the network. Do not exit the Fast Recovery procedure (i.e., 242 if any duplicate ACKs subsequently arrive, execute Step 4 of 243 Section 3.2 of [RFC5681]. 245 For the first partial ACK that arrives during Fast Recovery, also 246 reset the retransmit timer. Timer management is discussed in 247 more detail in Section 4. 249 4) Retransmit timeouts: 250 After a retransmit timeout, record the highest sequence number 251 transmitted in the variable Recover and exit the Fast 252 Recovery procedure if applicable. 254 Step 2 above specifies a check that the Cumulative Acknowledgment 255 field covers more than Recover. Because the acknowledgment field 256 contains the sequence number that the sender next expects to receive, 257 the acknowledgment "ack_number" covers more than Recover when: 259 ack_number - 1 > Recover; 261 i.e., at least one byte more of data is acknowledged beyond the 262 highest byte that was outstanding when Fast Retransmit was last 263 entered. 265 Note that in Step 3 above, the congestion window is deflated after 266 a partial acknowledgment is received. The congestion window was 267 likely to have been inflated considerably when the partial 268 acknowledgment was received. In addition, depending on the original 269 pattern of packet losses, the partial acknowledgment might 270 acknowledge nearly a window of data. In this case, if the congestion 271 window was not deflated, the data sender might be able to send nearly 272 a window of data back-to-back. 274 This document does not specify the sender's response to duplicate 275 ACKs when the Fast Retransmit/Fast Recovery algorithm is not 276 invoked. This is addressed in other documents, such as those 277 describing the Limited Transmit procedure [RFC3042]. This document 278 also does not address issues of adjusting the duplicate 279 acknowledgment threshold, but assumes the threshold specified in the 280 IETF standards; the current standard is [RFC5681], which specifies 281 a threshold of three duplicate acknowledgments. 283 As a final note, we would observe that in the absence of the SACK 284 option, the data sender is working from limited information. When 285 the issue of recovery from multiple dropped packets from a single 286 window of data is of particular importance, the best alternative 287 would be to use the SACK option. 289 4. Handling Duplicate Acknowledgments After A Timeout 291 After each retransmit timeout, the highest sequence number 292 transmitted so far is recorded in the variable "recover". 293 If, after a retransmit timeout, the TCP data sender retransmits three 294 consecutive packets that have already been received by the data 295 receiver, then the TCP data sender will receive three duplicate 296 acknowledgments that do not cover more than "recover". In this 297 case, the duplicate acknowledgments are not an indication of a new 298 instance of congestion. They are simply an indication that the 299 sender has unnecessarily retransmitted at least three packets. 301 However, when a retransmitted packet is itself dropped, the sender 302 can also receive three duplicate acknowledgments that do not cover 303 more than "recover". In this case, the sender would have been 304 better off if it had initiated Fast Retransmit. For a TCP that 305 implements the algorithm specified in Section 3 of this document, the 306 sender does not infer a packet drop from duplicate acknowledgments 307 in this scenario. As always, the retransmit timer is the backup 308 mechanism for inferring packet loss in this case. 310 There are several heuristics, based on timestamps or on the amount of 311 advancement of the cumulative acknowledgment field, that allow the 312 sender to distinguish, in some cases, between three duplicate 313 acknowledgments following a retransmitted packet that was dropped, 314 and three duplicate acknowledgments from the unnecessary 315 retransmission of three packets [Gur03, GF04]. The TCP sender MAY 316 use such a heuristic to decide to invoke a Fast Retransmit in some 317 cases, even when the three duplicate acknowledgments do not cover 318 more than "recover". 320 For example, when three duplicate acknowledgments are caused by the 321 unnecessary retransmission of three packets, this is likely to be 322 accompanied by the cumulative acknowledgment field advancing by at 323 least four segments. Similarly, a heuristic based on timestamps uses 324 the fact that when there is a hole in the sequence space, the 325 timestamp echoed in the duplicate acknowledgment is the timestamp of 326 the most recent data packet that advanced the cumulative 327 acknowledgment field [RFC1323]. If timestamps are used, and the 328 sender stores the timestamp of the last acknowledged segment, then 329 the timestamp echoed by duplicate acknowledgments can be used to 330 distinguish between a retransmitted packet that was dropped and 331 three duplicate acknowledgments from the unnecessary 332 retransmission of three packets. 334 4.1. ACK Heuristic 336 If the ACK-based heuristic is used, then following the advancement of 337 the cumulative acknowledgment field, the sender stores the value of 338 the previous cumulative acknowledgment as prev_highest_ack, and 339 stores the latest cumulative ACK as highest_ack. In addition, the 340 following step is performed if Step 1 in Section 3 fails, before 341 proceeding to Step 1B. 343 1*) If the Cumulative Acknowledgment field didn't cover more than 344 "recover", check to see if the congestion window is greater 345 than SMSS bytes and the difference between highest_ack and 346 prev_highest_ack is at most 4*SMSS bytes. If true, duplicate 347 ACKs indicate a lost segment (proceed to Step 1A in Section 348 3). Otherwise, duplicate ACKs likely result from unnecessary 349 retransmissions (proceed to Step 1B in Section 3). 351 The congestion window check serves to protect against fast retransmit 352 immediately after a retransmit timeout. 354 If several ACKs are lost, the sender can see a jump in the cumulative 355 ACK of more than three segments, and the heuristic can fail. 356 [RFC5681] recommends that a receiver should 357 send duplicate ACKs for every out-of-order data packet, such as a 358 data packet received during Fast Recovery. The ACK heuristic is more 359 likely to fail if the receiver does not follow this advice, because 360 then a smaller number of ACK losses are needed to produce a 361 sufficient jump in the cumulative ACK. 363 4.2. Timestamp Heuristic 365 If this heuristic is used, the sender stores the timestamp of the 366 last acknowledged segment. In addition, the second paragraph of step 367 1 in Section 3 is replaced as follows: 369 1**) If the Cumulative Acknowledgment field didn't cover more than 370 "recover", check to see if the echoed timestamp in the last 371 non-duplicate acknowledgment equals the 372 stored timestamp. If true, duplicate ACKs indicate a lost 373 segment (proceed to Step 1A in Section 3). Otherwise, duplicate 374 ACKs likely result from unnecessary retransmissions (proceed 375 to Step 1B in Section 3). 377 The timestamp heuristic works correctly, both when the receiver 378 echoes timestamps as specified by [RFC1323], and by its revision 379 attempts. However, if the receiver arbitrarily echoes timestamps, 380 the heuristic can fail. The heuristic can also fail if a timeout was 381 spurious and returning ACKs are not from retransmitted segments. 382 This can be prevented by detection algorithms such as [RFC3522]. 384 5. Implementation Issues for the Data Receiver 386 [RFC5681] specifies that "Out-of-order data segments SHOULD be 387 acknowledged immediately, in order to accelerate loss recovery." 388 Neal Cardwell has noted that some data receivers do not send an 389 immediate acknowledgment when they send a partial acknowledgment, 390 but instead wait first for their delayed acknowledgment timer to 391 expire [C98]. As [C98] notes, this severely limits the potential 392 benefit of NewReno by delaying the receipt of the partial 393 acknowledgment at the data sender. Echoing [RFC5681], our 394 recommendation is that the data receiver send an immediate 395 acknowledgment for an out-of-order segment, even when that 396 out-of-order segment fills a hole in the buffer. 398 6. Implementation Issues for the Data Sender 400 In Section 3, Step 5 above, it is noted that implementations should 401 take measures to avoid a possible burst of data when leaving Fast 402 Recovery, in case the amount of new data that the sender is eligible 403 to send due to the new value of the congestion window is large. This 404 can arise during NewReno when ACKs are lost or treated as pure window 405 updates, thereby causing the sender to underestimate the number of 406 new segments that can be sent during the recovery procedure. 407 Specifically, bursts can occur when the FlightSize is much less than 408 the new congestion window when exiting from Fast Recovery. One 409 simple mechanism to avoid a burst of data when leaving Fast Recovery 410 is to limit the number of data packets that can be sent in response 411 to a single acknowledgment. (This is known as "maxburst_" in the ns 412 simulator.) Other possible mechanisms for avoiding bursts include 413 rate-based pacing, or setting the slow-start threshold to the 414 resultant congestion window and then resetting the congestion window 415 to FlightSize. A recommendation on the general mechanism to avoid 416 excessively bursty sending patterns is outside the scope of this 417 document. 419 An implementation may want to use a separate flag to record whether 420 or not it is presently in the Fast Recovery procedure. The use of 421 the value of the duplicate acknowledgment counter for this purpose is 422 not reliable because it can be reset upon window updates and 423 out-of-order acknowledgments. 425 When updating the Cumulative Acknowledgment field outside of 426 Fast Recovery, the "recover" state variable may also need to be 427 updated in order to continue to permit possible entry into Fast 428 Recovery (Section 3, step 1). This issue arises when an update 429 of the Cumulative Acknowledgment field results in a sequence 430 wraparound that affects the ordering between the Cumulative 431 Acknowledgment field and the "recover" state variable. Entry 432 into Fast Recovery is only possible when the Cumulative 433 Acknowledgment field covers more than the "recover" state variable. 435 It is important for the sender to respond correctly to duplicate ACKs 436 received when the sender is no longer in Fast Recovery (e.g., because 437 of a Retransmit Timeout). The Limited Transmit procedure [RFC3042] 438 describes possible responses to the first and second duplicate 439 acknowledgments. When three or more duplicate acknowledgments are 440 received, the Cumulative Acknowledgment field doesn't cover more 441 than "recover", and a new Fast Recovery is not invoked, it is 442 important that the sender not execute the Fast Recovery steps (3) and 443 (4) in Section 3. Otherwise, the sender could end up in a chain of 444 spurious timeouts. We mention this only because several NewReno 445 implementations had this bug, including the implementation in the NS 446 simulator. 448 It has been observed that some TCP implementations enter a slow start 449 or congestion avoidance window updating algorithm immediately after 450 the cwnd is set by the equation found in (Section 3, step 5), even 451 without a new external event generating the cwnd change. Note that 452 after cwnd is set based on the procedure for exiting Fast Recovery 453 (Section 3, step 5), cwnd SHOULD NOT be updated until a further 454 event occurs (e.g., arrival of an ack, or timeout) after this 455 adjustment. 457 7. Security Considerations 459 [RFC5681] discusses general security considerations concerning TCP 460 congestion control. This document describes a specific algorithm 461 that conforms with the congestion control requirements of [RFC5681], 462 and so those considerations apply to this algorithm, too. There are 463 no known additional security concerns for this specific algorithm. 465 8. IANA Considerations 467 This document has no actions for IANA. 469 9. Conclusions 471 This document specifies the NewReno Fast Retransmit and Fast Recovery 472 algorithms for TCP. This NewReno modification to TCP can even be 473 important for TCP implementations that support the SACK option, 474 because the SACK option can only be used for TCP connections when 475 both TCP end-nodes support the SACK option. NewReno performs better 476 than Reno (RFC5681) in a number of scenarios discussed in 477 previous versions of this RFC ([RFC2582], [RFC3782]). 479 A number of options to the basic algorithm presented in Section 3 are 480 also referenced in Appendix A to this document. These include the 481 handling of the retransmission timer, the response to partial 482 acknowledgments, and whether or not the sender must maintain a state 483 variable called Recover. Our belief is that the differences 484 between these variants of NewReno are small compared to the 485 differences between Reno and NewReno. That is, the important thing 486 is to implement NewReno instead of Reno, for a TCP connection 487 without SACK; it is less important exactly which of the variants of 488 NewReno is implemented. 490 10. Acknowledgments 492 Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu, 493 Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed 494 feedback on this document or on its precursor, RFC 2582. Jeffrey 495 Hsu provided clarifications on the handling of the recover variable 496 that were applied to RFC 3782 as errata, and now are in Section 8 497 of this document. Yoshifumi Nishida contributed a modification 498 to the fast recovery algorithm to account for the case in which 499 flightsize is 0 when the TCP sender leaves fast recovery, and the 500 TCP receiver uses delayed acknowledgments. Alexander Zimmermann 501 provided several suggestions to improve the clarity of the document. 503 11. References 504 11.1. Normative References 506 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 507 Requirement Levels", BCP 14, RFC 2119, March 1997. 509 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 510 Control", RFC 5681, September 2009. 512 [RFC6298] Paxson, V., Allman, M., Chu, J., and Sargent, M., 513 "Computing TCP's Retransmission Timer", RFC 6298, 514 June 2011. 516 11.2. Informative References 518 [C98] Cardwell, N., "delayed ACKs for retransmitted packets: 519 ouch!". November 1998, Email to the tcpimpl mailing list, 520 Message-ID "Pine.LNX.4.02A.9811021421340.26785-100000@ 521 sake.cs.washington.edu", 522 archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". 524 [F98] Floyd, S., Revisions to RFC 2001, "Presentation to the 525 TCPIMPL Working Group", August 1998. URLs 526 "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.ps" and 527 "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.pdf". 529 [F03] Floyd, S., "Moving NewReno from Experimental to Proposed 530 Standard? Presentation to the TSVWG Working Group", March 531 2003. URLs 532 "http://www.icir.org/floyd/talks/newreno-Mar03.ps" and 533 "http://www.icir.org/floyd/talks/newreno-Mar03.pdf". 535 [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of 536 Tahoe, Reno and SACK TCP", Computer Communication Review, 537 July 1996. URL "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z". 539 [F94] Floyd, S., "TCP and Successive Fast Retransmits", Technical 540 report, October 1994. URL 541 "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps". 543 [GF04] Gurtov, A. and S. Floyd, "Resolving Acknowledgment 544 Ambiguity in non-SACK TCP", Next Generation Teletraffic and 545 Wired/Wireless Advanced Networking (NEW2AN'04), February 546 2004. URL "http://www.cs.helsinki.fi/u/gurtov/papers/ 547 heuristics.html". 549 [Gur03] Gurtov, A., "[Tsvwg] resolving the problem of unnecessary 550 fast retransmits in go-back-N", email to the tsvwg mailing 551 list, message ID <3F25B467.9020609@cs.helsinki.fi>, July 552 28, 2003. URL "http://www1.ietf.org/mail-archive/ 553 working-groups/ tsvwg/current/msg04334.html". 555 [Hen98] Henderson, T., Re: NewReno and the 2001 Revision. September 556 1998. Email to the tcpimpl mailing list, Message ID 557 "Pine.BSI.3.95.980923224136.26134A-100000@raptor. 558 CS.Berkeley.EDU", archived at 559 "http://tcp-impl.lerc.nasa.gov/tcp-impl". 561 [Hoe95] Hoe, J., "Startup Dynamics of TCP's Congestion Control and 562 Avoidance Schemes", Master's Thesis, MIT, 1995. 564 [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion 565 Control Scheme for TCP", ACM SIGCOMM, August 1996. URL 566 "http://www.acm.org/sigcomm/sigcomm96/program.html". 568 [LM97] Lin, D. and R. Morris, "Dynamics of Random Early 569 Detection", SIGCOMM 97, September 1997. URL 570 "http://www.acm.org/sigcomm/sigcomm97/program.html". 572 [NS] The Network Simulator (NS). 573 URL "http://www.isi.edu/nsnam/ns/". 575 [PF01] Padhye, J. and S. Floyd, "Identifying the TCP Behavior of 576 Web Servers", June 2001, SIGCOMM 2001. 578 [RFC1323] Jacobson, V., Braden, R. and D. Borman, "TCP Extensions for 579 High Performance", RFC 1323, May 1992. 581 [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to 582 TCP's Fast Recovery Algorithm", RFC 2582, April 1999. 584 [RFC2883] Floyd, S., J. Mahdavi, M. Mathis, and M. Podolsky, "The 585 Selective Acknowledgment (SACK) Option for TCP, RFC 2883, 586 July 2000. 588 [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's 589 Loss Recovery Using Limited Transmit", RFC 3042, January 590 2001. 592 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for 593 TCP", RFC 3522, April 2003. 595 [RFC3782] Floyd, S., T. Henderson, and A. Gurtov, "The NewReno 596 Modification to TCP's Fast Recovery Algorithm", RFC 3782, 597 April 2004. 599 Appendix A. Additional Information 601 Previous versions of this RFC ([RFC2582], [RFC3782]) contained 602 additional informative material on the following subjects, and 603 may be consulted by readers who may want more information about 604 possible variants to the algorithm and who may want references 605 to specific [NS] simulations that provide NewReno test cases. 607 Section 4 of [RFC3782] discusses some alternative behaviors for 608 resetting the retransmit timer after a partial acknowledgment. 610 Section 5 of [RFC3782] discusses some alternative behaviors for 611 performing retransmission after a partial acknowledgment. 613 Section 6 of [RFC3782] describes more information about the 614 motivation for the sender's state variable Recover. 616 Section 9 of [RFC3782] introduces some NS simulation test 617 suites for NewReno. In addition, references to simulation 618 results can be found throughout [RFC3782]. 620 Section 10 of [RFC3782] provides a comparison of Reno and 621 NewReno TCP. 623 Section 11 of [RFC3782] listed changes relative to [RFC3782]. 625 Appendix B. Changes Relative to RFC 3782 627 In [RFC3782], the cwnd after Full ACK reception will be set to 628 (1) min (ssthresh, FlightSize + SMSS) or (2) ssthresh. However, 629 there is a risk in the first logic which results in performance 630 degradation. With the first logic, if FlightSize is zero, the 631 result will be 1 SMSS. This means TCP can transmit only 1 segment 632 at this moment, which can cause delay in ACK transmission at receiver 633 due to delayed ACK algorithm. 635 The FlightSize on Full ACK reception can be zero in some situations. 636 A typical example is where sending window size during fast recovery 637 is small. In this case, the retransmitted packet and new data packets 638 can be transmitted within a short interval. If all these packets 639 successfully arrive, the receiver may generate a Full ACK that 640 acknowledges all outstanding data. Even if window size is not small, 641 loss of ACK packets or receive buffer shortage during fast recovery 642 can also increase the possibility to fall into this situation. 644 The proposed fix in this document ensures that sender TCP transmits 645 at least two segments on Full ACK reception. 647 In addition, errata for RFC3782 (editorial clarification to Section 8 648 of RFC2582, which is now Section 6 of this document) has been 649 applied. 651 The specification text (Section 3.2 herein) was rewritten to more 652 closely track Section 3.2 of [RFC5681]. 654 Sections 4, 5, 9-11 of [RFC3782] were removed, and instead Appendix 655 A of this document was added to back-reference this informative 656 material. 658 Appendix C. Document Revision History 660 To be removed upon publication 662 +----------+--------------------------------------------------+ 663 | Revision | Comments | 664 +----------+--------------------------------------------------+ 665 | draft-00 | RFC3782 errata applied, and changes applied from | 666 | | draft-nishida-newreno-modification-02 | 667 +----------+--------------------------------------------------+ 668 | draft-01 | Non-normative sections moved to appendices, | 669 | | editorial clarifications applied as suggested | 670 | | by Alexander Zimmermann. | 671 +----------+--------------------------------------------------+ 672 | draft-02 | Better align specification text with RFC5681. | 673 | | Replace informative appendices by a new appendix | 674 | | that just provides back-references to earlier | 675 | | NewReno RFCs. | 676 +----------+--------------------------------------------------+ 678 Authors' Addresses 680 Tom Henderson 681 The Boeing Company 683 EMail: thomas.r.henderson@boeing.com 685 Sally Floyd 686 International Computer Science Institute 688 Phone: +1 (510) 666-2989 689 EMail: floyd@acm.org 690 URL: http://www.icir.org/floyd/ 692 Andrei Gurtov 693 HIIT 694 Helsinki Institute for Information Technology 695 P.O. Box 19215 696 00076 Aalto 697 Finland 699 EMail: gurtov@hiit.fi 700 Yoshifumi Nishida 701 WIDE Project 702 Endo 5322 703 Fujisawa, Kanagawa 252-8520 704 Japan 706 Email: nishida@wide.ad.jp