idnits 2.17.1 draft-ietf-tcpm-rfc3782-bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 35 instances of too long lines in the document, the longest one being 20 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 20, 2011) is 4726 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2988' is defined on line 510, but no explicit reference was found in the text == Unused Reference: 'F98' is defined on line 523, but no explicit reference was found in the text == Unused Reference: 'F03' is defined on line 528, but no explicit reference was found in the text == Unused Reference: 'PF01' is defined on line 570, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) -- Obsolete informational reference (is this intentional?): RFC 2001 (ref. 'F98') (Obsoleted by RFC 2581) -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 2582 (Obsoleted by RFC 3782) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Henderson 3 Internet-Draft Boeing 4 Obsoletes: 3782 (if approved) S. Floyd 5 Intended status: Standards Track ICSI 6 Expires: October 20, 2011 A. Gurtov 7 HIIT 8 Y. Nishida 9 WIDE Project 10 April 20, 2011 12 The NewReno Modification to TCP's Fast Recovery Algorithm 13 draft-ietf-tcpm-rfc3782-bis-02.txt 15 Abstract 17 RFC 5681 documents the following four intertwined TCP 18 congestion control algorithms: slow start, congestion avoidance, fast 19 retransmit, and fast recovery. RFC 5681 explicitly allows 20 certain modifications of these algorithms, including modifications 21 that use the TCP Selective Acknowledgement (SACK) option (RFC 2883), 22 and modifications that respond to "partial acknowledgments" (ACKs 23 which cover new data, but not all the data outstanding when loss was 24 detected) in the absence of SACK. This document describes a specific 25 algorithm for responding to partial acknowledgments, referred to as 26 NewReno. This response to partial acknowledgments was first proposed 27 by Janey Hoe. This document obsoletes RFC 3782. 29 Status of this Memo 31 This Internet-Draft is submitted to IETF in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on September 15, 2011. 46 Copyright Notice 48 Copyright (c) 2011 IETF Trust and the persons identified as 49 the document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 This document may contain material from IETF Documents or IETF 62 Contributions published or made publicly available before November 63 10, 2008. The person(s) controlling the copyright in some of this 64 material may not have granted the IETF Trust the right to allow 65 modifications of such material outside the IETF Standards Process. 66 Without obtaining an adequate license from the person(s) controlling 67 the copyright in such materials, this document may not be modified 68 outside the IETF Standards Process, and derivative works of it may 69 not be created outside the IETF Standards Process, except to format 70 it for publication as an RFC or to translate it into languages other 71 than English. 73 1. Introduction 75 For the typical implementation of the TCP Fast Recovery algorithm 76 described in [RFC5681] (first implemented in the 1990 BSD Reno 77 release, and referred to as the Reno algorithm in [FF96]), the TCP 78 data sender only retransmits a packet after a retransmit timeout has 79 occurred, or after three duplicate acknowledgments have arrived 80 triggering the Fast Retransmit algorithm. A single retransmit 81 timeout might result in the retransmission of several data packets, 82 but each invocation of the Fast Retransmit algorithm in RFC 5681 83 leads to the retransmission of only a single data packet. 85 Two problems arise with Reno TCP when multiple packet losses occur 86 in a single window. First, Reno will often take a timeout, as 87 has been documented in [Hoe95]. Second, even if a retransmission 88 timeout is avoided, multiple fast retransmits and window reductions 89 can occur, as documented in [F94]. When multiple packet losses 90 occur, if the SACK option [RFC2883] is available, the TCP sender 91 has the information to make intelligent decisions about which packets 92 to retransmit and which packets not to retransmit during Fast 93 Recovery. This document applies to TCP connections that are 94 unable to use the TCP Selective Acknowledgement (SACK) option, 95 either because the option is not locally supported or 96 because the TCP peer did not indicate a willingness to use SACK. 98 In the absence of SACK, there is little information available to the 99 TCP sender in making retransmission decisions during Fast 100 Recovery. From the three duplicate acknowledgments, the sender 101 infers a packet loss, and retransmits the indicated packet. After 102 this, the data sender could receive additional duplicate 103 acknowledgments, as the data receiver acknowledges additional data 104 packets that were already in flight when the sender entered Fast 105 Retransmit. 107 In the case of multiple packets dropped from a single window of data, 108 the first new information available to the sender comes when the 109 sender receives an acknowledgment for the retransmitted packet (that 110 is, the packet retransmitted when Fast Retransmit was first 111 entered). If there is a single packet drop and no reordering, then the 112 acknowledgment for this packet will acknowledge all of the packets 113 transmitted before Fast Retransmit was entered. However, if there 114 are multiple packet drops, then the acknowledgment for the 115 retransmitted packet will acknowledge some but not all of the packets 116 transmitted before the Fast Retransmit. We call this acknowledgment 117 a partial acknowledgment. 119 Along with several other suggestions, [Hoe95] suggested that during 120 Fast Recovery the TCP data sender responds to a partial 121 acknowledgment by inferring that the next in-sequence packet has been 122 lost, and retransmitting that packet. This document describes a 123 modification to the Fast Recovery algorithm in RFC 5681 that 124 incorporates a response to partial acknowledgments received during 125 Fast Recovery. We call this modified Fast Recovery algorithm 126 NewReno, because it is a slight but significant variation of the 127 basic Reno algorithm in RFC 5681. This document does not discuss the 128 other suggestions in [Hoe95] and [Hoe96], such as a change to the 129 ssthresh parameter during Slow-Start, or the proposal to send a new 130 packet for every two duplicate acknowledgments during Fast 131 Recovery. The version of NewReno in this document also draws on other 132 discussions of NewReno in the literature [LM97, Hen98]. 134 We do not claim that the NewReno version of Fast Recovery described 135 here is an optimal modification of Fast Recovery for responding to 136 partial acknowledgments, for TCP connections that are unable to use 137 SACK. Based on our experiences with the NewReno modification in the 138 NS simulator [NS] and with numerous implementations of NewReno, we 139 believe that this modification improves the performance of the Fast 140 Retransmit and Fast Recovery algorithms in a wide variety of 141 scenarios. Previous versions of this RFC [RFC2582, RFC3782] provide 142 simulation-based evidence of the possible performance gains. 144 2. Terminology and Definitions 146 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 147 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 148 and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 149 [RFC2119]. This RFC indicates requirement levels for compliant TCP 150 implementations implementing the NewReno Fast Retransmit and Fast 151 Recovery algorithms described in this document. 153 This document assumes that the reader is familiar with the terms 154 SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and 155 FLIGHT SIZE (FlightSize) defined in [RFC5681]. FLIGHT SIZE is 156 defined as in [RFC5681] as follows: 158 FLIGHT SIZE: 159 The amount of data that has been sent but not yet cumulatively 160 acknowledged. 162 This document defines an additional sender-side state variable 163 called RECOVER: 165 RECOVER: 166 When in Fast Recovery, this variable records the send sequence 167 number that must be acknowledged before the Fast Recovery 168 procedure is declared to be over. 170 3. The Fast Retransmit and Fast Recovery Algorithms in NewReno 172 3.1. Protocol Overview 174 The basic idea of these extensions to the Fast Retransmit and 175 Fast Recovery algorithms described in Section 3.2 of [RFC5681] 176 is as follows. The TCP sender can infer, from the arrival of 177 duplicate acknowledgments, whether multiple losses in the same 178 window of data have most likely occurred, and avoid taking a 179 retransmit timeout or making multiple congestion window reductions 180 due to such an event. 182 The NewReno modification applies to the Fast Recovery procedure that 183 begins when three duplicate ACKs are received and ends when either a 184 retransmission timeout occurs or an ACK arrives that acknowledges all 185 of the data up to and including the data that was outstanding when 186 the Fast Recovery procedure began. 188 3.2. Specification 190 The procedures specified in Section 3.2 of [RFC5681] are followed 191 with the following modifications. 193 1) Initialization of TCP protocol control block: 194 When the TCP protocol control block is initialized, Recover is 195 set to the initial send sequence number. 197 2) Three duplicate ACKs: 198 When the third duplicate ACK is received, the TCP sender first 199 checks the value of Recover to see if the Cumulative Acknowledgment 200 field covers more than Recover. If so, the value of Recover is 201 incremented to the value of the highest sequence number 202 transmitted by the TCP so far. The TCP then enters Fast Retransmit 203 (step 2 of Section 3.2 of [RFC5681]). If not, the TCP does not 204 enter fast retransmit and does not reset ssthresh. 206 3) Response to newly acknowledged data: 207 Step 6 of [RFC5681] specifies the response to the next ACK that 208 acknowledges previously unacknowledged data. When an ACK 209 arrives that acknowledges new data, this ACK could be the 210 acknowledgment elicited by the retransmission from step 2, or 211 elicited by a later retransmission. There are two cases. 213 Full acknowledgments: 214 If this ACK acknowledges all of the data up to and including 215 Recover, then the ACK acknowledges all the intermediate 216 segments sent between the original transmission of the lost 217 segment and the receipt of the third duplicate ACK. Set cwnd to 218 either (1) min (ssthresh, max(FlightSize, SMSS) + SMSS) or 219 (2) ssthresh, where ssthresh is the value set when Fast Retransmit 220 was entered, and where FlightSize in (1) is the amount of data 221 presently outstanding. This is termed "deflating" the window. 222 If the second option is selected, the implementation 223 is encouraged to take measures to avoid a possible burst of 224 data, in case the amount of data outstanding in the network is 225 much less than the new congestion window allows. A simple mechanism 226 is to limit the number of data packets that can be sent in response 227 to a single acknowledgment. Exit the Fast Recovery procedure. 229 Partial acknowledgments: 230 If this ACK does *not* acknowledge all of the data up to and 231 including Recover, then this is a partial ACK. In this case, 232 retransmit the first unacknowledged segment. Deflate the 233 congestion window by the amount of new data acknowledged by the 234 cumulative acknowledgment field. If the partial ACK 235 acknowledges at least one SMSS of new data, then add back SMSS 236 bytes to the congestion window. This artificially 237 inflates the congestion window in order to reflect the additional 238 segment that has left the network. Send a new segment if 239 permitted by the new value of cwnd. This "partial window 240 deflation" attempts to ensure that, when Fast Recovery eventually 241 ends, approximately ssthresh amount of data will be outstanding 242 in the network. Do not exit the Fast Recovery procedure (i.e., 243 if any duplicate ACKs subsequently arrive, execute Step 4 of 244 Section 3.2 of [RFC5681]. 246 For the first partial ACK that arrives during Fast Recovery, also 247 reset the retransmit timer. Timer management is discussed in 248 more detail in Section 4. 250 4) Retransmit timeouts: 251 After a retransmit timeout, record the highest sequence number 252 transmitted in the variable Recover and exit the Fast 253 Recovery procedure if applicable. 255 Step 2 above specifies a check that the Cumulative Acknowledgment 256 field covers more than Recover. Because the acknowledgment field 257 contains the sequence number that the sender next expects to receive, 258 the acknowledgment "ack_number" covers more than Recover when: 260 ack_number - 1 > Recover; 262 i.e., at least one byte more of data is acknowledged beyond the 263 highest byte that was outstanding when Fast Retransmit was last 264 entered. 266 Note that in Step 3 above, the congestion window is deflated after 267 a partial acknowledgment is received. The congestion window was 268 likely to have been inflated considerably when the partial 269 acknowledgment was received. In addition, depending on the original 270 pattern of packet losses, the partial acknowledgment might 271 acknowledge nearly a window of data. In this case, if the congestion 272 window was not deflated, the data sender might be able to send nearly 273 a window of data back-to-back. 275 This document does not specify the sender's response to duplicate 276 ACKs when the Fast Retransmit/Fast Recovery algorithm is not 277 invoked. This is addressed in other documents, such as those 278 describing the Limited Transmit procedure [RFC3042]. This document 279 also does not address issues of adjusting the duplicate acknowledgment 280 threshold, but assumes the threshold specified in the IETF standards; 281 the current standard is [RFC5681], which specifies a threshold of three 282 duplicate acknowledgments. 284 As a final note, we would observe that in the absence of the SACK 285 option, the data sender is working from limited information. When 286 the issue of recovery from multiple dropped packets from a single 287 window of data is of particular importance, the best alternative 288 would be to use the SACK option. 290 4. Handling Duplicate Acknowledgments After A Timeout 292 After each retransmit timeout, the highest sequence number 293 transmitted so far is recorded in the variable "recover". 294 If, after a retransmit timeout, the TCP data sender retransmits three 295 consecutive packets that have already been received by the data 296 receiver, then the TCP data sender will receive three duplicate 297 acknowledgments that do not cover more than "recover". In this 298 case, the duplicate acknowledgments are not an indication of a new 299 instance of congestion. They are simply an indication that the 300 sender has unnecessarily retransmitted at least three packets. 302 However, when a retransmitted packet is itself dropped, the sender 303 can also receive three duplicate acknowledgments that do not cover 304 more than "recover". In this case, the sender would have been 305 better off if it had initiated Fast Retransmit. For a TCP that 306 implements the algorithm specified in Section 3 of this document, the 307 sender does not infer a packet drop from duplicate acknowledgments 308 in this scenario. As always, the retransmit timer is the backup 309 mechanism for inferring packet loss in this case. 311 There are several heuristics, based on timestamps or on the amount of 312 advancement of the cumulative acknowledgment field, that allow the 313 sender to distinguish, in some cases, between three duplicate 314 acknowledgments following a retransmitted packet that was dropped, 315 and three duplicate acknowledgments from the unnecessary 316 retransmission of three packets [Gur03, GF04]. The TCP sender MAY use 317 such a heuristic to decide to invoke a Fast Retransmit in some cases, 318 even when the three duplicate acknowledgments do not cover more than 319 "recover". 321 For example, when three duplicate acknowledgments are caused by the 322 unnecessary retransmission of three packets, this is likely to be 323 accompanied by the cumulative acknowledgment field advancing by at 324 least four segments. Similarly, a heuristic based on timestamps uses 325 the fact that when there is a hole in the sequence space, the 326 timestamp echoed in the duplicate acknowledgment is the timestamp of 327 the most recent data packet that advanced the cumulative 328 acknowledgment field [RFC1323]. If timestamps are used, and the 329 sender stores the timestamp of the last acknowledged segment, then 330 the timestamp echoed by duplicate acknowledgments can be used to 331 distinguish between a retransmitted packet that was dropped and 332 three duplicate acknowledgments from the unnecessary 333 retransmission of three packets. 335 4.1. ACK Heuristic 337 If the ACK-based heuristic is used, then following the advancement of 338 the cumulative acknowledgment field, the sender stores the value of 339 the previous cumulative acknowledgment as prev_highest_ack, and stores 340 the latest cumulative ACK as highest_ack. In addition, the following 341 step is performed if Step 1 in Section 3 fails, before proceeding to 342 Step 1B. 344 1*) If the Cumulative Acknowledgment field didn't cover more than 345 "recover", check to see if the congestion window is greater 346 than SMSS bytes and the difference between highest_ack and 347 prev_highest_ack is at most 4*SMSS bytes. If true, duplicate 348 ACKs indicate a lost segment (proceed to Step 1A in Section 349 3). Otherwise, duplicate ACKs likely result from unnecessary 350 retransmissions (proceed to Step 1B in Section 3). 352 The congestion window check serves to protect against fast retransmit 353 immediately after a retransmit timeout. 355 If several ACKs are lost, the sender can see a jump in the cumulative 356 ACK of more than three segments, and the heuristic can fail. 357 [RFC5681] recommends that a receiver should 358 send duplicate ACKs for every out-of-order data packet, such as a 359 data packet received during Fast Recovery. The ACK heuristic is more 360 likely to fail if the receiver does not follow this advice, because 361 then a smaller number of ACK losses are needed to produce a 362 sufficient jump in the cumulative ACK. 364 4.2. Timestamp Heuristic 366 If this heuristic is used, the sender stores the timestamp of the 367 last acknowledged segment. In addition, the second paragraph of step 368 1 in Section 3 is replaced as follows: 370 1**) If the Cumulative Acknowledgment field didn't cover more than 371 "recover", check to see if the echoed timestamp in the last 372 non-duplicate acknowledgment equals the 373 stored timestamp. If true, duplicate ACKs indicate a lost 374 segment (proceed to Step 1A in Section 3). Otherwise, duplicate 375 ACKs likely result from unnecessary retransmissions (proceed 376 to Step 1B in Section 3). 378 The timestamp heuristic works correctly, both when the receiver echoes 379 timestamps as specified by [RFC1323], and by its revision attempts. 380 However, if the receiver arbitrarily echoes timestamps, the heuristic 381 can fail. The heuristic can also fail if a timeout was spurious and 382 returning ACKs are not from retransmitted segments. This can be 383 prevented by detection algorithms such as [RFC3522]. 385 5. Implementation Issues for the Data Receiver 387 [RFC5681] specifies that "Out-of-order data segments SHOULD be 388 acknowledged immediately, in order to accelerate loss recovery." 389 Neal Cardwell has noted that some data receivers do not send an 390 immediate acknowledgment when they send a partial acknowledgment, 391 but instead wait first for their delayed acknowledgment timer to 392 expire [C98]. As [C98] notes, this severely limits the potential 393 benefit of NewReno by delaying the receipt of the partial 394 acknowledgment at the data sender. Echoing [RFC5681], our 395 recommendation is that the data receiver send an immediate 396 acknowledgment for an out-of-order segment, even when that 397 out-of-order segment fills a hole in the buffer. 399 6. Implementation Issues for the Data Sender 401 In Section 3, Step 5 above, it is noted that implementations should 402 take measures to avoid a possible burst of data when leaving Fast 403 Recovery, in case the amount of new data that the sender is eligible 404 to send due to the new value of the congestion window is large. This 405 can arise during NewReno when ACKs are lost or treated as pure window 406 updates, thereby causing the sender to underestimate the number of 407 new segments that can be sent during the recovery procedure. 408 Specifically, bursts can occur when the FlightSize is much less than 409 the new congestion window when exiting from Fast Recovery. One 410 simple mechanism to avoid a burst of data when leaving Fast Recovery 411 is to limit the number of data packets that can be sent in response 412 to a single acknowledgment. (This is known as "maxburst_" in the ns 413 simulator.) Other possible mechanisms for avoiding bursts include 414 rate-based pacing, or setting the slow-start threshold to the 415 resultant congestion window and then resetting the congestion window 416 to FlightSize. A recommendation on the general mechanism to avoid 417 excessively bursty sending patterns is outside the scope of this 418 document. 420 An implementation may want to use a separate flag to record whether 421 or not it is presently in the Fast Recovery procedure. The use of 422 the value of the duplicate acknowledgment counter for this purpose is 423 not reliable because it can be reset upon window updates and 424 out-of-order acknowledgments. 426 When updating the Cumulative Acknowledgment field outside of 427 Fast Recovery, the "recover" state variable may also need to be 428 updated in order to continue to permit possible entry into Fast 429 Recovery (Section 3, step 1). This issue arises when an update 430 of the Cumulative Acknowledgment field results in a sequence 431 wraparound that affects the ordering between the Cumulative 432 Acknowledgment field and the "recover" state variable. Entry 433 into Fast Recovery is only possible when the Cumulative 434 Acknowledgment field covers more than the "recover" state variable. 436 It is important for the sender to respond correctly to duplicate ACKs 437 received when the sender is no longer in Fast Recovery (e.g., because 438 of a Retransmit Timeout). The Limited Transmit procedure [RFC3042] 439 describes possible responses to the first and second duplicate 440 acknowledgments. When three or more duplicate acknowledgments are 441 received, the Cumulative Acknowledgment field doesn't cover more 442 than "recover", and a new Fast Recovery is not invoked, it is 443 important that the sender not execute the Fast Recovery steps (3) and 444 (4) in Section 3. Otherwise, the sender could end up in a chain of 445 spurious timeouts. We mention this only because several NewReno 446 implementations had this bug, including the implementation in the NS 447 simulator. 449 It has been observed that some TCP implementations enter a slow start 450 or congestion avoidance window updating algorithm immediately after 451 the cwnd is set by the equation found in (Section 3, step 5), even 452 without a new external event generating the cwnd change. Note that 453 after cwnd is set based on the procedure for exiting Fast Recovery 454 (Section 3, step 5), cwnd SHOULD NOT be updated until a further 455 event occurs (e.g., arrival of an ack, or timeout) after this 456 adjustment. 458 7. Security Considerations 460 [RFC5681] discusses general security considerations concerning TCP 461 congestion control. This document describes a specific algorithm 462 that conforms with the congestion control requirements of [RFC5681], 463 and so those considerations apply to this algorithm, too. There are 464 no known additional security concerns for this specific algorithm. 466 8. IANA Considerations 468 This document has no actions for IANA. 470 9. Conclusions 472 This document specifies the NewReno Fast Retransmit and Fast Recovery 473 algorithms for TCP. This NewReno modification to TCP can even be 474 important for TCP implementations that support the SACK option, 475 because the SACK option can only be used for TCP connections when 476 both TCP end-nodes support the SACK option. NewReno performs better 477 than Reno (RFC5681) in a number of scenarios discussed in 478 previous versions of this RFC ([RFC2582], [RFC3782]). 480 A number of options to the basic algorithm presented in Section 3 are 481 also referenced in Appendix A to this document. These include the 482 handling of the retransmission timer, the response to partial 483 acknowledgments, and whether or not the sender must maintain a state 484 variable called Recover. Our belief is that the differences 485 between these variants of NewReno are small compared to the 486 differences between Reno and NewReno. That is, the important thing 487 is to implement NewReno instead of Reno, for a TCP connection 488 without SACK; it is less important exactly which of the variants of 489 NewReno is implemented. 491 10. Acknowledgments 493 Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu, 494 Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed 495 feedback on this document or on its precursor, RFC 2582. Jeffrey 496 Hsu provided clarifications on the handling of the recover variable 497 that were applied to RFC 3782 as errata, and now are in Section 8 498 of this document. Yoshifumi Nishida contributed a modification 499 to the fast recovery algorithm to account for the case in which 500 flightsize is 0 when the TCP sender leaves fast recovery, and the 501 TCP receiver uses delayed acknowledgments. Alexander Zimmermann 502 provided several suggestions to improve the clarity of the document. 504 11. References 505 11.1. Normative References 507 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 508 Requirement Levels", BCP 14, RFC 2119, March 1997. 510 [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission 511 Timer", RFC 2988, November 2000. 513 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 514 Control", RFC 5681, September 2009. 516 11.2. Informative References 518 [C98] Cardwell, N., "delayed ACKs for retransmitted packets: ouch!". 519 November 1998, Email to the tcpimpl mailing list, Message-ID 520 "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu", 521 archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". 523 [F98] Floyd, S., Revisions to RFC 2001, "Presentation to the TCPIMPL 524 Working Group", August 1998. URLs 525 "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.ps" and 526 "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.pdf". 528 [F03] Floyd, S., "Moving NewReno from Experimental to Proposed 529 Standard? Presentation to the TSVWG Working Group", March 2003. 530 URLs "http://www.icir.org/floyd/talks/newreno-Mar03.ps" and 531 "http://www.icir.org/floyd/talks/newreno-Mar03.pdf". 533 [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of Tahoe, 534 Reno and SACK TCP", Computer Communication Review, July 1996. URL 535 "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z". 537 [F94] Floyd, S., "TCP and Successive Fast Retransmits", Technical 538 report, October 1994. URL 539 "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps". 541 [GF04] Gurtov, A. and S. Floyd, "Resolving Acknowledgment Ambiguity 542 in non-SACK TCP", Next Generation Teletraffic and 543 Wired/Wireless Advanced Networking (NEW2AN'04), February 544 2004. URL "http://www.cs.helsinki.fi/u/gurtov/papers/ 545 heuristics.html". 547 [Gur03] Gurtov, A., "[Tsvwg] resolving the problem of unnecessary fast 548 retransmits in go-back-N", email to the tsvwg mailing list, message 549 ID <3F25B467.9020609@cs.helsinki.fi>, July 28, 2003. URL 550 "http://www1.ietf.org/mail-archive/working-groups/tsvwg/current/msg04334.html". 552 [Hen98] Henderson, T., Re: NewReno and the 2001 Revision. September 553 1998. Email to the tcpimpl mailing list, Message ID 554 "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS.Berkeley.EDU", 555 archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". 557 [Hoe95] Hoe, J., "Startup Dynamics of TCP's Congestion Control and 558 Avoidance Schemes", Master's Thesis, MIT, 1995. 560 [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion 561 Control Scheme for TCP", ACM SIGCOMM, August 1996. URL 562 "http://www.acm.org/sigcomm/sigcomm96/program.html". 564 [LM97] Lin, D. and R. Morris, "Dynamics of Random Early Detection", 565 SIGCOMM 97, September 1997. URL 566 "http://www.acm.org/sigcomm/sigcomm97/program.html". 568 [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/". 570 [PF01] Padhye, J. and S. Floyd, "Identifying the TCP Behavior of Web 571 Servers", June 2001, SIGCOMM 2001. 573 [RFC1323] Jacobson, V., Braden, R. and D. Borman, "TCP Extensions for 574 High Performance", RFC 1323, May 1992. 576 [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to 577 TCP's Fast Recovery Algorithm", RFC 2582, April 1999. 579 [RFC2883] Floyd, S., J. Mahdavi, M. Mathis, and M. Podolsky, "The 580 Selective Acknowledgment (SACK) Option for TCP, RFC 2883, July 2000. 582 [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's 583 Loss Recovery Using Limited Transmit", RFC 3042, January 2001. 585 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for 586 TCP", RFC 3522, April 2003. 588 [RFC3782] Floyd, S., T. Henderson, and A. Gurtov, "The NewReno 589 Modification to TCP's Fast Recovery Algorithm", RFC 3782, April 2004. 591 Appendix A. Additional Information 593 Previous versions of this RFC ([RFC2582], [RFC3782]) contained 594 additional informative material on the following subjects, and 595 may be consulted by readers who may want more information about 596 possible variants to the algorithm and who may want references 597 to specific [NS] simulations that provide NewReno test cases. 599 Section 4 of [RFC3782] discusses some alternative behaviors for 600 resetting the retransmit timer after a partial acknowledgment. 602 Section 5 of [RFC3782] discusses some alternative behaviors for 603 performing retransmission after a partial acknowledgment. 605 Section 6 of [RFC3782] describes more information about the 606 motivation for the sender's state variable Recover. 608 Section 9 of [RFC3782] introduces some NS simulation test 609 suites for NewReno. In addition, references to simulation 610 results can be found throughout [RFC3782]. 612 Section 10 of [RFC3782] provides a comparison of Reno and 613 NewReno TCP. 615 Section 11 of [RFC3782] listed changes relative to [RFC3782]. 617 Appendix B. Changes Relative to RFC 3782 619 In [RFC3782], the cwnd after Full ACK reception will be set to 620 (1) min (ssthresh, FlightSize + SMSS) or (2) ssthresh. However, 621 there is a risk in the first logic which results in performance 622 degradation. With the first logic, if FlightSize is zero, the result 623 will be 1 SMSS. This means TCP can transmit only 1 segment at this 624 moment, which can cause delay in ACK transmission at receiver due to 625 delayed ACK algorithm. 627 The FlightSize on Full ACK reception can be zero in some situations. 628 A typical example is where sending window size during fast recovery is 629 small. In this case, the retransmitted packet and new data packets can 630 be transmitted within a short interval. If all these packets 631 successfully arrive, the receiver may generate a Full ACK that 632 acknowledges all outstanding data. Even if window size is not small, 633 loss of ACK packets or receive buffer shortage during fast recovery can 634 also increase the possibility to fall into this situation. 636 The proposed fix in this document ensures that sender TCP transmits at 637 least two segments on Full ACK reception. 639 In addition, errata for RFC3782 (editorial clarification to Section 8 640 of RFC2582, which is now Section 6 of this document) has been applied. 642 The specification text (Section 3.2 herein) was rewritten to more 643 closely track Section 3.2 of [RFC5681]. 645 Sections 4, 5, 9-11 of [RFC3782] were removed, and instead Appendix 646 A of this document was added to back-reference this informative 647 material. 649 Appendix C. Document Revision History 650 To be removed upon publication 652 +----------+--------------------------------------------------+ 653 | Revision | Comments | 654 +----------+--------------------------------------------------+ 655 | draft-00 | RFC3782 errata applied, and changes applied from | 656 | | draft-nishida-newreno-modification-02 | 657 +----------+--------------------------------------------------+ 658 | draft-01 | Non-normative sections moved to appendices, | 659 | | editorial clarifications applied as suggested | 660 | | by Alexander Zimmermann. | 661 +----------+--------------------------------------------------+ 662 | draft-02 | Better align specification text with RFC5681. | 663 | | Replace informative appendices by a new appendix | 664 | | that just provides back-references to earlier | 665 | | NewReno RFCs. | 666 +----------+--------------------------------------------------+ 668 Authors' Addresses 670 Tom Henderson 671 The Boeing Company 673 EMail: thomas.r.henderson@boeing.com 675 Sally Floyd 676 International Computer Science Institute 678 Phone: +1 (510) 666-2989 679 EMail: floyd@acm.org 680 URL: http://www.icir.org/floyd/ 682 Andrei Gurtov 683 HIIT 684 Helsinki Institute for Information Technology 685 P.O. Box 19215 686 00076 Aalto 687 Finland 689 EMail: gurtov@hiit.fi 691 Yoshifumi Nishida 692 WIDE Project 693 Endo 5322 694 Fujisawa, Kanagawa 252-8520 695 Japan 697 Email: nishida@wide.ad.jp