idnits 2.17.1 draft-ietf-tcpm-rfc3782-bis-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 781 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 15 instances of too long lines in the document, the longest one being 10 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 5, 2011) is 4526 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC6298' is defined on line 512, but no explicit reference was found in the text == Unused Reference: 'F98' is defined on line 523, but no explicit reference was found in the text == Unused Reference: 'F03' is defined on line 528, but no explicit reference was found in the text == Unused Reference: 'PF01' is defined on line 572, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2001 (ref. 'F98') (Obsoleted by RFC 2581) -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 2582 (Obsoleted by RFC 3782) -- Obsolete informational reference (is this intentional?): RFC 3782 (Obsoleted by RFC 6582) Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TCP Maintenance and Minor T. Henderson 2 Extensions Working Group Boeing 3 Internet-Draft S. Floyd 4 Obsoletes: 3782 (if approved) ICSI 5 Intended status: Standards Track A. Gurtov 6 Expires: June 5, 2012 HIIT 7 Y. Nishida 8 WIDE Project 9 December 5, 2011 11 The NewReno Modification to TCP's Fast Recovery Algorithm 12 draft-ietf-tcpm-rfc3782-bis-04.txt 14 Abstract 16 RFC 5681 documents the following four intertwined TCP 17 congestion control algorithms: slow start, congestion avoidance, fast 18 retransmit, and fast recovery. RFC 5681 explicitly allows 19 certain modifications of these algorithms, including modifications 20 that use the TCP Selective Acknowledgement (SACK) option (RFC 2883), 21 and modifications that respond to "partial acknowledgments" (ACKs 22 which cover new data, but not all the data outstanding when loss was 23 detected) in the absence of SACK. This document describes a specific 24 algorithm for responding to partial acknowledgments, referred to as 25 NewReno. This response to partial acknowledgments was first proposed 26 by Janey Hoe. This document obsoletes RFC 3782. 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on June 5, 2012. 45 Copyright Notice 47 Copyright (c) 2011 IETF Trust and the persons identified as 48 the document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 This document may contain material from IETF Documents or IETF 61 Contributions published or made publicly available before November 62 10, 2008. The person(s) controlling the copyright in some of this 63 material may not have granted the IETF Trust the right to allow 64 modifications of such material outside the IETF Standards Process. 65 Without obtaining an adequate license from the person(s) controlling 66 the copyright in such materials, this document may not be modified 67 outside the IETF Standards Process, and derivative works of it may 68 not be created outside the IETF Standards Process, except to format 69 it for publication as an RFC or to translate it into languages other 70 than English. 72 1. Introduction 74 For the typical implementation of the TCP Fast Recovery algorithm 75 described in [RFC5681] (first implemented in the 1990 BSD Reno 76 release, and referred to as the Reno algorithm in [FF96]), the TCP 77 data sender only retransmits a packet after a retransmit timeout has 78 occurred, or after three duplicate acknowledgments have arrived 79 triggering the Fast Retransmit algorithm. A single retransmit 80 timeout might result in the retransmission of several data packets, 81 but each invocation of the Fast Retransmit algorithm in RFC 5681 82 leads to the retransmission of only a single data packet. 84 Two problems arise with Reno TCP when multiple packet losses occur 85 in a single window. First, Reno will often take a timeout, as 86 has been documented in [Hoe95]. Second, even if a retransmission 87 timeout is avoided, multiple fast retransmits and window reductions 88 can occur, as documented in [F94]. When multiple packet losses 89 occur, if the SACK option [RFC2883] is available, the TCP sender 90 has the information to make intelligent decisions about which packets 91 to retransmit and which packets not to retransmit during Fast 92 Recovery. This document applies to TCP connections that are 93 unable to use the TCP Selective Acknowledgement (SACK) option, 94 either because the option is not locally supported or 95 because the TCP peer did not indicate a willingness to use SACK. 97 In the absence of SACK, there is little information available to the 98 TCP sender in making retransmission decisions during Fast 99 Recovery. From the three duplicate acknowledgments, the sender 100 infers a packet loss, and retransmits the indicated packet. After 101 this, the data sender could receive additional duplicate 102 acknowledgments, as the data receiver acknowledges additional data 103 packets that were already in flight when the sender entered Fast 104 Retransmit. 106 In the case of multiple packets dropped from a single window of data, 107 the first new information available to the sender comes when the 108 sender receives an acknowledgment for the retransmitted packet (that 109 is, the packet retransmitted when Fast Retransmit was first 110 entered). If there is a single packet drop and no reordering, then 111 the acknowledgment for this packet will acknowledge all of the 112 packets transmitted before Fast Retransmit was entered. However, if 113 there are multiple packet drops, then the acknowledgment for the 114 retransmitted packet will acknowledge some but not all of the packets 115 transmitted before the Fast Retransmit. We call this acknowledgment 116 a partial acknowledgment. 118 Along with several other suggestions, [Hoe95] suggested that during 119 Fast Recovery the TCP data sender responds to a partial 120 acknowledgment by inferring that the next in-sequence packet has been 121 lost, and retransmitting that packet. This document describes a 122 modification to the Fast Recovery algorithm in RFC 5681 that 123 incorporates a response to partial acknowledgments received during 124 Fast Recovery. We call this modified Fast Recovery algorithm 125 NewReno, because it is a slight but significant variation of the 126 basic Reno algorithm in RFC 5681. This document does not discuss the 127 other suggestions in [Hoe95] and [Hoe96], such as a change to the 128 ssthresh parameter during Slow-Start, or the proposal to send a new 129 packet for every two duplicate acknowledgments during Fast 130 Recovery. The version of NewReno in this document also draws on 131 other discussions of NewReno in the literature [LM97, Hen98]. 133 We do not claim that the NewReno version of Fast Recovery described 134 here is an optimal modification of Fast Recovery for responding to 135 partial acknowledgments, for TCP connections that are unable to use 136 SACK. Based on our experiences with the NewReno modification in the 137 NS simulator [NS] and with numerous implementations of NewReno, we 138 believe that this modification improves the performance of the Fast 139 Retransmit and Fast Recovery algorithms in a wide variety of 140 scenarios. Previous versions of this RFC [RFC2582, RFC3782] provide 141 simulation-based evidence of the possible performance gains. 143 2. Terminology and Definitions 145 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 146 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 147 "OPTIONAL" in this document are to be interpreted as described in 148 RFC 2119 [RFC2119]. 150 This document assumes that the reader is familiar with the terms 151 SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and 152 FLIGHT SIZE (FlightSize) defined in [RFC5681]. FLIGHT SIZE is 153 defined as in [RFC5681] as follows: 155 FLIGHT SIZE: 156 The amount of data that has been sent but not yet cumulatively 157 acknowledged. 159 This document defines an additional sender-side state variable 160 called RECOVER: 162 RECOVER: 163 When in Fast Recovery, this variable records the send sequence 164 number that must be acknowledged before the Fast Recovery 165 procedure is declared to be over. 167 3. The Fast Retransmit and Fast Recovery Algorithms in NewReno 169 3.1. Protocol Overview 171 The basic idea of these extensions to the Fast Retransmit and 172 Fast Recovery algorithms described in Section 3.2 of [RFC5681] 173 is as follows. The TCP sender can infer, from the arrival of 174 duplicate acknowledgments, whether multiple losses in the same 175 window of data have most likely occurred, and avoid taking a 176 retransmit timeout or making multiple congestion window reductions 177 due to such an event. 179 The NewReno modification applies to the Fast Recovery procedure that 180 begins when three duplicate ACKs are received and ends when either a 181 retransmission timeout occurs or an ACK arrives that acknowledges all 182 of the data up to and including the data that was outstanding when 183 the Fast Recovery procedure began. 185 3.2. Specification 187 The procedures specified in Section 3.2 of [RFC5681] are followed 188 with the following modifications. 190 1) Initialization of TCP protocol control block: 191 When the TCP protocol control block is initialized, Recover is 192 set to the initial send sequence number. 194 2) Three duplicate ACKs: 195 When the third duplicate ACK is received, the TCP sender first 196 checks the value of Recover to see if the Cumulative 197 Acknowledgment field covers more than Recover. If so, the value 198 of Recover is incremented to the value of the highest sequence 199 number transmitted by the TCP so far. The TCP then enters Fast 200 Retransmit (step 2 of Section 3.2 of [RFC5681]). If not, the TCP 201 does not enter fast retransmit and does not reset ssthresh. 203 3) Response to newly acknowledged data: 204 Step 6 of [RFC5681] specifies the response to the next ACK that 205 acknowledges previously unacknowledged data. When an ACK 206 arrives that acknowledges new data, this ACK could be the 207 acknowledgment elicited by the retransmission from step 2, or 208 elicited by a later retransmission. There are two cases. 210 Full acknowledgments: 211 If this ACK acknowledges all of the data up to and including 212 Recover, then the ACK acknowledges all the intermediate 213 segments sent between the original transmission of the lost 214 segment and the receipt of the third duplicate ACK. Set cwnd to 215 either (1) min (ssthresh, max(FlightSize, SMSS) + SMSS) or 216 (2) ssthresh, where ssthresh is the value set when Fast Retransmit 217 was entered, and where FlightSize in (1) is the amount of data 218 presently outstanding. This is termed "deflating" the window. 219 If the second option is selected, the implementation 220 is encouraged to take measures to avoid a possible burst of 221 data, in case the amount of data outstanding in the network is 222 much less than the new congestion window allows. A simple 223 mechanism is to limit the number of data packets that can be sent 224 in response to a single acknowledgment. Exit the Fast Recovery 225 procedure. 227 Partial acknowledgments: 228 If this ACK does *not* acknowledge all of the data up to and 229 including Recover, then this is a partial ACK. In this case, 230 retransmit the first unacknowledged segment. Deflate the 231 congestion window by the amount of new data acknowledged by the 232 cumulative acknowledgment field. If the partial ACK 233 acknowledges at least one SMSS of new data, then add back SMSS 234 bytes to the congestion window. This artificially 235 inflates the congestion window in order to reflect the additional 236 segment that has left the network. Send a new segment if 237 permitted by the new value of cwnd. This "partial window 238 deflation" attempts to ensure that, when Fast Recovery eventually 239 ends, approximately ssthresh amount of data will be outstanding 240 in the network. Do not exit the Fast Recovery procedure (i.e., 241 if any duplicate ACKs subsequently arrive, execute Step 4 of 242 Section 3.2 of [RFC5681]. 244 For the first partial ACK that arrives during Fast Recovery, also 245 reset the retransmit timer. Timer management is discussed in 246 more detail in Section 4. 248 4) Retransmit timeouts: 249 After a retransmit timeout, record the highest sequence number 250 transmitted in the variable Recover and exit the Fast 251 Recovery procedure if applicable. 253 Step 2 above specifies a check that the Cumulative Acknowledgment 254 field covers more than Recover. Because the acknowledgment field 255 contains the sequence number that the sender next expects to receive, 256 the acknowledgment "ack_number" covers more than Recover when: 258 ack_number - 1 > Recover; 260 i.e., at least one byte more of data is acknowledged beyond the 261 highest byte that was outstanding when Fast Retransmit was last 262 entered. 264 Note that in Step 3 above, the congestion window is deflated after 265 a partial acknowledgment is received. The congestion window was 266 likely to have been inflated considerably when the partial 267 acknowledgment was received. In addition, depending on the original 268 pattern of packet losses, the partial acknowledgment might 269 acknowledge nearly a window of data. In this case, if the congestion 270 window was not deflated, the data sender might be able to send nearly 271 a window of data back-to-back. 273 This document does not specify the sender's response to duplicate 274 ACKs when the Fast Retransmit/Fast Recovery algorithm is not 275 invoked. This is addressed in other documents, such as those 276 describing the Limited Transmit procedure [RFC3042]. This document 277 also does not address issues of adjusting the duplicate 278 acknowledgment threshold, but assumes the threshold specified in 279 the IETF standards; the current standard is [RFC5681], which 280 specifies a threshold of three duplicate acknowledgments. 282 As a final note, we would observe that in the absence of the SACK 283 option, the data sender is working from limited information. When 284 the issue of recovery from multiple dropped packets from a single 285 window of data is of particular importance, the best alternative 286 would be to use the SACK option. 288 4. Handling Duplicate Acknowledgments After A Timeout 290 After each retransmit timeout, the highest sequence number 291 transmitted so far is recorded in the variable "recover". 292 If, after a retransmit timeout, the TCP data sender retransmits three 293 consecutive packets that have already been received by the data 294 receiver, then the TCP data sender will receive three duplicate 295 acknowledgments that do not cover more than "recover". In this 296 case, the duplicate acknowledgments are not an indication of a new 297 instance of congestion. They are simply an indication that the 298 sender has unnecessarily retransmitted at least three packets. 300 However, when a retransmitted packet is itself dropped, the sender 301 can also receive three duplicate acknowledgments that do not cover 302 more than "recover". In this case, the sender would have been 303 better off if it had initiated Fast Retransmit. For a TCP that 304 implements the algorithm specified in Section 3.2 of this document, the 305 sender does not infer a packet drop from duplicate acknowledgments 306 in this scenario. As always, the retransmit timer is the backup 307 mechanism for inferring packet loss in this case. 309 There are several heuristics, based on timestamps or on the amount of 310 advancement of the cumulative acknowledgment field, that allow the 311 sender to distinguish, in some cases, between three duplicate 312 acknowledgments following a retransmitted packet that was dropped, 313 and three duplicate acknowledgments from the unnecessary 314 retransmission of three packets [Gur03, GF04]. The TCP sender MAY 315 use such a heuristic to decide to invoke a Fast Retransmit in some 316 cases, even when the three duplicate acknowledgments do not cover 317 more than "recover". 319 For example, when three duplicate acknowledgments are caused by the 320 unnecessary retransmission of three packets, this is likely to be 321 accompanied by the cumulative acknowledgment field advancing by at 322 least four segments. Similarly, a heuristic based on timestamps uses 323 the fact that when there is a hole in the sequence space, the 324 timestamp echoed in the duplicate acknowledgment is the timestamp of 325 the most recent data packet that advanced the cumulative 326 acknowledgment field [RFC1323]. If timestamps are used, and the 327 sender stores the timestamp of the last acknowledged segment, then 328 the timestamp echoed by duplicate acknowledgments can be used to 329 distinguish between a retransmitted packet that was dropped and 330 three duplicate acknowledgments from the unnecessary 331 retransmission of three packets. 333 4.1. ACK Heuristic 335 If the ACK-based heuristic is used, then following the advancement of 336 the cumulative acknowledgment field, the sender stores the value of 337 the previous cumulative acknowledgment as prev_highest_ack, and 338 stores the latest cumulative ACK as highest_ack. In addition, the 339 following check is performed if, in Step 2 of Section 3.2, the 340 Cumulative Acknowledgment field does not cover more than "recover". 342 1*) If the Cumulative Acknowledgment field didn't cover more than 343 "recover", check to see if the congestion window is greater 344 than SMSS bytes and the difference between highest_ack and 345 prev_highest_ack is at most 4*SMSS bytes. If true, duplicate 346 ACKs indicate a lost segment (enter Fast Retransmit). Otherwise, 347 duplicate ACKs likely result from unnecessary retransmissions 348 (do not enter Fast Retransmit). 350 The congestion window check serves to protect against fast retransmit 351 immediately after a retransmit timeout. 353 If several ACKs are lost, the sender can see a jump in the cumulative 354 ACK of more than three segments, and the heuristic can fail. 355 [RFC5681] recommends that a receiver should 356 send duplicate ACKs for every out-of-order data packet, such as a 357 data packet received during Fast Recovery. The ACK heuristic is more 358 likely to fail if the receiver does not follow this advice, because 359 then a smaller number of ACK losses are needed to produce a 360 sufficient jump in the cumulative ACK. 362 4.2. Timestamp Heuristic 364 If this heuristic is used, the sender stores the timestamp of the 365 last acknowledged segment. In addition, the last sentence of step 366 2 in Section 3.2 is replaced as follows: 368 1**) If the Cumulative Acknowledgment field didn't cover more than 369 "recover", check to see if the echoed timestamp in the last 370 non-duplicate acknowledgment equals the 371 stored timestamp. If true, duplicate ACKs indicate a lost 372 segment (enter Fast Retransmit). Otherwise, duplicate 373 ACKs likely result from unnecessary retransmissions (do not enter 374 Fast Retransmit). 376 The timestamp heuristic works correctly, both when the receiver 377 echoes timestamps as specified by [RFC1323], and by its revision 378 attempts. However, if the receiver arbitrarily echoes timestamps, 379 the heuristic can fail. The heuristic can also fail if a timeout was 380 spurious and returning ACKs are not from retransmitted segments. 381 This can be prevented by detection algorithms such as [RFC3522]. 383 5. Implementation Issues for the Data Receiver 385 [RFC5681] specifies that "Out-of-order data segments SHOULD be 386 acknowledged immediately, in order to accelerate loss recovery." 387 Neal Cardwell has noted that some data receivers do not send an 388 immediate acknowledgment when they send a partial acknowledgment, 389 but instead wait first for their delayed acknowledgment timer to 390 expire [C98]. As [C98] notes, this severely limits the potential 391 benefit of NewReno by delaying the receipt of the partial 392 acknowledgment at the data sender. Echoing [RFC5681], our 393 recommendation is that the data receiver send an immediate 394 acknowledgment for an out-of-order segment, even when that 395 out-of-order segment fills a hole in the buffer. 397 6. Implementation Issues for the Data Sender 399 In Section 3, Step 5 above, it is noted that implementations should 400 take measures to avoid a possible burst of data when leaving Fast 401 Recovery, in case the amount of new data that the sender is eligible 402 to send due to the new value of the congestion window is large. This 403 can arise during NewReno when ACKs are lost or treated as pure window 404 updates, thereby causing the sender to underestimate the number of 405 new segments that can be sent during the recovery procedure. 406 Specifically, bursts can occur when the FlightSize is much less than 407 the new congestion window when exiting from Fast Recovery. One 408 simple mechanism to avoid a burst of data when leaving Fast Recovery 409 is to limit the number of data packets that can be sent in response 410 to a single acknowledgment. (This is known as "maxburst_" in the ns 411 simulator.) Other possible mechanisms for avoiding bursts include 412 rate-based pacing, or setting the slow-start threshold to the 413 resultant congestion window and then resetting the congestion window 414 to FlightSize. A recommendation on the general mechanism to avoid 415 excessively bursty sending patterns is outside the scope of this 416 document. 418 An implementation may want to use a separate flag to record whether 419 or not it is presently in the Fast Recovery procedure. The use of 420 the value of the duplicate acknowledgment counter for this purpose is 421 not reliable because it can be reset upon window updates and 422 out-of-order acknowledgments. 424 When updating the Cumulative Acknowledgment field outside of 425 Fast Recovery, the "recover" state variable may also need to be 426 updated in order to continue to permit possible entry into Fast 427 Recovery (Section 3, step 1). This issue arises when an update 428 of the Cumulative Acknowledgment field results in a sequence 429 wraparound that affects the ordering between the Cumulative 430 Acknowledgment field and the "recover" state variable. Entry 431 into Fast Recovery is only possible when the Cumulative 432 Acknowledgment field covers more than the "recover" state variable. 434 It is important for the sender to respond correctly to duplicate ACKs 435 received when the sender is no longer in Fast Recovery (e.g., because 436 of a Retransmit Timeout). The Limited Transmit procedure [RFC3042] 437 describes possible responses to the first and second duplicate 438 acknowledgments. When three or more duplicate acknowledgments are 439 received, the Cumulative Acknowledgment field doesn't cover more 440 than "recover", and a new Fast Recovery is not invoked, it is 441 important that the sender not execute the Fast Recovery steps (3) and 442 (4) in Section 3. Otherwise, the sender could end up in a chain of 443 spurious timeouts. We mention this only because several NewReno 444 implementations had this bug, including the implementation in the NS 445 simulator. 447 It has been observed that some TCP implementations enter a slow start 448 or congestion avoidance window updating algorithm immediately after 449 the cwnd is set by the equation found in (Section 3, step 5), even 450 without a new external event generating the cwnd change. Note that 451 after cwnd is set based on the procedure for exiting Fast Recovery 452 (Section 3, step 5), cwnd SHOULD NOT be updated until a further 453 event occurs (e.g., arrival of an ack, or timeout) after this 454 adjustment. 456 7. Security Considerations 458 [RFC5681] discusses general security considerations concerning TCP 459 congestion control. This document describes a specific algorithm 460 that conforms with the congestion control requirements of [RFC5681], 461 and so those considerations apply to this algorithm, too. There are 462 no known additional security concerns for this specific algorithm. 464 8. IANA Considerations 466 This document has no actions for IANA. 468 9. Conclusions 470 This document specifies the NewReno Fast Retransmit and Fast Recovery 471 algorithms for TCP. This NewReno modification to TCP can even be 472 important for TCP implementations that support the SACK option, 473 because the SACK option can only be used for TCP connections when 474 both TCP end-nodes support the SACK option. NewReno performs better 475 than Reno (RFC5681) in a number of scenarios discussed in 476 previous versions of this RFC ([RFC2582], [RFC3782]). 478 A number of options to the basic algorithm presented in Section 3 are 479 also referenced in Appendix A to this document. These include the 480 handling of the retransmission timer, the response to partial 481 acknowledgments, and whether or not the sender must maintain a state 482 variable called Recover. Our belief is that the differences 483 between these variants of NewReno are small compared to the 484 differences between Reno and NewReno. That is, the important thing 485 is to implement NewReno instead of Reno, for a TCP connection 486 without SACK; it is less important exactly which of the variants of 487 NewReno is implemented. 489 10. Acknowledgments 491 Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu, 492 Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed 493 feedback on this document or on its precursor, RFC 2582. Jeffrey 494 Hsu provided clarifications on the handling of the recover variable 495 that were applied to RFC 3782 as errata, and now are in Section 8 496 of this document. Yoshifumi Nishida contributed a modification 497 to the fast recovery algorithm to account for the case in which 498 flightsize is 0 when the TCP sender leaves fast recovery, and the 499 TCP receiver uses delayed acknowledgments. Alexander Zimmermann 500 provided several suggestions to improve the clarity of the document. 502 11. References 504 11.1. Normative References 506 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 507 Requirement Levels", BCP 14, RFC 2119, March 1997. 509 [RFC5681] Allman, M., Paxson, V. and E. Blanton, "TCP Congestion 510 Control", RFC 5681, September 2009. 512 [RFC6298] Paxson, V., M. Allman, J. Chu, and M. Sargent, "Computing 513 TCP's Retransmission Timer", RFC 6298, June 2011. 515 11.2. Informative References 517 [C98] Cardwell, N., "delayed ACKs for retransmitted packets: 518 ouch!". November 1998, Email to the tcpimpl mailing list, 519 Message-ID 520 "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu", 521 archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". 523 [F98] Floyd, S., Revisions to RFC 2001, "Presentation to the 524 TCPIMPL Working Group", August 1998. URLs 525 "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.ps" and 526 "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.pdf". 528 [F03] Floyd, S., "Moving NewReno from Experimental to Proposed 529 Standard? Presentation to the TSVWG Working Group", March 2003. 530 URLs "http://www.icir.org/floyd/talks/newreno-Mar03.ps" and 531 "http://www.icir.org/floyd/talks/newreno-Mar03.pdf". 533 [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of 534 Tahoe, Reno and SACK TCP", Computer Communication Review, July 1996. 535 URL "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z". 537 [F94] Floyd, S., "TCP and Successive Fast Retransmits", Technical 538 report, October 1994. URL 539 "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps". 541 [GF04] Gurtov, A. and S. Floyd, "Resolving Acknowledgment 542 Ambiguity in non-SACK TCP", Next Generation Teletraffic and 543 Wired/Wireless Advanced Networking (NEW2AN'04), February 544 2004. URL "http://www.cs.helsinki.fi/u/gurtov/papers/ 545 heuristics.html". 547 [Gur03] Gurtov, A., "[Tsvwg] resolving the problem of unnecessary 548 fast retransmits in go-back-N", email to the tsvwg mailing list, 549 message ID <3F25B467.9020609@cs.helsinki.fi>, July 28, 2003. URL 550 "http://www1.ietf.org/mail-archive/working-groups/tsvwg/current/ 551 msg04334.html". 553 [Hen98] Henderson, T., Re: NewReno and the 2001 Revision. September 554 1998. Email to the tcpimpl mailing list, Message ID 555 "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS.Berkeley.EDU", 556 archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl". 558 [Hoe95] Hoe, J., "Startup Dynamics of TCP's Congestion Control and 559 Avoidance Schemes", Master's Thesis, MIT, 1995. 561 [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion 562 Control Scheme for TCP", ACM SIGCOMM, August 1996. URL 563 "http://www.acm.org/sigcomm/sigcomm96/program.html". 565 [LM97] Lin, D. and R. Morris, "Dynamics of Random Early 566 Detection", SIGCOMM 97, September 1997. URL 567 "http://www.acm.org/sigcomm/sigcomm97/program.html". 569 [NS] The Network Simulator (NS). 570 URL "http://www.isi.edu/nsnam/ns/". 572 [PF01] Padhye, J. and S. Floyd, "Identifying the TCP Behavior of 573 Web Servers", June 2001, SIGCOMM 2001. 575 [RFC1323] Jacobson, V., Braden, R. and D. Borman, "TCP Extensions for 576 High Performance", RFC 1323, May 1992. 578 [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to 579 TCP's Fast Recovery Algorithm", RFC 2582, April 1999. 581 [RFC2883] Floyd, S., J. Mahdavi, M. Mathis, and M. Podolsky, "The 582 Selective Acknowledgment (SACK) Option for TCP, RFC 2883, July 2000. 584 [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's 585 Loss Recovery Using Limited Transmit", RFC 3042, January 2001. 587 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for 588 TCP", RFC 3522, April 2003. 590 [RFC3782] Floyd, S., T. Henderson, and A. Gurtov, "The NewReno 591 Modification to TCP's Fast Recovery Algorithm", RFC 3782, April 2004. 593 Appendix A. Additional Information 595 Previous versions of this RFC ([RFC2582], [RFC3782]) contained 596 additional informative material on the following subjects, and 597 may be consulted by readers who may want more information about 598 possible variants to the algorithm and who may want references 599 to specific [NS] simulations that provide NewReno test cases. 601 Section 4 of [RFC3782] discusses some alternative behaviors for 602 resetting the retransmit timer after a partial acknowledgment. 604 Section 5 of [RFC3782] discusses some alternative behaviors for 605 performing retransmission after a partial acknowledgment. 607 Section 6 of [RFC3782] describes more information about the 608 motivation for the sender's state variable Recover. 610 Section 9 of [RFC3782] introduces some NS simulation test 611 suites for NewReno. In addition, references to simulation 612 results can be found throughout [RFC3782]. 614 Section 10 of [RFC3782] provides a comparison of Reno and 615 NewReno TCP. 617 Section 11 of [RFC3782] listed changes relative to [RFC3782]. 619 Appendix B. Changes Relative to RFC 3782 621 In [RFC3782], the cwnd after Full ACK reception will be set to 622 (1) min (ssthresh, FlightSize + SMSS) or (2) ssthresh. However, 623 there is a risk in the first option which results in performance 624 degradation. With the first option, if FlightSize is zero, the 625 result will be 1 SMSS. This means TCP can transmit only 1 segment 626 at this moment, which can cause delay in ACK transmission at receiver 627 due to delayed ACK algorithm. 629 The FlightSize on Full ACK reception can be zero in some situations. 630 A typical example is where sending window size during fast recovery 631 is small. In this case, the retransmitted packet and new data packets 632 can be transmitted within a short interval. If all these packets 633 successfully arrive, the receiver may generate a Full ACK that 634 acknowledges all outstanding data. Even if window size is not small, 635 loss of ACK packets or receive buffer shortage during fast recovery 636 can also increase the possibility of falling into this situation. 638 The proposed fix in this document, which sets cwnd to at least 2*SMSS 639 if the implementation uses option 1 in the Full ACK case (Section 3.2, 640 step 3, option 1), ensures that the sender TCP transmits at least two 641 segments on Full ACK reception. 643 In addition, errata for RFC3782 (editorial clarification to Section 8 644 of RFC2582, which is now Section 6 of this document) has been 645 applied. 647 The specification text (Section 3.2 herein) was rewritten to more 648 closely track Section 3.2 of [RFC5681]. 650 Sections 4, 5, 9-11 of [RFC3782] were removed, and instead Appendix 651 A of this document was added to back-reference this informative 652 material. 654 Appendix C. Document Revision History 656 To be removed upon publication 658 +----------+--------------------------------------------------+ 659 | Revision | Comments | 660 +----------+--------------------------------------------------+ 661 | draft-00 | RFC3782 errata applied, and changes applied from | 662 | | draft-nishida-newreno-modification-02 | 663 +----------+--------------------------------------------------+ 664 | draft-01 | Non-normative sections moved to appendices, | 665 | | editorial clarifications applied as suggested | 666 | | by Alexander Zimmermann. | 667 +----------+--------------------------------------------------+ 668 | draft-02 | Better align specification text with RFC5681. | 669 | | Replace informative appendices by a new appendix | 670 | | that just provides back-references to earlier | 671 | | NewReno RFCs. | 672 +----------+--------------------------------------------------+ 674 Authors' Addresses 676 Tom Henderson 677 The Boeing Company 679 EMail: thomas.r.henderson@boeing.com 681 Sally Floyd 682 International Computer Science Institute 684 Phone: +1 (510) 666-2989 685 EMail: floyd@acm.org 686 URL: http://www.icir.org/floyd/ 688 Andrei Gurtov 689 HIIT 690 Helsinki Institute for Information Technology 691 P.O. Box 19215 692 00076 Aalto 693 Finland 695 EMail: gurtov@hiit.fi 697 Yoshifumi Nishida 698 WIDE Project 699 Endo 5322 700 Fujisawa, Kanagawa 252-8520 701 Japan 703 Email: nishida@wide.ad.jp