idnits 2.17.1 

draft-ietf-tcpm-rfc3782-bis-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (January 18, 2012) is 4472 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Obsolete informational reference (is this intentional?): RFC 1323
     (Obsoleted by RFC 7323)

  -- Obsolete informational reference (is this intentional?): RFC 2582
     (Obsoleted by RFC 3782)

  -- Obsolete informational reference (is this intentional?): RFC 3782
     (Obsoleted by RFC 6582)


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	TCP Maintenance and Minor                                   T. Henderson
3	Extensions Working Group                                          Boeing
4	Internet-Draft                                                  S. Floyd
5	Obsoletes: 3782  (if approved)                                      ICSI
6	Intended status: Standards Track                               A. Gurtov
7	Expires:  July 18, 2012                               University of Oulu
8	                                                              Y. Nishida
9	                                                            WIDE Project
10	                                                        January 18, 2012

12	       The NewReno Modification to TCP's Fast Recovery Algorithm
13	                   draft-ietf-tcpm-rfc3782-bis-05.txt

15	Abstract

17	   RFC 5681 documents the following four intertwined TCP
18	   congestion control algorithms: slow start, congestion avoidance, fast
19	   retransmit, and fast recovery.  RFC 5681 explicitly allows
20	   certain modifications of these algorithms, including modifications
21	   that use the TCP Selective Acknowledgement (SACK) option (RFC 2883),
22	   and modifications that respond to "partial acknowledgments" (ACKs
23	   which cover new data, but not all the data outstanding when loss was
24	   detected) in the absence of SACK.  This document describes a specific
25	   algorithm for responding to partial acknowledgments, referred to as
26	   NewReno.  This response to partial acknowledgments was first proposed
27	   by Janey Hoe.  This document obsoletes RFC 3782.

29	Status of this Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at http://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on July 18, 2012.

46	Copyright Notice

48	   Copyright (c) 2012 IETF Trust and the persons identified as
49	   the document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (http://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	   This document may contain material from IETF Documents or IETF
62	   Contributions published or made publicly available before November
63	   10, 2008.  The person(s) controlling the copyright in some of this
64	   material may not have granted the IETF Trust the right to allow
65	   modifications of such material outside the IETF Standards Process.
66	   Without obtaining an adequate license from the person(s) controlling
67	   the copyright in such materials, this document may not be modified
68	   outside the IETF Standards Process, and derivative works of it may
69	   not be created outside the IETF Standards Process, except to format
70	   it for publication as an RFC or to translate it into languages other
71	   than English.

73	1.  Introduction

75	   For the typical implementation of the TCP Fast Recovery algorithm
76	   described in [RFC5681] (first implemented in the 1990 BSD Reno
77	   release, and referred to as the Reno algorithm in [FF96]), the TCP
78	   data sender only retransmits a packet after a retransmit timeout has
79	   occurred, or after three duplicate acknowledgments have arrived
80	   triggering the Fast Retransmit algorithm.  A single retransmit
81	   timeout might result in the retransmission of several data packets,
82	   but each invocation of the Fast Retransmit algorithm in RFC 5681
83	   leads to the retransmission of only a single data packet.

85	   Two problems arise with Reno TCP when multiple packet losses occur
86	   in a single window.  First, Reno will often take a timeout, as
87	   has been documented in [Hoe95].  Second, even if a retransmission
88	   timeout is avoided, multiple fast retransmits and window reductions
89	   can occur, as documented in [F94].  When multiple packet losses
90	   occur, if the SACK option [RFC2883] is available, the TCP sender
91	   has the information to make intelligent decisions about which packets
92	   to retransmit and which packets not to retransmit during Fast
93	   Recovery.  This document applies to TCP connections that are
94	   unable to use the TCP Selective Acknowledgement (SACK) option,
95	   either because the option is not locally supported or
96	   because the TCP peer did not indicate a willingness to use SACK.

98	   In the absence of SACK, there is little information available to the
99	   TCP sender in making retransmission decisions during Fast
100	   Recovery.  From the three duplicate acknowledgments, the sender
101	   infers a packet loss, and retransmits the indicated packet.  After
102	   this, the data sender could receive additional duplicate
103	   acknowledgments, as the data receiver acknowledges additional data
104	   packets that were already in flight when the sender entered Fast
105	   Retransmit.

107	   In the case of multiple packets dropped from a single window of data,
108	   the first new information available to the sender comes when the
109	   sender receives an acknowledgment for the retransmitted packet (that
110	   is, the packet retransmitted when Fast Retransmit was first
111	   entered).  If there is a single packet drop and no reordering, then
112	   the acknowledgment for this packet will acknowledge all of the
113	   packets transmitted before Fast Retransmit was entered.  However, if
114	   there are multiple packet drops, then the acknowledgment for the
115	   retransmitted packet will acknowledge some but not all of the packets
116	   transmitted before the Fast Retransmit.  We call this acknowledgment
117	   a partial acknowledgment.

119	   Along with several other suggestions, [Hoe95] suggested that during
120	   Fast Recovery the TCP data sender responds to a partial
121	   acknowledgment by inferring that the next in-sequence packet has been
122	   lost, and retransmitting that packet.  This document describes a
123	   modification to the Fast Recovery algorithm in RFC 5681 that
124	   incorporates a response to partial acknowledgments received during
125	   Fast Recovery.  We call this modified Fast Recovery algorithm
126	   NewReno, because it is a slight but significant variation of the
127	   basic Reno algorithm in RFC 5681.  This document does not discuss the
128	   other suggestions in [Hoe95] and [Hoe96], such as a change to the
129	   ssthresh parameter during Slow-Start, or the proposal to send a new
130	   packet for every two duplicate acknowledgments during Fast
131	   Recovery.  The version of NewReno in this document also draws on
132	   other discussions of NewReno in the literature [LM97, Hen98].

134	   We do not claim that the NewReno version of Fast Recovery described
135	   here is an optimal modification of Fast Recovery for responding to
136	   partial acknowledgments, for TCP connections that are unable to use
137	   SACK.  Based on our experiences with the NewReno modification in the
138	   NS simulator [NS] and with numerous implementations of NewReno, we
139	   believe that this modification improves the performance of the Fast
140	   Retransmit and Fast Recovery algorithms in a wide variety of
141	   scenarios.  Previous versions of this RFC [RFC2582, RFC3782] provide
142	   simulation-based evidence of the possible performance gains.

144	2.  Terminology and Definitions

146	   This document assumes that the reader is familiar with the terms
147	   SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and
148	   FLIGHT SIZE (FlightSize) defined in [RFC5681].

150	   This document defines an additional sender-side state variable
151	   called RECOVER:

153	      RECOVER:
154	         When in Fast Recovery, this variable records the send sequence
155	         number that must be acknowledged before the Fast Recovery
156	         procedure is declared to be over.

158	3.  The Fast Retransmit and Fast Recovery Algorithms in NewReno

160	3.1.  Protocol Overview

162	   The basic idea of these extensions to the Fast Retransmit and
163	   Fast Recovery algorithms described in Section 3.2 of [RFC5681]
164	   is as follows.  The TCP sender can infer, from the arrival of
165	   duplicate acknowledgments, whether multiple losses in the same
166	   window of data have most likely occurred, and avoid taking a
167	   retransmit timeout or making multiple congestion window reductions
168	   due to such an event.

170	   The NewReno modification applies to the Fast Recovery procedure that
171	   begins when three duplicate ACKs are received and ends when either a
172	   retransmission timeout occurs or an ACK arrives that acknowledges all
173	   of the data up to and including the data that was outstanding when
174	   the Fast Recovery procedure began.

176	3.2.  Specification

178	   The procedures specified in Section 3.2 of [RFC5681] are followed
179	   with the following modifications.  Note that this specification
180	   avoids the use of the key words defined in RFC 2119 [RFC2119] since
181	   it mainly provides sender-side implementation guidance for
182	   performance improvement, and does not affect interoperability.

184	   1)  Initialization of TCP protocol control block:
185	       When the TCP protocol control block is initialized, Recover is
186	       set to the initial send sequence number.

188	   2)  Three duplicate ACKs:
189	       When the third duplicate ACK is received, the TCP sender first
190	       checks the value of Recover to see if the Cumulative
191	       Acknowledgment field covers more than Recover.  If so, the value
192	       of Recover is incremented to the value of the highest sequence
193	       number transmitted by the TCP so far.  The TCP then enters Fast
194	       Retransmit (step 2 of Section 3.2 of [RFC5681]).  If not, the TCP
195	       does not enter fast retransmit and does not reset ssthresh.

197	   3)  Response to newly acknowledged data:
198	       Step 6 of [RFC5681] specifies the response to the next ACK that
199	       acknowledges previously unacknowledged data.  When an ACK
200	       arrives that acknowledges new data, this ACK could be the
201	       acknowledgment elicited by the retransmission from step 2, or
202	       elicited by a later retransmission.  There are two cases.

204	       Full acknowledgments:
205	       If this ACK acknowledges all of the data up to and including
206	       Recover, then the ACK acknowledges all the intermediate
207	       segments sent between the original transmission of the lost
208	       segment and the receipt of the third duplicate ACK.  Set cwnd to
209	       either (1) min (ssthresh, max(FlightSize, SMSS) + SMSS) or
210	       (2) ssthresh, where ssthresh is the value set when Fast
211	       Retransmit was entered, and where FlightSize in (1) is the amount
212	       of data presently outstanding.  This is termed "deflating" the
213	       window.  If the second option is selected, the implementation
214	       is encouraged to take measures to avoid a possible burst of
215	       data, in case the amount of data outstanding in the network is
216	       much less than the new congestion window allows.  A simple
217	       mechanism is to limit the number of data packets that can be sent
218	       in response to a single acknowledgment.  Exit the Fast Recovery
219	       procedure.

221	       Partial acknowledgments:
222	       If this ACK does *not* acknowledge all of the data up to and
223	       including Recover, then this is a partial ACK.  In this case,
224	       retransmit the first unacknowledged segment.  Deflate the
225	       congestion window by the amount of new data acknowledged by the
226	       cumulative acknowledgment field.  If the partial ACK
227	       acknowledges at least one SMSS of new data, then add back SMSS
228	       bytes to the congestion window.  This artificially
229	       inflates the congestion window in order to reflect the additional
230	       segment that has left the network.  Send a new segment if
231	       permitted by the new value of cwnd.  This "partial window
232	       deflation" attempts to ensure that, when Fast Recovery eventually
233	       ends, approximately ssthresh amount of data will be outstanding
234	       in the network.  Do not exit the Fast Recovery procedure (i.e.,
235	       if any duplicate ACKs subsequently arrive, execute Step 4 of
236	       Section 3.2 of [RFC5681].

238	       For the first partial ACK that arrives during Fast Recovery, also
239	       reset the retransmit timer.  Timer management is discussed in
240	       more detail in Section 4.

242	   4)  Retransmit timeouts:
243	       After a retransmit timeout, record the highest sequence number
244	       transmitted in the variable Recover and exit the Fast
245	       Recovery procedure if applicable.

247	   Step 2 above specifies a check that the Cumulative Acknowledgment
248	   field covers more than Recover.  Because the acknowledgment field
249	   contains the sequence number that the sender next expects to receive,
250	   the acknowledgment "ack_number" covers more than Recover when:

252	      ack_number - 1 > Recover;

254	   i.e., at least one byte more of data is acknowledged beyond the
255	   highest byte that was outstanding when Fast Retransmit was last
256	   entered.

258	   Note that in Step 3 above, the congestion window is deflated after
259	   a partial acknowledgment is received.  The congestion window was
260	   likely to have been inflated considerably when the partial
261	   acknowledgment was received.  In addition, depending on the original
262	   pattern of packet losses, the partial acknowledgment might
263	   acknowledge nearly a window of data.  In this case, if the congestion
264	   window was not deflated, the data sender might be able to send nearly
265	   a window of data back-to-back.

267	   This document does not specify the sender's response to duplicate
268	   ACKs when the Fast Retransmit/Fast Recovery algorithm is not
269	   invoked.  This is addressed in other documents, such as those
270	   describing the Limited Transmit procedure [RFC3042].  This document
271	   also does not address issues of adjusting the duplicate
272	   acknowledgment threshold, but assumes the threshold specified in
273	   the IETF standards; the current standard is [RFC5681], which
274	   specifies a threshold of three duplicate acknowledgments.

276	   As a final note, we would observe that in the absence of the SACK
277	   option, the data sender is working from limited information.  When
278	   the issue of recovery from multiple dropped packets from a single
279	   window of data is of particular importance, the best alternative
280	   would be to use the SACK option.

282	4.  Handling Duplicate Acknowledgments After A Timeout

284	   After each retransmit timeout, the highest sequence number
285	   transmitted so far is recorded in the variable "recover".
286	   If, after a retransmit timeout, the TCP data sender retransmits three
287	   consecutive packets that have already been received by the data
288	   receiver, then the TCP data sender will receive three duplicate
289	   acknowledgments that do not cover more than "recover".  In this
290	   case, the duplicate acknowledgments are not an indication of a new
291	   instance of congestion.  They are simply an indication that the
292	   sender has unnecessarily retransmitted at least three packets.

294	   However, when a retransmitted packet is itself dropped, the sender
295	   can also receive three duplicate acknowledgments that do not cover
296	   more than "recover".  In this case, the sender would have been
297	   better off if it had initiated Fast Retransmit.  For a TCP sender
298	   that implements the algorithm specified in Section 3.2 of this
299	   document, the sender does not infer a packet drop from duplicate
300	   acknowledgments in this scenario.  As always, the retransmit timer
301	   is the backup mechanism for inferring packet loss in this case.

303	   There are several heuristics, based on timestamps or on the amount of
304	   advancement of the cumulative acknowledgment field, that allow the
305	   sender to distinguish, in some cases, between three duplicate
306	   acknowledgments following a retransmitted packet that was dropped,
307	   and three duplicate acknowledgments from the unnecessary
308	   retransmission of three packets [Gur03, GF04].  The TCP sender may
309	   use such a heuristic to decide to invoke a Fast Retransmit in some
310	   cases, even when the three duplicate acknowledgments do not cover
311	   more than "recover".

313	   For example, when three duplicate acknowledgments are caused by the
314	   unnecessary retransmission of three packets, this is likely to be
315	   accompanied by the cumulative acknowledgment field advancing by at
316	   least four segments.  Similarly, a heuristic based on timestamps uses
317	   the fact that when there is a hole in the sequence space, the
318	   timestamp echoed in the duplicate acknowledgment is the timestamp of
319	   the most recent data packet that advanced the cumulative
320	   acknowledgment field [RFC1323].  If timestamps are used, and the
321	   sender stores the timestamp of the last acknowledged segment, then
322	   the timestamp echoed by duplicate acknowledgments can be used to
323	   distinguish between a retransmitted packet that was dropped and
324	   three duplicate acknowledgments from the unnecessary
325	   retransmission of three packets.

327	4.1.  ACK Heuristic

329	   If the ACK-based heuristic is used, then following the advancement of
330	   the cumulative acknowledgment field, the sender stores the value of
331	   the previous cumulative acknowledgment as prev_highest_ack, and
332	   stores the latest cumulative ACK as highest_ack.  In addition, the
333	   following check is performed if, in Step 2 of Section 3.2, the
334	   Cumulative Acknowledgment field does not cover more than "recover".

336	   1*)  If the Cumulative Acknowledgment field didn't cover more than
337	        "recover", check to see if the congestion window is greater
338	        than SMSS bytes and the difference between highest_ack and
339	        prev_highest_ack is at most 4*SMSS bytes.  If true, duplicate
340	        ACKs indicate a lost segment (enter Fast Retransmit).
341	        Otherwise, duplicate ACKs likely result from unnecessary
342	        retransmissions (do not enter Fast Retransmit).

344	   The congestion window check serves to protect against fast retransmit
345	   immediately after a retransmit timeout.

347	   If several ACKs are lost, the sender can see a jump in the cumulative
348	   ACK of more than three segments, and the heuristic can fail.
349	   [RFC5681] recommends that a receiver should
350	   send duplicate ACKs for every out-of-order data packet, such as a
351	   data packet received during Fast Recovery.  The ACK heuristic is more
352	   likely to fail if the receiver does not follow this advice, because
353	   then a smaller number of ACK losses are needed to produce a
354	   sufficient jump in the cumulative ACK.

356	4.2.  Timestamp Heuristic

358	   If this heuristic is used, the sender stores the timestamp of the
359	   last acknowledged segment.  In addition, the last sentence of step
360	   2 in Section 3.2 is replaced as follows:

362	   1**) If the Cumulative Acknowledgment field didn't cover more than
363	        "recover", check to see if the echoed timestamp in the last
364	        non-duplicate acknowledgment equals the
365	        stored timestamp.  If true, duplicate ACKs indicate a lost
366	        segment (enter Fast Retransmit).  Otherwise, duplicate
367	        ACKs likely result from unnecessary retransmissions (do not
368	        enter Fast Retransmit).

370	   The timestamp heuristic works correctly, both when the receiver
371	   echoes timestamps as specified by [RFC1323], and by its revision
372	   attempts.  However, if the receiver arbitrarily echoes timestamps,
373	   the heuristic can fail.  The heuristic can also fail if a timeout was
374	   spurious and returning ACKs are not from retransmitted segments.
375	   This can be prevented by detection algorithms such as [RFC3522].

377	5.  Implementation Issues for the Data Receiver

379	   [RFC5681] specifies that "Out-of-order data segments SHOULD be
380	   acknowledged immediately, in order to accelerate loss recovery."
381	   Neal Cardwell has noted that some data receivers do not send an
382	   immediate acknowledgment when they send a partial acknowledgment,
383	   but instead wait first for their delayed acknowledgment timer to
384	   expire [C98].  As [C98] notes, this severely limits the potential
385	   benefit of NewReno by delaying the receipt of the partial
386	   acknowledgment at the data sender.  Echoing [RFC5681], our
387	   recommendation is that the data receiver send an immediate
388	   acknowledgment for an out-of-order segment, even when that
389	   out-of-order segment fills a hole in the buffer.

391	6.  Implementation Issues for the Data Sender

393	   In Section 3, Step 5 above, it is noted that implementations should
394	   take measures to avoid a possible burst of data when leaving Fast
395	   Recovery, in case the amount of new data that the sender is eligible
396	   to send due to the new value of the congestion window is large.  This
397	   can arise during NewReno when ACKs are lost or treated as pure window
398	   updates, thereby causing the sender to underestimate the number of
399	   new segments that can be sent during the recovery procedure.
400	   Specifically, bursts can occur when the FlightSize is much less than
401	   the new congestion window when exiting from Fast Recovery.  One
402	   simple mechanism to avoid a burst of data when leaving Fast Recovery
403	   is to limit the number of data packets that can be sent in response
404	   to a single acknowledgment.  (This is known as "maxburst_" in the ns
405	   simulator.)  Other possible mechanisms for avoiding bursts include
406	   rate-based pacing, or setting the slow-start threshold to the
407	   resultant congestion window and then resetting the congestion window
408	   to FlightSize.  A recommendation on the general mechanism to avoid
409	   excessively bursty sending patterns is outside the scope of this
410	   document.

412	   An implementation may want to use a separate flag to record whether
413	   or not it is presently in the Fast Recovery procedure.  The use of
414	   the value of the duplicate acknowledgment counter for this purpose is
415	   not reliable because it can be reset upon window updates and
416	   out-of-order acknowledgments.

418	   When updating the Cumulative Acknowledgment field outside of
419	   Fast Recovery, the "recover" state variable may also need to be
420	   updated in order to continue to permit possible entry into Fast
421	   Recovery (Section 3, step 1).  This issue arises when an update
422	   of the Cumulative Acknowledgment field results in a sequence
423	   wraparound that affects the ordering between the Cumulative
424	   Acknowledgment field and the "recover" state variable.  Entry
425	   into Fast Recovery is only possible when the Cumulative
426	   Acknowledgment field covers more than the "recover" state variable.

428	   It is important for the sender to respond correctly to duplicate ACKs
429	   received when the sender is no longer in Fast Recovery (e.g., because
430	   of a Retransmit Timeout).  The Limited Transmit procedure [RFC3042]
431	   describes possible responses to the first and second duplicate
432	   acknowledgments.  When three or more duplicate acknowledgments are
433	   received, the Cumulative Acknowledgment field doesn't cover more
434	   than "recover", and a new Fast Recovery is not invoked, it is
435	   important that the sender not execute the Fast Recovery steps (3) and
436	   (4) in Section 3.  Otherwise, the sender could end up in a chain of
437	   spurious timeouts.  We mention this only because several NewReno
438	   implementations had this bug, including the implementation in the NS
439	   simulator.

441	   It has been observed that some TCP implementations enter a slow start
442	   or congestion avoidance window updating algorithm immediately after
443	   the cwnd is set by the equation found in (Section 3, step 5), even
444	   without a new external event generating the cwnd change.  Note that
445	   after cwnd is set based on the procedure for exiting Fast Recovery
446	   (Section 3, step 5), cwnd should not be updated until a further
447	   event occurs (e.g., arrival of an ack, or timeout) after this
448	   adjustment.

450	7.  Security Considerations

452	   [RFC5681] discusses general security considerations concerning TCP
453	   congestion control.  This document describes a specific algorithm
454	   that conforms with the congestion control requirements of [RFC5681],
455	   and so those considerations apply to this algorithm, too.  There are
456	   no known additional security concerns for this specific algorithm.

458	8.  IANA Considerations
459	   This document has no actions for IANA.

461	9.  Conclusions

463	   This document specifies the NewReno Fast Retransmit and Fast Recovery
464	   algorithms for TCP.  This NewReno modification to TCP can even be
465	   important for TCP implementations that support the SACK option,
466	   because the SACK option can only be used for TCP connections when
467	   both TCP end-nodes support the SACK option.  NewReno performs better
468	   than Reno (RFC5681) in a number of scenarios discussed in
469	   previous versions of this RFC ([RFC2582], [RFC3782]).

471	   A number of options to the basic algorithm presented in Section 3 are
472	   also referenced in Appendix A to this document.  These include the
473	   handling of the retransmission timer, the response to partial
474	   acknowledgments, and whether or not the sender must maintain a state
475	   variable called Recover.  Our belief is that the differences
476	   between these variants of NewReno are small compared to the
477	   differences between Reno and NewReno.  That is, the important thing
478	   is to implement NewReno instead of Reno, for a TCP connection
479	   without SACK; it is less important exactly which of the variants of
480	   NewReno is implemented.

482	10.  Acknowledgments

484	   Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu,
485	   Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed
486	   feedback on this document or on its precursor, RFC 2582.  Jeffrey
487	   Hsu provided clarifications on the handling of the recover variable
488	   that were applied to RFC 3782 as errata, and now are in Section 8
489	   of this document.  Yoshifumi Nishida contributed a modification
490	   to the fast recovery algorithm to account for the case in which
491	   flightsize is 0 when the TCP sender leaves fast recovery, and the
492	   TCP receiver uses delayed acknowledgments.  Alexander Zimmermann
493	   provided several suggestions to improve the clarity of the document.

495	11.  References

497	11.1.  Normative References

499	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
500	             Requirement Levels", BCP 14, RFC 2119, March 1997.

502	   [RFC5681] Allman, M., Paxson, V. and  E. Blanton, "TCP Congestion
503	             Control", RFC 5681, September 2009.

505	11.2.  Informative References

507	   [C98]     Cardwell, N., "delayed ACKs for retransmitted packets:
508	             ouch!".  November 1998,  Email to the tcpimpl mailing list,
509	             Message-ID
510	             "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.
511	             washington.edu",
512	             archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl".

514	   [FF96]    Fall, K. and S. Floyd, "Simulation-based Comparisons of
515	             Tahoe, Reno and SACK TCP", Computer Communication Review,
516	             July 1996.
517	             URL "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z".

519	   [F94]     Floyd, S., "TCP and Successive Fast Retransmits", Technical
520	             report, October 1994.  URL
521	             "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps".

523	   [GF04]    Gurtov, A. and S. Floyd, "Resolving Acknowledgment
524	             Ambiguity in non-SACK TCP", Next Generation Teletraffic and
525	             Wired/Wireless Advanced Networking (NEW2AN'04), February
526	             2004.  URL "http://www.cs.helsinki.fi/u/gurtov/papers/
527	             heuristics.html".

529	   [Gur03]   Gurtov, A., "[Tsvwg] resolving the problem of unnecessary
530	             fast retransmits in go-back-N", email to the tsvwg mailing
531	             list, message ID <3F25B467.9020609@cs.helsinki.fi>,
532	             July 28, 2003.  URL "http://www1.ietf.org/mail-archive/
533	             working-groups/tsvwg/current/msg04334.html".

535	   [Hen98]   Henderson, T., Re: NewReno and the 2001 Revision. September
536	             1998.  Email to the tcpimpl mailing list, Message ID
537	             "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS.
538	             Berkeley.EDU",
539	             archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl".

541	   [Hoe95]   Hoe, J., "Startup Dynamics of TCP's Congestion Control and
542	             Avoidance Schemes", Master's Thesis, MIT, 1995.

544	   [Hoe96]   Hoe, J., "Improving the Start-up Behavior of a Congestion
545	             Control Scheme for TCP", ACM SIGCOMM, August 1996.  URL
546	             "http://www.acm.org/sigcomm/sigcomm96/program.html".

548	   [LM97]    Lin, D. and R. Morris, "Dynamics of Random Early
549	             Detection", SIGCOMM 97, September 1997.  URL
550	             "http://www.acm.org/sigcomm/sigcomm97/program.html".

552	   [NS]      The Network Simulator (NS).
553	             URL "http://www.isi.edu/nsnam/ns/".

555	   [RFC1323] Jacobson, V., Braden, R. and D. Borman, "TCP Extensions for
556	             High Performance", RFC 1323, May 1992.

558	   [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to
559	             TCP's Fast Recovery Algorithm", RFC 2582, April 1999.

561	   [RFC2883] Floyd, S., J. Mahdavi, M. Mathis, and M. Podolsky, "The
562	             Selective Acknowledgment (SACK) Option for TCP, RFC 2883,
563	             July 2000.

565	   [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's
566	             Loss Recovery Using Limited Transmit", RFC 3042,
567	             January 2001.

569	   [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for
570	             TCP", RFC 3522, April 2003.

572	   [RFC3782] Floyd, S., T. Henderson, and A. Gurtov, "The NewReno
573	             Modification to TCP's Fast Recovery Algorithm", RFC 3782,
574	             April 2004.

576	Appendix A.  Additional Information

578	   Previous versions of this RFC ([RFC2582], [RFC3782]) contained
579	   additional informative material on the following subjects, and
580	   may be consulted by readers who may want more information about
581	   possible variants to the algorithm and who may want references
582	   to specific [NS] simulations that provide NewReno test cases.

584	   Section 4 of [RFC3782] discusses some alternative behaviors for
585	   resetting the retransmit timer after a partial acknowledgment.

587	   Section 5 of [RFC3782] discusses some alternative behaviors for
588	   performing retransmission after a partial acknowledgment.

590	   Section 6 of [RFC3782] describes more information about the
591	   motivation for the sender's state variable Recover.

593	   Section 9 of [RFC3782] introduces some NS simulation test
594	   suites for NewReno.  In addition, references to simulation
595	   results can be found throughout [RFC3782].

597	   Section 10 of [RFC3782] provides a comparison of Reno and
598	   NewReno TCP.

600	   Section 11 of [RFC3782] listed changes relative to [RFC2582].

602	Appendix B.  Changes Relative to RFC 3782

604	   In [RFC3782], the cwnd after Full ACK reception will be set to
605	   (1) min (ssthresh, FlightSize + SMSS) or (2) ssthresh.  However,
606	   there is a risk in the first option which results in performance
607	   degradation.  With the first option, if FlightSize is zero, the
608	   result will be 1 SMSS. This means TCP can transmit only 1 segment
609	   at this moment, which can cause delay in ACK transmission at receiver
610	   due to delayed ACK algorithm.

612	   The FlightSize on Full ACK reception can be zero in some situations.
613	   A typical example is where sending window size during fast recovery
614	   is small. In this case, the retransmitted packet and new data packets
615	   can be transmitted within a short interval.  If all these packets
616	   successfully arrive, the receiver may generate a Full ACK that
617	   acknowledges all outstanding data.  Even if window size is not small,
618	   loss of ACK packets or receive buffer shortage during fast recovery
619	   can also increase the possibility of falling into this situation.

621	   The proposed fix in this document, which sets cwnd to at least 2*SMSS
622	   if the implementation uses option 1 in the Full ACK case (Section 3.2
623	   step 3, option 1), ensures that the sender TCP transmits at least two
624	   segments on Full ACK reception.

626	   In addition, errata for RFC3782 (editorial clarification to Section 8
627	   of RFC2582, which is now Section 6 of this document) has been
628	   applied.

630	   The specification text (Section 3.2 herein) was rewritten to more
631	   closely track Section 3.2 of [RFC5681].

633	   Sections 4, 5, 9-11 of [RFC3782] were removed, and instead Appendix
634	   A of this document was added to back-reference this informative
635	   material.  A few references that have no citation in the main body
636	   of the draft have been removed.

638	Appendix C.  Document Revision History

640	   To be removed upon publication

642	   +----------+--------------------------------------------------+
643	   | Revision | Comments                                         |
644	   +----------+--------------------------------------------------+
645	   | draft-00 | RFC3782 errata applied, and changes applied from |
646	   |          | draft-nishida-newreno-modification-02            |
647	   +----------+--------------------------------------------------+
648	   | draft-01 | Non-normative sections moved to appendices,      |
649	   |          | editorial clarifications applied as suggested    |
650	   |          | by Alexander Zimmermann.                         |
651	   +----------+--------------------------------------------------+
652	   | draft-02 | Better align specification text with RFC5681.    |
653	   |          | Replace informative appendices by a new appendix |
654	   |          | that just provides back-references to earlier    |
655	   |          | NewReno RFCs.                                    |
656	   +----------+--------------------------------------------------+
657	   | draft-03 | Document refresh and fix id-nits                 |
658	   +----------+--------------------------------------------------+
659	   | draft-04 | Address editorial comments received from secdir  |
660	   |          | review (provided by Tom Yu).                     |
661	   +----------+--------------------------------------------------+
662	   | draft-05 | Address IESG review comments from David          |
663	   |          | Harrington, and Gen-ART review comments from     |
664	   |          | Ben Campbell.                                    |
665	   +----------+--------------------------------------------------+

667	Authors' Addresses

669	   Tom Henderson
670	   The Boeing Company

672	   EMail: thomas.r.henderson@boeing.com

674	   Sally Floyd
675	   International Computer Science Institute

677	   Phone: +1 (510) 666-2989
678	   EMail: floyd@acm.org
679	   URL: http://www.icir.org/floyd/

681	   Andrei Gurtov
682	   University of Oulu
683	   Centre for Wireless Communications CWC
684	   P.O. Box 4500
685	   FI-90014 University of Oulu
686	   Finland

688	   EMail: gurtov@ee.oulu.fi

690	   Yoshifumi Nishida
691	   WIDE Project
692	   Endo 5322
693	   Fujisawa, Kanagawa  252-8520
694	   Japan

696	   Email: nishida@wide.ad.jp