idnits 2.17.1 

draft-ietf-tsvwg-newreno-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 19
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 20 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The abstract seems to contain references ([RFC2018,RFC3517], [RFC2581],
     [FF96], [RFC2582], [Hoe95]), which it shouldn't.  Please replace those
     with straight textual mentions of the documents in question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 528: '...   retransmission of three packets [Gur03].  The TCP sender MAY use such...'
     RFC 2119 keyword, line 607: '...   [RFC2581] specifies that "Out-of-order data segments SHOULD be...'
     RFC 2119 keyword, line 762: '...two heuristics that the TCP sender MAY...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 2003) is 7461 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2988' is defined on line 807, but no explicit
     reference was found in the text

  == Unused Reference: 'Hen98' is defined on line 844, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782)

  ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298)

  -- Obsolete informational reference (is this intentional?): RFC 2001 (ref.
     'F98') (Obsoleted by RFC 2581)

  -- Obsolete informational reference (is this intentional?): RFC 1323
     (Obsoleted by RFC 7323)

  -- Obsolete informational reference (is this intentional?): RFC 3517
     (Obsoleted by RFC 6675)


     Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                 S. Floyd
3	INTERNET DRAFT                                                      ICSI
4	draft-ietf-tsvwg-newreno-02.txt                             T. Henderson
5	                                                                  Boeing
6	                                                               A. Gurtov
7	                                                             U. Helsinki
8	                                                           November 2003

10	       The NewReno Modification to TCP's Fast Recovery Algorithm

12	                          Status of this Memo

14	   This document is an Internet-Draft and is in full conformance with
15	   all provisions of Section 10 of RFC2026.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	Abstract

35	   RFC 2581 [RFC2581] documents the following four intertwined TCP
36	   congestion control algorithms: Slow Start, Congestion Avoidance, Fast
37	   Retransmit, and Fast Recovery.  RFC 2581 [RFC2581] explicitly allows
38	   certain modifications of these algorithms, including modifications
39	   that use the TCP Selective Acknowledgement (SACK) option
40	   [RFC2018,RFC3517], and modifications that respond to "partial
41	   acknowledgments" (ACKs which cover new data, but not all the data
42	   outstanding when loss was detected) in the absence of SACK.  The
43	   NewReno mechanism uses an algorithm for responding to partial
44	   acknowledgments that was first proposed by Janey Hoe in [Hoe95].

46	   RFC 2582 [RFC2582] specified the NewReno mechanisms as Experimental
47	   in 1999.  This document is a small revision of RFC 2582 intended to
48	   advance the NewReno mechanisms to Proposed Standard.  RFC 2581 notes
49	   that the Fast Retransmit/Fast Recovery algorithm specified in that
50	   document does not recover very efficiently from multiple losses in a
51	   single flight of packets, and that RFC 2582 contains one set of
52	   modifications to address this problem.

54	   NOTE TO THE RFC EDITOR:  PLEASE REMOVE THIS SECTION UPON PUBLICATION.

56	   Changes from draft-ietf-tsvwg-newreno-01.txt:

58	   * Some improvements in phrasing suggested by Mark Allman.

60	   Changes from draft-ietf-tsvwg-newreno-00.txt:

62	   * In Section 8, added a cautionary note about using the duplicate
63	   acknowledgment counter as a flag for whether Fast Recovery is in
64	   effect.

66	   * In Section 8, added a note about pulling along "recover" with
67	   "snd_una" when Fast Recovery is not in effect.

69	   * Added a discussion in Section 6 about heuristics for distinguishing
70	   between a retransmitted packet that was dropped, and three duplicate
71	   acknowledgements simply from the unnecessary retransmission of three
72	   packets.

74	   * Added more text and examples for comparing the Impatient and the
75	   Slow-but-Steady variants.

77	   * In Section 8, added a cautionary note saying that when the sender
78	   is not in Fast Retransmit, the sender should not use the Fast
79	   Recovery response to multiple duplicate acknowledgements.

81	   Changes from draft-floyd-newreno-00.txt:

83	   * In Section 8 on "Implementation issues for the data sender",
84	   mentioned alternate methods for limiting bursts when exiting Fast
85	   Recovery.

87	   * Changed draft from draft-floyd-newreno to draft-ietf-tsvwg-newreno

89	   Changes from RFC 2582:

91	   * Rephrasing and rearrangements of the text.

93	   * RFC 2582 described the Careful and Less Careful variants of
94	   NewReno, along with a default version that was neither Careful nor
95	   Less Careful, and recommended the Careful variant.  This document
96	   only specifies the Careful version.

98	   * RFC 2582 used two separate variables, "send_high" and "recover",
99	   and this document has merged them into a single variable "recover".

101	   * Added sections on "Comparisons between Reno and NewReno TCP", and
102	   on "Changes relative to RFC 2582".  The section on "Comparisons
103	   between Reno and NewReno TCP" includes a discussion of the one area
104	   where NewReno is known to perform worse than Reno or SACK, and that
105	   is in the response to reordering.

107	   * Moved all of the discussions of the Impatient and Slow-but-Steady
108	   variants to one place, and recommended the Impatient variant (as in
109	   the default version in RFC 2582).

111	   * Added a section on Implementation issues for the data sender,
112	   mentioning maxburst_.

114	   * Added a paragraph about differences between RFC 2582 and [FF96].

116	   END OF NOTE TO RFC EDITOR

118	1. Introduction

120	   For the typical implementation of the TCP Fast Recovery algorithm
121	   described in [RFC2581] (first implemented in the 1990 BSD Reno
122	   release, and referred to as the Reno algorithm in [FF96]), the TCP
123	   data sender only retransmits a packet after a retransmit timeout has
124	   occurred, or after three duplicate acknowledgements have arrived
125	   triggering the Fast Retransmit algorithm.  A single retransmit
126	   timeout might result in the retransmission of several data packets,
127	   but each invocation of the Fast Retransmit algorithm in RFC 2581
128	   leads to the retransmission of only a single data packet.

130	   Problems can arise, therefore, when multiple packets have been
131	   dropped from a single window of data and the Fast Retransmit and Fast
132	   Recovery algorithms are invoked.  In this case, if the SACK option is
133	   available, the TCP sender has the information to make intelligent
134	   decisions about which packets to retransmit and which packets not to
135	   retransmit during Fast Recovery.  This document applies only for TCP
136	   connections that are unable to use the TCP Selective Acknowledgement
137	   (SACK) option, either because the option is not locally supported or
138	   because the TCP peer did not indicate a willingness to use SACK.

140	   In the absence of SACK, there is little information available to the
141	   TCP sender in making retransmission decisions during Fast Recovery.

143	   From the three duplicate acknowledgements, the sender infers a packet
144	   loss, and retransmits the indicated packet.  After this, the data
145	   sender could receive additional duplicate acknowledgements, as the
146	   data receiver acknowledges additional data packets that were already
147	   in flight when the sender entered Fast Retransmit.

149	   In the case of multiple packets dropped from a single window of data,
150	   the first new information available to the sender comes when the
151	   sender receives an acknowledgement for the retransmitted packet (that
152	   is, the packet retransmitted when Fast Retransmit was first entered).
153	   If there had been a single packet drop and no reordering, then the
154	   acknowledgement for this packet will acknowledge all of the packets
155	   transmitted before Fast Retransmit was entered.  However, when there
156	   were multiple packet drops, then the acknowledgement for the
157	   retransmitted packet will acknowledge some but not all of the packets
158	   transmitted before the Fast Retransmit.  We call this acknowledgement
159	   a partial acknowledgment.

161	   Along with several other suggestions, [Hoe95] suggested that during
162	   Fast Recovery the TCP data sender responds to a partial
163	   acknowledgment by inferring that the next in-sequence packet has been
164	   lost, and retransmitting that packet.  This document describes a
165	   modification to the Fast Recovery algorithm in RFC 2581 that
166	   incorporates a response to partial acknowledgements received during
167	   Fast Recovery.  We call this modified Fast Recovery algorithm
168	   NewReno, because it is a slight but significant variation of the
169	   basic Reno algorithm in RFC 2581.  This document does not discuss the
170	   other suggestions in [Hoe95] and [Hoe96], such as a change to the
171	   ssthresh parameter during Slow-Start, or the proposal to send a new
172	   packet for every two duplicate acknowledgements during Fast Recovery.
173	   The version of NewReno in this document also draws on other
174	   discussions of NewReno in the literature [LM97].

176	   We do not claim that the NewReno version of Fast Recovery described
177	   here is an optimal modification of Fast Recovery for responding to
178	   partial acknowledgements, for TCP connections that are unable to use
179	   SACK.  Based on our experiences with the NewReno modification in the
180	   NS simulator [NS] and with numerous implementations of NewReno, we
181	   believe that this modification improves the performance of the Fast
182	   Retransmit and Fast Recovery algorithms in a wide variety of
183	   scenarios.

185	2. Terminology and Definitions

187	   In this document, the key words "MUST", "MUST NOT", "REQUIRED",
188	   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
189	   and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119
190	   and indicate requirement levels for compliant TCP implementations
191	   implementing the NewReno Fast Retransmit and Fast Recovery algorithms
192	   described in this document.

194	   This document assumes that the reader is familiar with the terms
195	   SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and
196	   FLIGHT SIZE (FlightSize) defined in [RFC2581].  FLIGHT SIZE is
197	   defined as in [RFC2581] as follows:

199	      FLIGHT SIZE:
200	         The amount of data that has been sent but not yet acknowledged.

202	3. The Fast Retransmit and Fast Recovery Algorithms in NewReno

204	   The standard implementation of the Fast Retransmit and Fast Recovery
205	   algorithms is given in [RFC2581].  This section specifies the basic
206	   NewReno algorithm.  Sections 4 through 6 describe some optional
207	   variants, and the motivations behind them, that an implementor may
208	   want to consider when tuning performance for certain network
209	   scenarios.  Sections 7 and 8 provide some guidance to implementors
210	   based on experience with NewReno implementations.

212	   The NewReno modification concerns the Fast Recovery procedure that
213	   begins when three duplicate ACKs are received and ends when either a
214	   retransmission timeout occurs or an ACK arrives that acknowledges all
215	   of the data up to and including the data that was outstanding when
216	   the Fast Recovery procedure began.

218	   The NewReno algorithm specified in this document differs from the
219	   implementation in [RFC2581] in the introduction of the variable
220	   "recover" in step 1, in the response to a partial or new
221	   acknowledgement in step 5, and in modifications to step 1 and the
222	   addition of step 6 for avoiding multiple Fast Retransmits caused by
223	   the retransmission of packets already received by the receiver.

225	   The algorithm specified in this document uses a variable "recover",
226	   whose initial value is the initial send sequence number.

228	   1)  Three duplicate ACKs:
229	       When the third duplicate ACK is received and the sender is not
230	       already in the Fast Recovery procedure, check to see if the
231	       Cumulative Acknowledgement field covers more than "recover".
232	       If so, go to Step 1A.  Otherwise, go to Step 1B.

234	   1A) Invoking Fast Retransmit:
235	       If so, then set ssthresh to no more than the value given in
236	       equation 1 below.  (This is equation 3 from [RFC2581]).

238	         ssthresh = max (FlightSize / 2, 2*SMSS)           (1)

240	       In addition, record the highest sequence number transmitted in
241	       the variable "recover", and go to Step 2.

243	   1B) Not invoking Fast Retransmit:
244	       Do not enter the Fast Retransmit and Fast Recovery procedure.
245	       In particular, do not change ssthresh, do not go to Step 2 to
246	       retransmit the "lost" segment, and do not execute Step 3 upon
247	       subsequent duplicate ACKs.

249	   2)  Entering Fast Retransmit:
250	       Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS.
251	       This artificially "inflates" the congestion window by the number
252	       of segments (three) that have left the network and which the
253	       receiver has buffered.

255	   3)  Fast Recovery:
256	       For each additional duplicate ACK received while in Fast
257	       Recovery, increment cwnd by SMSS.  This artificially inflates
258	       the congestion window in order to reflect the additional segment
259	       that has left the network.

261	   4)  Fast Recovery, continued:
262	       Transmit a segment, if allowed by the new value of cwnd and the
263	       receiver's advertised window.

265	   5)  When an ACK arrives that acknowledges new data, this ACK could be
266	       the acknowledgment elicited by the retransmission from step 2, or
267	       elicited by a later retransmission.

269	       Full acknowledgements:
270	       If this ACK acknowledges all of the data up to and including
271	       "recover", then the ACK acknowledges all the intermediate
272	       segments sent between the original transmission of the lost
273	       segment and the receipt of the third duplicate ACK.  Set cwnd to
274	       either (1) min (ssthresh, FlightSize + SMSS); or (2) ssthresh,
275	       where ssthresh is the value set in step 1; this is termed
276	       "deflating" the window.  (We note that "FlightSize" in step 1
277	       referred to the amount of data outstanding in step 1, when Fast
278	       Recovery was entered, while "FlightSize" in step 5 refers to the
279	       amount of data outstanding in step 5, when Fast Recovery is
280	       exited.) If the second option is selected, the implementation
281	       is encouraged to take measures to avoid a possible burst of
282	       data, in case the amount of data outstanding in the network was
283	       much less than the new congestion window allows.  A simple
284	       mechanism is to limit the number of data packets that can be
285	       sent in response to a single acknowledgement.  (This is known
286	       as "maxburst_" in the NS simulator).  Exit the Fast Recovery
287	       procedure.

289	       Partial acknowledgements:
290	       If this ACK does *not* acknowledge all of the data up to and
291	       including "recover", then this is a partial ACK.  In this case,
292	       retransmit the first unacknowledged segment.  Deflate the
293	       congestion window by the amount of new data acknowledged by the
294	       cumulative acknowledgement field.  If the partial ACK
295	       acknowledges at least one SMSS of new data, then add back SMSS
296	       bytes to the congestion window.  As in Step 3, this artificially
297	       inflates the congestion window in order to reflect the additional
298	       segment that has left the network.  Send a new segment if
299	       permitted by the new value of cwnd.  This "partial window
300	       deflation" attempts to ensure that, when Fast Recovery eventually
301	       ends, approximately ssthresh amount of data will be outstanding
302	       in the network.  Do not exit the Fast Recovery procedure (i.e.,
303	       if any duplicate ACKs subsequently arrive, execute Steps 3 and
304	       4 above).

306	       For the first partial ACK that arrives during Fast Recovery, also
307	       reset the retransmit timer.  Timer management is discussed in
308	       more detail in Section 4.

310	   6)  Retransmit timeouts:
311	       After a retransmit timeout, record the highest sequence number
312	       transmitted in the variable "recover" and exit the Fast
313	       Recovery procedure if applicable.

315	   Step 1 specifies a check that the Cumulative Acknowledgement field
316	   covers more than "recover".  Because the acknowledgement field
317	   contains the sequence number that the sender next expects to receive,
318	   the acknowledgement "ack_number" covers more than "recover" when:

320	     ack_number - 1 > recover.

322	   Note that in Step 5, the congestion window is deflated after a
323	   partial acknowledgement is received.  The congestion window was
324	   likely to have been inflated considerably when the partial
325	   acknowledgement was received.  In addition, depending on the original
326	   pattern of packet losses, the partial acknowledgement might
327	   acknowledge nearly a window of data.  In this case, if the congestion
328	   window was not deflated, the data sender might be able to send nearly
329	   a window of data back-to-back.

331	   This document does not specify the sender's response to duplicate
332	   ACKs when the Fast Retransmit/Fast Recovery algorithm is not invoked.
333	   This is addressed in other documents, such as those describing the
334	   Limited Transmit procedure [RFC3042].  This document also does not
335	   address issues of adjusting the duplicate acknowledgement threshold,
336	   but assumes the threshold specified in the IETF standards; the
337	   current standard is RFC 2581, which specifies a threshold of three
338	   duplicate acknowledgements.

340	   As a final note, we would observe that in the absence of the SACK
341	   option, the data sender is working from limited information.  When
342	   the issue of recovery from multiple dropped packets from a single
343	   window of data is of particular importance, the best alternative
344	   would be to use the SACK option.

346	4. Resetting the Retransmit Timer in Response to Partial
347	Acknowledgements.

349	   One possible variant to the response to partial acknowledgements
350	   specified in Section 3 concerns when to reset the retransmit timer
351	   after a partial acknowledgement.  The algorithm in Section 3, Step 5,
352	   resets the retransmit timer only after the first partial ACK.  In
353	   this case, if a large number of packets were dropped from a window of
354	   data, the TCP data sender's retransmit timer will ultimately expire,
355	   and the TCP data sender will invoke Slow-Start.  (This is illustrated
356	   on page 12 of [F98].)  We call this the Impatient variant of NewReno.
357	   We note that the Impatient variant in Section 3 doesn't follow the
358	   recommended algorithm in RFC 2988 of restarting the retransmit timer
359	   after every packet transmission or retransmission [RFC2988, Step
360	   5.1].

362	   In contrast, the NewReno simulations in [FF96] illustrate the
363	   algorithm described above with the modification that the retransmit
364	   timer is reset after each partial acknowledgement.  We call this the
365	   Slow-but-Steady variant of NewReno.  In this case, for a window with
366	   a large number of packet drops, the TCP data sender retransmits at
367	   most one packet per roundtrip time.  (This behavior is illustrated in
368	   the New-Reno TCP simulation of Figure 5 in [FF96], and on page 11 of
369	   [F98].

371	   When N packets have been dropped from a window of data for a large
372	   value of N, the Slow-but-Steady variant can remain in Fast Recovery
373	   for N round-trip times, retransmitting one more dropped packet each
374	   round-trip time; for these scenarios, the Impatient variant gives a
375	   faster recovery and better performance.  The tests "ns test-suite-
376	   newreno.tcl impatient1" and "ns test-suite-newreno.tcl slow1" in the
377	   NS simulator illustrate such a scenario, where the Impatient variant
378	   performs better than the Slow-but-Steady variant.  The Impatient
379	   variant can be particularly important for TCP connections with large
380	   congestion windows, as illustrated by the tests "ns test-suite-
381	   newreno.tcl impatient4" and "ns test-suite-newreno.tcl slow4" in the
382	   NS simulator.

384	   One can also construct scenarios where the Slow-but-Steady variant
385	   gives better performance than the Impatient variant.  As an example,
386	   this occurs then only a small number of packets are dropped, the RTO
387	   is sufficiently small that the retransmit timer expires, and
388	   performance would have been better without a retransmit timeout.  The
389	   tests "ns test-suite-newreno.tcl impatient2" and "ns test-suite-
390	   newreno.tcl slow2" in the NS simulator illustrate such a scenario.

392	   The Slow-but-Steady variant can also achieve higher goodput than the
393	   Impatient variant, by avoiding unnecessary retransmissions.  This
394	   could be of special interest for cellular links, where every
395	   transmission costs battery power and money.  The tests "ns test-
396	   suite-newreno.tcl impatient3" and "ns test-suite-newreno.tcl slow3"
397	   in the NS simulator illustrate such a scenario.  The Slow-but-Steady
398	   variant can also be more robust to delay variation in the network,
399	   where a delay spike might force the Impatient variant into a timeout
400	   and go-back-N recovery.

402	   Neither of the two variants discussed above are optimal.  Our
403	   recommendation is for the Impatient variant, as specified in Section
404	   3 of this document, because of the poor performance of the Slow-but-
405	   Steady variant for TCP connections with large congestion windows.

407	   One possibility for a more optimal algorithm would be one that
408	   recovered from multiple packet drops as quickly as does slow-start,
409	   while resetting the retransmit timers after each partial
410	   acknowledgement, as described in the section below.  We note,
411	   however, that there is a limitation to the potential performance in
412	   this case in the absence of the SACK option.

414	5. Retransmissions after a Partial Acknowledgement

416	   One possible variant to the response to partial acknowledgements
417	   specified in Section 3 would be to retransmit more than one packet
418	   after each partial acknowledgement, and to reset the retransmit timer
419	   after each retransmission.  The algorithm specified in Section 3
420	   retransmits a single packet after each partial acknowledgement.  This
421	   is the most conservative alternative, in that it is the least likely
422	   to result in an unnecessarily-retransmitted packet.  A variant that
423	   would recover faster from a window with many packet drops would be to
424	   effectively Slow-Start, retransmitting two packets after each partial
425	   acknowledgement.  Such an approach would take less than N roundtrip
426	   times to recover from N losses [Hoe96].  However, in the absence of
427	   SACK, recovering as quickly as slow-start introduces the likelihood
428	   of unnecessarily retransmitting packets, and this could significantly
429	   complicate the recovery mechanisms.

431	   We note that the response to partial acknowledgements specified in
432	   Section 3 of this document and in RFC 2582 differs from the response
433	   in [FF96], even though both approaches only retransmit one packet in
434	   response to a partial acknowledgement.  Step 5 of Section 3 specifies
435	   that the TCP sender responds to a partial ACK by deflating the
436	   congestion window by the amount of new data acknowledged, then adding
437	   back SMSS bytes if the partial ACK acknowledges at least SMSS bytes
438	   of new data, and sending a new segment if permitted by the new value
439	   of cwnd.  Thus, only one previously-sent packet is retransmitted in
440	   response to each partial acknowledgement, but additional new packets
441	   might be transmitted as well, depending on the amount of new data
442	   acknowledged by the partial acknowledgement.  In contrast, the
443	   variant of NewReno illustrated in [FF96] simply set the congestion
444	   window to ssthresh when a partial acknowledgement was received.  The
445	   approach in [FF96] is more conservative, and does not attempt to
446	   accurately track the actual number of outstanding packets after a
447	   partial acknowledgement is received.  While either of these
448	   approaches gives acceptable performance, the variant specified in
449	   Section 3 recovers more smoothly when multiple packets are dropped
450	   from a window of data.  (The [FF96] behavior can be seen in the NS
451	   simulator by setting the variable "partial_window_deflation_" for
452	   "Agent/TCP/Newreno" to 0, and the behavior specified in Section 3 is
453	   achieved by setting "partial_window_deflation_" to 1.)

455	6. Avoiding Multiple Fast Retransmits

457	   This section describes the motivation for the sender's state variable
458	   "recover", and discusses possible heuristics for distinguishing
459	   between a retransmitted packet that was dropped, and three duplicate
460	   acknowledgements simply from the unnecessary retransmission of three
461	   packets.

463	   In the absence of the SACK option or timestamps, a duplicate
464	   acknowledgement carries no information to identify the data packet or
465	   packets at the TCP data receiver that triggered that duplicate
466	   acknowledgement.  In this case, the TCP data sender is unable to
467	   distinguish between a duplicate acknowledgement that results from a
468	   lost or delayed data packet, and a duplicate acknowledgement that
469	   results from the sender's unnecessary retransmission of a data packet
470	   that had already been received at the TCP data receiver.  Because of
471	   this, with the Retransmit and Fast Recovery algorithms in Reno TCP,
472	   multiple segment losses from a single window of data can sometimes
473	   result in unnecessary multiple Fast Retransmits (and multiple
474	   reductions of the congestion window) [F94].

476	   With the Fast Retransmit and Fast Recovery algorithms in Reno TCP,
477	   the performance problems caused by multiple Fast Retransmits are
478	   relatively minor compared to the potential problems with Tahoe TCP,
479	   which does not implement Fast Recovery.  Nevertheless, unnecessary
480	   Fast Retransmits can occur with Reno TCP unless some explicit
481	   mechanism is added to avoid this, such as the use of the "recover"
482	   variable.  (This modification is called "bugfix" in [F98], and is
483	   illustrated on pages 7 and 9 of that document.  Unnecessary Fast
484	   Retransmits for Reno without "bugfix" is illustrated on page 6 of
485	   [F98].)

487	   Section 3 of RFC 2582 defined a default variant of NewReno TCP that
488	   did not use the variable "recover", and did not check if duplicate
489	   ACKs cover the variable "recover" before invoking Fast Retransmit.
490	   With this default variant from RFC 2582, the problem of multiple Fast
491	   Retransmits from a single window of data can occur after a Retransmit
492	   Timeout (as in page 8 of [F98]) or in scenarios with reordering (as
493	   in the validation test "./test-all-newreno newreno5_noBF" in
494	   directory "tcl/test" of the NS simulator.  This gives performance
495	   similar to that on page 8 of [F03].)  RFC 2582 also defined Careful
496	   and Less Careful variants of the NewReno algorithm, and recommended
497	   the Careful variant.

499	   The algorithm specified in Section 3 of this document corresponds to
500	   the Careful variant of NewReno TCP from RFC 2582, and eliminates the
501	   problem of multiple Fast Retransmits.  This algorithm uses the
502	   variable "recover", whose initial value is the initial send sequence
503	   number.  After each retransmit timeout, the highest sequence number
504	   transmitted so far is recorded in the variable "recover".

506	   If, after a retransmit timeout, the TCP data sender retransmits three
507	   consecutive packets that have already been received by the data
508	   receiver, then the TCP data sender will receive three duplicate
509	   acknowledgements that do not cover more than "recover".  In this
510	   case, the duplicate acknowledgements are not an indication of a new
511	   instance of congestion.  They are simply an indication that the
512	   sender has unnecessarily retransmitted at least three packets.

514	   However, when a retransmitted packet is itself dropped, the sender
515	   can also receive three duplicate acknowledgements that do not cover
516	   more than "recover", and in this case the sender would have been
517	   better off if it had initiated Fast Retransmit.  For a TCP that
518	   implements the algorithm specified in Section 3 of this document, the
519	   sender does not infer a packet drop from duplicate acknowledgements
520	   in this scenario.  As always, the retransmit timer is the backup
521	   mechanism for inferring packet loss in this case.

523	   There are several heuristics, based on timestamps or on the amount of
524	   advancement of the cumulative acknowledgement field, that allow the
525	   sender to distinguish in some cases between three duplicate
526	   acknowledgements following a retransmitted packet that was dropped,
527	   and three duplicate acknowledgements simply from the unnecessary
528	   retransmission of three packets [Gur03].  The TCP sender MAY use such
529	   a heuristic to decide to invoke a Fast Retransmit in some cases even
530	   when the three duplicate acknowledgements do not cover more than
531	   "recover".

533	   For example, when three duplicate acknowledgements are caused by the
534	   unnecessary retransmission of three packets, this is likely to be
535	   accompanied by the cumulative acknowledgement field advancing by at
536	   least four segments.  Similarly, a heuristic based on timestamps uses
537	   the fact that when there is a hole in the sequence space, the
538	   timestamp echoed in the duplicate acknowledgement is the timestamp of
539	   the most recent data packet that advanced the cumulative
540	   acknowledgement field [RFC1323].  If timestamps are used, and the
541	   sender stores the timestamp of the last acknowledged segment, then
542	   the timestamp echoed by duplicate acknowledgements can be used to
543	   distinguish between a retransmitted packet that was dropped, and
544	   three duplicate acknowledgements simply from the unnecessary
545	   retransmission of three packets.  The heuristics are illustrated in
546	   the NS simulator in the validation test "./test-all-newreno".

548	6.1.  ACK Heuristic.

550	   If the ACK-based heuristic is used, then following the advancement of
551	   the cumulative acknowledgement field, the sender stores the value of
552	   previous cumulative acknowledgement as prev_highest_ack and stores
553	   the latest cumulative ACK as highest_ack.  In addition, the following
554	   step is performed if Step 1 in Section 3 fails, before proceeding to
555	   Step 1B.

557	   1*)  If the Cumulative Acknowledgement field didn't cover more than
558	        "recover", check to see if the congestion window is greater
559	        than SMSS bytes and the difference between highest_ack and
560	        prev_highest_ack is at most 4*SMSS bytes.  If true, duplicate
561	        ACKs indicate a lost segment (proceed to Step 1A in Section
562	        3). Otherwise, duplicate ACKs likely result from unnecessary
563	        retransmissions (proceed to Step 1B in Section 3).

565	   The congestion window check serves to protect against fast retransmit
566	   immediately after a retransmit timeout, similar to the
567	   "exitFastRetrans_" variable in NS. Examples of applying the ACK
568	   heuristic are in validation tests "./test-all-newreno
569	   newreno_rto_loss_ack" and "./test-all-newreno newreno_rto_dup_ack" in
570	   directory "tcl/test" of the NS simulator.

572	   If several ACKs are lost, the sender can see a jump in the cumulative
573	   ACK of more than three segments and the heuristic can fail. A
574	   validation test for this scenario is "./test-all-newreno
575	   newreno_rto_loss_ackf".  RFC 2581 recommends that a receiver should
576	   send duplicate ACKs for every out-of-order data packet, such as a
577	   data packet received during Fast Recovery.  The ACK heuristic is more
578	   likely to fail if the receiver does not follow this advice, because
579	   then a smaller number of ACK losses are needed to produce a
580	   sufficient jump in the cumulative ACK.

582	6.2.  Timestamp Heuristic.

584	   If this heuristic is used, the sender stores the timestamp of the
585	   last acknowledged segment.  In addition, the second paragraph of step
586	   1 in Section 3 is replaced as follows:

588	   1**) If the Cumulative Acknowledgement field didn't cover more than
589	        "recover", check to see if the echoed timestamp equals the
590	        stored timestamp. If true, duplicate ACKs indicate a lost
591	        segment (proceed to Step 1A in Section 3). Otherwise, duplicate
592	        ACKs likely result from unnecessary retransmissions (proceed
593	        to Step 1B in Section 3).

595	   Examples of applying the timestamp heuristic are in validation tests
596	   "./test-all-newreno newreno_rto_loss_tsh" and "./test-all-newreno
597	   newreno_rto_dup_tsh". The timestamp heuristic works correctly both
598	   when the receiver echoes timestamps as specified by [RFC1323] or by
599	   its revision attempts. However, if the receiver arbitrarily echos
600	   timestamps, the heuristic can fail.  The heuristic can also fail if a
601	   timeout was spurious and returning ACKs are not from retransmitted
602	   segments. This can be prevented by detection algorithms such as
603	   [RFC3522].

605	7. Implementation Issues for the Data Receiver

607	   [RFC2581] specifies that "Out-of-order data segments SHOULD be
608	   acknowledged immediately, in order to accelerate loss recovery."
609	   Neal Cardwell has noted that some data receivers do not send an
610	   immediate acknowledgement when they send a partial acknowledgment,
611	   but instead wait first for their delayed acknowledgement timer to
612	   expire [C98].  As [C98] notes, this severely limits the potential
613	   benefit from NewReno by delaying the receipt of the partial
614	   acknowledgement at the data sender.  Echoing RFC 2581, our
615	   recommendation is that the data receiver send an immediate
616	   acknowledgement for an out-of-order segment, even when that out-of-
617	   order segment fills a hole in the buffer.

619	8. Implementation Issues for the Data Sender

621	   In Section 3, Step 5 above, it is noted that implementations should
622	   take measures to avoid a possible burst of data when leaving Fast
623	   Recovery, in case the amount of new data that the sender is eligible
624	   to send due to the new value of the congestion window is large.  This
625	   can arise during NewReno when ACKs are lost or treated as pure window
626	   updates, thereby causing the sender to underestimate the number of
627	   new segments that can be sent during the recovery procedure.
628	   Specifically, bursts can occur when the FlightSize is much less than
629	   the new congestion window when exiting from Fast Recovery.  One
630	   simple mechanism to avoid a burst of data when leaving Fast Recovery
631	   is to limit the number of data packets that can be sent in response
632	   to a single acknowledgment.  (This is known as "maxburst_" in the ns
633	   simulator.)  Other possible mechanisms for avoiding bursts include
634	   rate-based pacing, or setting the slow-start threshold to the
635	   resultant congestion window and then resetting the congestion window
636	   to FlightSize.  A recommendation on the general mechanism to avoid
637	   excessively bursty sending patterns is outside the scope of this
638	   document.

640	   An implementation may want to use a separate flag to record whether
641	   or not it is presently in the Fast Recovery procedure.  The use of
642	   the value of the duplicate acknowledgment counter as such a flag is
643	   not reliable because it can be reset upon window updates and out-of-
644	   order acknowledgments.

646	   When not in Fast Recovery, the value of the state variable "recover"
647	   should be pulled along with the value of the state variable for
648	   acknowledgments (typically, "snd_una") so that, when large amounts of
649	   data has been sent and acked, the sequence space does not wrap and
650	   falsely indicate that Fast Recovery should not be entered (Section 3,
651	   step 1, last paragraph).

653	   It is important for the sender to respond correctly to duplicate ACKs
654	   received when the sender is no longer in Fast Recovery (e.g., because
655	   of a Retransmit Timeout).  The Limited Transmit procedure [RFC3042]
656	   describes possible responses to the first and second duplicate
657	   acknowledgements.  When three or more duplicate acknowledgements are
658	   received, the Cumulative Acknowledgement field doesn't cover more
659	   than "recover", and a new Fast Recovery is not invoked, it is
660	   important that the sender not execute the Fast Recovery steps (3) and
661	   (4) in Section 3.  Otherwise, the sender could end up in a chain of
662	   spurious timeouts.  We mention this only because several NewReno
663	   implementations had this bug, including the implementation in the NS
664	   simulator.  (This bug in the NS simulator was fixed in July 2003,
665	   with the variable "exitFastRetrans_".)

667	9. Simulations

669	   Simulations with NewReno are illustrated with the validation test
670	   "tcl/test/test-all-newreno" in the NS simulator.  The command
671	   "../../ns test-suite-newreno.tcl reno" shows a simulation with Reno
672	   TCP, illustrating the data sender's lack of response to a partial
673	   acknowledgement.  In contrast, the command "../../ns test-suite-
674	   newreno.tcl newreno_B" shows a simulation with the same scenario
675	   using the NewReno algorithms described in this paper.

677	10. Comparisons between Reno and NewReno TCP

679	   As we stated in the introduction, we believe that the NewReno
680	   modification described in this document improves the performance of
681	   the Fast Retransmit and Fast Recovery algorithms of Reno TCP in a
682	   wide variety of scenarios.  This has been discussed in some depth in
683	   [FF96], which illustrates Reno TCP's poor performance when multiple
684	   packets are dropped from a window of data and also illustrates
685	   NewReno TCP's good performance in that scenario.

687	   We do, however, know of one scenario where Reno TCP gives better
688	   performance than NewReno TCP, that we describe here for the sake of
689	   completeness.  Consider a scenario with no packet loss, but with
690	   sufficient reordering that the TCP sender receives three duplicate
691	   acknowledgements.  This will trigger the Fast Retransmit and Fast
692	   Recovery algorithms.  With Reno TCP or with Sack TCP, this will
693	   result in the unnecessary retransmission of a single packet, combined
694	   with a halving of the congestion window (shown on pages 4 and 6 of
695	   [F03]).  With NewReno TCP, however, this reordering will also result
696	   in the unnecessary retransmission of an entire window of data (shown
697	   on page 5 of [F03]).

699	   While Reno TCP performs better than NewReno TCP in the presence of
700	   reordering, NewReno's superior performance in the presence of
701	   multiple packet drops generally outweighs its less optimal
702	   performance in the presence of reordering.  (Sack TCP is the
703	   preferred solution, with good performance in both scenarios.) This
704	   document recommends the Fast Retransmit and Fast Recovery algorithms
705	   of NewReno TCP instead of those of Reno TCP for those TCP connections
706	   that do not support SACK.  We would also note that NewReno's Fast
707	   Retransmit and Fast Recovery mechanisms are widely deployed in TCP
708	   implementations in the Internet today, as documented in [PF01].  For
709	   example, tests of TCP implementations in several thousand web servers
710	   in 2001 showed that for those TCP connections where the web browser
711	   was not SACK-capable, more web servers used the Fast Retransmit and
712	   Fast Recovery algorithms of NewReno than those of Reno or Tahoe TCP
713	   [PF01].

715	11. Changes Relative to RFC 2582

717	   The purpose of this document is to advance the NewReno's Fast
718	   Retransmit and Fast Recovery algorithms in RFC 2582 to Proposed
719	   Standard.

721	   The main change in this document relative to RFC 2582 is to specify
722	   the Careful variant of NewReno's Fast Retransmit and Fast Recovery
723	   algorithms.  The base algorithm described in RFC 2582 did not attempt
724	   to avoid unnecessary multiple Fast Retransmits that can occur after a
725	   timeout (described in more detail in the section above).  However,
726	   RFC 2582 also defined "Careful" and "Less Careful" variants that
727	   avoid these unnecessary Fast Retransmits, and recommended the Careful
728	   variant.  This document specifies the previously-named "Careful"
729	   variant as the basic version of NewReno.  As described below, this
730	   algorithm uses a variable "recover", whose initial value is the send
731	   sequence number.

733	   The algorithm specified in Section 3 checks whether the
734	   acknowledgement field of a partial acknowledgement covers *more* than
735	   "recover".  Another possible variant would be to require simply that
736	   the acknowledgement field *cover* "recover" before initiating another
737	   Fast Retransmit.  We called this the Less Careful variant in RFC
738	   2582.

740	   There are two separate scenarios in which the TCP sender could
741	   receive three duplicate acknowledgements acknowledging "recover" but
742	   no more than "recover".  One scenario would be that the data sender
743	   transmitted four packets with sequence numbers higher than "recover",
744	   that the first packet was dropped in the network, and the following
745	   three packets triggered three duplicate acknowledgements
746	   acknowledging "recover".  The second scenario would be that the
747	   sender unnecessarily retransmitted three packets below "recover", and
748	   that these three packets triggered three duplicate acknowledgements
749	   acknowledging "recover".  In the absence of SACK, the TCP sender in
750	   unable to distinguish between these two scenarios.

752	   For the Careful variant of Fast Retransmit, the data sender would
753	   have to wait for a retransmit timeout in the first scenario, but
754	   would not have an unnecessary Fast Retransmit in the second scenario.
755	   For the Less Careful variant to Fast Retransmit, the data sender
756	   would Fast Retransmit as desired in the first scenario, and would
757	   unnecessarily Fast Retransmit in the second scenario.  This document
758	   only specifies the Careful variant in Section 3.  Unnecessary Fast
759	   Retransmits with the Less Careful variant in scenarios with
760	   reordering are illustrated in page 8 of [F03].

762	   The document also specifies two heuristics that the TCP sender MAY
763	   use to decide to invoke Fast Retransmit even when the three duplicate
764	   acknowledgements do not cover more than "recover".  These heuristics,
765	   an ACK-based heuristic and a timestamp heuristic, are described in
766	   Sections 6.1 and 6.2 respectively.

768	12. Conclusions

770	   This document specifies the NewReno Fast Retransmit and Fast Recovery
771	   algorithms for TCP.  This NewReno modification to TCP can be
772	   important even for TCP implementations that support the SACK option,
773	   because the SACK option can only be used for TCP connections when
774	   both TCP end-nodes support the SACK option.  NewReno performs better
775	   than Reno (RFC 2581) in a number of scenarios discussed herein.

777	   A number of options to the basic algorithm presented in Section 3 are
778	   also described.  These include the handling of the retransmission
779	   timer (Section 4), the response to partial acknowledgments (Section
780	   5), and the value of the congestion window when leaving Fast Recovery
781	   (section 3, step 5).  Our belief is that the differences between
782	   these variants of NewReno are small compared to the differences
783	   between Reno and NewReno.  That is, the important thing is to
784	   implement NewReno instead of Reno, for a TCP connection without SACK;
785	   it is less important exactly which of the variants of NewReno is
786	   implemented.

788	13. Acknowledgements

790	   Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu,
791	   Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed
792	   feedback on this document or on its precursor RFC 2582.

794	14. References

796	   Normative References

798	   [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective
799	   Acknowledgement Options", RFC 2018, October 1996.

801	   [RFC2581] W. Stevens, M. Allman, and V. Paxson, "TCP Congestion
802	   Control", RFC 2581, April 1999.

804	   [RFC2582] S. Floyd and T. Henderson, The NewReno Modification to
805	   TCP's Fast Recovery Algorithm, RFC 2582, April 1999.

807	   [RFC2988] V. Paxson and M. Allman, Computing TCP's Retransmission
808	   Timer, RFC 2988, November 2000.

810	   [RFC3042] M. Allman, H. Balakrishnan, and S. Floyd, Enhancing TCP's
811	   Loss Recovery Using Limited Transmit, RFC 3042, January 2001.

813	   Informative References

815	   [C98] N. Cardwell, "delayed ACKs for retransmitted packets: ouch!".
816	   November 1998,  Email to the tcpimpl mailing list, Message-ID
817	   "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu",
818	   archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl".

820	   [F98] S. Floyd, Revisions to RFC 2001, "Presentation to the TCPIMPL
821	   Working Group", August 1998.  URLs "ftp://ftp.ee.lbl.gov/talks/sf-
822	   tcpimpl-aug98.ps" and "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-
823	   aug98.pdf".

825	   [F03] S. Floyd, "Moving NewReno from Experimental to Proposed
826	   Standard?  Presentation to the TSVWG Working Group", March 2003.
827	   URLs "http://www.icir.org/floyd/talks/newreno-Mar03.ps" and
828	   "http://www.icir.org/floyd/talks/newreno-Mar03.pdf".

830	   [FF96] K. Fall and S. Floyd, "Simulation-based Comparisons of Tahoe,
831	   Reno and SACK TCP", Computer Communication Review, July 1996.  URL
832	   "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z".

834	   [F94] S. Floyd, "TCP and Successive Fast Retransmits", Technical
835	   report, October 1994.  URL
836	   "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps".

838	   [Gur03] A. Gurtov, "[Tsvwg] resolving the problem of unnecessary fast
839	   retransmits in go-back-N", email to the tsvwg mailing list, message
840	   ID <3F25B467.9020609@cs.helsinki.fi>, July 28, 2003.  URL
841	   "http://www1.ietf.org/mail-archive/working-
842	   groups/tsvwg/current/msg04334.html".

844	   [Hen98] T. Henderson, Re: NewReno and the 2001 Revision. September
845	   1998.  Email to the tcpimpl mailing list, Message ID
846	   "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS.Berkeley.EDU",
847	   archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl".

849	   [Hoe95] J. Hoe, "Startup Dynamics of TCP's Congestion Control and
850	   Avoidance Schemes", Master's Thesis, MIT, 1995.  URL "http://ana-
851	   www.lcs.mit.edu/anaweb/ps-papers/hoe-thesis.ps".

853	   [Hoe96] J. Hoe, "Improving the Start-up Behavior of a Congestion
854	   Control Scheme for TCP", ACM SIGCOMM, August 1996.  URL
855	   "http://www.acm.org/sigcomm/sigcomm96/program.html".

857	   [LM97] D. Lin and R. Morris, "Dynamics of Random Early Detection",
858	   SIGCOMM 97, September 1997.  URL
859	   "http://www.acm.org/sigcomm/sigcomm97/program.html".

861	   [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/".

863	   [PF01] J. Padhye and S. Floyd, "Identifying the TCP Behavior of Web
864	   Servers", June 2001, SIGCOMM 2001.

866	   [RFC1323] V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for
867	   High Performance,", RFC 1323, May 1992.

869	   [RFC3517] E. Blanton, M. Allman, K. Fall, and L. Wang, "A
870	   Conservative Selective Acknowledgment (SACK)-based Loss Recovery
871	   Algorithm for TCP", RFC 3517, April 2003.

873	   [RFC3522] R. Ludwig and M. Meyer, The Eifel Detection Algorithm for
874	   TCP, RFC 3522, April 2003.

876	15. Security Considerations

878	   RFC 2581 discusses general security considerations concerning TCP
879	   congestion control.  This document describes a specific algorithm
880	   that conforms with the congestion control requirements of RFC 2581,
881	   and so those considerations apply to this algorithm, too.  There are
882	   no known additional security concerns for this specific algorithm.

884	AUTHORS' ADDRESSES

886	   Sally Floyd
887	   International Computer Science Institute
888	   Phone: +1 (510) 666-2989
889	   Email: floyd@acm.org
890	   URL: http://www.icir.org/floyd/

892	   Tom Henderson
893	   The Boeing Company

895	   Email: thomas.r.henderson@boeing.com

897	   Andrei Gurtov
898	   U. Helsinki

900	   Email: gurtov@cs.helsinki.fi