idnits 2.17.1 

draft-ietf-tsvwg-newreno-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 19
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 20 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The abstract seems to contain references ([RFC2018], [RFC2581], [FF96],
     [RFC2582], [Hoe95]), which it shouldn't.  Please replace those with
     straight textual mentions of the documents in question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 517: '...   retransmission of three packets [Gur03].  The TCP sender MAY use such...'
     RFC 2119 keyword, line 593: '...   [RFC2581] specifies that "Out-of-order data segments SHOULD be...'
     RFC 2119 keyword, line 748: '...two heuristics that the TCP sender MAY...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (September 2003) is 7529 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2988' is defined on line 793, but no explicit
     reference was found in the text

  == Unused Reference: 'Hen98' is defined on line 830, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782)

  ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298)

  -- Obsolete informational reference (is this intentional?): RFC 2001 (ref.
     'F98') (Obsoleted by RFC 2581)

  -- Obsolete informational reference (is this intentional?): RFC 1323
     (Obsoleted by RFC 7323)


     Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                 S. Floyd
3	INTERNET DRAFT                                                      ICSI
4	draft-ietf-tsvwg-newreno-01.txt                             T. Henderson
5	                                                                  Boeing
6	                                                               A. Gurtov
7	                                                             U. Helsinki
8	                                                          September 2003

10	       The NewReno Modification to TCP's Fast Recovery Algorithm

12	                          Status of this Memo

14	   This document is an Internet-Draft and is in full conformance with
15	   all provisions of Section 10 of RFC2026.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	Abstract

35	   RFC 2581 [RFC2581] documents the following four intertwined TCP
36	   congestion control algorithms: Slow Start, Congestion Avoidance, Fast
37	   Retransmit, and Fast Recovery.  RFC 2581 [RFC2581] explicitly allows
38	   certain modifications of these algorithms, including modifications
39	   that use the TCP Selective Acknowledgement (SACK) option [RFC2018],
40	   and modifications that respond to "partial acknowledgments" (ACKs
41	   which cover new data, but not all the data outstanding when loss was
42	   detected) in the absence of SACK.  The NewReno mechanism uses an
43	   algorithm for responding to partial acknowledgments that was first
44	   proposed by Janey Hoe in [Hoe95].

46	   RFC 2582 [RFC2582] specified the NewReno mechanisms as Experimental
47	   in 1999.  This document is a small revision of RFC 2582 intended to
48	   advance the NewReno mechanisms to Proposed Standard.  RFC 2581 notes
49	   that the Fast Retransmit/Fast Recovery algorithm specified in that
50	   document does not recover very efficiently from multiple losses in a
51	   single flight of packets, and that RFC 2582 contains one set of
52	   modifications to address this problem.

54	   NOTE TO THE RFC EDITOR:  PLEASE REMOVE THIS SECTION UPON PUBLICATION.

56	   Changes from draft-ietf-tsvwg-newreno-00.txt:

58	   * In Section 8, added a cautionary note about using the duplicate
59	   acknowledgment counter as a flag for whether Fast Recovery is in
60	   effect.

62	   * In Section 8, added a note about pulling along "recover" with
63	   "snd_una" when Fast Recovery is not in effect.

65	   * Added a discussion in Section 6 about heuristics for distinguishing
66	   between a retransmitted packet that was dropped, and three duplicate
67	   acknowledgements simply from the unnecessary retransmission of three
68	   packets.

70	   * Added more text and examples for comparing the Impatient and the
71	   Slow-but-Steady variants.

73	   * In Section 8, added a cautionary note saying that when the sender
74	   is not in Fast Retransmit, the sender should not use the Fast
75	   Recovery response to multiple duplicate acknowledgements.

77	   Changes from draft-floyd-newreno-00.txt:

79	   * In Section 8 on "Implementation issues for the data sender",
80	   mentioned alternate methods for limiting bursts when exiting Fast
81	   Recovery.

83	   * Changed draft from draft-floyd-newreno to draft-ietf-tsvwg-newreno

85	   Changes from RFC 2582:

87	   * Rephrasing and rearrangements of the text.

89	   * RFC 2582 described the Careful and Less Careful variants of
90	   NewReno, along with a default version that was neither Careful nor
91	   Less Careful, and recommended the Careful variant.  This document
92	   only specifies the Careful version.

94	   * RFC 2582 used two separate variables, "send_high" and "recover",
95	   and this document has merged them into a single variable "recover".

97	   * Added sections on "Comparisons between Reno and NewReno TCP", and
98	   on "Changes relative to RFC 2582".  The section on "Comparisons
99	   between Reno and NewReno TCP" includes a discussion of the one area
100	   where NewReno is known to perform worse than Reno or SACK, and that
101	   is in the response to reordering.

103	   * Moved all of the discussions of the Impatient and Slow-but-Steady
104	   variants to one place, and recommended the Impatient variant (as in
105	   the default version in RFC 2582).

107	   * Added a section on Implementation issues for the data sender,
108	   mentioning maxburst_.

110	   * Added a paragraph about differences between RFC 2582 and [FF96].

112	   END OF NOTE TO RFC EDITOR

114	1. Introduction

116	   For the typical implementation of the TCP Fast Recovery algorithm
117	   described in [RFC2581] (first implemented in the 1990 BSD Reno
118	   release, and referred to as the Reno algorithm in [FF96]), the TCP
119	   data sender only retransmits a packet after a retransmit timeout has
120	   occurred, or after three duplicate acknowledgements have arrived
121	   triggering the Fast Retransmit algorithm.  A single retransmit
122	   timeout might result in the retransmission of several data packets,
123	   but each invocation of the Fast Retransmit algorithm in RFC 2581
124	   leads to the retransmission of only a single data packet.

126	   Problems can arise, therefore, when multiple packets have been
127	   dropped from a single window of data and the Fast Retransmit and Fast
128	   Recovery algorithms are invoked.  In this case, if the SACK option is
129	   available, the TCP sender has the information to make intelligent
130	   decisions about which packets to retransmit and which packets not to
131	   retransmit during Fast Recovery.  This document applies only for TCP
132	   connections that are unable to use the TCP Selective Acknowledgement
133	   (SACK) option, either because the option is not locally supported or
134	   because the TCP peer did not indicate a willingness to use SACK.

136	   In the absence of SACK, there is little information available to the
137	   TCP sender in making retransmission decisions during Fast Recovery.
138	   From the three duplicate acknowledgements, the sender infers a packet
139	   loss, and retransmits the indicated packet.  After this, the data
140	   sender could receive additional duplicate acknowledgements, as the
141	   data receiver acknowledges additional data packets that were already
142	   in flight when the sender entered Fast Retransmit.

144	   In the case of multiple packets dropped from a single window of data,
145	   the first new information available to the sender comes when the
146	   sender receives an acknowledgement for the retransmitted packet (that
147	   is, the packet retransmitted when Fast Retransmit was first entered).
148	   If there had been a single packet drop and no reordering, then the
149	   acknowledgement for this packet will acknowledge all of the packets
150	   transmitted before Fast Retransmit was entered.  However, when there
151	   were multiple packet drops, then the acknowledgement for the
152	   retransmitted packet will acknowledge some but not all of the packets
153	   transmitted before the Fast Retransmit.  We call this acknowledgement
154	   a partial acknowledgment.

156	   Along with several other suggestions, [Hoe95] suggested that during
157	   Fast Recovery the TCP data sender responds to a partial
158	   acknowledgment by inferring that the next in-sequence packet has been
159	   lost, and retransmitting that packet.  This document describes a
160	   modification to the Fast Recovery algorithm in RFC 2581 that
161	   incorporates a response to partial acknowledgements received during
162	   Fast Recovery.  We call this modified Fast Recovery algorithm
163	   NewReno, because it is a slight but significant variation of the
164	   basic Reno algorithm in RFC 2581.  This document does not discuss the
165	   other suggestions in [Hoe95] and [Hoe96], such as a change to the
166	   ssthresh parameter during Slow-Start, or the proposal to send a new
167	   packet for every two duplicate acknowledgements during Fast Recovery.
168	   The version of NewReno in this document also draws on other
169	   discussions of NewReno in the literature [LM97].

171	   We do not claim that the NewReno version of Fast Recovery described
172	   here is an optimal modification of Fast Recovery for responding to
173	   partial acknowledgements, for TCP connections that are unable to use
174	   SACK.  Based on our experiences with the NewReno modification in the
175	   NS simulator [NS] and with numerous implementations of NewReno, we
176	   believe that this modification improves the performance of the Fast
177	   Retransmit and Fast Recovery algorithms in a wide variety of
178	   scenarios.

180	2. Terminology and Definitions

182	   In this document, the key words "MUST", "MUST NOT", "REQUIRED",
183	   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
184	   and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119
185	   and indicate requirement levels for compliant TCP implementations
186	   implementing the NewReno Fast Retransmit and Fast Recovery algorithms
187	   described in this document.

189	   This document assumes that the reader is familiar with the terms
190	   SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and
191	   FLIGHT SIZE (FlightSize) defined in [RFC2581].  FLIGHT SIZE is
192	   defined as in [RFC2581] as follows:

194	      FLIGHT SIZE:
195	         The amount of data that has been sent but not yet acknowledged.

197	3. The Fast Retransmit and Fast Recovery Algorithms in NewReno

199	   The standard implementation of the Fast Retransmit and Fast Recovery
200	   algorithms is given in [RFC2581].  This section specifies the basic
201	   NewReno algorithm.  Sections 4 through 6 describe some optional
202	   variants, and the motivations behind them, that an implementor may
203	   want to consider when tuning performance for certain network
204	   scenarios.  Sections 7 and 8 provide some guidance to implementors
205	   based on experience with NewReno implementations.

207	   The NewReno modification concerns the Fast Recovery procedure that
208	   begins when three duplicate ACKs are received and ends when either a
209	   retransmission timeout occurs or an ACK arrives that acknowledges all
210	   of the data up to and including the data that was outstanding when
211	   the Fast Recovery procedure began.

213	   The NewReno algorithm specified in this document differs from the
214	   implementation in [RFC2581] in the introduction of the variable
215	   "recover" in step 1, in the response to a partial or new
216	   acknowledgement in step 5, and in modifications to step 1 and the
217	   addition of step 6 for avoiding multiple Fast Retransmits caused by
218	   the retransmission of packets already received by the receiver.

220	   The algorithm specified in this document uses a variable "recover",
221	   whose initial value is the initial send sequence number.

223	   1)  Three duplicate ACKs:
224	       When the third duplicate ACK is received and the sender is not
225	       already in the Fast Recovery procedure, check to see if the
226	       Cumulative Acknowledgement field covers more than "recover".
227	       If so, go to Step 1A.  Otherwise, go to Step 1B.

229	   1A) Invoking Fast Retransmit:
230	       If so, then set ssthresh to no more than the value given in
231	       equation 1 below.  (This is equation 3 from [RFC2581]).

233	         ssthresh = max (FlightSize / 2, 2*SMSS)           (1)

235	       In addition, record the highest sequence number transmitted in
236	       the variable "recover", and go to Step 2.

238	   1B) Not invoking Fast Retransmit:
239	       Do not enter the Fast Retransmit and Fast Recovery procedure.
240	       In particular, do not change ssthresh, do not go to Step 2 to
241	       retransmit the "lost" segment, and do not execute Step 3 upon
242	       subsequent duplicate ACKs.

244	   2)  Entering Fast Retransmit:
245	       Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS.
246	       This artificially "inflates" the congestion window by the number
247	       of segments (three) that have left the network and which the
248	       receiver has buffered.

250	   3)  Fast Recovery:
251	       For each additional duplicate ACK received while in Fast
252	       Recovery, increment cwnd by SMSS.  This artificially inflates
253	       the congestion window in order to reflect the additional segment
254	       that has left the network.

256	   4)  Fast Recovery, continued:
257	       Transmit a segment, if allowed by the new value of cwnd and the
258	       receiver's advertised window.

260	   5)  When an ACK arrives that acknowledges new data, this ACK could be
261	       the acknowledgment elicited by the retransmission from step 2, or
262	       elicited by a later retransmission.

264	       Full acknowledgements:
265	       If this ACK acknowledges all of the data up to and including
266	       "recover", then the ACK acknowledges all the intermediate
267	       segments sent between the original transmission of the lost
268	       segment and the receipt of the third duplicate ACK.  Set cwnd to
269	       either (1) min (ssthresh, FlightSize + SMSS); or (2) ssthresh,
270	       where ssthresh is the value set in step 1; this is termed
271	       "deflating" the window.  (We note that "FlightSize" in step 1
272	       referred to the amount of data outstanding in step 1, when Fast
273	       Recovery was entered, while "FlightSize" in step 5 refers to the
274	       amount of data outstanding in step 5, when Fast Recovery is
275	       exited.) If the second option is selected, the implementation
276	       should take measures to avoid a possible burst of data, in case
277	       the amount of data outstanding in the network was much less than
278	       the new congestion window allows.  A simple mechanism is to limit
279	       the number of data packets that can be sent in response to a
280	       single acknowledgement.  (This is known as "maxburst_" in the NS
281	       simulator).  Exit the Fast Recovery procedure.

283	       Partial acknowledgements:
284	       If this ACK does *not* acknowledge all of the data up to and
285	       including "recover", then this is a partial ACK.  In this case,
286	       retransmit the first unacknowledged segment.  Deflate the
287	       congestion window by the amount of new data acknowledged, then
288	       add back one SMSS (if the partial ACK acknowledges at least one
289	       SMSS of new data) and send a new segment if permitted by the new
290	       value of cwnd.  This "partial window deflation" attempts to
291	       ensure that, when Fast Recovery eventually ends, approximately
292	       ssthresh amount of data will be outstanding in the network.  Do
293	       not exit the Fast Recovery procedure (i.e., if any duplicate ACKs
294	       subsequently arrive, execute Steps 3 and 4 above).

296	       For the first partial ACK that arrives during Fast Recovery, also
297	       reset the retransmit timer.

299	   6)  Retransmit timeouts:
300	       After a retransmit timeout, record the highest sequence number
301	       transmitted in the variable "recover" and exit the Fast
302	       Recovery procedure if applicable.

304	   Step 1 specifies a check that the Cumulative Acknowledgement field
305	   covers more than "recover".  Because the acknowledgement field
306	   contains the sequence number that the sender next expects to receive,
307	   the acknowledgement "ack_number" covers more than "recover" when:

309	     ack_number - one > recover.

311	   Note that in Step 5, the congestion window is deflated after a
312	   partial acknowledgement is received.  The congestion window was
313	   likely to have been inflated considerably when the partial
314	   acknowledgement was received.  In addition, depending on the original
315	   pattern of packet losses, the partial acknowledgement might
316	   acknowledge nearly a window of data.  In this case, if the congestion
317	   window was not deflated, the data sender might be able to send nearly
318	   a window of data back-to-back.

320	   This document does not specify the sender's response to duplicate
321	   ACKs when the Fast Retransmit/Fast Recovery algorithm is not invoked.
322	   This is addressed in other documents, such as those describing the
323	   Limited Transmit procedure [RFC3042].  This document also does not
324	   address issues of adjusting the duplicate acknowledgement threshold,
325	   but assumes the threshold of three duplicate acknowledgements
326	   currently specified in RFC 2581.

328	   As a final note, we would observe that in the absence of the SACK
329	   option, the data sender is working from limited information.  When
330	   the issue of recovery from multiple dropped packets from a single
331	   window of data is of particular importance, the best alternative
332	   would be to use the SACK option.

334	4. Resetting the Retransmit Timer in Response to Partial
335	Acknowledgements.

337	   One possible variant to the response to partial acknowledgements
338	   specified in Section 3 concerns when to reset the retransmit timer
339	   after a partial acknowledgement.  The algorithm in Section 3, Step 5,
340	   resets the retransmit timer only after the first partial ACK.  In
341	   this case, if a large number of packets were dropped from a window of
342	   data, the TCP data sender's retransmit timer will ultimately expire,
343	   and the TCP data sender will invoke Slow-Start.  (This is illustrated
344	   on page 12 of [F98].)  We call this the Impatient variant of NewReno.
345	   We note that the Impatient variant in Section 3 doesn't follow the
346	   recommended algorithm in RFC 2988 of restarting the retransmit timer
347	   after every packet transmission or retransmission [RFC2988, Step
348	   5.1].

350	   In contrast, the NewReno simulations in [FF96] illustrate the
351	   algorithm described above with the modification that the retransmit
352	   timer is reset after each partial acknowledgement.  We call this the
353	   Slow-but-Steady variant of NewReno.  In this case, for a window with
354	   a large number of packet drops, the TCP data sender retransmits at
355	   most one packet per roundtrip time.  (This behavior is illustrated in
356	   the New-Reno TCP simulation of Figure 5 in [FF96], and on page 11 of
357	   [F98].

359	   When N packets have been dropped from a window of data for a large
360	   value of N, the Slow-but-Steady variant can remain in Fast Recovery
361	   for N round-trip times, retransmitting one more dropped packet each
362	   round-trip time; for these scenarios, the Impatient variant gives a
363	   faster recovery and better performance.  The tests "ns test-suite-
364	   newreno.tcl impatient1" and "ns test-suite-newreno.tcl slow1" in the
365	   NS simulator illustrate such a scenario, where the Impatient variant
366	   performs better than the Slow-but-Steady variant.  The Impatient
367	   variant can be particularly important for TCP connections with large
368	   congestion windows, as illustrated by the tests "ns test-suite-
369	   newreno.tcl impatient4" and "ns test-suite-newreno.tcl slow4" in the
370	   NS simulator.

372	   One can also construct scenarios where the Slow-but-Steady variant
373	   gives better performance than the Impatient variant.  As an example,
374	   this occurs then only a small number of packets are dropped, the RTO
375	   is sufficiently small that the retransmit timer expires, and
376	   performance would have been better without a retransmit timeout.  The
377	   tests "ns test-suite-newreno.tcl impatient2" and "ns test-suite-
378	   newreno.tcl slow2" in the NS simulator illustrate such a scenario.

380	   The Slow-but-Steady variant can also achieve higher goodput than the
381	   Impatient variant, by avoiding unnecessary retransmissions.  This
382	   could be of special interest for cellular links, where every
383	   transmission costs battery power and money.  The tests "ns test-
384	   suite-newreno.tcl impatient3" and "ns test-suite-newreno.tcl slow3"
385	   in the NS simulator illustrate such a scenario.  The Slow-but-Steady
386	   variant can also be more robust to delay variation in the network,
387	   where a delay spike might force the Impatient variant into a timeout
388	   and go-back-N recovery.

390	   Neither of the two variants discussed above are optimal.  Our
391	   recommendation is for the Impatient variant, as specified in Section
392	   3 of this document, because of the poor performance of the Slow-but-
393	   Steady variant for TCP connections with large congestion windows.

395	   One possibility for a more optimal algorithm would be one that
396	   recovered from multiple packet drops as quickly as does slow-start,
397	   while resetting the retransmit timers after each partial
398	   acknowledgement, as described in the section below.  We note,
399	   however, that there is a limitation to the potential performance in
400	   this case in the absence of the SACK option.

402	5. Retransmissions after a Partial Acknowledgement

404	   One possible variant to the response to partial acknowledgements
405	   specified in Section 3 would be to retransmit more than one packet
406	   after each partial acknowledgement, and to reset the retransmit timer
407	   after each retransmission.  The algorithm specified in Section 3
408	   retransmits a single packet after each partial acknowledgement.  This
409	   is the most conservative alternative, in that it is the least likely
410	   to result in an unnecessarily-retransmitted packet.  A variant that
411	   would recover faster from a window with many packet drops would be to
412	   effectively Slow-Start, retransmitting two packets after each partial
413	   acknowledgement.  Such an approach would take less than N roundtrip
414	   times to recover from N losses [Hoe96].  However, in the absence of
415	   SACK, recovering as quickly as slow-start introduces the likelihood
416	   of unnecessarily retransmitting packets, and this could significantly
417	   complicate the recovery mechanisms.

419	   We note that the response to partial acknowledgements specified in
420	   Section 3 of this document and in RFC 2582 differs from the response
421	   in [FF96], even though both approaches only retransmit one packet in
422	   response to a partial acknowledgement.  Step 5 of Section 3 specifies
423	   that the TCP sender responds to a partial ACK by deflating the
424	   congestion window by the amount of new data acknowledged, then adding
425	   back one SMSS if the partial ACK acknowledges at least one SMSS of
426	   new data, and sending a new segment if permitted by the new value of
427	   cwnd.  Thus, only one previously-sent packet is retransmitted in
428	   response to each partial acknowledgement, but additional new packets
429	   might be transmitted as well, depending on the amount of new data
430	   acknowledged by the partial acknowledgement.  In contrast, the
431	   variant of NewReno illustrated in [FF96] simply set the congestion
432	   window to ssthresh when a partial acknowledgement was received.  The
433	   approach in [FF96] is more conservative, and does not attempt to
434	   accurately track the actual number of outstanding packets after a
435	   partial acknowledgement is received.  While either of these
436	   approaches gives acceptable performance, the variant specified in
437	   Section 3 recovers more smoothly when multiple packets are dropped
438	   from a window of data.  (The [FF96] behavior can be seen in the NS
439	   simulator by setting the variable "partial_window_deflation_" for
440	   "Agent/TCP/Newreno" to 0, and the behavior specified in Section 3 is
441	   achieved by setting "partial_window_deflation_" to 1.)

443	6. Avoiding Multiple Fast Retransmits

445	   This section describes the motivation for the sender's state variable
446	   "recover", and discusses possible heuristics for distinguishing
447	   between a retransmitted packet that was dropped, and three duplicate
448	   acknowledgements simply from the unnecessary retransmission of three
449	   packets.

451	   In the absence of the SACK option or timestamps, a duplicate
452	   acknowledgement carries no information to identify the data packet or
453	   packets at the TCP data receiver that triggered that duplicate
454	   acknowledgement.  In this case, the TCP data sender is unable to
455	   distinguish between a duplicate acknowledgement that results from a
456	   lost or delayed data packet, and a duplicate acknowledgement that
457	   results from the sender's unnecessary retransmission of a data packet
458	   that had already been received at the TCP data receiver.  Because of
459	   this, with the Retransmit and Fast Recovery algorithms in Reno TCP,
460	   multiple segment losses from a single window of data can sometimes
461	   result in unnecessary multiple Fast Retransmits (and multiple
462	   reductions of the congestion window) [F94].

464	   With the Fast Retransmit and Fast Recovery algorithms in Reno TCP,
465	   the performance problems caused by multiple Fast Retransmits are
466	   relatively minor compared to the potential problems with Tahoe TCP,
467	   which does not implement Fast Recovery.  Nevertheless, unnecessary
468	   Fast Retransmits can occur with Reno TCP unless some explicit
469	   mechanism is added to avoid this, such as the use of the "recover"
470	   variable.  (This modification is called "bugfix" in [F98], and is
471	   illustrated on pages 7 and 9 of that document.  Unnecessary Fast
472	   Retransmits for Reno without "bugfix" is illustrated on page 6 of
473	   [F98].)

475	   Section 3 of RFC 2582 defined a default variant of NewReno TCP that
476	   did not use the variable "recover", and did not check if duplicate
477	   ACKs cover the variable "recover" before invoking Fast Retransmit.

479	   With this default variant from RFC 2582, the problem of multiple Fast
480	   Retransmits from a single window of data can occur after a Retransmit
481	   Timeout (as in page 8 of [F98]) or in scenarios with reordering (as
482	   in the validation test "./test-all-newreno newreno5_noBF" in
483	   directory "tcl/test" of the NS simulator.  This gives performance
484	   similar to that on page 8 of [F03].)  RFC 2582 also defined Careful
485	   and Less Careful variants of the NewReno algorithm, and recommended
486	   the Careful variant.

488	   The algorithm specified in Section 3 of this document corresponds to
489	   the Careful variant of NewReno TCP from RFC 2582, and eliminates the
490	   problem of multiple Fast Retransmits.  This algorithm uses the
491	   variable "recover", whose initial value is the initial send sequence
492	   number.  After each retransmit timeout, the highest sequence number
493	   transmitted so far is recorded in the variable "recover".

495	   If, after a retransmit timeout, the TCP data sender retransmits three
496	   consecutive packets that have already been received by the data
497	   receiver, then the TCP data sender will receive three duplicate
498	   acknowledgements that do not cover more than "recover".  In this
499	   case, the duplicate acknowledgements are not an indication of a new
500	   instance of congestion.  They are simply an indication that the
501	   sender has unnecessarily retransmitted at least three packets.

503	   However, when a retransmitted packet is itself dropped, the sender
504	   can also receive three duplicate acknowledgements that do not cover
505	   more than "recover", and in this case the sender would have been
506	   better off if it had initiated Fast Retransmit.  For a TCP that
507	   implements the algorithm specified in Section 3 of this document, the
508	   sender does not infer a packet drop from duplicate acknowledgements
509	   in this scenario.  As always, the retransmit timer is the backup
510	   mechanism for inferring packet loss in this case.

512	   There are several heuristics, based on timestamps or on the amount of
513	   advancement of the cumulative acknowledgement field, that allow the
514	   sender to distinguish in some cases between three duplicate
515	   acknowledgements following a retransmitted packet that was dropped,
516	   and three duplicate acknowledgements simply from the unnecessary
517	   retransmission of three packets [Gur03].  The TCP sender MAY use such
518	   a heuristic to decide to invoke a Fast Retransmit in some cases even
519	   when the three duplicate acknowledgements do not cover more than
520	   "recover".

522	   For example, when three duplicate acknowledgements are caused by the
523	   unnecessary retransmission of three packets, this is likely to be
524	   accompanied by the cumulative acknowledgement field advancing by at
525	   least four segments.  Similarly, a heuristic based on timestamps uses
526	   the fact that when there is a hole in the sequence space, the
527	   timestamp echoed in the duplicate acknowledgement is the timestamp of
528	   the most recent data packet that advanced the cumulative
529	   acknowledgement field [RFC1323].  If timestamps are used, and the
530	   sender stores the timestamp of the last acknowledged segment, then
531	   the timestamp echoed by duplicate acknowledgements can be used to
532	   distinguish between a retransmitted packet that was dropped, and
533	   three duplicate acknowledgements simply from the unnecessary
534	   retransmission of three packets.  The heuristics are illustrated in
535	   the NS simulator in the validation test "./test-all-newreno".

537	6.1.  ACK Heuristic.

539	   If the ACK-based heuristic is used, then following the advancement of
540	   the cumulative acknowledgement field, the sender stores the value of
541	   previous cumulative acknowledgement as prev_highest_ack and stores
542	   the latest cumulative ACK as highest_ack.  In addition, the following
543	   step is performed if Step 1 in Section 3 fails, before proceeding to
544	   Step 1B.

546	   1*)  If the Cumulative Acknowledgement field didn't cover more than
547	        "recover", check to see if the congestion window is greater
548	        than one SMSS and the difference between highest_ack and
549	        prev_highest_ack is at most four SMSS.  If true, duplicate
550	        ACKs indicate a lost segment (proceed to Step 1A in Section
551	        3). Otherwise, duplicate ACKs likely result from unnecessary
552	        retransmissions (proceed to Step 1B in Section 3).

554	   The congestion window check serves to protect against fast retransmit
555	   immediately after a retransmit timeout, similar to the
556	   "exitFastRetrans_" variable in NS. Examples of applying the ACK
557	   heuristic are in validation tests "./test-all-newreno
558	   newreno_rto_loss_ack" and "./test-all-newreno newreno_rto_dup_ack" in
559	   directory "tcl/test" of the NS simulator.

561	   If several ACKs are lost, the sender can see a jump in the cumulative
562	   ACK of more than three segments and the heuristic can fail. A
563	   validation test for this scenario is "./test-all-newreno
564	   newreno_rto_loss_ackf". The ACK heuristic is more likely to fail if
565	   the receiver uses delayed ACKs, because then a smaller number of ACK
566	   losses are needed to produce a sufficient jump in the cumulative ACK.

568	6.2.  Timestamp Heuristic.

570	   If this heuristic is used, the sender stores the timestamp of the
571	   last acknowledged segment.  In addition, the second paragraph of step
572	   1 in Section 3 is replaced as follows:

574	   1**) If the Cumulative Acknowledgement field didn't cover more than
575	        "recover", check to see if the echoed timestamp equals the
576	        stored timestamp. If true, duplicate ACKs indicate a lost
577	        segment (proceed to Step 1A in Section 3). Otherwise, duplicate
578	        ACKs likely result from unnecessary retransmissions (proceed
579	        to Step 1B in Section 3).

581	   Examples of applying the timestamp heuristic are in validation tests
582	   "./test-all-newreno newreno_rto_loss_tsh" and "./test-all-newreno
583	   newreno_rto_dup_tsh". The timestamp heuristic works correctly both
584	   when the receiver echoes timestamps as specified by [RFC1323] or by
585	   its revision attempts. However, if the receiver arbitrarily echos
586	   timestamps, the heuristic can fail.  The heuristic can also fail if a
587	   timeout was spurious and returning ACKs are not from retransmitted
588	   segments. This can be prevented by detection algorithms such as
589	   [RFC3522].

591	7. Implementation Issues for the Data Receiver

593	   [RFC2581] specifies that "Out-of-order data segments SHOULD be
594	   acknowledged immediately, in order to accelerate loss recovery."
595	   Neal Cardwell has noted that some data receivers do not send an
596	   immediate acknowledgement when they send a partial acknowledgment,
597	   but instead wait first for their delayed acknowledgement timer to
598	   expire [C98].  As [C98] notes, this severely limits the potential
599	   benefit from NewReno by delaying the receipt of the partial
600	   acknowledgement at the data sender.  Our recommendation is that the
601	   data receiver send an immediate acknowledgement for an out-of-order
602	   segment, even when that out-of-order segment fills a hole in the
603	   buffer.

605	8. Implementation Issues for the Data Sender

607	   In Section 3, Step 5 above, it is noted that implementations should
608	   take measures to avoid a possible burst of data when leaving Fast
609	   Recovery, in case the amount of new data that the sender is eligible
610	   to send due to the new value of the congestion window is large.  This
611	   can arise during NewReno when ACKs are lost or treated as pure window
612	   updates, thereby causing the sender to underestimate the number of
613	   new segments that can be sent during the recovery procedure.
614	   Specifically, bursts can occur when the FlightSize is much less than
615	   the new congestion window when exiting from Fast Recovery.  One
616	   simple mechanism to avoid a burst of data when leaving Fast Recovery
617	   is to limit the number of data packets that can be sent in response
618	   to a single acknowledgment.  (This is known as "maxburst_" in the ns
619	   simulator.)  Other possible mechanisms for avoiding bursts include
620	   rate-based pacing, or setting the slow-start threshold to the
621	   resultant congestion window and then resetting the congestion window
622	   to FlightSize.  A recommendation on the general mechanism to avoid
623	   excessively bursty sending patterns is outside the scope of this
624	   document.

626	   An implementation may want to use a separate flag to record whether
627	   or not it is presently in the Fast Recovery procedure.  The use of
628	   the value of the duplicate acknowledgment counter as such a flag is
629	   not reliable because it can be reset upon window updates and out-of-
630	   order acknowledgments.

632	   When not in Fast Recovery, the value of the state variable "recover"
633	   should be pulled along with the value of the state variable for
634	   acknowledgments (typically, "snd_una") so that, when large amounts of
635	   data has been sent and acked, the sequence space does not wrap and
636	   falsely indicate that Fast Recovery should not be entered (Section 3,
637	   step 1, last paragraph).

639	   It is important for the sender to respond correctly to duplicate ACKs
640	   received when the sender is no longer in Fast Recovery (e.g., because
641	   of a Retransmit Timeout).  The Limited Transmit procedure [RFC3042]
642	   describes possible responses to the first and second duplicate
643	   acknowledgements.  When three or more duplicate acknowledgements are
644	   received, the Cumulative Acknowledgement field doesn't cover more
645	   than "recover", and a new Fast Recovery is not invoked, it is
646	   important that the sender not execute the Fast Recovery steps (3) and
647	   (4) in Section 3.  Otherwise, the sender could end up in a chain of
648	   spurious timeouts.  We mention this only because several NewReno
649	   implementations had this bug, including the implementation in the NS
650	   simulator.  (This bug in the NS simulator was fixed in July 2003,
651	   with the variable "exitFastRetrans_".)

653	9. Simulations

655	   Simulations with NewReno are illustrated with the validation test
656	   "tcl/test/test-all-newreno" in the NS simulator.  The command
657	   "../../ns test-suite-newreno.tcl reno" shows a simulation with Reno
658	   TCP, illustrating the data sender's lack of response to a partial
659	   acknowledgement.  In contrast, the command "../../ns test-suite-
660	   newreno.tcl newreno_B" shows a simulation with the same scenario
661	   using the NewReno algorithms described in this paper.

663	10. Comparisons between Reno and NewReno TCP

665	   As we stated in the introduction, we believe that the NewReno
666	   modification described in this document improves the performance of
667	   the Fast Retransmit and Fast Recovery algorithms of Reno TCP in a
668	   wide variety of scenarios.  This has been discussed in some depth in
669	   [FF96], which illustrates Reno TCP's poor performance when multiple
670	   packets are dropped from a window of data and also illustrates
671	   NewReno TCP's good performance in that scenario.

673	   We do, however, know of one scenario where Reno TCP gives better
674	   performance than NewReno TCP, that we describe here for the sake of
675	   completeness.  Consider a scenario with no packet loss, but with
676	   sufficient reordering that the TCP sender receives three duplicate
677	   acknowledgements.  This will trigger the Fast Retransmit and Fast
678	   Recovery algorithms.  With Reno TCP or with Sack TCP, this will
679	   result in the unnecessary retransmission of a single packet, combined
680	   with a halving of the congestion window (shown on pages 4 and 6 of
681	   [F03]).  With NewReno TCP, however, this reordering will also result
682	   in the unnecessary retransmission of an entire window of data (shown
683	   on page 5 of [F03]).

685	   While Reno TCP performs better than NewReno TCP in the presence of
686	   reordering, NewReno's superior performance in the presence of
687	   multiple packet drops generally outweighs its less optimal
688	   performance in the presence of reordering.  (Sack TCP is the
689	   preferred solution, with good performance in both scenarios.) This
690	   document recommends the Fast Retransmit and Fast Recovery algorithms
691	   of NewReno TCP instead of those of Reno TCP for those TCP connections
692	   that do not support SACK.  We would also note that NewReno's Fast
693	   Retransmit and Fast Recovery mechanisms are widely deployed in TCP
694	   implementations in the Internet today, as documented in [PF01].  For
695	   example, tests of TCP implementations in several thousand web servers
696	   in 2001 showed that for those TCP connections where the web browser
697	   was not SACK-capable, more web servers used the Fast Retransmit and
698	   Fast Recovery algorithms of NewReno than those of Reno or Tahoe TCP
699	   [PF01].

701	11. Changes Relative to RFC 2582

703	   The purpose of this document is to advance the NewReno's Fast
704	   Retransmit and Fast Recovery algorithms in RFC 2582 to Proposed
705	   Standard.

707	   The main change in this document relative to RFC 2582 is to specify
708	   the Careful variant of NewReno's Fast Retransmit and Fast Recovery
709	   algorithms.  The base algorithm described in RFC 2582 did not attempt
710	   to avoid unnecessary multiple Fast Retransmits that can occur after a
711	   timeout (described in more detail in the section above).  However,
712	   RFC 2582 also defined "Careful" and "Less Careful" variants that
713	   avoid these unnecessary Fast Retransmits, and recommended the Careful
714	   variant.  This document specifies the previously-named "Careful"
715	   variant as the basic version of NewReno.  As described below, this
716	   algorithm uses a variable "recover", whose initial value is the send
717	   sequence number.

719	   The algorithm specified in Section 3 checks whether the
720	   acknowledgement field of a partial acknowledgement covers *more* than
721	   "recover".  Another possible variant would be to require simply that
722	   the acknowledgement field *cover* "recover" before initiating another
723	   Fast Retransmit.  We called this the Less Careful variant in RFC
724	   2582.

726	   There are two separate scenarios in which the TCP sender could
727	   receive three duplicate acknowledgements acknowledging "recover" but
728	   no more than "recover".  One scenario would be that the data sender
729	   transmitted four packets with sequence numbers higher than "recover",
730	   that the first packet was dropped in the network, and the following
731	   three packets triggered three duplicate acknowledgements
732	   acknowledging "recover".  The second scenario would be that the
733	   sender unnecessarily retransmitted three packets below "recover", and
734	   that these three packets triggered three duplicate acknowledgements
735	   acknowledging "recover".  In the absence of SACK, the TCP sender in
736	   unable to distinguish between these two scenarios.

738	   For the Careful variant of Fast Retransmit, the data sender would
739	   have to wait for a retransmit timeout in the first scenario, but
740	   would not have an unnecessary Fast Retransmit in the second scenario.
741	   For the Less Careful variant to Fast Retransmit, the data sender
742	   would Fast Retransmit as desired in the first scenario, and would
743	   unnecessarily Fast Retransmit in the second scenario.  This document
744	   only specifies the Careful variant in Section 3.  Unnecessary Fast
745	   Retransmits with the Less Careful variant in scenarios with
746	   reordering are illustrated in page 8 of [F03].

748	   The document also specifies two heuristics that the TCP sender MAY
749	   use to decide to invoke Fast Retransmit even when the three duplicate
750	   acknowledgements do not cover more than "recover".  These heuristics,
751	   an ACK-based heuristic and a timestamp heuristic, are described in
752	   Sections 6.1 and 6.2 respectively.

754	12. Conclusions

756	   This document specifies the NewReno Fast Retransmit and Fast Recovery
757	   algorithms for TCP.  This NewReno modification to TCP can be
758	   important even for TCP implementations that support the SACK option,
759	   because the SACK option can only be used for TCP connections when
760	   both TCP end-nodes support the SACK option.  NewReno performs better
761	   than Reno (RFC 2581) in a number of scenarios discussed herein.

763	   A number of options to the basic algorithm presented in Section 3 are
764	   also described.  These include the handling of the retransmission
765	   timer (Section 4), the response to partial acknowledgments (Section
766	   5), and the value of the congestion window when leaving Fast Recovery
767	   (section 3, step 5).  Our belief is that the differences between
768	   these variants of NewReno are small compared to the differences
769	   between Reno and NewReno.  That is, the important thing is to
770	   implement NewReno instead of Reno, for a TCP connection without SACK;
771	   it is less important exactly which of the variants of NewReno is
772	   implemented.

774	13. Acknowledgements

776	   Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu,
777	   Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed
778	   feedback on this document or on its precursor RFC 2582.

780	14. References

782	   Normative References

784	   [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective
785	   Acknowledgement Options", RFC 2018, October 1996.

787	   [RFC2581] W. Stevens, M. Allman, and V. Paxson, "TCP Congestion
788	   Control", RFC 2581, April 1999.

790	   [RFC2582] S. Floyd and T. Henderson, The NewReno Modification to
791	   TCP's Fast Recovery Algorithm, RFC 2582, April 1999.

793	   [RFC2988] V. Paxson and M. Allman, Computing TCP's Retransmission
794	   Timer, RFC 2988, November 2000.

796	   [RFC3042] M. Allman, H. Balakrishnan, and S. Floyd, Enhancing TCP's
797	   Loss Recovery Using Limited Transmit, RFC 3042, January 2001.

799	   Informative References

801	   [C98] N. Cardwell, "delayed ACKs for retransmitted packets: ouch!".
802	   November 1998,  Email to the tcpimpl mailing list, Message-ID
803	   "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu",
804	   archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl".

806	   [F98] S. Floyd, Revisions to RFC 2001, "Presentation to the TCPIMPL
807	   Working Group", August 1998.  URLs "ftp://ftp.ee.lbl.gov/talks/sf-
808	   tcpimpl-aug98.ps" and "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-
809	   aug98.pdf".

811	   [F03] S. Floyd, "Moving NewReno from Experimental to Proposed
812	   Standard?  Presentation to the TSVWG Working Group", March 2003.
813	   URLs "http://www.icir.org/floyd/talks/newreno-Mar03.ps" and
814	   "http://www.icir.org/floyd/talks/newreno-Mar03.pdf".

816	   [FF96] K. Fall and S. Floyd, "Simulation-based Comparisons of Tahoe,
817	   Reno and SACK TCP", Computer Communication Review, July 1996.  URL
818	   "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z".

820	   [F94] S. Floyd, "TCP and Successive Fast Retransmits", Technical
821	   report, October 1994.  URL
822	   "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps".

824	   [Gur03] A. Gurtov, "[Tsvwg] resolving the problem of unnecessary fast
825	   retransmits in go-back-N", email to the tsvwg mailing list, message
826	   ID <3F25B467.9020609@cs.helsinki.fi>, July 28, 2003.  URL
827	   "http://www1.ietf.org/mail-archive/working-
828	   groups/tsvwg/current/msg04334.html".

830	   [Hen98] T. Henderson, Re: NewReno and the 2001 Revision. September
831	   1998.  Email to the tcpimpl mailing list, Message ID
832	   "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS.Berkeley.EDU",
833	   archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl".

835	   [Hoe95] J. Hoe, "Startup Dynamics of TCP's Congestion Control and
836	   Avoidance Schemes", Master's Thesis, MIT, 1995.  URL "http://ana-
837	   www.lcs.mit.edu/anaweb/ps-papers/hoe-thesis.ps".

839	   [Hoe96] J. Hoe, "Improving the Start-up Behavior of a Congestion
840	   Control Scheme for TCP", ACM SIGCOMM, August 1996.  URL
841	   "http://www.acm.org/sigcomm/sigcomm96/program.html".

843	   [LM97] D. Lin and R. Morris, "Dynamics of Random Early Detection",
844	   SIGCOMM 97, September 1997.  URL
845	   "http://www.acm.org/sigcomm/sigcomm97/program.html".

847	   [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/".

849	   [PF01] J. Padhye and S. Floyd, "Identifying the TCP Behavior of Web
850	   Servers", June 2001, SIGCOMM 2001.

852	   [RFC1323] V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for
853	   High Performance,", RFC 1323, May 1992.

855	   [RFC3522] R. Ludwig and M. Meyer, The Eifel Detection Algorithm for
856	   TCP, RFC 3522, April 2003.

858	15. Security Considerations

860	   RFC 2581 discusses general security considerations concerning TCP
861	   congestion control.  This document describes a specific algorithm
862	   that conforms with the congestion control requirements of RFC 2581,
863	   and so those considerations apply to this algorithm, too.  There are
864	   no known additional security concerns for this specific algorithm.

866	AUTHORS' ADDRESSES

868	   Sally Floyd
869	   International Computer Science Institute

871	   Phone: +1 (510) 666-2989
872	   Email: floyd@acm.org
873	   URL: http://www.icir.org/floyd/
874	   Tom Henderson
875	   The Boeing Company

877	   Email: thomas.r.henderson@boeing.com

879	   Andrei Gurtov
880	   U. Helsinki

882	   Email: gurtov@cs.helsinki.fi