idnits 2.17.1 

draft-ietf-tsvwg-newreno-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 14
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 15 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The abstract seems to contain references ([RFC2018], [RFC2581], [FF96],
     [RFC2582], [Hoe95]), which it shouldn't.  Please replace those with
     straight textual mentions of the documents in question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 448: '...   [RFC2581] specifies that "Out-of-order data segments SHOULD be...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 2003) is 7620 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'Hen98' is defined on line 644, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782)

  -- Obsolete informational reference (is this intentional?): RFC 2001 (ref.
     'F98') (Obsoleted by RFC 2581)


     Summary: 7 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                 S. Floyd
3	INTERNET DRAFT                                                      ICSI
4	draft-ietf-tsvwg-newreno-00.txt                             T. Henderson
5	                                                                  Boeing
6	                                                               June 2003

8	       The NewReno Modification to TCP's Fast Recovery Algorithm

10	                          Status of this Memo

12	   This document is an Internet-Draft and is in full conformance with
13	   all provisions of Section 10 of RFC2026.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six months
21	   and may be updated, replaced, or obsoleted by other documents at any
22	   time.  It is inappropriate to use Internet-Drafts as reference
23	   material or to cite them other than as "work in progress."

25	   The list of current Internet-Drafts can be accessed at
26	   http://www.ietf.org/ietf/1id-abstracts.txt

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html.

31	Abstract

33	   RFC 2581 [RFC2581] documents the following four intertwined TCP
34	   congestion control algorithms: Slow Start, Congestion Avoidance, Fast
35	   Retransmit, and Fast Recovery.  RFC 2581 [RFC2581] explicitly allows
36	   certain modifications of these algorithms, including modifications
37	   that use the TCP Selective Acknowledgement (SACK) option [RFC2018],
38	   and modifications that respond to "partial acknowledgments" (ACKs
39	   which cover new data, but not all the data outstanding when loss was
40	   detected) in the absence of SACK.  The NewReno mechanism described in
41	   this document describes a specific algorithm for responding to
42	   partial acknowledgments, referred to as NewReno.  This response to
43	   partial acknowledgments was first proposed by Janey Hoe in [Hoe95].

45	   RFC 2582 [RFC2582] specified the NewReno mechanisms as Experimental
46	   in 1999.  This document is a small revision of RFC 2582 intended to
47	   advance the NewReno mechanisms to Proposed Standard.  RFC 2581 notes
48	   that the Fast Retransmit/Fast Recovery algorithm specified in that
49	   document does not recover very efficiently from multiple losses in a
50	   single flight of packets, and that RFC 2582 contains one set of
51	   modifications to address this problem.

53	   NOTE TO THE RFC EDITOR:  PLEASE REMOVE THIS SECTION UPON PUBLICATION.

55	   Changes from draft-floyd-newreno-00.txt:

57	   * In Section 8 on "Implementation issues for the data sender",
58	   mentioned alternate methods for limiting bursts when exiting Fast
59	   Recovery.

61	   * Changed draft from draft-floyd-newreno to draft-ietf-tsvwg-newreno

63	   Changes from RFC 2582:

65	   * Rephrasing and rearrangements of the text.

67	   * RFC 2582 described the Careful and Less Careful variants of
68	   NewReno, along with a default version that was neither Careful nor
69	   Less Careful, and recommended the Careful variant.  This document
70	   only specifies the Careful version.

72	   * RFC 2582 used two separate variables, "send_high" and "recover",
73	   and this document has merged them into a single variable "recover".

75	   * Added sections on "Comparisons between Reno and NewReno TCP", and
76	   on "Changes relative to RFC 2582".  The section on "Comparisons
77	   between Reno and NewReno TCP" includes a discussion of the one area
78	   where NewReno is known to perform worse than Reno or SACK, and that
79	   is in the response to reordering.

81	   * Moved all of the discussions of the Impatient and Slow-but-Steady
82	   variants to one place, and specified the Impatient variant (as in the
83	   default version in RFC 2582).

85	   * Added a section on Implementation issues for the data sender,
86	   mentioning maxburst_.

88	   * Added a paragraph about differences between RFC 2582 and [FF96].

90	   END OF NOTE TO RFC EDITOR

92	1. Introduction

94	   For the typical implementation of the TCP Fast Recovery algorithm
95	   described in [RFC2581] (first implemented in the 1990 BSD Reno
96	   release, and referred to as the Reno algorithm in [FF96]), the TCP
97	   data sender only retransmits a packet after a retransmit timeout has
98	   occurred, or after three duplicate acknowledgements have arrived
99	   triggering the Fast Retransmit algorithm.  A single retransmit
100	   timeout might result in the retransmission of several data packets,
101	   but each invocation of the Fast Retransmit algorithm in RFC 2581
102	   leads to the retransmission of only a single data packet.

104	   Problems can arise, therefore, when multiple packets have been
105	   dropped from a single window of data and the Fast Retransmit and Fast
106	   Recovery algorithms are invoked.  In this case, if the SACK option is
107	   available, the TCP sender has the information to make intelligent
108	   decisions about which packets to retransmit and which packets not to
109	   retransmit during Fast Recovery.  This document applies only for TCP
110	   connections that are unable to use the TCP Selective Acknowledgement
111	   (SACK) option, either because the option is not locally supported or
112	   because the TCP peer did not indicate a willingness to use SACK.

114	   In the absence of SACK, there is little information available to the
115	   TCP sender in making retransmission decisions during Fast Recovery.
116	   From the three duplicate acknowledgements, the sender infers a packet
117	   loss, and retransmits the indicated packet.  After this, the data
118	   sender could receive additional duplicate acknowledgements, as the
119	   data receiver acknowledges additional data packets that were already
120	   in flight when the sender entered Fast Retransmit.

122	   In the case of multiple packets dropped from a single window of data,
123	   the first new information available to the sender comes when the
124	   sender receives an acknowledgement for the retransmitted packet (that
125	   is, the packet retransmitted when Fast Retransmit was first entered).
126	   If there had been a single packet drop and no reordering, then the
127	   acknowledgement for this packet will acknowledge all of the packets
128	   transmitted before Fast Retransmit was entered.  However, when there
129	   were multiple packet drops, then the acknowledgement for the
130	   retransmitted packet will acknowledge some but not all of the packets
131	   transmitted before the Fast Retransmit.  We call this acknowledgement
132	   a partial acknowledgment.

134	   Along with several other suggestions, [Hoe95] suggested that during
135	   Fast Recovery the TCP data sender respond to a partial acknowledgment
136	   by inferring that the next in-sequence packet has been lost, and
137	   retransmitting that packet.  This document describes a modification
138	   to the Fast Recovery algorithm in RFC 2581 that incorporates a
139	   response to partial acknowledgements received during Fast Recovery.

141	   We call this modified Fast Recovery algorithm NewReno, because it is
142	   a slight but significant variation of the basic Reno algorithm in RFC
143	   2581.  This document does not discuss the other suggestions in
144	   [Hoe95] and [Hoe96], such as a change to the ssthresh parameter
145	   during Slow-Start, or the proposal to send a new packet for every two
146	   duplicate acknowledgements during Fast Recovery.  The version of
147	   NewReno in this document also draws on other discussions of NewReno
148	   in the literature [LM97].

150	   We do not claim that the NewReno version of Fast Recovery described
151	   here is an optimal modification of Fast Recovery for responding to
152	   partial acknowledgements, for TCP connections that are unable to use
153	   SACK.  Based on our experiences with the NewReno modification in the
154	   NS simulator [NS] and with numerous implementations of NewReno, we
155	   believe that this modification improves the performance of the Fast
156	   Retransmit and Fast Recovery algorithms in a wide variety of
157	   scenarios.

159	2. Terminology and Definitions

161	   In this document, the key words "MUST", "MUST NOT", "REQUIRED",
162	   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
163	   and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119
164	   and indicate requirement levels for compliant TCP implementations
165	   implementing the NewReno Fast Retransmit and Fast Recovery algorithms
166	   described in this document.

168	   This document assumes that the reader is familiar with the terms
169	   SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and
170	   FLIGHT SIZE (FlightSize) defined in [RFC2581].  FLIGHT SIZE is
171	   defined as in [RFC2581] as follows:

173	      FLIGHT SIZE:
174	         The amount of data that has been sent but not yet acknowledged.

176	3. The Fast Retransmit and Fast Recovery algorithms in NewReno

178	   The standard implementation of the Fast Retransmit and Fast Recovery
179	   algorithms is given in [RFC2581].  The NewReno modification of these
180	   algorithms is given below.  The NewReno modification concerns the
181	   Fast Recovery procedure that begins when three duplicate ACKs are
182	   received and ends when either a retransmission timeout occurs or an
183	   ACK arrives that acknowledges all of the data up to and including the
184	   data that was outstanding when the Fast Recovery procedure began.

186	   The NewReno algorithm specified in this document differs from the
187	   implementation in [RFC2581] in the introduction of the variable
188	   "recover" in step 1, in the response to a partial or new
189	   acknowledgement in step 5, and in modifications to step 1 and the
190	   addition of step 6 for avoiding multiple Fast Retransmits caused by
191	   the retransmission of packets already received by the receiver.

193	   The algorithm specified in this document uses a variable "recover",
194	   whose initial value is the initial send sequence number.

196	   1)  When the third duplicate ACK is received and the sender is not
197	       already in the Fast Recovery procedure, check to see if the
198	       Cumulative Acknowledgement field covers more than "recover".
199	       If so, then set ssthresh to no more than the value given in
200	       equation 1 below.  (This is equation 3 from [RFC2581]).

202	         ssthresh = max (FlightSize / 2, 2*SMSS)           (1)

204	       In addition, record the highest sequence number transmitted in
205	       the variable "recover", and go to Step 2.

207	       If the Cumulative Acknowledgement field didn't cover more than
208	       "recover", then
209	       do not enter the Fast Retransmit and Fast Recovery procedure.
210	       In particular, do not change ssthresh, do not go to Step 2 to
211	       retransmit the "lost" segment, and do not execute Step 3 upon
212	       subsequent duplicate ACKs.

214	   2)  Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS.
215	       This artificially "inflates" the congestion window by the number
216	       of segments (three) that have left the network and which the
217	       receiver has buffered.

219	   3)  For each additional duplicate ACK received, increment cwnd by
220	       SMSS.  This artificially inflates the congestion window in order
221	       to reflect the additional segment that has left the network.

223	   4)  Transmit a segment, if allowed by the new value of cwnd and the
224	       receiver's advertised window.

226	   5)  When an ACK arrives that acknowledges new data, this ACK could be
227	       the acknowledgment elicited by the retransmission from step 2, or
228	       elicited by a later retransmission.

230	       If this ACK acknowledges all of the data up to and including
231	       "recover", then the ACK acknowledges all the intermediate
232	       segments sent between the original transmission of the lost
233	       segment and the receipt of the third duplicate ACK.  Set cwnd to
234	       either (1) min (ssthresh, FlightSize + SMSS); or (2) ssthresh,
235	       where ssthresh is the value set in step 1; this is termed
236	       "deflating" the window.  (We note that "FlightSize" in step 1
237	       referred to the amount of data outstanding in step 1, when Fast
238	       Recovery was entered, while "FlightSize" in step 5 refers to the
239	       amount of data outstanding in step 5, when Fast Recovery is
240	       exited.) If the second option is selected, the implementation
241	       should take measures to avoid a possible burst of data, in case
242	       the amount of data outstanding in the network was much less than
243	       the new congestion window allows.  A simple mechanism is to limit
244	       the number of data packets that can be sent in response to a
245	       single acknowledgement.  (This is known as "maxburst_" in the NS
246	       simulator).  Exit the Fast Recovery procedure.

248	       If this ACK does *not* acknowledge all of the data up to and
249	       including "recover", then this is a partial ACK.  In this case,
250	       retransmit the first unacknowledged segment.  Deflate the
251	       congestion window by the amount of new data acknowledged, then
252	       add back one SMSS (if the partial ACK acknowledges at least one
253	       SMSS of new data) and send a new segment if permitted by the new
254	       value of cwnd.  This "partial window deflation" attempts to
255	       ensure that, when Fast Recovery eventually ends, approximately
256	       ssthresh amount of data will be outstanding in the network.  Do
257	       not exit the Fast Recovery procedure (i.e., if any duplicate ACKs
258	       subsequently arrive, execute Steps 3 and 4 above).

260	       For the first partial ACK that arrives during Fast Recovery, also
261	       reset the retransmit timer.

263	   6)  After a retransmit timeout, record the highest sequence number
264	       transmitted in the variable "recover" and exit the Fast
265	       Recovery procedure if applicable.

267	   Step 1 specifies a check that the Cumulative Acknowledgement field
268	   covers more than "recover".  Because the acknowledgement field
269	   contains the sequence number that the sender next expects to receive,
270	   the acknowledgement "ack_number" covers more than "recover" when:

272	     ack_number - one > recover.

274	   Note that in Step 5, the congestion window is deflated after a
275	   partial acknowledgement is received.  The congestion window was
276	   likely to have been inflated considerably when the partial
277	   acknowledgement was received.  In addition, depending on the original
278	   pattern of packet losses, the partial acknowledgement might
279	   acknowledge nearly a window of data.  In this case, if the congestion
280	   window was not deflated, the data sender might be able to send nearly
281	   a window of data back-to-back.

283	   This document does not specify the sender's response to duplicate
284	   ACKs when the Fast Retransmit/Fast Recovery algorithm is not invoked.

286	   This is addressed in other documents, such as those describing the
287	   Limited Transmit procedure [RFC3042].  This document also does not
288	   address issues of adjusting the duplicate acknowledgement threshold,
289	   but assumes the threshold of three duplicate acknowledgements
290	   currently specified in RFC 2581.

292	   As a final note, we would observe that in the absence of the SACK
293	   option, the data sender is working from limited information.  When
294	   the issue of recovery from multiple dropped packets from a single
295	   window of data is of particular importance, the best alternative
296	   would be to use the SACK option.

298	4. Resetting the retransmit timer in response to partial
299	acknowledgements.

301	   One possible variant to the response to partial acknowledgements
302	   specified in Section 3 concerns when to reset the retransmit timer
303	   after a partial acknowledgement.  The algorithm in Section 3, Step 5,
304	   resets the retransmit timer only after the first partial ACK.  In
305	   this case, if a large number of packets were dropped from a window of
306	   data, the TCP data sender's retransmit timer will ultimately expire,
307	   and the TCP data sender will invoke Slow-Start.  (This is illustrated
308	   on page 12 of [F98].)  We call this the Impatient variant of NewReno.

310	   In contrast, the NewReno simulations in [FF96] illustrate the
311	   algorithm described above with the modification that the retransmit
312	   timer is reset after each partial acknowledgement.  We call this the
313	   Slow-but-Steady variant of NewReno.  In this case, for a window with
314	   a large number of packet drops, the TCP data sender retransmits at
315	   most one packet per roundtrip time.  (This behavior is illustrated in
316	   the New-Reno TCP simulation of Figure 5 in [FF96], and on page 11 of
317	   [F98].  The tests "../../ns test-suite-newreno.tcl newreno1_B0" and
318	   "../../ns test-suite-newreno.tcl newreno1_B" in the NS simulator also
319	   illustrate the Slow-but-Steady and the Impatient variants of NewReno,
320	   respectively.)

322	   When N packets have been dropped from a window of data for a large
323	   value of N, the Slow-but-Steady variant can remain in Fast Recovery
324	   for N round-trip times, retransmitting one more dropped packet each
325	   round-trip time; for these scenarios, the Impatient variant gives a
326	   faster recovery and better performance.  One can also construct
327	   scenarios where the Slow-but-Steady variant would give better
328	   performance, where only a small number of packets are dropped, the
329	   RTO is sufficiently small that the retransmit timer expires, and
330	   performance would have been better without a retransmit timeout.
331	   Thus, neither of these variants are optimal; our recommendation is
332	   for the Impatient variant, as specified in Section 3 of this
333	   document.

335	   One possibility for a more optimal algorithm would be one that
336	   recovered from multiple packet drops as quickly as does slow-start,
337	   while resetting the retransmit timers after each partial
338	   acknowledgement, as described in the section below.  We note,
339	   however, that there is a limitation to the potential performance in
340	   this case in the absence of the SACK option.

342	5. Retransmissions after a partial acknowledgement.

344	   One possible variant to the response to partial acknowledgements
345	   specified in Section 3 would be to retransmit more than one packet
346	   after each partial acknowledgement, and to reset the retransmit timer
347	   after each retransmission.  The algorithm specified in Section 3
348	   retransmits a single packet after each partial acknowledgement.  This
349	   is the most conservative alternative, in that it is the least likely
350	   to result in an unnecessarily-retransmitted packet.  A variant that
351	   would recover faster from a window with many packet drops would be to
352	   effectively Slow-Start, retransmitting two packets after each partial
353	   acknowledgement.  Such an approach would take less than N roundtrip
354	   times to recover from N losses [Hoe96].  However, in the absence of
355	   SACK, recovering as quickly as slow-start introduces the likelihood
356	   of unnecessarily retransmitting packets, and this could significantly
357	   complicate the recovery mechanisms.

359	   We note that the response to partial acknowledgements specified in
360	   Section 3 of this document and in RFC 2582 differs from the response
361	   in [FF96], even though both approaches only retransmit one packet in
362	   response to a partial acknowledgement.  Step 5 of Section 3 specifies
363	   that the TCP sender responds to a partial ACK by deflating the
364	   congestion window by the amount of new data acknowledged, then adding
365	   back one SMSS if the partial ACK acknowledges at least one SMSS of
366	   new data, and sending a new segment if permitted by the new value of
367	   cwnd.  Thus, only one previously-sent packet is retransmitted in
368	   response to each partial acknowledgement, but additional new packets
369	   might be transmitted as well, depending on the amount of new data
370	   acknowledged by the partial acknowledgement.  In contrast, the
371	   variant of NewReno illustrated in [FF96] simply set the congestion
372	   window to ssthresh when a partial acknowledgement was received.  The
373	   approach in [FF96] is more conservative, and does not attempt to
374	   accurately track the actual number of outstanding packets after a
375	   partial acknowledgement is received.  While either of these
376	   approaches gives acceptable performance, the variant specified in
377	   Section 3 recovers more smoothly when multiple packets are dropped
378	   from a window of data.  (The [FF96] behavior can be seen in the NS
379	   simulator by setting the variable "partial_window_deflation_" for
380	   "Agent/TCP/Newreno" to 0, and the behavior specified in Section 3 is
381	   achieved by setting "partial_window_deflation_" to 1.)

383	6. Avoiding Multiple Fast Retransmits

385	   This section describes the motivation for the sender's state variable
386	   "recover".

388	   In the absence of the SACK option, a duplicate acknowledgement
389	   carries no information to identify the data packet or packets at the
390	   TCP data receiver that triggered that duplicate acknowledgement.  The
391	   TCP data sender is unable to distinguish between a duplicate
392	   acknowledgement that results from a lost or delayed data packet, and
393	   a duplicate acknowledgement that results from the sender's
394	   retransmission of a data packet that had already been received at the
395	   TCP data receiver.  Because of this, multiple segment losses from a
396	   single window of data can sometimes result in unnecessary multiple
397	   Fast Retransmits (and multiple reductions of the congestion window)
398	   [F94].

400	   With the Fast Retransmit and Fast Recovery algorithms in Reno TCP,
401	   the performance problems caused by multiple Fast Retransmits are
402	   relatively minor compared to the potential problems with Tahoe TCP,
403	   which does not implement Fast Recovery.  Nevertheless, unnecessary
404	   Fast Retransmits can occur with Reno TCP unless some explicit
405	   mechanism is added to avoid this, such as the use of the "recover"
406	   variable.  (This modification is called "bugfix" in [F98], and is
407	   illustrated on pages 7 and 9.  Unnecessary Fast Retransmits for Reno
408	   without "bugfix" is illustrated on page 6 of [F98].)

410	   Section 3 of RFC 2582 defined a default variant of NewReno TCP that
411	   did not use the variable "recover", and did not check if duplicate
412	   ACKs cover the variable "recover" before invoking Fast Retransmit.
413	   With this default variant from RFC 2582, the problem of multiple Fast
414	   Retransmits from a single window of data can occur after a Retransmit
415	   Timeout (as in page 8 of [F98]) or in scenarios with reordering (as
416	   in the validation test "./test-all-newreno newreno5_noBF" in
417	   directory "tcl/test" of the NS simulator.  This gives performance
418	   similar to that on page 8 of [F03].)  RFC 2582 also defined Careful
419	   and Less Careful variants of the NewReno algorithm, and recommended
420	   the Careful variant.

422	   The algorithm specified in Section 3 of this document corresponds to
423	   the Careful variant of NewReno TCP from RFC 2582, and eliminates the
424	   problem of multiple Fast Retransmits.  This algorithm uses the
425	   variable "recover", whose initial value is the initial send sequence
426	   number.  After each retransmit timeout, the highest sequence number
427	   transmitted so far is recorded in the variable "recover".

429	   If, after a retransmit timeout, the TCP data sender retransmits three
430	   consecutive packets that have already been received by the data
431	   receiver, then the TCP data sender will receive three duplicate
432	   acknowledgements that do not cover more than "recover".  In this
433	   case, the duplicate acknowledgements are not an indication of a new
434	   instance of congestion.  They are simply an indication that the
435	   sender has unnecessarily retransmitted at least three packets.

437	   We note that if the TCP data sender receives three duplicate
438	   acknowledgements that do not cover more than "recover", the sender
439	   does not know whether these duplicate acknowledgements resulted from
440	   a new packet drop or not.  For a TCP that implements the algorithm
441	   specified in Section 3 of this document, the sender does not infer a
442	   packet drop from duplicate acknowledgements in these circumstances.
443	   As always, the retransmit timer is the backup mechanism for inferring
444	   packet loss in this case.

446	7. Implementation issues for the data receiver.

448	   [RFC2581] specifies that "Out-of-order data segments SHOULD be
449	   acknowledged immediately, in order to accelerate loss recovery."
450	   Neal Cardwell has noted that some data receivers do not send an
451	   immediate acknowledgement when they send a partial acknowledgment,
452	   but instead wait first for their delayed acknowledgement timer to
453	   expire [C98].  As [C98] notes, this severely limits the potential
454	   benefit from NewReno by delaying the receipt of the partial
455	   acknowledgement at the data sender.  Our recommendation is that the
456	   data receiver send an immediate acknowledgement for an out-of-order
457	   segment, even when that out-of-order segment fills a hole in the
458	   buffer.

460	8. Implementation issues for the data sender.

462	   In Section 3, Step 5 above, it is noted that implementations should
463	   take measures to avoid a possible burst of data when leaving Fast
464	   Recovery, in case the amount of new data that the sender is eligible
465	   to send due to the new value of the congestion window is large.  This
466	   can arise during NewReno when ACKs are lost or treated as pure window
467	   updates, thereby causing the sender to underestimate the number of
468	   new segments that can be sent during the recovery procedure.
469	   Specifically, bursts can occur when the FlightSize is much less than
470	   the new congestion window when exiting from Fast Recovery.  One
471	   simple mechanism to avoid a burst of data when leaving Fast Recovery
472	   is to limit the number of data packets that can be sent in response
473	   to a single acknowledgment.  (This is known as "maxburst_" in the ns
474	   simulator.)  Other possible mechanisms for avoiding bursts include
475	   rate-based pacing, or setting the slow-start threshold to the
476	   resultant congestion window and then resetting the congestion window
477	   to FlightSize.  A recommendation on the general mechanism to avoid
478	   excessively bursty sending patterns is outside the scope of this
479	   document.

481	9. Simulations

483	   Simulations with NewReno are illustrated with the validation test
484	   "tcl/test/test-all-newreno" in the NS simulator.  The command
485	   "../../ns test-suite-newreno.tcl reno" shows a simulation with Reno
486	   TCP, illustrating the data sender's lack of response to a partial
487	   acknowledgement.  In contrast, the command "../../ns test-suite-
488	   newreno.tcl newreno_B" shows a simulation with the same scenario
489	   using the NewReno algorithms described in this paper.

491	10. Comparisons between Reno and NewReno TCP.

493	   As we stated in the introduction, we believe that the NewReno
494	   modification described in this document improves the performance of
495	   the Fast Retransmit and Fast Recovery algorithms of Reno TCP in a
496	   wide variety of scenarios.  This has been discussed in some depth in
497	   [FF96], which illustrates Reno TCP's poor performance when multiple
498	   packets are dropped from a window of data and also illustrates
499	   NewReno TCP's good performance in that scenario.

501	   We do, however, know of one scenario where Reno TCP gives better
502	   performance than NewReno TCP, that we are describe here for the sake
503	   of completeness.  Consider a scenario with no packet loss, but with
504	   sufficient reordering that the TCP sender receives three duplicate
505	   acknowledgements.  This will trigger the Fast Retransmit and Fast
506	   Recovery algorithms.  With Reno TCP or with Sack TCP, this will
507	   result in the unnecessary retransmission of a single packet, combined
508	   with a halving of the congestion window (shown on pages 4 and 6 of
509	   [F03]).  With NewReno TCP, however, this reordering will also result
510	   in the unnecessary retransmission of an entire window of data (shown
511	   on page 5 of [F03]).

513	   While Reno TCP performs better than NewReno TCP in the presence of
514	   reordering, NewReno's superior performance in the presence of
515	   multiple packet drops generally outweighs its less optimal
516	   performance in the presence of reordering.  (Sack TCP is the
517	   preferred solution, with good performance in both scenarios.) This
518	   document recommends the Fast Retransmit and Fast Recovery algorithms
519	   of NewReno TCP instead of those of Reno TCP for those TCP connections
520	   that do not support SACK.  We would also note that NewReno's Fast
521	   Retransmit and Fast Recovery mechanisms are widely deployed in TCP
522	   implementations in the Internet today, as documented in [PF01].  For
523	   example, tests of TCP implementations in several thousand web servers
524	   in 2001 showed that for those TCP connections where the web browser
525	   was not SACK-capable, more web servers used the Fast Retransmit and
526	   Fast Recovery algorithms of NewReno than those of Reno or Tahoe TCP

528	   [PF01].

530	11. Changes relative to RFC 2582

532	   The purpose of this document is to advance the NewReno's Fast
533	   Retransmit and Fast Recovery algorithms in RFC 2582 to Proposed
534	   Standard.

536	   The main change in this document relative to RFC 2582 is to specify
537	   the Careful variant of NewReno's Fast Retransmit and Fast Recovery
538	   algorithms.  The base algorithm described in RFC 2582 did not attempt
539	   to avoid unnecessary multiple Fast Retransmits that can occur after a
540	   timeout (described in more detail in the section above).  However,
541	   RFC 2582 also defined "Careful" and "Less Careful" variants that
542	   avoid these unnecessary Fast Retransmits, and recommended the Careful
543	   variant.  This document specifies the previously-named "Careful"
544	   variant as the basic version of NewReno.  As described below, this
545	   algorithm uses a variable "recover", whose initial value is the send
546	   sequence number.

548	   The algorithm specified in Section 3 checks whether the
549	   acknowledgement field of a partial acknowledgement covers *more* than
550	   "recover".  Another possible variant would be to require simply that
551	   the acknowledgement field *cover* "recover" before initiating another
552	   Fast Retransmit.  We called this the Less Careful variant in RFC
553	   2582.

555	   There are two separate scenarios in which the TCP sender could
556	   receive three duplicate acknowledgements acknowledging "recover" but
557	   no more than "recover".  One scenario would be that the data sender
558	   transmitted four packets with sequence numbers higher than "recover",
559	   that the first packet was dropped in the network, and the following
560	   three packets triggered three duplicate acknowledgements
561	   acknowledging "recover".  The second scenario would be that the
562	   sender unnecessarily retransmitted three packets below "recover", and
563	   that these three packets triggered three duplicate acknowledgements
564	   acknowledging "recover".  In the absence of SACK, the TCP sender in
565	   unable to distinguish between these two scenarios.

567	   For the Careful variant of Fast Retransmit, the data sender would
568	   have to wait for a retransmit timeout in the first scenario, but
569	   would not have an unnecessary Fast Retransmit in the second scenario.
570	   For the Less Careful variant to Fast Retransmit, the data sender
571	   would Fast Retransmit as desired in the first scenario, and would
572	   unnecessarily Fast Retransmit in the second scenario.  This document
573	   only specifies the Careful variant in Section 3.  Unnecessary Fast
574	   Retransmits with the Less Careful variant in scenarios with
575	   reordering are illustrated in page 8 of [F03].

577	12. Conclusions

579	   This document specifies the NewReno Fast Retransmit and Fast Recovery
580	   algorithms for TCP.  This NewReno modification to TCP can be
581	   important even for TCP implementations that support the SACK option,
582	   because the SACK option can only be used for TCP connections when
583	   both TCP end-nodes support the SACK option.  NewReno performs better
584	   than Reno (RFC 2581) in a number of scenarios discussed herein.

586	   A number of options to the basic algorithm presented in Section 3 are
587	   also described.  These include the handling of the retransmission
588	   timer (Section 4), the response to partial acknowledgments (Section
589	   5), and the value of the congestion window when leaving Fast Recovery
590	   (section 3, step 5).  Our belief is that the differences between
591	   these variants of NewReno are small compared to the differences
592	   between Reno and NewReno.  That is, the important thing is to
593	   implement NewReno instead of Reno, for a TCP connection without SACK;
594	   it is less important exactly which of the variants of NewReno is
595	   implemented.

597	13. Acknowledgements

599	   Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Vern Paxson,
600	   Kacheong Poon, Keyur Shah, and Bernie Volz for detailed feedback on
601	   this document or on its precursor RFC 2582.

603	14. References

605	   Normative References

607	   [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective
608	   Acknowledgement Options", RFC 2018, October 1996.

610	   [RFC2581] W. Stevens, M. Allman, and V. Paxson, "TCP Congestion
611	   Control", RFC 2581, April 1999.

613	   [RFC2582] S. Floyd and T. Henderson, The NewReno Modification to
614	   TCP's Fast Recovery Algorithm, RFC 2582, April 1999.

616	   [RFC3042] M. Allman, H. Balakrishnan, and S. Floyd, Enhancing TCP's
617	   Loss Recovery Using Limited Transmit, RFC 3042, January 2001.

619	   Informative References

621	   [C98] Neal Cardwell, "delayed ACKs for retransmitted packets: ouch!".
622	   November 1998.  Email to the tcpimpl mailing list, Message-ID
623	   "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu",
624	   archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl".

626	   [F98] Sally Floyd.  Revisions to RFC 2001.  Presentation to the
627	   TCPIMPL Working Group, August 1998.  URLs
628	   "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.ps" and
629	   "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.pdf".

631	   [F03] Sally Floyd.  Moving NewReno from Experimental to Proposed
632	   Standard?  Presentation to the TSVWG Working Group, March 2003.  URLs
633	   " "http://www.icir.org/floyd/talks/newreno-Mar03.ps" and
634	   "http://www.icir.org/floyd/talks/newreno-Mar03.pdf".

636	   [FF96] Kevin Fall and Sally Floyd.  Simulation-based Comparisons of
637	   Tahoe, Reno and SACK TCP.  Computer Communication Review, July 1996.
638	   URL "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z".

640	   [F94] S. Floyd, TCP and Successive Fast Retransmits. Technical
641	   report, October 1994.  URL
642	   "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps".

644	   [Hen98] Tom Henderson, Re: NewReno and the 2001 Revision. September
645	   1998.  Email to the tcpimpl mailing list, Message ID
646	   "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS.Berkeley.EDU",
647	   archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl".

649	   [Hoe95] J. Hoe, Startup Dynamics of TCP's Congestion Control and
650	   Avoidance Schemes. Master's Thesis, MIT, 1995.  URL "http://ana-
651	   www.lcs.mit.edu/anaweb/ps-papers/hoe-thesis.ps".

653	   [Hoe96] J. Hoe, Improving the Start-up Behavior of a Congestion
654	   Control Scheme for TCP.  In ACM SIGCOMM, August 1996.  URL
655	   "http://www.acm.org/sigcomm/sigcomm96/program.html".

657	   [LM97] Dong Lin and Robert Morris, "Dynamics of Random Early
658	   Detection", SIGCOMM 97, September 1997.  URL
659	   "http://www.acm.org/sigcomm/sigcomm97/program.html".

661	   [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/".

663	   [PF01] J. Padhye and S. Floyd,  Identifying the TCP Behavior of Web
664	   Servers.  June 2001, SIGCOMM 2001.

666	15. Security Considerations

668	   RFC 2581 discusses general security considerations concerning TCP
669	   congestion control.  This document describes a specific algorithm
670	   that conforms with the congestion control requirements of RFC 2581,
671	   and so those considerations apply to this algorithm, too.  There are
672	   no known additional security concerns for this specific algorithm.

674	AUTHORS' ADDRESSES

676	   Sally Floyd
677	   International Computer Science Institute

679	   Phone: +1 (510) 666-2989
680	   Email: floyd@acm.org
681	   URL: http://www.icir.org/floyd/

683	   Tom Henderson
684	   The Boeing Company

686	   Email: thomas.r.henderson@boeing.com