idnits 2.17.1 

draft-ietf-tcpm-rfc3782-bis-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 781 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 15 instances of too long lines in the document, the longest
     one being 10 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (December 5, 2011) is 4526 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC6298' is defined on line 512, but no explicit
     reference was found in the text

  == Unused Reference: 'F98' is defined on line 523, but no explicit
     reference was found in the text

  == Unused Reference: 'F03' is defined on line 528, but no explicit
     reference was found in the text

  == Unused Reference: 'PF01' is defined on line 572, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 2001 (ref.
     'F98') (Obsoleted by RFC 2581)

  -- Obsolete informational reference (is this intentional?): RFC 1323
     (Obsoleted by RFC 7323)

  -- Obsolete informational reference (is this intentional?): RFC 2582
     (Obsoleted by RFC 3782)

  -- Obsolete informational reference (is this intentional?): RFC 3782
     (Obsoleted by RFC 6582)


     Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	TCP Maintenance and Minor                                   T. Henderson
2	Extensions Working Group                                          Boeing
3	Internet-Draft                                                  S. Floyd
4	Obsoletes: 3782  (if approved)                                      ICSI
5	Intended status: Standards Track                               A. Gurtov
6	Expires:  June 5, 2012                                              HIIT
7	                                                              Y. Nishida
8	                                                            WIDE Project
9	                                                        December 5, 2011

11	       The NewReno Modification to TCP's Fast Recovery Algorithm
12	                   draft-ietf-tcpm-rfc3782-bis-04.txt

14	Abstract

16	   RFC 5681 documents the following four intertwined TCP
17	   congestion control algorithms: slow start, congestion avoidance, fast
18	   retransmit, and fast recovery.  RFC 5681 explicitly allows
19	   certain modifications of these algorithms, including modifications
20	   that use the TCP Selective Acknowledgement (SACK) option (RFC 2883),
21	   and modifications that respond to "partial acknowledgments" (ACKs
22	   which cover new data, but not all the data outstanding when loss was
23	   detected) in the absence of SACK.  This document describes a specific
24	   algorithm for responding to partial acknowledgments, referred to as
25	   NewReno.  This response to partial acknowledgments was first proposed
26	   by Janey Hoe.  This document obsoletes RFC 3782.

28	Status of this Memo

30	   This Internet-Draft is submitted in full conformance with the
31	   provisions of BCP 78 and BCP 79.

33	   Internet-Drafts are working documents of the Internet Engineering
34	   Task Force (IETF).  Note that other groups may also distribute
35	   working documents as Internet-Drafts.  The list of current Internet-
36	   Drafts is at http://datatracker.ietf.org/drafts/current/.

38	   Internet-Drafts are draft documents valid for a maximum of six months
39	   and may be updated, replaced, or obsoleted by other documents at any
40	   time.  It is inappropriate to use Internet-Drafts as reference
41	   material or to cite them other than as "work in progress."

43	   This Internet-Draft will expire on June 5, 2012.

45	Copyright Notice

47	   Copyright (c) 2011 IETF Trust and the persons identified as
48	   the document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (http://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document.  Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document.  Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	   This document may contain material from IETF Documents or IETF
61	   Contributions published or made publicly available before November
62	   10, 2008.  The person(s) controlling the copyright in some of this
63	   material may not have granted the IETF Trust the right to allow
64	   modifications of such material outside the IETF Standards Process.
65	   Without obtaining an adequate license from the person(s) controlling
66	   the copyright in such materials, this document may not be modified
67	   outside the IETF Standards Process, and derivative works of it may
68	   not be created outside the IETF Standards Process, except to format
69	   it for publication as an RFC or to translate it into languages other
70	   than English.

72	1.  Introduction

74	   For the typical implementation of the TCP Fast Recovery algorithm
75	   described in [RFC5681] (first implemented in the 1990 BSD Reno
76	   release, and referred to as the Reno algorithm in [FF96]), the TCP
77	   data sender only retransmits a packet after a retransmit timeout has
78	   occurred, or after three duplicate acknowledgments have arrived
79	   triggering the Fast Retransmit algorithm.  A single retransmit
80	   timeout might result in the retransmission of several data packets,
81	   but each invocation of the Fast Retransmit algorithm in RFC 5681
82	   leads to the retransmission of only a single data packet.

84	   Two problems arise with Reno TCP when multiple packet losses occur
85	   in a single window.  First, Reno will often take a timeout, as
86	   has been documented in [Hoe95].  Second, even if a retransmission
87	   timeout is avoided, multiple fast retransmits and window reductions
88	   can occur, as documented in [F94].  When multiple packet losses
89	   occur, if the SACK option [RFC2883] is available, the TCP sender
90	   has the information to make intelligent decisions about which packets
91	   to retransmit and which packets not to retransmit during Fast
92	   Recovery.  This document applies to TCP connections that are
93	   unable to use the TCP Selective Acknowledgement (SACK) option,
94	   either because the option is not locally supported or
95	   because the TCP peer did not indicate a willingness to use SACK.

97	   In the absence of SACK, there is little information available to the
98	   TCP sender in making retransmission decisions during Fast
99	   Recovery.  From the three duplicate acknowledgments, the sender
100	   infers a packet loss, and retransmits the indicated packet.  After
101	   this, the data sender could receive additional duplicate
102	   acknowledgments, as the data receiver acknowledges additional data
103	   packets that were already in flight when the sender entered Fast
104	   Retransmit.

106	   In the case of multiple packets dropped from a single window of data,
107	   the first new information available to the sender comes when the
108	   sender receives an acknowledgment for the retransmitted packet (that
109	   is, the packet retransmitted when Fast Retransmit was first
110	   entered).  If there is a single packet drop and no reordering, then
111	   the acknowledgment for this packet will acknowledge all of the
112	   packets transmitted before Fast Retransmit was entered.  However, if
113	   there are multiple packet drops, then the acknowledgment for the
114	   retransmitted packet will acknowledge some but not all of the packets
115	   transmitted before the Fast Retransmit.  We call this acknowledgment
116	   a partial acknowledgment.

118	   Along with several other suggestions, [Hoe95] suggested that during
119	   Fast Recovery the TCP data sender responds to a partial
120	   acknowledgment by inferring that the next in-sequence packet has been
121	   lost, and retransmitting that packet.  This document describes a
122	   modification to the Fast Recovery algorithm in RFC 5681 that
123	   incorporates a response to partial acknowledgments received during
124	   Fast Recovery.  We call this modified Fast Recovery algorithm
125	   NewReno, because it is a slight but significant variation of the
126	   basic Reno algorithm in RFC 5681.  This document does not discuss the
127	   other suggestions in [Hoe95] and [Hoe96], such as a change to the
128	   ssthresh parameter during Slow-Start, or the proposal to send a new
129	   packet for every two duplicate acknowledgments during Fast
130	   Recovery.  The version of NewReno in this document also draws on
131	   other discussions of NewReno in the literature [LM97, Hen98].

133	   We do not claim that the NewReno version of Fast Recovery described
134	   here is an optimal modification of Fast Recovery for responding to
135	   partial acknowledgments, for TCP connections that are unable to use
136	   SACK.  Based on our experiences with the NewReno modification in the
137	   NS simulator [NS] and with numerous implementations of NewReno, we
138	   believe that this modification improves the performance of the Fast
139	   Retransmit and Fast Recovery algorithms in a wide variety of
140	   scenarios.  Previous versions of this RFC [RFC2582, RFC3782] provide
141	   simulation-based evidence of the possible performance gains.

143	2.  Terminology and Definitions

145	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
146	   NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
147	   "OPTIONAL" in this document are to be interpreted as described in
148	   RFC 2119 [RFC2119].

150	   This document assumes that the reader is familiar with the terms
151	   SENDER MAXIMUM SEGMENT SIZE (SMSS), CONGESTION WINDOW (cwnd), and
152	   FLIGHT SIZE (FlightSize) defined in [RFC5681].  FLIGHT SIZE is
153	   defined as in [RFC5681] as follows:

155	      FLIGHT SIZE:
156	         The amount of data that has been sent but not yet cumulatively
157	         acknowledged.

159	   This document defines an additional sender-side state variable
160	   called RECOVER:

162	      RECOVER:
163	         When in Fast Recovery, this variable records the send sequence
164	         number that must be acknowledged before the Fast Recovery
165	         procedure is declared to be over.

167	3.  The Fast Retransmit and Fast Recovery Algorithms in NewReno

169	3.1.  Protocol Overview

171	   The basic idea of these extensions to the Fast Retransmit and
172	   Fast Recovery algorithms described in Section 3.2 of [RFC5681]
173	   is as follows.  The TCP sender can infer, from the arrival of
174	   duplicate acknowledgments, whether multiple losses in the same
175	   window of data have most likely occurred, and avoid taking a
176	   retransmit timeout or making multiple congestion window reductions
177	   due to such an event.

179	   The NewReno modification applies to the Fast Recovery procedure that
180	   begins when three duplicate ACKs are received and ends when either a
181	   retransmission timeout occurs or an ACK arrives that acknowledges all
182	   of the data up to and including the data that was outstanding when
183	   the Fast Recovery procedure began.

185	3.2.  Specification

187	   The procedures specified in Section 3.2 of [RFC5681] are followed
188	   with the following modifications.

190	   1)  Initialization of TCP protocol control block:
191	       When the TCP protocol control block is initialized, Recover is
192	       set to the initial send sequence number.

194	   2)  Three duplicate ACKs:
195	       When the third duplicate ACK is received, the TCP sender first
196	       checks the value of Recover to see if the Cumulative
197	       Acknowledgment field covers more than Recover.  If so, the value
198	       of Recover is incremented to the value of the highest sequence
199	       number transmitted by the TCP so far.  The TCP then enters Fast
200	       Retransmit (step 2 of Section 3.2 of [RFC5681]).  If not, the TCP
201	       does not enter fast retransmit and does not reset ssthresh.

203	   3)  Response to newly acknowledged data:
204	       Step 6 of [RFC5681] specifies the response to the next ACK that
205	       acknowledges previously unacknowledged data.  When an ACK
206	       arrives that acknowledges new data, this ACK could be the
207	       acknowledgment elicited by the retransmission from step 2, or
208	       elicited by a later retransmission.  There are two cases.

210	       Full acknowledgments:
211	       If this ACK acknowledges all of the data up to and including
212	       Recover, then the ACK acknowledges all the intermediate
213	       segments sent between the original transmission of the lost
214	       segment and the receipt of the third duplicate ACK.  Set cwnd to
215	       either (1) min (ssthresh, max(FlightSize, SMSS) + SMSS) or
216	       (2) ssthresh, where ssthresh is the value set when Fast Retransmit
217	       was entered, and where FlightSize in (1) is the amount of data
218	       presently outstanding.  This is termed "deflating" the window.
219	       If the second option is selected, the implementation
220	       is encouraged to take measures to avoid a possible burst of
221	       data, in case the amount of data outstanding in the network is
222	       much less than the new congestion window allows.  A simple
223	       mechanism is to limit the number of data packets that can be sent
224	       in response to a single acknowledgment.  Exit the Fast Recovery
225	       procedure.

227	       Partial acknowledgments:
228	       If this ACK does *not* acknowledge all of the data up to and
229	       including Recover, then this is a partial ACK.  In this case,
230	       retransmit the first unacknowledged segment.  Deflate the
231	       congestion window by the amount of new data acknowledged by the
232	       cumulative acknowledgment field.  If the partial ACK
233	       acknowledges at least one SMSS of new data, then add back SMSS
234	       bytes to the congestion window.  This artificially
235	       inflates the congestion window in order to reflect the additional
236	       segment that has left the network.  Send a new segment if
237	       permitted by the new value of cwnd.  This "partial window
238	       deflation" attempts to ensure that, when Fast Recovery eventually
239	       ends, approximately ssthresh amount of data will be outstanding
240	       in the network.  Do not exit the Fast Recovery procedure (i.e.,
241	       if any duplicate ACKs subsequently arrive, execute Step 4 of
242	       Section 3.2 of [RFC5681].

244	       For the first partial ACK that arrives during Fast Recovery, also
245	       reset the retransmit timer.  Timer management is discussed in
246	       more detail in Section 4.

248	   4)  Retransmit timeouts:
249	       After a retransmit timeout, record the highest sequence number
250	       transmitted in the variable Recover and exit the Fast
251	       Recovery procedure if applicable.

253	   Step 2 above specifies a check that the Cumulative Acknowledgment
254	   field covers more than Recover.  Because the acknowledgment field
255	   contains the sequence number that the sender next expects to receive,
256	   the acknowledgment "ack_number" covers more than Recover when:

258	      ack_number - 1 > Recover;

260	   i.e., at least one byte more of data is acknowledged beyond the
261	   highest byte that was outstanding when Fast Retransmit was last
262	   entered.

264	   Note that in Step 3 above, the congestion window is deflated after
265	   a partial acknowledgment is received.  The congestion window was
266	   likely to have been inflated considerably when the partial
267	   acknowledgment was received.  In addition, depending on the original
268	   pattern of packet losses, the partial acknowledgment might
269	   acknowledge nearly a window of data.  In this case, if the congestion
270	   window was not deflated, the data sender might be able to send nearly
271	   a window of data back-to-back.

273	   This document does not specify the sender's response to duplicate
274	   ACKs when the Fast Retransmit/Fast Recovery algorithm is not
275	   invoked.  This is addressed in other documents, such as those
276	   describing the Limited Transmit procedure [RFC3042].  This document
277	   also does not address issues of adjusting the duplicate
278	   acknowledgment threshold, but assumes the threshold specified in
279	   the IETF standards; the current standard is [RFC5681], which
280	   specifies a threshold of three duplicate acknowledgments.

282	   As a final note, we would observe that in the absence of the SACK
283	   option, the data sender is working from limited information.  When
284	   the issue of recovery from multiple dropped packets from a single
285	   window of data is of particular importance, the best alternative
286	   would be to use the SACK option.

288	4.  Handling Duplicate Acknowledgments After A Timeout

290	   After each retransmit timeout, the highest sequence number
291	   transmitted so far is recorded in the variable "recover".
292	   If, after a retransmit timeout, the TCP data sender retransmits three
293	   consecutive packets that have already been received by the data
294	   receiver, then the TCP data sender will receive three duplicate
295	   acknowledgments that do not cover more than "recover".  In this
296	   case, the duplicate acknowledgments are not an indication of a new
297	   instance of congestion.  They are simply an indication that the
298	   sender has unnecessarily retransmitted at least three packets.

300	   However, when a retransmitted packet is itself dropped, the sender
301	   can also receive three duplicate acknowledgments that do not cover
302	   more than "recover".  In this case, the sender would have been
303	   better off if it had initiated Fast Retransmit.  For a TCP that
304	   implements the algorithm specified in Section 3.2 of this document, the
305	   sender does not infer a packet drop from duplicate acknowledgments
306	   in this scenario.  As always, the retransmit timer is the backup
307	   mechanism for inferring packet loss in this case.

309	   There are several heuristics, based on timestamps or on the amount of
310	   advancement of the cumulative acknowledgment field, that allow the
311	   sender to distinguish, in some cases, between three duplicate
312	   acknowledgments following a retransmitted packet that was dropped,
313	   and three duplicate acknowledgments from the unnecessary
314	   retransmission of three packets [Gur03, GF04].  The TCP sender MAY
315	   use such a heuristic to decide to invoke a Fast Retransmit in some
316	   cases, even when the three duplicate acknowledgments do not cover
317	   more than "recover".

319	   For example, when three duplicate acknowledgments are caused by the
320	   unnecessary retransmission of three packets, this is likely to be
321	   accompanied by the cumulative acknowledgment field advancing by at
322	   least four segments.  Similarly, a heuristic based on timestamps uses
323	   the fact that when there is a hole in the sequence space, the
324	   timestamp echoed in the duplicate acknowledgment is the timestamp of
325	   the most recent data packet that advanced the cumulative
326	   acknowledgment field [RFC1323].  If timestamps are used, and the
327	   sender stores the timestamp of the last acknowledged segment, then
328	   the timestamp echoed by duplicate acknowledgments can be used to
329	   distinguish between a retransmitted packet that was dropped and
330	   three duplicate acknowledgments from the unnecessary
331	   retransmission of three packets.

333	4.1.  ACK Heuristic

335	   If the ACK-based heuristic is used, then following the advancement of
336	   the cumulative acknowledgment field, the sender stores the value of
337	   the previous cumulative acknowledgment as prev_highest_ack, and
338	   stores the latest cumulative ACK as highest_ack.  In addition, the
339	   following check is performed if, in Step 2 of Section 3.2, the
340	   Cumulative Acknowledgment field does not cover more than "recover".

342	   1*)  If the Cumulative Acknowledgment field didn't cover more than
343	        "recover", check to see if the congestion window is greater
344	        than SMSS bytes and the difference between highest_ack and
345	        prev_highest_ack is at most 4*SMSS bytes.  If true, duplicate
346	        ACKs indicate a lost segment (enter Fast Retransmit).  Otherwise,
347	        duplicate ACKs likely result from unnecessary retransmissions
348	        (do not enter Fast Retransmit).

350	   The congestion window check serves to protect against fast retransmit
351	   immediately after a retransmit timeout.

353	   If several ACKs are lost, the sender can see a jump in the cumulative
354	   ACK of more than three segments, and the heuristic can fail.
355	   [RFC5681] recommends that a receiver should
356	   send duplicate ACKs for every out-of-order data packet, such as a
357	   data packet received during Fast Recovery.  The ACK heuristic is more
358	   likely to fail if the receiver does not follow this advice, because
359	   then a smaller number of ACK losses are needed to produce a
360	   sufficient jump in the cumulative ACK.

362	4.2.  Timestamp Heuristic

364	   If this heuristic is used, the sender stores the timestamp of the
365	   last acknowledged segment.  In addition, the last sentence of step
366	   2 in Section 3.2 is replaced as follows:

368	   1**) If the Cumulative Acknowledgment field didn't cover more than
369	        "recover", check to see if the echoed timestamp in the last
370	        non-duplicate acknowledgment equals the
371	        stored timestamp.  If true, duplicate ACKs indicate a lost
372	        segment (enter Fast Retransmit).  Otherwise, duplicate
373	        ACKs likely result from unnecessary retransmissions (do not enter
374	        Fast Retransmit).

376	   The timestamp heuristic works correctly, both when the receiver
377	   echoes timestamps as specified by [RFC1323], and by its revision
378	   attempts.  However, if the receiver arbitrarily echoes timestamps,
379	   the heuristic can fail.  The heuristic can also fail if a timeout was
380	   spurious and returning ACKs are not from retransmitted segments.
381	   This can be prevented by detection algorithms such as [RFC3522].

383	5.  Implementation Issues for the Data Receiver

385	   [RFC5681] specifies that "Out-of-order data segments SHOULD be
386	   acknowledged immediately, in order to accelerate loss recovery."
387	   Neal Cardwell has noted that some data receivers do not send an
388	   immediate acknowledgment when they send a partial acknowledgment,
389	   but instead wait first for their delayed acknowledgment timer to
390	   expire [C98].  As [C98] notes, this severely limits the potential
391	   benefit of NewReno by delaying the receipt of the partial
392	   acknowledgment at the data sender.  Echoing [RFC5681], our
393	   recommendation is that the data receiver send an immediate
394	   acknowledgment for an out-of-order segment, even when that
395	   out-of-order segment fills a hole in the buffer.

397	6.  Implementation Issues for the Data Sender

399	   In Section 3, Step 5 above, it is noted that implementations should
400	   take measures to avoid a possible burst of data when leaving Fast
401	   Recovery, in case the amount of new data that the sender is eligible
402	   to send due to the new value of the congestion window is large.  This
403	   can arise during NewReno when ACKs are lost or treated as pure window
404	   updates, thereby causing the sender to underestimate the number of
405	   new segments that can be sent during the recovery procedure.
406	   Specifically, bursts can occur when the FlightSize is much less than
407	   the new congestion window when exiting from Fast Recovery.  One
408	   simple mechanism to avoid a burst of data when leaving Fast Recovery
409	   is to limit the number of data packets that can be sent in response
410	   to a single acknowledgment.  (This is known as "maxburst_" in the ns
411	   simulator.)  Other possible mechanisms for avoiding bursts include
412	   rate-based pacing, or setting the slow-start threshold to the
413	   resultant congestion window and then resetting the congestion window
414	   to FlightSize.  A recommendation on the general mechanism to avoid
415	   excessively bursty sending patterns is outside the scope of this
416	   document.

418	   An implementation may want to use a separate flag to record whether
419	   or not it is presently in the Fast Recovery procedure.  The use of
420	   the value of the duplicate acknowledgment counter for this purpose is
421	   not reliable because it can be reset upon window updates and
422	   out-of-order acknowledgments.

424	   When updating the Cumulative Acknowledgment field outside of
425	   Fast Recovery, the "recover" state variable may also need to be
426	   updated in order to continue to permit possible entry into Fast
427	   Recovery (Section 3, step 1).  This issue arises when an update
428	   of the Cumulative Acknowledgment field results in a sequence
429	   wraparound that affects the ordering between the Cumulative
430	   Acknowledgment field and the "recover" state variable.  Entry
431	   into Fast Recovery is only possible when the Cumulative
432	   Acknowledgment field covers more than the "recover" state variable.

434	   It is important for the sender to respond correctly to duplicate ACKs
435	   received when the sender is no longer in Fast Recovery (e.g., because
436	   of a Retransmit Timeout).  The Limited Transmit procedure [RFC3042]
437	   describes possible responses to the first and second duplicate
438	   acknowledgments.  When three or more duplicate acknowledgments are
439	   received, the Cumulative Acknowledgment field doesn't cover more
440	   than "recover", and a new Fast Recovery is not invoked, it is
441	   important that the sender not execute the Fast Recovery steps (3) and
442	   (4) in Section 3.  Otherwise, the sender could end up in a chain of
443	   spurious timeouts.  We mention this only because several NewReno
444	   implementations had this bug, including the implementation in the NS
445	   simulator.

447	   It has been observed that some TCP implementations enter a slow start
448	   or congestion avoidance window updating algorithm immediately after
449	   the cwnd is set by the equation found in (Section 3, step 5), even
450	   without a new external event generating the cwnd change.  Note that
451	   after cwnd is set based on the procedure for exiting Fast Recovery
452	   (Section 3, step 5), cwnd SHOULD NOT be updated until a further
453	   event occurs (e.g., arrival of an ack, or timeout) after this
454	   adjustment.

456	7.  Security Considerations

458	   [RFC5681] discusses general security considerations concerning TCP
459	   congestion control.  This document describes a specific algorithm
460	   that conforms with the congestion control requirements of [RFC5681],
461	   and so those considerations apply to this algorithm, too.  There are
462	   no known additional security concerns for this specific algorithm.

464	8.  IANA Considerations

466	   This document has no actions for IANA.

468	9.  Conclusions

470	   This document specifies the NewReno Fast Retransmit and Fast Recovery
471	   algorithms for TCP.  This NewReno modification to TCP can even be
472	   important for TCP implementations that support the SACK option,
473	   because the SACK option can only be used for TCP connections when
474	   both TCP end-nodes support the SACK option.  NewReno performs better
475	   than Reno (RFC5681) in a number of scenarios discussed in
476	   previous versions of this RFC ([RFC2582], [RFC3782]).

478	   A number of options to the basic algorithm presented in Section 3 are
479	   also referenced in Appendix A to this document.  These include the
480	   handling of the retransmission timer, the response to partial
481	   acknowledgments, and whether or not the sender must maintain a state
482	   variable called Recover.  Our belief is that the differences
483	   between these variants of NewReno are small compared to the
484	   differences between Reno and NewReno.  That is, the important thing
485	   is to implement NewReno instead of Reno, for a TCP connection
486	   without SACK; it is less important exactly which of the variants of
487	   NewReno is implemented.

489	10.  Acknowledgments

491	   Many thanks to Anil Agarwal, Mark Allman, Armando Caro, Jeffrey Hsu,
492	   Vern Paxson, Kacheong Poon, Keyur Shah, and Bernie Volz for detailed
493	   feedback on this document or on its precursor, RFC 2582.  Jeffrey
494	   Hsu provided clarifications on the handling of the recover variable
495	   that were applied to RFC 3782 as errata, and now are in Section 8
496	   of this document.  Yoshifumi Nishida contributed a modification
497	   to the fast recovery algorithm to account for the case in which
498	   flightsize is 0 when the TCP sender leaves fast recovery, and the
499	   TCP receiver uses delayed acknowledgments.  Alexander Zimmermann
500	   provided several suggestions to improve the clarity of the document.

502	11.  References

504	11.1.  Normative References

506	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
507	             Requirement Levels", BCP 14, RFC 2119, March 1997.

509	   [RFC5681] Allman, M., Paxson, V. and  E. Blanton, "TCP Congestion
510	             Control", RFC 5681, September 2009.

512	   [RFC6298] Paxson, V., M. Allman, J. Chu, and M. Sargent, "Computing
513	             TCP's Retransmission Timer", RFC 6298, June 2011.

515	11.2.  Informative References

517	   [C98]     Cardwell, N., "delayed ACKs for retransmitted packets:
518	             ouch!".  November 1998,  Email to the tcpimpl mailing list,
519	             Message-ID
520	             "Pine.LNX.4.02A.9811021421340.26785-100000@sake.cs.washington.edu",
521	             archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl".

523	   [F98]     Floyd, S., Revisions to RFC 2001, "Presentation to the
524	             TCPIMPL Working Group", August 1998.  URLs
525	             "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.ps" and
526	             "ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.pdf".

528	   [F03]     Floyd, S., "Moving NewReno from Experimental to Proposed
529	             Standard?  Presentation to the TSVWG Working Group", March 2003.
530	             URLs "http://www.icir.org/floyd/talks/newreno-Mar03.ps" and
531	             "http://www.icir.org/floyd/talks/newreno-Mar03.pdf".

533	   [FF96]    Fall, K. and S. Floyd, "Simulation-based Comparisons of
534	             Tahoe, Reno and SACK TCP", Computer Communication Review, July 1996.
535	             URL "ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z".

537	   [F94]     Floyd, S., "TCP and Successive Fast Retransmits", Technical
538	             report, October 1994.  URL
539	             "ftp://ftp.ee.lbl.gov/papers/fastretrans.ps".

541	   [GF04]    Gurtov, A. and S. Floyd, "Resolving Acknowledgment
542	             Ambiguity in non-SACK TCP", Next Generation Teletraffic and
543	             Wired/Wireless Advanced Networking (NEW2AN'04), February
544	             2004.  URL "http://www.cs.helsinki.fi/u/gurtov/papers/
545	             heuristics.html".

547	   [Gur03]   Gurtov, A., "[Tsvwg] resolving the problem of unnecessary
548	             fast retransmits in go-back-N", email to the tsvwg mailing list,
549	             message ID <3F25B467.9020609@cs.helsinki.fi>, July 28, 2003.  URL
550	             "http://www1.ietf.org/mail-archive/working-groups/tsvwg/current/
551	             msg04334.html".

553	   [Hen98]   Henderson, T., Re: NewReno and the 2001 Revision. September
554	             1998.  Email to the tcpimpl mailing list, Message ID
555	             "Pine.BSI.3.95.980923224136.26134A-100000@raptor.CS.Berkeley.EDU",
556	             archived at "http://tcp-impl.lerc.nasa.gov/tcp-impl".

558	   [Hoe95]   Hoe, J., "Startup Dynamics of TCP's Congestion Control and
559	             Avoidance Schemes", Master's Thesis, MIT, 1995.

561	   [Hoe96]   Hoe, J., "Improving the Start-up Behavior of a Congestion
562	             Control Scheme for TCP", ACM SIGCOMM, August 1996.  URL
563	             "http://www.acm.org/sigcomm/sigcomm96/program.html".

565	   [LM97]    Lin, D. and R. Morris, "Dynamics of Random Early
566	             Detection", SIGCOMM 97, September 1997.  URL
567	             "http://www.acm.org/sigcomm/sigcomm97/program.html".

569	   [NS]      The Network Simulator (NS).
570	             URL "http://www.isi.edu/nsnam/ns/".

572	   [PF01]    Padhye, J. and S. Floyd, "Identifying the TCP Behavior of
573	             Web Servers", June 2001, SIGCOMM 2001.

575	   [RFC1323] Jacobson, V., Braden, R. and D. Borman, "TCP Extensions for
576	             High Performance", RFC 1323, May 1992.

578	   [RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to
579	             TCP's Fast Recovery Algorithm", RFC 2582, April 1999.

581	   [RFC2883] Floyd, S., J. Mahdavi, M. Mathis, and M. Podolsky, "The
582	             Selective Acknowledgment (SACK) Option for TCP, RFC 2883, July 2000.

584	   [RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing TCP's
585	             Loss Recovery Using Limited Transmit", RFC 3042, January 2001.

587	   [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for
588	             TCP", RFC 3522, April 2003.

590	   [RFC3782] Floyd, S., T. Henderson, and A. Gurtov, "The NewReno
591	             Modification to TCP's Fast Recovery Algorithm", RFC 3782, April 2004.

593	Appendix A.  Additional Information

595	   Previous versions of this RFC ([RFC2582], [RFC3782]) contained
596	   additional informative material on the following subjects, and
597	   may be consulted by readers who may want more information about
598	   possible variants to the algorithm and who may want references
599	   to specific [NS] simulations that provide NewReno test cases.

601	   Section 4 of [RFC3782] discusses some alternative behaviors for
602	   resetting the retransmit timer after a partial acknowledgment.

604	   Section 5 of [RFC3782] discusses some alternative behaviors for
605	   performing retransmission after a partial acknowledgment.

607	   Section 6 of [RFC3782] describes more information about the
608	   motivation for the sender's state variable Recover.

610	   Section 9 of [RFC3782] introduces some NS simulation test
611	   suites for NewReno.  In addition, references to simulation
612	   results can be found throughout [RFC3782].

614	   Section 10 of [RFC3782] provides a comparison of Reno and
615	   NewReno TCP.

617	   Section 11 of [RFC3782] listed changes relative to [RFC3782].

619	Appendix B.  Changes Relative to RFC 3782

621	   In [RFC3782], the cwnd after Full ACK reception will be set to
622	   (1) min (ssthresh, FlightSize + SMSS) or (2) ssthresh.  However,
623	   there is a risk in the first option which results in performance
624	   degradation.  With the first option, if FlightSize is zero, the
625	   result will be 1 SMSS. This means TCP can transmit only 1 segment
626	   at this moment, which can cause delay in ACK transmission at receiver
627	   due to delayed ACK algorithm.

629	   The FlightSize on Full ACK reception can be zero in some situations.
630	   A typical example is where sending window size during fast recovery
631	   is small. In this case, the retransmitted packet and new data packets
632	   can be transmitted within a short interval.  If all these packets
633	   successfully arrive, the receiver may generate a Full ACK that
634	   acknowledges all outstanding data.  Even if window size is not small,
635	   loss of ACK packets or receive buffer shortage during fast recovery
636	   can also increase the possibility of falling into this situation.

638	   The proposed fix in this document, which sets cwnd to at least 2*SMSS
639	   if the implementation uses option 1 in the Full ACK case (Section 3.2,
640	   step 3, option 1), ensures that the sender TCP transmits at least two
641	   segments on Full ACK reception.

643	   In addition, errata for RFC3782 (editorial clarification to Section 8
644	   of RFC2582, which is now Section 6 of this document) has been
645	   applied.

647	   The specification text (Section 3.2 herein) was rewritten to more
648	   closely track Section 3.2 of [RFC5681].

650	   Sections 4, 5, 9-11 of [RFC3782] were removed, and instead Appendix
651	   A of this document was added to back-reference this informative
652	   material.

654	Appendix C.  Document Revision History

656	   To be removed upon publication

658	   +----------+--------------------------------------------------+
659	   | Revision | Comments                                         |
660	   +----------+--------------------------------------------------+
661	   | draft-00 | RFC3782 errata applied, and changes applied from |
662	   |          | draft-nishida-newreno-modification-02            |
663	   +----------+--------------------------------------------------+
664	   | draft-01 | Non-normative sections moved to appendices,      |
665	   |          | editorial clarifications applied as suggested    |
666	   |          | by Alexander Zimmermann.                         |
667	   +----------+--------------------------------------------------+
668	   | draft-02 | Better align specification text with RFC5681.    |
669	   |          | Replace informative appendices by a new appendix |
670	   |          | that just provides back-references to earlier    |
671	   |          | NewReno RFCs.                                    |
672	   +----------+--------------------------------------------------+

674	Authors' Addresses

676	   Tom Henderson
677	   The Boeing Company

679	   EMail: thomas.r.henderson@boeing.com

681	   Sally Floyd
682	   International Computer Science Institute

684	   Phone: +1 (510) 666-2989
685	   EMail: floyd@acm.org
686	   URL: http://www.icir.org/floyd/

688	   Andrei Gurtov
689	   HIIT
690	   Helsinki Institute for Information Technology
691	   P.O. Box 19215
692	   00076 Aalto
693	   Finland

695	   EMail: gurtov@hiit.fi

697	   Yoshifumi Nishida
698	   WIDE Project
699	   Endo 5322
700	   Fujisawa, Kanagawa  252-8520
701	   Japan

703	   Email: nishida@wide.ad.jp